The Thumbprint Classification tool provides a document classification solution that can automatically learn a document layout based upon a predefined set of keywords and the positions of the text data. Each incoming image generates a unique thumbprint that is used to identify and classify documents by assigning the same identifier to any other images whose thumbprints are very similar.
Each incoming image generates a "thumbprint" that is then stored in a database. If this thumbprint is found to be unique, the image is tagged as an exemplar image and is assigned a new numeric identifier. If, instead, the thumbprint is found to match an existing exemplar in the database, it is assigned the same numeric identifier of a previously created exemplar image. In this way, the images are classified by assigning the same identifier to all images whose thumbprints are very similar.
Simple text objects cannot be used since the positional information about the OCR data is a requirement for the tool.
Job Objects (In) |
Description |
---|---|
Input Image |
Name of the image to be classified. |
Input OCR Data |
Name of the OCR data job object, previously created by another tool, to be classified by this tool. |
System |
Description |
---|---|
Database Setup |
Configure and test the connection to a SQL Server database. |
Manage |
Evaluate the results of the tool over time. |
Settings |
Configure the properties of the thumbprint creation (match percentage, page height percentage, keywords, regular expressions, etc.). |
Data Fields |
Description |
---|---|
Exemplar |
The identifier of the exemplar image that the current job matched against, or the identifier that was created if the current job generated a new exemplar. |
Status |
Specifies whether the current job is considered a new exemplar image or matched to an existing exemplar in the database. |
User Data |
Assign a meaningful output value for each exemplar class. All matches to this master will have the User Data value for use in the workflow. |
Database |
Description |
---|---|
System Name |
Specify a system name if you want multiple tools and workflows to use a single database. |
Options |
Description |
---|---|
Process Page One Only |
Process only the first page of each job, reducing processing time by processing only the necessary page for classifying the document. |
Save Match Data |
Archive the data for the images that match against existing exemplars. Doing so limits the ability to evaluate the success rate of the tool, but it may be safe to do after a suitable exemplar database has been created. |
Training Mode |
If True, the tool trains and creates new exemplars. If False, the tool does not create any additional exemplars. |
Creating new exemplars takes up space in the database. To control the size of the database, you can set Training Mode to False once you are certain that Thumbprint Classification training is complete.