Thumbprint Classification - Infiniworx - Foundation 23.1 - Foundation 23.1 - AnyDoc - external

Infiniworx

Platform
AnyDoc
Product
Infiniworx
Release
Foundation 23.1
License

The Thumbprint Classification tool provides a document classification solution that can automatically learn a document layout based upon a predefined set of keywords and the positions of the text data. Each incoming image generates a unique thumbprint that is used to identify and classify documents by assigning the same identifier to any other images whose thumbprints are very similar.

Each incoming image generates a "thumbprint" that is then stored in a database. If this thumbprint is found to be unique, the image is tagged as an exemplar image and is assigned a new numeric identifier. If, instead, the thumbprint is found to match an existing exemplar in the database, it is assigned the same numeric identifier of a previously created exemplar image. In this way, the images are classified by assigning the same identifier to all images whose thumbprints are very similar.

Simple text objects cannot be used since the positional information about the OCR data is a requirement for the tool.

Job Objects (In)

Description

Input Image

Name of the image to be classified.

Input OCR Data

Name of the OCR data job object, previously created by another tool, to be classified by this tool.

System

Description

Database Setup

Configure and test the connection to a SQL Server database.

Manage

Evaluate the results of the tool over time.

Settings

Configure the properties of the thumbprint creation (match percentage, page height percentage, keywords, regular expressions, etc.).

Data Fields

Description

Exemplar

The identifier of the exemplar image that the current job matched against, or the identifier that was created if the current job generated a new exemplar.

Status

Specifies whether the current job is considered a new exemplar image or matched to an existing exemplar in the database.

User Data

Assign a meaningful output value for each exemplar class. All matches to this master will have the User Data value for use in the workflow.

Database

Description

System Name

Specify a system name if you want multiple tools and workflows to use a single database.

Options

Description

Process Page One Only

Process only the first page of each job, reducing processing time by processing only the necessary page for classifying the document.

Save Match Data

Archive the data for the images that match against existing exemplars. Doing so limits the ability to evaluate the success rate of the tool, but it may be safe to do after a suitable exemplar database has been created.

Training Mode

If True, the tool trains and creates new exemplars.

If False, the tool does not create any additional exemplars.

Tip:

Creating new exemplars takes up space in the database. To control the size of the database, you can set Training Mode to False once you are certain that Thumbprint Classification training is complete.