The Settings dialog box is used to set the properties of the matching algorithm. Click Setup... to open the dialog box.
-
Image Match Percentage- The percentage required for the data image to be considered a match of the exemplar image thumbprint.
-
Page Height Percentage- The percentage of the image's page height that is considered when generating thumbprints. This setting is useful when processing images that are uniquely identified by the top portion of the image, such as invoices and certain health forms.
-
Keywords- Enter the keywords for which the tool will search on all incoming images. These keywords should be representative of the types of forms that will be processed with the tool.
-
Use Stemming- This check box provides the option to stem keywords before comparison, causing the data to be compared using versions of the words that are considered the same regardless of pluralization or conjugation.
For example, the keywords "insure," "insures," and "insurance" all share the same stemmed keyword "insur." If using stemming, the tool would look for any of these keyword variants (rather than an exact keyword match) when comparing incoming images against the exemplar image.
-
Stemmed Keywords- This box lists the stemmed keywords. The list is shaded in gray if the Use Stemming check box is not selected, indicating that the stemmed keywords are not being used.
-
Regular Expressions- As an alternative to using static keywords, you can use regular expressions instead to find data that matches a particular layout or pattern, such as ZIP codes or phone numbers.
-
Style- Determines how incoming images are compared to the stored exemplar image thumbprints.
-
Expression- Memorizes the regular expression itself. This causes a match against any other data that matches the regular expression in the same position.
-
Layout- Memorizes the form of the data that matched the mask, remembering character types and counts.
-
Data- Memorizes the exact found data itself.
-
For example, to locate ZIP codes on a form, a user could create a regular expression that would find 5-digit or 9-digit ZIP codes. The Style Expression would remember that a ZIP code is found, Layout would remember the type of ZIP code (5- or 9-digit), and Data would remember the ZIP code itself and only match against the exact same data.