A Text Data Field Zone is configured using the options on the Text Field tab of the Data Field Zone dialog box.
Data Field Zone Keyword Field Options |
Description |
---|---|
Keyword type |
Using this drop-down list, select the Keyword Type that values extracted from the Data Field Zone are to be assigned to. If the currently-displayed form has an assigned Document Type, only the Keyword Types associated with the assigned Document Type are available in the drop-down list. If no Document Type is assigned to the currently-displayed form, only the <None> and >>Document Date options are available in the drop-down list. Some notes about selecting a Keyword Type:
Note:
When the Person name extraction Data Application option is selected, the Keyword type drop-down is replaced by the Last Name Keyword drop-down. |
XML Node |
Note:
This field is only displayed if the Advanced Capture form is configured to create an XML rendition of documents matched to it (i.e., the Create XML data rendition option is selected for the Advanced Capture form). Enter the name of the XML node (i.e., the element) that Keyword Values extracted from this zone are contained in when the XML rendition of the document is created. Note:
While the XML Node name may include alphanumeric characters, underscores (_), hyphens (-), or periods (.), it must begin with either a letter or an underscore. |
Data Application |
Determine how the Data Field Zone is processed by selecting one of the Data Application radio buttons. Entire zone value: Select this option if all text detected in the Data Field Zone should be processed and assigned to the Keyword Value. To allow for the removal of line breaks in the Keyword Value, which might be present if multiple lines of text are displayed in the Data Field Zone, the Remove line breaks check box is selected by default. To prevent line breaks from being removed in the Keyword Value, deselect this option. Specific line(s): Select this option if multiple lines of text may be displayed in the Data Field Zone, but only the text from specific detected lines should be assigned to the Keyword Value. Enter the lines from which text is to be extracted into the field next to the option. Multiple line numbers and line ranges can be included in the field. Individual line numbers and line ranges must be separated by commas, and line ranges are denoted by a dash between the first included line and the last included line. For example, entering 1, 3-5, 7 into the field next to the option will assign values extracted from lines 1, 3, 4, 5, and 7 to the Keyword Value. Note the following:
Table style/vertical list: Select this option if multiple lines of text should be read as a table. Each line will be assigned as a separate instance of the Keyword. Multi-Instance Keyword Type Groups can be used with values read as a table to group related values together accordingly. Each Multi-Instance Keyword Type Group is constructed with Keyword Values in the order they were detected. For example: You have an invoice for multiple line items. You would like to extract indexing information using three Keyword Types (Course ID, Course Name, and Credits) for each individual line item, and you would like to keep the Keyword Values for each line item grouped together using Multi-Instance Keyword Type Groups. |
Data Application (cont.) |
In order to correctly identify and group the Keyword Values, you must create a separate Data Field Zone for each of the Keyword Types, and each zone must be configured to read each zone as a table. Each recognized Keyword Value will be associated with the others that compose the Multi-Instance Keyword Type Group in the order they were read (i.e., the first Class ID Keyword Value is grouped with the first Class Name and Credits Keyword Values, the second Class ID Keyword Value is grouped with the second Class Name and Credits Keyword Values, etc.) CAUTION:
In order to read Data Field Zones as a table to create Multi-Instance Keyword Type Groups, a value must exist for every Keyword Value in the Multi-Instance Keyword Type Group. If a value is missing, the subsequent values for the Keyword Type may be unintentionally assigned to the wrong Multi-Instance Keyword Type Group. When the Entire zone value, Specific line(s), or Table style option is selected, the Remove text clipped by zone boundaries option is displayed in the Filtering section below. Select this option if you want to exclude from the Keyword Value any characters that straddle a boundary of the Data Field Zone. Note:
The Remove text clipped by zone boundaries option only excludes characters that actually straddle boundaries of the zone. All whole characters in the zone, including those that stray into the zone from nearby text, will be included in the Keyword Value. |
Data Application (cont.) |
Find by tag/expression: Select this option if you want to identify information to be stored as Keyword Values based on a specific tag or regular expression. Note:
When extracting Keyword Values using text tags or regular expressions, you may configure the entire page as a Data Field Zone instead of configuring individual Data Field Zones. For more information, see Creating an Entire Page Zone for Find by Tag/Expression. When this option is selected, the Find near tag/expression and Find matching reg. expression options are displayed in the Filtering section (as in the following image), allowing you to define a text tag or regular expression to identify the Keyword Value to be extracted. Select either the Find near tag/expression or the Find matching reg. expression radio button and enter the tag or the regular expression rule in the associated text field. Note:
To access the Regular Expression Library, click in the field and press F2. See The Regular Expression Library for more information.
|
Data Application (cont.) |
Note:
All regular expressions must be ECMA compliant. |
Data Application (cont.) |
Note:
When the Applied data format field is blank, the entire value captured in the Data Field Zone is shown in highlights and image snippets (that is, when indexing). When you specify capture groups in this field, however, only the matching capture groups in the Data Field Zone are shown in the highlights and image snippets. Once you have entered the tag or regular expression rule in the associated text field, you can create variations or synonyms of this value to search for when the entered value is not found. Click the ... button to open the Search Tag Synonyms dialog box. |
Data Application (cont.) |
To create a new search tag synonym, type the value in the text field and click Add. The synonym is added to the list in the top field. Once you have added multiple synonyms to the list, you can select them and then click the Move Up and Move Down buttons to reorder them for search precedence. Select the Find multiple occurrences check box if you would like to extract multiple Keyword Values from the zone (for example, multiple values in the zone meet the regular expression rule or the tag appears multiple times in the zone and you would like to extract each instance as a Keyword Value). Person name extraction: Select this option if you want the OCR engine to identify and extract any text in the zone that is formatted like a name. When this option is selected, the Person Name options are displayed in the Filtering section below, allowing you to define how the name should be identified and extracted as a Keyword Value. The following name formats are supported by default.
Note:
A middle name value is optional. If the name being extracted is displayed in the Last First [Middle] format, select the Format is Last First [Middle] with or without comma check box to distinguish it from other supported formats. |
Data Application (cont.) |
While the entire name is captured in the zone, the text is parsed into separate values (first name, middle name, and last name) and each value can be stored as an individual Keyword Value.
Once you have entered the tag or regular expression rule in the associated text field, you can create variations or synonyms of this value to search for when the entered value is not found. Click the ... button to open the Search Tag Synonyms dialog box. From here, you can add multiple synonyms and order them by search preference. To ignore the case of the text value being extracted as a Keyword Value, select the Match against any letter case check box. Tip:
The Match against any letter case option should be used when the zone contains only the name and is not surrounded by other text. Address extraction: Select this option if you want the OCR engine to identify and extract any text in the zone that is formatted like an address. When this option is selected, the Address options are displayed in the Filtering section below, allowing you to define how the address should be identified and extracted as a Keyword Value. The following address formats are supported by default:
Note:
State/Province Code values must be two characters in length. Street Address 2 and Secondary Postal Code values are optional. |
Data Application (cont.) |
While the entire address is captured in the zone, the text is parsed into separate values (street address, city, state/province, and postal code) and each value can be stored as an individual Keyword Value.
Note:
While all four of the above values must be captured within the zone for address extraction to be processed, you do not have to assign each of the values to a Keyword Type if one or more of the values are not needed. You may also identify a text tag or a regular expression to find the name values within the zone. When a text tag is configured, t he OCR engine searches the zone for the specified text and extracts the next logical value (either to the right of or underneath the tag) as a Keyword Value. When a regular expression is configured, the text matching the pattern of the regular expression is identified as a name value. To configure a tag or regular expression, enter the text tag or regular expression in the Find after tag/expression: field. Text tags must be enclosed in quotation marks so that they are not treated as regular expressions. Multiple tags and/or regular expressions can be configured; they must be separated by a space in the Find after tag/expression field. Once you have entered the tag or regular expression rule in the associated text field, you can create variations or synonyms of this value to search for when the entered value is not found. Click the ... button to open the Search Tag Synonyms dialog box. From here, you can add multiple synonyms and order them by search preference. |
Remove all whitespace from result |
Note:
This option is not displayed if the Person name extraction radio button is selected in the Data Application section. Select this check box if you would like to remove all whitespace (for example, spaces or paragraph returns) from the Keyword Value after it is read by the OCR engine. For example, if the Keyword Value read by the OCR engine is PSY 103 and the Remove all whitespace from result check box is selected, the Keyword Value is modified to PSY103. |
Delete this page from document |
Select this check box if you would like the page to be deleted from the document if a value is successfully extracted for the Data Field Zone and this value is not suspect. If a value is not extracted for the Data Field Zone, or if the extracted value is suspect, the page will not be deleted. |
Redact zone from document after keyword assignment |
Note:
This option is only available when the system performing Advanced Capture is licensed for Automated Redaction. Select this check box if you would like to redact a Data Field Zone from a document once the Keyword Value has been extracted from the zone and assigned. Tip:
If you would like to redact a Data Field Zone from a document without capturing and assigning a Keyword Value, select <None> from the Keyword type drop-down list. |
Character Search Hints |
These options are available to assist the OCR engine in determining if the value it reads is correct. Select the check box next to each character type that is allowed to be included in the Keyword Value. The available options are:
When a character is recognized by the OCR engine that is not part of an allowable character set, the character is replaced by a tilde (~) and the value is automatically marked as suspect. Tip:
Using the allowed character type options can sometimes help the OCR engine more easily determine the correct value by eliminating characters that are obviously not correct (e.g., an I is correctly identified instead of a 1 because numeric characters are filtered and prevented from being recognized as part of the value). Select the Disable Asian Text Support check box to instruct the OCR engine to skip Asian (i.e., double-byte) characters. Note:
The Disable Asian Text Support check box is only available if the OCR format assigned to the Document Type is configured for an Asian language (e.g., Japanese, Korean, etc.). Tip:
Selecting the Disable Asian Text Support check box allows you to identify numeric data (e.g., Date Keyword Values, Currency Keyword Values, and Document Dates) when performing OCR on documents configured to contain Asian (i.e., double-byte) characters. |
Page Location(s) |
The Page Location(s) options control the pages that the OCR engine searches for a particular Data Field Zone.
|
Specific allowed characters |
The Specific allowed characters field allows you to configure the OCR engine to identify only the characters that you specify in the field. For example, if you entered ABC in the Specific allowed characters field and the OCR engine identified the value as ABCDEFG, the DEFG characters would be stripped from the value and the Keyword Value associated with the document would be ABC. |
Advanced Recognition |
Note:
Options involving ICR processing below are only enabled if your solution is licensed for Intelligent Character Recognition (ICR). Note:
The OCR engine does not support Asian characters when reading dot matrix-printed text. Using this drop-down list, select the type of processing you would like to perform on this Data Field Zone.
|
Advanced Recognition (cont.) |
|
Advanced Recognition (cont.) |
Note:
The Auto-detect ICR/OCR options may not work properly if the Data Field Zone contains less than 25 characters. Tip:
Of the two Auto-detect ICR/OCR options, Default to ICR is more likely to produce the best results when the type of text cannot be determined. This is because the OCR engine's intelligent character recognition generally reads machine-printed text more accurately than the engine's optical character recognition reads hand-written text. If one of the Handwriting/ICR options is selected, the Remove vertical/horizontal lines from zone before processing check box is displayed. Select this option if you would like the OCR engine to attempt to strip vertical or horizontal lines (i.e., constraint boxes) from the zone before it is processed. |
Bar Code Types |
Note:
This option is only available when the Bar Code option is selected in the Advanced recognition drop-down list. Click this button to access the Select Bar Code Types dialog box. Here you can select the types of bar codes for which the Advanced Capture engine will search when processing the Data Field Zone. The engine will ignore any deselected bar code types when processing this zone. By default, all bar code types are selected. Tip:
To expedite Advanced Capture processing, select only the desired bar code types. To save changes to your bar code type selections, click Save. |
VB script |
Use the VB script drop-down to select a VBscript to associate with the processing of this Data Field Zone. Click the ... button to open the VB Scripts dialog box. Here, the selected script can be re-configured or edited. |
Keyword association |
You can logically group Data Field Zones using Keyword association. These groupings can be used to identify Keywords that belong to a Multi-Instance Keyword Type Group. Use the Keyword association field to enter or select a logical group name for the Data Field Zones that identify Keywords belonging to an MIKG. The Advanced Capture engine will attempt to maintain this Keyword grouping on the resulting document. |
Activation groups |
When you have configured multiple Form Identification Zones or Page Registration Zones for a document, you can assign individual Data Field Zones to a specific Form Identification or Page Registration Zone using activation groups. Activation groups allow you to activate only the Data Field Zones assigned to the Form Identification or Page Registration Zone that is used to match the document to an Advanced Capture form. Data Field Zones assigned to Form Identification or Page Registration Zones that are not used to match the document to a form will not be processed. Also, Data Field Zones present on pages other than the pages containing their assigned Form Identification or Page Registration Zones will not be processed, unless otherwise specified through the Page Location(s) setting or by adding a + to the front of the activation group name on the Form Identification or Page Registration Zone. This selective activation saves processing time and reduces the number of forms that need to be created for a Document Type. Use the Activation groups field to enter or select an activation group name. Add a + to the front of a group name (for example, +Group1) on a Form Identification or Page Registration Zone to set all Data Field Zones assigned to this group to be processed. Use commas to separate multiple group names.
|
Activation groups (cont.) |
Alternatively, you can assign a form definition group as the Data Field Zone's activation group to activate the zone for processing. Form definition groups can be used to extract only specific types of information (for example, header data vs. detail data) during processing. In the Activation groups drop-down list, form definition groups are enclosed in brackets (for example, [Group1]). |
Colors |
If you would like to assign specific colors to the Keyword Type configured for the Data Field Zone, click the Colors button. The Display Colors dialog box is displayed. Here you can change the colors in which any regular or suspect values for the corresponding Keyword Type are displayed in the Indexing panel once Advanced Capture processing has taken place.
Note:
Any colors assigned here can be overridden by colors assigned through Keyword Lookup/Replace settings and/or VB scripting. |
Suspect level |
Enter the Suspect Level threshold, 1-99, in this field. By default, this value is set to the default Suspect Level set for the Advanced Capture form. The Suspect Level is the level of confidence placed in data values captured in this zone. The default Suspect Level set for the Advanced Capture form and the actual Suspect Level detected for the selected value are displayed below the Suspect level field. After a zone is processed, the OCR engine gives the resulting value a score between 1 and 99, depending on how confident it is in the result that was returned. The higher the score is, the lower the OCR engine's confidence is in the results. The value you enter in this field is the threshold at which the OCR engine determines if a returned value is acceptable or suspect. A score returned by the OCR engine higher than the Suspect Level threshold you set causes the value captured from the zone to be marked as suspect. All scores lower than the Suspect Level threshold indicate that the captured value is considered by the OCR engine to be acceptable. For example, setting the Suspect Level to 99 would indicate you completely trust the result returned by the OCR engine because no higher score could be returned and no result could be marked as suspect. Setting the Suspect Level to 1 would indicate you have no trust in the result, since no lower score could be returned and no result could be determined acceptable. Setting the Suspect Level to 0 reverts back to the default threshold of 75. Tip:
By default, the Suspect Level threshold is set to 75 and the average score given to a processed field is 70. It is considered a best practice to set your Suspect Level to the default threshold of 75 to ensure that suspect Keyword Values are being consistently identified. |
Override Default Regional Setting |
When a Date, Date & Time, or Currency Keyword Type has been selected for the Data Field Zone, the Override Default Regional Setting drop-down list is displayed to the right of the Suspect level field. Note:
If Person name extraction or Address extraction has been selected for the Data Application, the Override Default Regional Setting drop-down list is not displayed. This option allows you to override your system's default regional settings when extracting values for the applicable Keyword Types in the Data Field Zone. Select a regional language from the drop-down list to parse the applicable Keyword Values using the selected region's formatting rules and language-specific names for days, months, etc. Select <Runtime System Default> to maintain your system's default regional settings when parsing these values. Note:
This option does not affect which languages the OCR format is configured to recognize. For information on configuring OCR formats, see the Full-Page OCR module reference guide or help files. |
Find/validate in keyword dataset |
Note:
This option is not enabled if the Person name extraction option is selected in the Data Application section. Additionally, it is only enabled if the Keyword Type selected in the Keyword type drop-down list is configured to use a Keyword Data Set (but not a Cascading Data Set). Select this option if you wish to validate the value identified by the OCR engine using values specified for the Keyword Type's Data Set. If the extracted value is an exact or fuzzy match to a value found in the Data Set (i.e., the number of mismatched characters in the extracted value is less than 10% of the number of characters in the extracted value), then the value found in the Data Set is assigned as the Keyword Value. Additionally, if the extracted value is a fuzzy match to a value found in the Data Set, then the assigned Keyword Value (i.e., the value fuzzy-matched in the Data Set) is marked as suspect. Note:
Fuzzy matches are disabled for values of five characters or less. If the extracted value is not an exact match to a value found in the Data Set, then the extracted value is assigned as the Keyword Value and is marked as suspect. If the Find/validate in keyword dataset check box is not selected, the OCR engine does not attempt to validate the extracted value against any values in the Keyword Data Set, and the extracted value is assigned as the Keyword Value. |
Add implied decimal point |
Note:
This check box is only enabled if a Currency Keyword Type is selected in the Keyword Type drop-down list. Select this check box if you would like the OCR engine to automatically insert a decimal point in the extracted value if one is not detected. For example, if this option is selected and the extracted value for the Total Amount Due Keyword is $12345, the OCR engine would automatically modify the value to $123.45. If this option is selected and a decimal point is detected in the value, or if this option is not selected, then the extracted value is left as-is. |