Configuring a Text Data Field Zone - Advanced Capture - Foundation 24.1 - Foundation 24.1 - Ready - OnBase - Essential - Premier - Standard - external - Standard - Premier - Essential

Advanced Capture

Platform
OnBase
Product
Advanced Capture
Release
Foundation 24.1
License
Standard
Premier
Essential

A Text Data Field Zone is configured using the options on the Text Field tab of the Data Field Zone dialog box.

Data Field Zone Keyword Field Options

Description

Keyword type

Using this drop-down list, select the Keyword Type that values extracted from the Data Field Zone are to be assigned to.

If the currently-displayed form has an assigned Document Type, only the Keyword Types associated with the assigned Document Type are available in the drop-down list. If no Document Type is assigned to the currently-displayed form, only the <None> and >>Document Date options are available in the drop-down list.

Some notes about selecting a Keyword Type:

  • If the Keyword Value identified by the OCR engine exceeds the maximum length of the Keyword Type it is assigned to, then the Keyword Value is truncated to fit this length.

  • If you are configuring a Data Field Zone to capture a Date Keyword Value, the date format of the value you are attempting to capture must match the workstation's regional format.

    If the value's date format does not match the workstation's regional format, then you must configure a VB script for the Data Field Zone to parse the value into the correct format.

  • If you are configuring a Data Field Zone to capture a Date Keyword Value, the year portion of the value must be less than 100 or greater than 1900 to be considered valid (e.g., 01/01/11 or 01/01/2011). If the year does not fall within these ranges, the Date Keyword Value will be marked as suspect.

    Note:

    When the year portion of a Date Keyword Value consists of two digits, the Two Digit Year Settings configured for your solution override the workstation's regional settings for the pivot year. For more information on configuring the Two Digit Year Settings, see the System Administration module reference guide or help file.

  • If you are configuring a Data Field Zone to capture a Currency Keyword Value, the Add implied decimal point check box is displayed below the Suspect Level field. Select this check box to automatically insert a decimal point into a Currency Keyword Value if one is not found when the zone is processed.

Note:

When the Person name extraction Data Application option is selected, the Keyword type drop-down is replaced by the Last Name Keyword drop-down.

XML Node

Note:

This field is only displayed if the Advanced Capture form is configured to create an XML rendition of documents matched to it (i.e., the Create XML data rendition option is selected for the Advanced Capture form).

Enter the name of the XML node (i.e., the element) that Keyword Values extracted from this zone are contained in when the XML rendition of the document is created.

Note:

While the XML Node name may include alphanumeric characters, underscores (_), hyphens (-), or periods (.), it must begin with either a letter or an underscore.

Data Application

Determine how the Data Field Zone is processed by selecting one of the Data Application radio buttons.

Entire zone value: Select this option if all text detected in the Data Field Zone should be processed and assigned to the Keyword Value.

To allow for the removal of line breaks in the Keyword Value, which might be present if multiple lines of text are displayed in the Data Field Zone, the Remove line breaks check box is selected by default. To prevent line breaks from being removed in the Keyword Value, deselect this option.

Specific line(s): Select this option if multiple lines of text may be displayed in the Data Field Zone, but only the text from specific detected lines should be assigned to the Keyword Value. Enter the lines from which text is to be extracted into the field next to the option. Multiple line numbers and line ranges can be included in the field. Individual line numbers and line ranges must be separated by commas, and line ranges are denoted by a dash between the first included line and the last included line. For example, entering 1, 3-5, 7 into the field next to the option will assign values extracted from lines 1, 3, 4, 5, and 7 to the Keyword Value.

Note the following:

  • If Specific line(s) is selected and the Remove line breaks check box is selected, any values extracted from the specified lines will be concatenated into a single value and assigned to one instance of the Keyword. If the resulting Keyword Value exceeds the maximum length of the Keyword Type it is assigned to, then the Keyword Value is truncated to fit this length.

  • If Specific line(s) is selected and Remove line breaks is not selected, each line will be assigned as a separate instance of the Keyword.

Table style/vertical list: Select this option if multiple lines of text should be read as a table. Each line will be assigned as a separate instance of the Keyword.

Multi-Instance Keyword Type Groups can be used with values read as a table to group related values together accordingly. Each Multi-Instance Keyword Type Group is constructed with Keyword Values in the order they were detected.

For example: You have an invoice for multiple line items. You would like to extract indexing information using three Keyword Types (Course ID, Course Name, and Credits) for each individual line item, and you would like to keep the Keyword Values for each line item grouped together using Multi-Instance Keyword Type Groups.

Data Application (cont.)

In order to correctly identify and group the Keyword Values, you must create a separate Data Field Zone for each of the Keyword Types, and each zone must be configured to read each zone as a table. Each recognized Keyword Value will be associated with the others that compose the Multi-Instance Keyword Type Group in the order they were read (i.e., the first Class ID Keyword Value is grouped with the first Class Name and Credits Keyword Values, the second Class ID Keyword Value is grouped with the second Class Name and Credits Keyword Values, etc.)

CAUTION:

In order to read Data Field Zones as a table to create Multi-Instance Keyword Type Groups, a value must exist for every Keyword Value in the Multi-Instance Keyword Type Group. If a value is missing, the subsequent values for the Keyword Type may be unintentionally assigned to the wrong Multi-Instance Keyword Type Group.

When the Entire zone value, Specific line(s), or Table style option is selected, the Remove text clipped by zone boundaries option is displayed in the Filtering section below. Select this option if you want to exclude from the Keyword Value any characters that straddle a boundary of the Data Field Zone.

Note:

The Remove text clipped by zone boundaries option only excludes characters that actually straddle boundaries of the zone. All whole characters in the zone, including those that stray into the zone from nearby text, will be included in the Keyword Value.

Data Application (cont.)

Find by tag/expression: Select this option if you want to identify information to be stored as Keyword Values based on a specific tag or regular expression.

Note:

When extracting Keyword Values using text tags or regular expressions, you may configure the entire page as a Data Field Zone instead of configuring individual Data Field Zones. For more information, see Creating an Entire Page Zone for Find by Tag/Expression.

When this option is selected, the Find near tag/expression and Find matching reg. expression options are displayed in the Filtering section (as in the following image), allowing you to define a text tag or regular expression to identify the Keyword Value to be extracted.

Select either the Find near tag/expression or the Find matching reg. expression radio button and enter the tag or the regular expression rule in the associated text field.

Note:

To access the Regular Expression Library, click in the field and press F2. See The Regular Expression Library for more information.

  • Find near tag/expression: The OCR engine searches the zone for a tag and extracts the next logical value (by default, either to the right of or underneath the tag) as a Keyword Value.

    Tags can be literal (that is, the text specified in the text box field is the tag) or regular expressions (that is, the text in the text box field defines a regular expression, and text matching that pattern is the tag).

    Literal text tags must be enclosed in quotation marks.

    The tag is case insensitive. However, if the case of the tag and document value match, the suspect level of the Keyword Value is lower than it would be if they did not match.

Data Application (cont.)

  • To change the default search direction for the tag, click Search Direction. The Search Direction and Distance dialog box is displayed.

    The eight directional search areas are separated by dotted lines, with the tag in the center. The order of area search preference is indicated by ascending numerals, from 1 up to 4. When an area is selected to be considered in the search preference order, it is filled in hatched green with the ordinal numeral in the center. Click any area to select/deselect it.

    Tip:

    To reorder the preference for the selected areas, first deselect all of them and then select up to four areas in the appropriate order.

    Note:

    The area order preference only applies when the search identifies multiple hits roughly the same distance from the tag; otherwise, proximity to the tag automatically takes precedence.

    For entire page zones, you can also enter a maximum search distance value (0.25 to 20.00 inches) in the Maximum search distance from tag upper left corner field.

  • Find matching reg. expression: The OCR engine searches the zone for text that matches the regular expression rule. If text matching the rule is found, it is extracted as a Keyword Value.

    For example: the regular expression rule for a Social Security Number is \d{3}-\d{2}-\d{4}(that is, 3 digits, dash, 2 digits, dash, 4 digits). If the OCR engine identifies text matching this pattern in the zone, that text is stored as a Keyword Value.

Note:

All regular expressions must be ECMA compliant.

Data Application (cont.)

  • Select the Applied data format check box if you would like to use a regular expression to identify the Keyword Value, but you would like to apply a specific format to the extracted data. Enter the format in the text field, enclosing capture groups within brackets.

    For example:

    A document displays a date without separators (for example, Date: 01 01 2011). To capture this value and apply the appropriate date format (for example, DD/MM/YYYY), you would first enter the following regular expression in the Find matching reg. expression field, with parentheses enclosing the three capture groups: (\d\d)\s+(\d\d)\s+(\d\d\d\d). Then you would enter the date format in the Applied data format field, with brackets enclosing the capture groups: [1]/[2]/[3]. The Keyword Value can then be stored in the desired format (for example, 01/01/2011).

Note:

When the Applied data format field is blank, the entire value captured in the Data Field Zone is shown in highlights and image snippets (that is, when indexing). When you specify capture groups in this field, however, only the matching capture groups in the Data Field Zone are shown in the highlights and image snippets.

Once you have entered the tag or regular expression rule in the associated text field, you can create variations or synonyms of this value to search for when the entered value is not found. Click the ... button to open the Search Tag Synonyms dialog box.

Data Application (cont.)

To create a new search tag synonym, type the value in the text field and click Add. The synonym is added to the list in the top field. Once you have added multiple synonyms to the list, you can select them and then click the Move Up and Move Down buttons to reorder them for search precedence.

Select the Find multiple occurrences check box if you would like to extract multiple Keyword Values from the zone (for example, multiple values in the zone meet the regular expression rule or the tag appears multiple times in the zone and you would like to extract each instance as a Keyword Value).

Person name extraction: Select this option if you want the OCR engine to identify and extract any text in the zone that is formatted like a name.

When this option is selected, the Person Name options are displayed in the Filtering section below, allowing you to define how the name should be identified and extracted as a Keyword Value.

The following name formats are supported by default.

  • Last, First [Middle]

  • LAST, FIRST [MIDDLE]

  • First [Middle] Last

  • FIRST [MIDDLE] LAST

Note:

A middle name value is optional.

If the name being extracted is displayed in the Last First [Middle] format, select the Format is Last First [Middle] with or without comma check box to distinguish it from other supported formats.

Data Application (cont.)

While the entire name is captured in the zone, the text is parsed into separate values (first name, middle name, and last name) and each value can be stored as an individual Keyword Value.

  • Use the Last Name Keyword drop-down list to select the Keyword Type that the last name is assigned to.

  • Use the First Name Keyword drop-down list to select the Keyword Type that the first name is assigned to.

  • Use the Middle Name Keyword drop-down list to select the Keyword Type that the middle name is assigned to.

Once you have entered the tag or regular expression rule in the associated text field, you can create variations or synonyms of this value to search for when the entered value is not found. Click the ... button to open the Search Tag Synonyms dialog box. From here, you can add multiple synonyms and order them by search preference.

To ignore the case of the text value being extracted as a Keyword Value, select the Match against any letter case check box.

Tip:

The Match against any letter case option should be used when the zone contains only the name and is not surrounded by other text.

Address extraction: Select this option if you want the OCR engine to identify and extract any text in the zone that is formatted like an address.

When this option is selected, the Address options are displayed in the Filtering section below, allowing you to define how the address should be identified and extracted as a Keyword Value.

The following address formats are supported by default:

  • Street Address 1

    [Street Address 2]

    City, State/Province Code, Postal Code[-Secondary Postal Code]

  • Street Address 1, [Street Address 2,] City, State/Province Code, Postal Code[-Secondary Postal Code]

Note:

State/Province Code values must be two characters in length. Street Address 2 and Secondary Postal Code values are optional.

Data Application (cont.)

While the entire address is captured in the zone, the text is parsed into separate values (street address, city, state/province, and postal code) and each value can be stored as an individual Keyword Value.

  • Use the Street Address Keyword drop-down list to select the Keyword Type that the street address is assigned to.

  • Use the City Keyword drop-down list to select the Keyword Type that the city is assigned to.

  • Use the State/Province Keyword drop-down list to select the Keyword Type that the state/province is assigned to.

  • Use the Postal Code Keyword drop-down list to select the Keyword Type that the postal code is assigned to.

Note:

While all four of the above values must be captured within the zone for address extraction to be processed, you do not have to assign each of the values to a Keyword Type if one or more of the values are not needed.

You may also identify a text tag or a regular expression to find the name values within the zone. When a text tag is configured, t he OCR engine searches the zone for the specified text and extracts the next logical value (either to the right of or underneath the tag) as a Keyword Value. When a regular expression is configured, the text matching the pattern of the regular expression is identified as a name value.

To configure a tag or regular expression, enter the text tag or regular expression in the Find after tag/expression: field. Text tags must be enclosed in quotation marks so that they are not treated as regular expressions. Multiple tags and/or regular expressions can be configured; they must be separated by a space in the Find after tag/expression field.

Once you have entered the tag or regular expression rule in the associated text field, you can create variations or synonyms of this value to search for when the entered value is not found. Click the ... button to open the Search Tag Synonyms dialog box. From here, you can add multiple synonyms and order them by search preference.

Remove all whitespace from result

Note:

This option is not displayed if the Person name extraction radio button is selected in the Data Application section.

Select this check box if you would like to remove all whitespace (for example, spaces or paragraph returns) from the Keyword Value after it is read by the OCR engine.

For example, if the Keyword Value read by the OCR engine is PSY 103 and the Remove all whitespace from result check box is selected, the Keyword Value is modified to PSY103.

Delete this page from document

Select this check box if you would like the page to be deleted from the document if a value is successfully extracted for the Data Field Zone and this value is not suspect.

If a value is not extracted for the Data Field Zone, or if the extracted value is suspect, the page will not be deleted.

Redact zone from document after keyword assignment

Note:

This option is only available when the system performing Advanced Capture is licensed for Automated Redaction.

Select this check box if you would like to redact a Data Field Zone from a document once the Keyword Value has been extracted from the zone and assigned.

Tip:

If you would like to redact a Data Field Zone from a document without capturing and assigning a Keyword Value, select <None> from the Keyword type drop-down list.

Character Search Hints

These options are available to assist the OCR engine in determining if the value it reads is correct. Select the check box next to each character type that is allowed to be included in the Keyword Value. The available options are:

  • Numerals. Numeric characters, 0-9.

  • Uppercase. Uppercase alphabetic characters.

  • Lowercase. Lowercase alphabetic characters.

  • Punctuation. Punctuation marks (e.g., . ! ?).

  • Miscellaneous. Other ASCII characters that do not fall into one of the above categories (e.g., # $ * @).

When a character is recognized by the OCR engine that is not part of an allowable character set, the character is replaced by a tilde (~) and the value is automatically marked as suspect.

Tip:

Using the allowed character type options can sometimes help the OCR engine more easily determine the correct value by eliminating characters that are obviously not correct (e.g., an I is correctly identified instead of a 1 because numeric characters are filtered and prevented from being recognized as part of the value).

Select the Disable Asian Text Support check box to instruct the OCR engine to skip Asian (i.e., double-byte) characters.

Note:

The Disable Asian Text Support check box is only available if the OCR format assigned to the Document Type is configured for an Asian language (e.g., Japanese, Korean, etc.).

Tip:

Selecting the Disable Asian Text Support check box allows you to identify numeric data (e.g., Date Keyword Values, Currency Keyword Values, and Document Dates) when performing OCR on documents configured to contain Asian (i.e., double-byte) characters.

Page Location(s)

The Page Location(s) options control the pages that the OCR engine searches for a particular Data Field Zone.

  • Select the Absolute page radio button if the Keyword Value being read by the Data Field Zone is only displayed on one page and is always displayed on the same page (for example, the Keyword Value is always displayed on page 1). Enter the page number that the Keyword Value is located on in the associated field.

  • Select the Relative page radio button if the Keyword Value may be located on one or more pages relative to the length of the document. Select one or more of the following check boxes to indicate which page(s) the Data Field Zone may be located on.

    Select the First page check box if the Data Field Zone is located only on the first page of the document. This option can be used in conjunction with the Interior pages and/or Last page check boxes.

    Select the First page (not only page) check box if the Data Field Zone is located on the first page and other pages in the document.

    Select the Interior pages check box if the Data Field Zone is located on every page of the document other than the first or last page. This option can be used in conjunction with the First page and/or Last page check boxes.

    Select the Last page check box if the Data Field Zone is located only on the last page of the document. This option can be used in conjunction with the First page and/or the Interior pages check boxes.

    Select the Last page (not only page) check box if the Data Field Zone is located on the last page and other pages in the document.

Specific allowed characters

The Specific allowed characters field allows you to configure the OCR engine to identify only the characters that you specify in the field.

For example, if you entered ABC in the Specific allowed characters field and the OCR engine identified the value as ABCDEFG, the DEFG characters would be stripped from the value and the Keyword Value associated with the document would be ABC.

Advanced Recognition

Note:

Options involving ICR processing below are only enabled if your solution is licensed for Intelligent Character Recognition (ICR).

Note:

The OCR engine does not support Asian characters when reading dot matrix-printed text.

Using this drop-down list, select the type of processing you would like to perform on this Data Field Zone.

  • Machine print text/OCR. The OCR engine will perform optical character recognition to read machine-printed text.

    This option is selected by default.

  • Machine print text/detect structure. The OCR engine will attempt to determine whether the value consists of machine-printed text, dot matrix-printed text, or hand-written text, and then use the appropriate type of character recognition to read the text.

  • Machine print text/flowing text. The OCR engine will attempt to read machine-printed text within the Data Field Zone as if the text flowed like a paragraph with similar character sizes, spacing, and fonts.

    If this option is not selected, by default, the OCR engine will attempt to read machine-printed text using the automatic zoning recognition mode, which simply attempts to determine what the character sizes, spacing, and fonts look like on their own (i.e., independent of their positioning within a paragraph or section of flowing text).

  • Machine print text/labeled form field. The OCR engine will attempt to identify and remove form tag data from the extracted value.

    If there are two or more consecutive characters at the beginning or end of the extracted value that are smaller in font height than the middle text of the line, then the smaller characters are removed from the value.

    If the image is processed in color, and some of the characters within the Data Field Zone are red, then these characters are removed from the value. However, if your OCR format is configured for black and white processing, color characters may not be removed.

    Additionally, if one set of characters appears on a different line than another set of characters, the first set of characters may be removed from the value.

    For example, if State OH is extracted from the Data Field Zone, and State has a smaller font height, appears on a different line, or has a different color than OH, then State will be identified as a tag from a form and thus be removed from the value.

Advanced Recognition (cont.)

  • Machine print text/Rotated 90 degrees counter-clockwise. The OCR engine will perform optical character recognition to read machine-printed text that has been rotated 90 degrees counter-clockwise.

  • Machine print text/Rotated 90 degrees clockwise. The OCR engine will perform optical character recognition to read machine-printed text that has been rotated 90 degrees clockwise.

  • Machine print text/dot matrix printer. The OCR engine will perform optical character recognition to read dot matrix-printed text.

  • Machine print text/dot matrix - rotated 90 degrees counter-clockwise. The OCR engine will perform optical character recognition to read dot matrix-printed text that has been rotated 90 degrees counter-clockwise.

  • Machine print text/dot matrix - rotated 90 degrees clockwise. The OCR engine will perform optical character recognition to read dot matrix-printed text that has been rotated 90 degrees clockwise.

  • Bar Code. The bar code found within the Data Field Zone will be converted into text and then read by the OCR engine.

    Note:

    This option is only available if your solution is licensed for the Bar Code Recognition Server.

    Note:

    Depending on your configuration, the Advanced Capture engine will only search for certain bar code types in this zone. See rgv1646160307088.html#rgv1646160307088__p_xww_y24_ytb for more information.

  • MICR Font. The OCR engine will perform optical character recognition to read MICR line data from check documents.

  • Handwriting/ICR - numerals/grouping punctuation (North American style). The OCR engine will perform intelligent character recognition to read hand-written text.

    This option will only attempt to recognize numeric characters written in the North American style (e.g., the 7 character is not crossed).

  • Handwriting/ICR - numerals/grouping punctuation (European style). The OCR engine will perform intelligent character recognition to read hand-written text.

    This option will only attempt to recognize numeric characters written in the European style (e.g., the 7 character is crossed).

Advanced Recognition (cont.)

  • Handwriting/ICR - alphanumeric/punctuation. The OCR engine will perform intelligent character recognition to read hand-written text.

    This option will attempt to recognize all alphanumeric characters.

  • Auto-detect ICR/OCR - Default to ICR/Alphanumeric. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, intelligent character recognition for alphanumeric characters will be used by default.

  • Auto-detect ICR/OCR - Default to ICR/Numeric North American. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, intelligent character recognition for numeric, North American-style characters (e.g., the 7 character is not crossed) will be used by default.

  • Auto-detect ICR/OCR - Default to ICR/Numeric European. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, intelligent character recognition for numeric, European-style characters (e.g., the 7 character is crossed) will be used by default.

  • Auto-detect ICR/OCR - Default to OCR. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, optical character recognition will be used by default.

Note:

The Auto-detect ICR/OCR options may not work properly if the Data Field Zone contains less than 25 characters.

Tip:

Of the two Auto-detect ICR/OCR options, Default to ICR is more likely to produce the best results when the type of text cannot be determined. This is because the OCR engine's intelligent character recognition generally reads machine-printed text more accurately than the engine's optical character recognition reads hand-written text.

If one of the Handwriting/ICR options is selected, the Remove vertical/horizontal lines from zone before processing check box is displayed. Select this option if you would like the OCR engine to attempt to strip vertical or horizontal lines (i.e., constraint boxes) from the zone before it is processed.

Bar Code Types

Note:

This option is only available when the Bar Code option is selected in the Advanced recognition drop-down list.

Click this button to access the Select Bar Code Types dialog box.

Here you can select the types of bar codes for which the Advanced Capture engine will search when processing the Data Field Zone. The engine will ignore any deselected bar code types when processing this zone.

By default, all bar code types are selected.

Tip:

To expedite Advanced Capture processing, select only the desired bar code types.

To save changes to your bar code type selections, click Save.

VB script

Use the VB script drop-down to select a VBscript to associate with the processing of this Data Field Zone.

Click the ... button to open the VB Scripts dialog box. Here, the selected script can be re-configured or edited.

Keyword association

You can logically group Data Field Zones using Keyword association. These groupings can be used to identify Keywords that belong to a Multi-Instance Keyword Type Group.

Use the Keyword association field to enter or select a logical group name for the Data Field Zones that identify Keywords belonging to an MIKG. The Advanced Capture engine will attempt to maintain this Keyword grouping on the resulting document.

Activation groups

When you have configured multiple Form Identification Zones or Page Registration Zones for a document, you can assign individual Data Field Zones to a specific Form Identification or Page Registration Zone using activation groups. Activation groups allow you to activate only the Data Field Zones assigned to the Form Identification or Page Registration Zone that is used to match the document to an Advanced Capture form. Data Field Zones assigned to Form Identification or Page Registration Zones that are not used to match the document to a form will not be processed. Also, Data Field Zones present on pages other than the pages containing their assigned Form Identification or Page Registration Zones will not be processed, unless otherwise specified through the Page Location(s) setting or by adding a + to the front of the activation group name on the Form Identification or Page Registration Zone. This selective activation saves processing time and reduces the number of forms that need to be created for a Document Type.

Use the Activation groups field to enter or select an activation group name. Add a + to the front of a group name (for example, +Group1) on a Form Identification or Page Registration Zone to set all Data Field Zones assigned to this group to be processed. Use commas to separate multiple group names.

  • When a Form Identification Zone or Page Registration Zone is matched to a form, all activation groups that have been configured for the zone will be activated.

  • Form Identification Zones are organized into Identification Groups (under Combined rule expressions), and only one Identification Group can be matched to a form. Once an Identification Group has been matched, any remaining Identification Groups on the document will be skipped.

  • Multiple Page Registration Zones can be matched to a form. Every Page Registration Zone on the document will be tested for a match.

  • If multiple activation groups have been configured for a Data Field Zone, the zone will be processed if any of these activation groups is activated.

  • If no activation groups have been configured for a Data Field Zone, the zone will be considered active and thus will be processed.

  • If a Data Field Zone is configured to only be searched for on certain pages (that is, through the Page Location(s) setting), the zone can only be considered active on these pages. This overrides any conflicting settings that would otherwise activate the Data Field Zone (for example, when a Data Field Zone is assigned to an activation group that is named with a + on the corresponding Form Identification or Page Registration Zone, or when a Data Field Zone is not assigned to any activation group).

Activation groups (cont.)

Alternatively, you can assign a form definition group as the Data Field Zone's activation group to activate the zone for processing. Form definition groups can be used to extract only specific types of information (for example, header data vs. detail data) during processing. In the Activation groups drop-down list, form definition groups are enclosed in brackets (for example, [Group1]).

Colors

If you would like to assign specific colors to the Keyword Type configured for the Data Field Zone, click the Colors button. The Display Colors dialog box is displayed.

Here you can change the colors in which any regular or suspect values for the corresponding Keyword Type are displayed in the Indexing panel once Advanced Capture processing has taken place.

  • To change the display color for regular values, click Display Color to open your machine's color palette, select a color, and click OK.

  • To change the display color for suspect values, click Suspect Color to open your machine's color palette, select a color, and click OK.

  • To revert back to the default display color for regular or suspect values, click the Automatic button that corresponds to the desired type of values (i.e., the left button left for regular values, the right button for suspect values).

Note:

Any colors assigned here can be overridden by colors assigned through Keyword Lookup/Replace settings and/or VB scripting.

Suspect level

Enter the Suspect Level threshold, 1-99, in this field. By default, this value is set to the default Suspect Level set for the Advanced Capture form.

The Suspect Level is the level of confidence placed in data values captured in this zone. The default Suspect Level set for the Advanced Capture form and the actual Suspect Level detected for the selected value are displayed below the Suspect level field.

After a zone is processed, the OCR engine gives the resulting value a score between 1 and 99, depending on how confident it is in the result that was returned. The higher the score is, the lower the OCR engine's confidence is in the results.

The value you enter in this field is the threshold at which the OCR engine determines if a returned value is acceptable or suspect. A score returned by the OCR engine higher than the Suspect Level threshold you set causes the value captured from the zone to be marked as suspect. All scores lower than the Suspect Level threshold indicate that the captured value is considered by the OCR engine to be acceptable.

For example, setting the Suspect Level to 99 would indicate you completely trust the result returned by the OCR engine because no higher score could be returned and no result could be marked as suspect.

Setting the Suspect Level to 1 would indicate you have no trust in the result, since no lower score could be returned and no result could be determined acceptable.

Setting the Suspect Level to 0 reverts back to the default threshold of 75.

Tip:

By default, the Suspect Level threshold is set to 75 and the average score given to a processed field is 70. It is considered a best practice to set your Suspect Level to the default threshold of 75 to ensure that suspect Keyword Values are being consistently identified.

Override Default Regional Setting

When a Date, Date & Time, or Currency Keyword Type has been selected for the Data Field Zone, the Override Default Regional Setting drop-down list is displayed to the right of the Suspect level field.

Note:

If Person name extraction or Address extraction has been selected for the Data Application, the Override Default Regional Setting drop-down list is not displayed.

This option allows you to override your system's default regional settings when extracting values for the applicable Keyword Types in the Data Field Zone.

Select a regional language from the drop-down list to parse the applicable Keyword Values using the selected region's formatting rules and language-specific names for days, months, etc.

Select <Runtime System Default> to maintain your system's default regional settings when parsing these values.

Note:

This option does not affect which languages the OCR format is configured to recognize. For information on configuring OCR formats, see the Full-Page OCR module reference guide or help files.

Find/validate in keyword dataset

Note:

This option is not enabled if the Person name extraction option is selected in the Data Application section. Additionally, it is only enabled if the Keyword Type selected in the Keyword type drop-down list is configured to use a Keyword Data Set (but not a Cascading Data Set).

Select this option if you wish to validate the value identified by the OCR engine using values specified for the Keyword Type's Data Set.

If the extracted value is an exact or fuzzy match to a value found in the Data Set (i.e., the number of mismatched characters in the extracted value is less than 10% of the number of characters in the extracted value), then the value found in the Data Set is assigned as the Keyword Value.

Additionally, if the extracted value is a fuzzy match to a value found in the Data Set, then the assigned Keyword Value (i.e., the value fuzzy-matched in the Data Set) is marked as suspect.

Note:

Fuzzy matches are disabled for values of five characters or less. If the extracted value is not an exact match to a value found in the Data Set, then the extracted value is assigned as the Keyword Value and is marked as suspect.

If the Find/validate in keyword dataset check box is not selected, the OCR engine does not attempt to validate the extracted value against any values in the Keyword Data Set, and the extracted value is assigned as the Keyword Value.

Add implied decimal point

Note:

This check box is only enabled if a Currency Keyword Type is selected in the Keyword Type drop-down list.

Select this check box if you would like the OCR engine to automatically insert a decimal point in the extracted value if one is not detected.

For example, if this option is selected and the extracted value for the Total Amount Due Keyword is $12345, the OCR engine would automatically modify the value to $123.45.

If this option is selected and a decimal point is detected in the value, or if this option is not selected, then the extracted value is left as-is.