Configuring Additional Grouped Line Item Extraction Options - Advanced Capture - Foundation 24.1 - Foundation 24.1 - Ready - OnBase - Essential - Premier - Standard - external - Standard - Essential - Premier

Advanced Capture

Platform
OnBase
Product
Advanced Capture
Release
Foundation 24.1
License
Standard
Essential
Premier

The following configuration options for the zone can be configured in the Data Field Zone dialog box.

Data Field Zone Grouped Line Item Extraction Option

Description

XML Node (Parent)

Note:

This field is only displayed if the Advanced Capture form is configured to create an XML rendition of documents matched to it (i.e., the Create XML data rendition option is selected for the Advanced Capture form).

Enter the name of the XML parent node (i.e., the element) that Keyword Values extracted from this zone are contained in when the XML rendition of the document is created.

Note:

While the XML Node (Parent) name may include alphanumeric characters, underscores (_), hyphens (-), or periods (.), it must begin with either a letter or an underscore.

Page Location(s)

The Page Location(s) options control the pages that the OCR engine searches for a particular Data Field Zone.

  • Select the Absolute page radio button if the groups are only displayed on one page and are always displayed on the same page (e.g., the groups are always displayed on page 1). Enter the page number that the groups are located on in the associated field.

  • Select the Relative page radio button if the groups may be located on one or more pages relative to the length of the document. Select one or more of the following check boxes to indicate which page(s) the groups may be located on.

    Select the First page check box if the groups are located only on the first page of the document. This option can be used in conjunction with the Interior pages and/or Last page check boxes.

    Select the First page (not only page) check box if the groups are located on the first page and other pages in the document.

    Select the Interior pages check box if the groups are located on every page of the document other than the first or last page. This option can be used in conjunction with the First page and/or Last page check boxes.

    Select the Last page check box if the groups are located only on the last page of the document. This option can be used in conjunction with the First page and/or the Interior pages check boxes.

    Select the Last page (not only page) check box if the groups are located on the last page and other pages in the document.

Automatic continued-row merging

Select the Automatic continued-row merging check box to compare the average spacing between rows.

If a row is closer than average to the row that precedes it and the row does not have data in at least half of its processed columns, the OCR engine recognizes it as a continuation of the previous row and merges the extracted values.

Extract only between tags

Select the Extract only between tags check box if all Keyword Values to be extracted can be found between two tags (i.e., a start marker and an end marker).

Tip:

This option should be selected when your groups contain “extra” data that can be ignored by the OCR engine because it does not contain Keyword Values.

VB script

Use the VB script drop-down to select a VBscript to associate with the processing of this Data Field Zone.

Click the ... button to open the VB Scripts dialog box. Here, the selected script can be re-configured or edited.

Suspect level

Enter the Suspect Level threshold, 1-99, in this field. By default, this value is set to the default Suspect Level set for the Advanced Capture form.

The Suspect Level is the level of confidence placed in data values captured in this zone. The default Suspect Level set for the Advanced Capture form is displayed below the Suspect level field.

After a zone is processed, the OCR engine gives the resulting value a score between 1 and 99, depending on how confident it is in the result that was returned. The higher the score is, the lower the OCR engine's confidence is in the results.

The value you enter in this field is the threshold at which the OCR engine determines if a returned value is acceptable or suspect. A score returned by the OCR engine higher than the Suspect Level threshold you set causes the value captured from the zone to be marked as suspect. All scores lower than the Suspect Level threshold indicate that the captured value is considered by the OCR engine to be acceptable.

For example, setting the Suspect Level to 99 would indicate you completely trust the result returned by the OCR engine because no higher score could be returned and no result could be marked as suspect.

Setting the Suspect Level to 1 would indicate you have no trust in the result, since no lower score could be returned and no result could be determined acceptable.

Setting the Suspect Level to 0 reverts back to the default threshold of 75.

Tip:

By default, the Suspect Level threshold is set to 75 and the average score given to a processed field is 70. It is considered a best practice to set your Suspect Level to the default threshold of 75 to ensure that suspect Keyword Values are being consistently identified.

Advanced Recognition

Note:

Options involving ICR processing below are only enabled if your solution is licensed for Intelligent Character Recognition (ICR).

Note:

The OCR engine does not support Asian characters when reading dot matrix-printed text.

Using this drop-down list, select the type of processing you would like to perform on this Data Field Zone.

  • Machine print text/OCR. The OCR engine will perform optical character recognition to read machine-printed text.

    This option is selected by default.

  • Machine print text/detect structure. The OCR engine will attempt to determine whether the value consists of machine-printed text, dot matrix-printed text, or hand-written text, and then use the appropriate type of character recognition to read the text.

  • Machine print text/flowing text. The OCR engine will attempt to read machine-printed text within the Data Field Zone as if the text flowed like a paragraph with similar character sizes, spacing, and fonts.

    If this option is not selected, by default, the OCR engine will attempt to read machine-printed text using the automatic zoning recognition mode, which simply attempts to determine what the character sizes, spacing, and fonts look like on their own (i.e., independent of their positioning within a paragraph or section of flowing text).

  • Machine print text/dot matrix printer. The OCR engine will perform optical character recognition to read dot matrix-printed text.

  • Handwriting/ICR -numerals/grouping punctuation (North American style). The OCR engine will perform intelligent character recognition to read hand-written text.

    This option will only attempt to recognize numeric characters written in the North American style (e.g., the 7 character is not crossed).

  • Handwriting/ICR -numerals/grouping punctuation (European style). The OCR engine will perform intelligent character recognition to read hand-written text.

    This option will only attempt to recognize numeric characters written in the European style (e.g., the 7 character is crossed).

  • Handwriting/ICR - alphanumeric/punctuation. The OCR engine will perform intelligent character recognition in order to read hand-written text.

    This option will attempt to recognize all alphanumeric characters.

Advanced Recognition (cont.)

  • Auto-detect ICR/OCR - Default to ICR/Alphanumeric. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, intelligent character recognition for alphanumeric characters will be used by default.

  • Auto-detect ICR/OCR - Default to ICR/Numeric North American. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, intelligent character recognition for numeric, North American-style characters (e.g., the 7 character is not crossed) will be used by default.

  • Auto-detect ICR/OCR - Default to ICR/Numeric European. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, intelligent character recognition for numeric, European-style characters (e.g., the 7 character is crossed) will be used by default.

  • Auto-detect ICR/OCR - Default to OCR. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, optical character recognition will be used by default.

Note:

The Auto-detect ICR/OCR options may not work properly if the Data Field Zone contains less than 25 characters.

Tip:

Of the two Auto-detect ICR/OCR options, Default to ICR is more likely to produce the best results when the type of text cannot be determined. This is because the OCR engine's intelligent character recognition generally reads machine-printed text more accurately than the engine's optical character recognition reads hand-written text.

Note:

The Grouped Line Item Extraction Data Field Zone cannot contain a mixture of OCR and ICR values. All values within the zone must use one type of recognition only.

Activation groups

When you have configured multiple Form Identification Zones or Page Registration Zones for a document, you can assign individual Data Field Zones to a specific Form Identification or Page Registration Zone using activation groups. Activation groups allow you to activate only the Data Field Zones assigned to the Form Identification or Page Registration Zone that is used to match the document to an Advanced Capture form. Data Field Zones assigned to Form Identification or Page Registration Zones that are not used to match the document to a form will not be processed. Also, Data Field Zones present on pages other than the pages containing their assigned Form Identification or Page Registration Zones will not be processed, unless otherwise specified through the Page Location(s) setting or by adding a + to the front of the activation group name on the Form Identification or Page Registration Zone. This selective activation saves processing time and reduces the number of forms that need to be created for a Document Type.

Use the Activation groups field to enter or select an activation group name. Add a + to the front of a group name (e.g., +Group1) on a Form Identification or Page Registration Zone to set all Data Field Zones assigned to this group to be processed. Use commas to separate multiple group names.

  • When a Form Identification Zone or Page Registration Zone is matched to a form, all activation groups that have been configured for the zone will be activated.

  • Form Identification Zones are organized into Identification Groups (under Combined rule expressions), and only one Identification Group can be matched to a form. Once an Identification Group has been matched, any remaining Identification Groups on the document will be skipped.

  • Multiple Page Registration Zones can be matched to a form. Every Page Registration Zone on the document will be tested for a match.

  • If multiple activation groups have been configured for a Data Field Zone, the zone will be processed if any of these activation groups is activated.

  • If no activation groups have been configured for a Data Field Zone, the zone will be considered active and thus will be processed.

  • If a Data Field Zone is configured to only be searched for on certain pages (i.e., through the Page Location(s) setting), the zone can only be considered active on these pages. This overrides any conflicting settings that would otherwise activate the Data Field Zone (e.g., when a Data Field Zone is assigned to an activation group that is named with a + on the corresponding Form Identification or Page Registration Zone, or when a Data Field Zone is not assigned to any activation group).

Activation groups (cont.)

Alternatively, you can assign a form definition group as the Data Field Zone's activation group to activate the zone for processing. Form definition groups can be used to extract only specific types of information (e.g., header data vs. detail data) during processing. In the Activation groups drop-down list, form definition groups are enclosed in brackets (e.g., [Group1]).

Context group

You can logically group Data Field Zones using context groups. These groupings can be used to identify Keywords that belong to one Multi-Instance Keyword Type Group but are split across multiple Data Field Zones, for instance.

Use the Context group field to enter or select a logical group name for the Data Field Zones that identify Keywords belonging to an MIKG (or other logical grouping). The Advanced Capture engine will attempt to maintain this Keyword grouping on the resulting document.