Additional Line Item Extraction Configuration Options - Advanced Capture - Foundation 24.1 - Foundation 24.1 - Ready - OnBase - Essential - Premier - Standard - external - Standard - Essential - Premier

Advanced Capture

Platform
OnBase
Product
Advanced Capture
Release
Foundation 24.1
License
Standard
Essential
Premier

Once you have configured the Line Item Extraction Field Zone's table decomposition mode, you are able to set other, non-mode specific configuration options.

Data Field Zone Line Item Extraction Option

Description

XML Node (Parent)

Note:

This field is only displayed if the Advanced Capture form is configured to create an XML rendition of documents matched to it (i.e., the Create XML data rendition option is selected for the Advanced Capture form).

Enter the name of the XML parent node (i.e., the element) that Keyword Values extracted from this zone are contained in when the XML rendition of the document is created.

Note:

While the XML Node (Parent) name may include alphanumeric characters, underscores (_), hyphens (-), or periods (.), it must begin with either a letter or an underscore.

Begin table marker

Use the Begin table marker field to specify the text that indicates the beginning of the information being extracted.

The markers can be configured as literal text (i.e., the text specified in the Begin Table Marker field is the marker) or regular expressions (i.e., the text in the Begin Table Marker field defines a regular expression, and text matching that pattern is the marker).

Literal markers must be enclosed in quotation marks to differentiate them from the regular expression markers.

Multiple begin table markers (either literal, regular expression, or both) can be configured by separating the text by a space in the Begin table marker field.

Note:

To access the Regular Expression Library, click in the field and press F2. See The Regular Expression Library for more information.

By default, text used as a Begin table marker is not included in the Keyword Value(s) being extracted. To include this text in the Keyword Vale, select the Inclusive check box beneath the Begin table marker field.

End table marker

Use the End table marker field to specify the text that indicates the end of the information being extracted.

The markers can be configured as literal text (i.e., the text specified in the End Table Marker field is the marker) or regular expressions (i.e., the text in the End Table Marker field defines a regular expression, and text matching that pattern is the marker).

Literal markers must be enclosed in quotation marks to differentiate them from the regular expression markers.

Multiple end table markers (either literal, regular expression, or both) can be configured by separating the text by a space in the End table marker field.

Note:

To access the Regular Expression Library, click in the field and press F2. See The Regular Expression Library for more information.

By default, text used as an End table marker is not included in the Keyword Value(s) being extracted. To include this text in the Keyword Vale, select the Inclusive check box beneath the End table marker field.

Invalid Row

Use the Invalid Row field to specify text that indicates the beginning of the row that should be discarded

The markers can be configured as literal text (i.e., the text specified in the Invalid Row field is the marker) or regular expressions (i.e., the text in the Invalid Row field defines a regular expression, and text matching that pattern is the marker).

Literal markers must be enclosed in quotation marks to differentiate them from the regular expression markers.

Multiple invalid row markers (either literal, regular expression, or both) can be configured by separating the text by a space in the Invalid Row field.

Note:

To access the Regular Expression Library, click in the field and press F2. See The Regular Expression Library for more information.

Page Location(s)

The Page Location(s) options control the pages that the OCR engine searches for a particular Data Field Zone.

  • Select the Absolute page radio button if the table being read in the Data Field Zone is only displayed on one page and is always displayed on the same page (e.g., the table is always displayed on page 1). Enter the page number that the table is located on in the associated field.

  • Select the Relative page radio button if the table may be located on one or more pages relative to the length of the document. Select one or more of the following check boxes to indicate which page(s) that the table may be located on.

    Select the First page check box if the table is located only on the first page of the document. This option can be used in conjunction with the Interior pages and/or Last page check boxes.

    Select the First page (not only page) check box if the table is located on the first page and other pages in the document.

    Select the Interior pages check box if the table is located on every page of the document other than the first or last page. This option can be used in conjunction with the First page and/or Last page check boxes.

    Select the Last page check box if the table is located only on the last page of the document. This option can be used in conjunction with the First page and/or the Interior pages check boxes.

    Select the Last page (not only page) check box if the table is located on the last page and other pages in the document.

VB script

Use the VB script drop-down to select a VBscript to associate with the processing of this Data Field Zone.

Click the ... button to open the VB Scripts dialog box. Here, the selected script can be re-configured or edited.

Suspect level

Enter the Suspect Level threshold, 1-99, in this field. By default, this value is set to the default Suspect Level set for the Advanced Capture form.

The Suspect Level is the level of confidence placed in data values captured in this zone. The default Suspect Level set for the Advanced Capture form and the actual Suspect Level detected for the selected table are displayed below the Suspect level field.

After a zone is processed, the OCR engine gives the resulting value a score between 1 and 99, depending on how confident it is in the result that was returned. The higher the score is, the lower the OCR engine's confidence is in the results.

The value you enter in this field is the threshold at which the OCR engine determines if a returned value is acceptable or suspect. A score returned by the OCR engine higher than the Suspect Level threshold you set causes the value captured from the zone to be marked as suspect. All scores lower than the Suspect Level threshold indicate that the captured value is considered by the OCR engine to be acceptable.

For example, setting the Suspect Level to 99 would indicate you completely trust the result returned by the OCR engine because no higher score could be returned and no result could be marked as suspect.

Setting the Suspect Level to 1 would indicate you have no trust in the result, since no lower score could be returned and no result could be determined acceptable.

Setting the Suspect Level to 0 reverts back to the default threshold of 75.

Tip:

By default, the Suspect Level threshold is set to 75 and the average score given to a processed field is 70. It is considered a best practice to set your Suspect Level to the default threshold of 75 to ensure that suspect Keyword Values are being consistently identified.

Advanced Recognition

Note:

Options involving ICR processing below are only enabled if your solution is licensed for Intelligent Character Recognition (ICR).

Note:

The OCR engine does not support Asian characters when reading dot matrix-printed text.

Using this drop-down list, select the type of processing you would like to perform on this Data Field Zone.

  • Machine print text/OCR. The OCR engine will perform optical character recognition to read machine-printed text.

    This option is selected by default.

  • Machine print text/detect structure. The OCR engine will attempt to determine whether the value consists of machine-printed text, dot matrix-printed text, or hand-written text, and then use the appropriate type of character recognition to read the text.

  • Machine print text/flowing text. The OCR engine will attempt to read machine-printed text within the Data Field Zone as if the text flowed like a paragraph with similar character sizes, spacing, and fonts.

    If this option is not selected, by default, the OCR engine will attempt to read machine-printed text using the automatic zoning recognition mode, which simply attempts to determine what the character sizes, spacing, and fonts look like on their own (i.e., independent of their positioning within a paragraph or section of flowing text).

  • Machine print text/dot matrix printer. The OCR engine will perform optical character recognition to read dot matrix-printed text.

  • Handwriting/ICR -numerals/grouping punctuation (North American style). The OCR engine will perform intelligent character recognition to read hand-written text.

    This option will only attempt to recognize numeric characters written in the North American style (e.g., the 7 character is not crossed).

  • Handwriting/ICR -numerals/grouping punctuation (European style). The OCR engine will perform intelligent character recognition to read hand-written text.

    This option will only attempt to recognize numeric characters written in the European style (e.g., the 7 character is crossed).

  • Handwriting/ICR - alphanumeric/punctuation. The OCR engine will perform intelligent character recognition in order to read hand-written text.

    This option will attempt to recognize all alphanumeric characters.

Advanced Recognition (cont.)

  • Auto-detect ICR/OCR - Default to ICR/Alphanumeric. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, intelligent character recognition for alphanumeric characters will be used by default.

  • Auto-detect ICR/OCR - Default to ICR/Numeric North American. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, intelligent character recognition for numeric, North American-style characters (e.g., the 7 character is not crossed) will be used by default.

  • Auto-detect ICR/OCR - Default to ICR/Numeric European. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, intelligent character recognition for numeric, European-style characters (e.g., the 7 character is crossed) will be used by default.

  • Auto-detect ICR/OCR - Default to OCR. The OCR engine will attempt to determine whether the value consists of hand-written or machine-printed text and then use the appropriate type of character recognition to read the text. When the type of text cannot be determined, optical character recognition will be used by default.

Note:

The Auto-detect ICR/OCR options may not work properly if the Data Field Zone contains less than 25 characters.

Tip:

Of the two Auto-detect ICR/OCR options, Default to ICR is more likely to produce the best results when the type of text cannot be determined. This is because the OCR engine's intelligent character recognition generally reads machine-printed text more accurately than the engine's optical character recognition reads hand-written text.

Note:

The Line Item Extraction Data Field Zone cannot contain a mixture of OCR and ICR values. All values within the zone must use one type of recognition only.

Activation groups

When you have configured multiple Form Identification Zones or Page Registration Zones for a document, you can assign individual Data Field Zones to a specific Form Identification or Page Registration Zone using activation groups. Activation groups allow you to activate only the Data Field Zones assigned to the Form Identification or Page Registration Zone that is used to match the document to an Advanced Capture form. Data Field Zones assigned to Form Identification or Page Registration Zones that are not used to match the document to a form will not be processed. Also, Data Field Zones present on pages other than the pages containing their assigned Form Identification or Page Registration Zones will not be processed, unless otherwise specified through the Page Location(s) setting or by adding a + to the front of the activation group name on the Form Identification or Page Registration Zone. This selective activation saves processing time and reduces the number of forms that need to be created for a Document Type.

Use the Activation groups field to enter or select an activation group name. Add a + to the front of a group name (e.g., +Group1) on a Form Identification or Page Registration Zone to set all Data Field Zones assigned to this group to be processed. Use commas to separate multiple group names.

  • When a Form Identification Zone or Page Registration Zone is matched to a form, all activation groups that have been configured for the zone will be activated.

  • Form Identification Zones are organized into Identification Groups (under Combined rule expressions), and only one Identification Group can be matched to a form. Once an Identification Group has been matched, any remaining Identification Groups on the document will be skipped.

  • Multiple Page Registration Zones can be matched to a form. Every Page Registration Zone on the document will be tested for a match.

  • If multiple activation groups have been configured for a Data Field Zone, the zone will be processed if any of these activation groups is activated.

  • If no activation groups have been configured for a Data Field Zone, the zone will be considered active and thus will be processed.

  • If a Data Field Zone is configured to only be searched for on certain pages (i.e., through the Page Location(s) setting), the zone can only be considered active on these pages. This overrides any conflicting settings that would otherwise activate the Data Field Zone (e.g., when a Data Field Zone is assigned to an activation group that is named with a + on the corresponding Form Identification or Page Registration Zone, or when a Data Field Zone is not assigned to any activation group).

Activation groups (cont.)

Alternatively, you can assign a form definition group as the Data Field Zone's activation group to activate the zone for processing. Form definition groups can be used to extract only specific types of information (e.g., header data vs. detail data) during processing. In the Activation groups drop-down list, form definition groups are enclosed in brackets (e.g., [Group1]).