Identifying Keyword Values Using Tags or Regular Expressions - Advanced Capture - Foundation 23.1 - Foundation 23.1 - Ready - OnBase - Premier - external - Standard - Essential - Premier - Standard - Essential

Advanced Capture

Platform
OnBase
Product
Advanced Capture
Release
Foundation 23.1
License
Premier
Standard
Essential
Note:

To access the Regular Expression Library, click in any text field that will receive a regular expression and press F2. See The Regular Expression Library for more information.

To identify Keyword Values using tags (literal or regular expression) or regular expressions:

  1. From the Grouped Line Item Extraction tab of the Data Field Zone dialog box, click New Tag to create the text tag or regular expression to be used to identify the Keyword Value. The Modify Line Item Tagged Value dialog box is displayed.
  2. Using the Keyword Type drop-down list, select the Keyword Type that the value you are identifying is to be assigned to.
    If you only want to capture this value as XML data (not as a Keyword Value), select <None>.
  3. Select the radio button that describes how the OCR engine will identify the Keyword Value:
    • Tag Text. The OCR engine searches the Data Field Zone for a tag (e.g., Name:, Date:, etc.) and identifies the value to the right of the tag as the Keyword Value.

      Note:

      When identifying a Keyword Value, a tag can only be used to identify the value to the right of the tag. A tag cannot be used to identify the value below the tag.

      Enter the tag text in the Tag Text field. Tags can be literal (i.e., the text specified in the field is the tag) or regular expressions (i.e., the text in the text box field defines a regular expression, and the text matching that pattern is the tag).

      Literal tags must be enclosed in quotation marks in order to differentiate them from regular expression tags. Regular expressions must be ECMA compliant.

    • Regular Expression Match. The OCR engine searches the zone for text that matches the regular expression rule. If text matching the pattern set by the regular expression rule is found, it is identified as the Keyword Value.

      Enter the regular expression in the Regular Expression Match field. All regular expressions must be ECMA compliant.

      • Select the Extract Capture Group Only check box if you would like to use a regular expression to identify the Keyword Value, but you would only like to extract part of the value identified by the regular expression as a Keyword Value. The portion of the Keyword Value you wish to capture must be enclosed in parentheses in the regular expression.

        For example: A document displays a student's Social Security Number (i.e., SSN: 123-45-6789), of which you only wish to capture the last four digits (6789) as a Keyword Value. To capture this value, you would select the Extract Capture Group Only check box and enter the following regular expression in the Regular Expression Match field: \d{3}-\d{2}-(\d{4}).

        Additionally, you can use the Extract Capture Group Only option to capture only one value from a row of line item data. For example: a document contains a Totals line that contains tallied data for five columns, of which you only want to capture the Total value from the third column (i.e., $310.71) as a Keyword Value.

        TOTAL $101.35 $177.01 $310.71 $285.00 $526.22

        To capture this value, you would select the Extract Capture Group Only check box and enter the following regular expression in the Regular Expression Match field: TOTAL\s+\$\d{3}.\d{2}\s+\$\d{3}.\d{2}\s+(\$\d{3}.\d{2})\s+\$\d{3}.\d{2}\s+\$\d{3}.\d{2}

    • Person Name Extraction. Similar to Tag Text option, the OCR engine searches the Data Field Zone for a literal tag or regular expression tag; however, once the tag is identified, the value following the tag is stored as a name instead of a single Keyword Value.

      When a name is stored, the extracted value is parsed into separate values (first name, middle name, and last name) and each value is stored separately as an individual Keyword Value.

      The following are valid name formats (middle name values are not required):

      • Last, First [Middle]

      • LAST, FIRST [MIDDLE]

      • First [Middle] Last

      • FIRST [MIDDLE] LAST

      To select the Keyword Types that the name values are assigned to:

      • Use the Last Name Keyword drop-down list (renamed from Keyword Type) to select the Keyword Type that the last name is assigned to.

      • Use the First Name drop-down list to select the Keyword Type that the first name is assigned to.

      • Use the Middle Name drop-down list to select the Keyword Type that the middle name is assigned to.

      Enter the tag text in the Person Name Extraction field. Tags can be literal (i.e., the text specified in the field is the tag) or regular expressions (i.e., the text in the text box field defines a regular expression, and the text matching that pattern is the tag).

      Literal tags must be enclosed in quotation marks in order to differentiate them from regular expression tags. Regular expressions must be ECMA compliant.

    • Meta Data Value. The OCR engine identifies the Keyword Value using data obtained when processing the document. Use the Meta Data Value drop-down list to select the data to be used as the Keyword Value:

      • <Group Number>. This option refers to the order of the groups in the zone. For example, if a zone contains three groups (e.g., Fall 2009, Spring 2010, Fall 2010), then Group Number for the first group (Fall 2009) is 1, the Group Number for the second group (Spring 2010) is 2, and the Group Number for the third group (Fall 2010) is 3.

      • <Line Number>. This option refers to the line on which the data is displayed once the zone is processed. The first line at the top of the zone is 1, the second line is 2, etc.

      • <None>. This option is reserved for future use.

      • <Page Number>. This option refers to the page on which the data is displayed.

    • Line Item Totals Marker. The OCR engine identifies the Keyword Value as the last line in the group (i.e., a “totals” line where column data is tallied).

      Enter the tag text in the Line Item Totals field. Tags can be literal (i.e., the text specified in the field is the tag) or regular expressions (i.e., the text in the text box field defines a regular expression, and the text matching that pattern is the tag).

      Select No tag value/always last line in group check box if the data is not denoted by a tag, but should captured because it is the last line in the group.

  4. Select the radio button that describes the location of tag or regular expression in the group.
    • This item PRECEDES the associated line item group. Select this radio button if the Keyword Value is located before the line item data.

      • When this option is selected, the Duplicate value does not start new logical group check box is enabled. Select this check box to continue the same group when a duplicate tag value is found. If you do not select this option, a new logical group will be started when a duplicate tag value is found.

    • This item FOLLOWS the associated line item group. Select this radio button if the Keyword Value is located after the line item data.

      • When this option is selected, the Force logical group to end regardless of future repeating values check box is enabled. Select this check box to force the current group of data to end and ensure that a new group will be started once a new line item tag value is found for the next group. This prevents the next group from being appended to the current group, even if the first line of tagged values in the next group is identical.

        This may be desirable in explanation of benefit forms (EOBs), for instance, when a single patient has multiple claims with different claim numbers, but the claim numbers do not appear in the first line of tags for the new group. In this case, even though the first line of tags will be identical, the new group should not be appended to the current group.

      Note:

      If there is a page break in the middle of the associated line item group and the FOLLOWS tag appears on the second page, this tag does not capture any line item values from the associated group on the first page.

    • Treat this row as a line item after reading this tagged value. Select this radio button if the line item containing the Keyword Value should be also be treated as a row of line item data.

    • Treat this row as a line item/tag value starts new group. Select this radio button if the line item containing the Keyword Value should be treated as a row of line item data and the Keyword Value that starts each new group is located in the leftmost position of the line item, concurrent with the line items for the first row of each group and not appearing again until the first row of the next group.

      In the example below, the highlighted Keyword Values mark the beginning of each group:

    • The position of this item is not significant. Select this radio button if the location of the Keyword Value is not significant in the zone.

      • When both this option and either the Tag Text, Regular Expression Match, or Person Name Extraction option is selected, the Carry Across Groups check box is enabled. Select this check box to apply the extracted value to the current group and all subsequent groups until a new value is read for this tag. When a new value is read, this loop starts over.

        For example, an account number may be extracted as a top-level value. This account number applies to the current group and all subsequent groups, each of which has start and end markers, until a new account number is extracted. The pattern then repeats for the new account number.

    • This entire line of data should be ignored. Select this radio button if the entire line containing the matching tag or regular expression should be ignored.

      • This option is only available if the value is configured to be captured as XML data (i.e., if <None> is selected in the Keyword Type drop-down list).

  5. Select the Data Value required check box if the Keyword Value is a required value. If this check box is selected and no value is found, then the entire group is discarded.
  6. If you wish to remove all whitespace (i.e., spaces, paragraph returns, etc.) from a Keyword Value after it is read by the OCR engine, select the Remove all whitespace from result check box.

    For example, if the Keyword Value read by the OCR engine is PSY 103 and this option is selected, the Keyword Value is modified to PSY103.

    Note:

    The Remove all whitespace from result check box is only available if you selected the Tag Text or Regular Expression Match option in step 3. Once the OCR engine finds a value following the specified tag or matching the specified regular expression, it removes all whitespace from the result.

  7. If you wish to ignore all whitespace when searching for a text tag or potential regular expression match, select the Ignore all whitespace in tag or regular expression search check box.
    Note:

    The Ignore all whitespace in tag or regular expression search check box is only available if you selected the Tag Text, Regular Expression Match, Person Name Extraction, or Line Item Totals Marker option in step 3. The OCR engine removes all whitespace from the text it is searching for before finding a match.

  8. If you selected a Date, Date & Time, or Currency Keyword Type in step 2, the Override Default Regional Setting drop-down list is displayed above the Save button. If you have not selected one of these Keyword Types, or if you have selected the Person Name Extraction, Meta Data Value, or Line Item Totals Marker option, the Override Default Regional Setting drop-down list is disabled.

    When available, you can use the Override Default Regional Setting option to override your system's default regional settings when extracting values for the applicable Keyword Types in the Data Field Zone. To use this option, do one of the following:

    • Select a regional language from the drop-down list to parse the applicable Keyword Values using the selected region's formatting rules and language-specific names for days, months, etc.

    • Select <Runtime System Default> to maintain your system's default regional settings when parsing these values.

      Note:

      This option does not affect which languages the OCR format is configured to recognize. For information on configuring OCR formats, see the Full-Page OCR module reference guide or help files.

  9. If you would like to assign specific colors to the Keyword Type configured for the Data Field Zone, click the Colors button. The Display Colors dialog box is displayed.

    Here you can change the colors in which any regular or suspect values for the corresponding Keyword Type are displayed in the Indexing panel once processing has taken place.

    • To change the display color for regular values, click Display Color to open your machine's color palette, select a color, and click OK.

    • To change the display color for suspect values, click Suspect Color to open your machine's color palette, select a color, and click OK.

    • To revert back to the default display color for regular or suspect values, click the Automatic button that corresponds to the desired type of values (i.e., the left button left for regular values, the right button for suspect values).

      Note:

      Any colors assigned here can be overridden by colors assigned through Keyword Lookup/Replace settings and/or VB scripting.

  10. If the form is configured to create an XML rendition of documents matched to it (i.e., the Create XML data rendition option is selected for the form), the XML Node drop-down list is displayed.

    Use the XML Node drop-down list to select the XML node (i.e., the element) that Keyword Value is contained in when the XML rendition of the document is created.

  11. Click Save. The Modify Line Item Tagged Value dialog box is closed and you are returned to the Grouped Line Item Extraction tab of the Data Field Zone dialog box.
  12. If you wish to edit a configured Keyword Value identification type, double-click on its entry in the table to re-open the Modify Line Item Tagged Value dialog box, change any desired settings, and click Save.
    Note:

    If you hover over the list of configured tags/expressions in the Grouped Line Item Extraction tab of the Data Field Zone dialog box, the list will be expanded to show more rows. When you stop hovering over the list, it will return to the default size.