Example for a Supervised Learning Workflow - Designer - Foundation 23.1 - Foundation 23.1 - Brainware - external

Brainware Intelligent Capture Designer

Platform
Brainware
Product
Designer
Release
Foundation 23.1
License

You have an existing vendor class, named Invoice that processes invoices from the supplier Company International Technology, S.A. You have a document class with the same purpose, but with a different name, such as BW1234567, that can be created through the Supervised Learning Workflow, but may cause conflicts.

To avoid conflicts, take the following cases into account to combine non-SLW and SLW processed class nodes.

Criteria Case A Case B Case C Case D
Existing classes conflict with potential classes No Yes Yes Yes
Automatic Learning required NA No Yes Yes
Level 1 class (Derived from Base DocClass) NA No Yes No
Base class do not satisfy requirements for SLW execution or classification field improperly configured NA NA NA Yes

Case A

The existing set of classes does not coincide with the set of classes that the Supervised Learning Workflow might create.

This case is the easiest to handle. All you have to do is create new base classes or use existing base classes required for the Supervised Learning processing, and then run the Supervised Learning Workflow in conjunction with existing classification and extraction project configurations without any additional adjustments.

Case B

The existing set of classes does coincide with the set of classes that the Supervised Learning Workflow might create. Automatic learning for these classes is not required.

Case B occurs when you want to keep an existing document class with its learnset, scripting, and other settings, as is, and all documents should still be classified to this class. It is conceivable that Supervised Learning could create a new document class by automatic learning for this document. In that case, you would have to switch off automatic learning for this document class.

To disable “automatic class creation” for a particular class, set the vendor type field in the CSV file to 2 for the corresponding class search entry. For details, see About the Associative Search Engine.

If the Vendor_Type column is not part of the vendor pool, you must create or designate a new column in the vendor pool CSV file to be used for Associative Search Engine search database import. This field can have one of the following three values: 0, 1, or 2. By default, when the field is not available in the vendor pool, the system acts as if the Vendor_Type value is 0. A Vendor_Type value of 0 means that all documents with more than N% invalid fields are added to the learnset. For more details, see the Brainware Intelligent Capture Verifier Help.

You must re-launch the pool import in the Associative Search settings to establish Classification Field column as the Vendor_Type field. Do not use this field as a search field.

In the scenario, the illustration at right shows how a migrated project might look. This illustration shows a sample hierarchy consisting of the old document class Companies with its derived document classes and a new base class conceived for SLW, Invoices.

In the scenario, there is an entry Company in the vendor pool, but learning is prohibited for this entry. Although no new document class for Company will be created, you still must ensure that the system classifies the documents to the “old” document class, Company Invoices, and not to Invoices. For the example, those document classes are situated within different tree branches. To ensure correct classification, the following project configurations are possible. There are other possible configurations, depending on the specific project being migrated.

  • Move part of the document class tree to the new Invoices class. In the illustration at right, automatic learning for Company Invoices is disabled and all the configurations (such as for learnset) are retained. Because this part of the class hierarchy does not have to be trained, the derived document class does not have to at Level 1.
  • Do not change the document class hierarchy. For the document class Invoices do not select a classification, but do select one for the Companies branch. Additionally, set a default classification to Invoices. This means that if a document cannot be classified to the document class for which classification engines are selected, it will be classified to Invoices.
  • Use scripts to ensure that the document is classified to the correct document class
 

Case C

The existing set of classes does coincide with the set of classes that the Supervised Learning Workflow might create. Automatic learning for these classes is required. In addition, these classes are Level 1 classes, meaning that they are derived from a base document class.

If Supervised Learning must be enabled for an existing class that is a derived document class at level 1, and its base class satisfies constraints for Supervised Learning execution, Vendor_Type in the CSV file must be 0 for the corresponding class entry in this file.

In addition, the existing class name and the class name automatically generated by SLW and specified on the ASE settings for Classname Format must coincide with each other. To do this, either:

  • Rename the existing class so that its name coincides with the name that is generated through the Classname Format for the corresponding entry in a CSV file.
  • Change Classname Format so that the names are the same. This can only be applied if it can be done for all classes that “Case C” covers. In addition, uniqueness of all new document class names must be guaranteed.
  • Create a new (non-search) field in the CSV file and give this field a descriptive name. (An example might be My Document Class Name.) In the CSV file generation procedure for all classes described by Case C, use their class names to fill out the field. For all other classes, create a name according to field naming conventions (For example, vendor name, space, unique vendor ID.) Then, in the settings for the Associative Search Engine, use [My Document Class Name] as Classname Format.

The base document classes must satisfy the constraints for execution of the Supervised Learning Workflow: The Classification Field settings are configured correctly, the generic document class is at level 0 of the classification hierarchy and the Associative Search Engine is configured. For details, see About Supervised Learning.

Case D

The existing classes do coincide with the set of classes that might be created by the Supervised Learning Workflow. Automatic learning for these classes is required. These classes are not on Level 1 in BIC project classes' hierarchy or their base classes do not satisfy constraints for Supervised Learning – meaning that Classification Field settings are not correctly configured.

If the Supervised Learning Workflow must be enabled for a particular existing class that is not a Level 1 class or its base class does not satisfy constraints for SLW execution, the vendor type field must be 0 for the corresponding class entry in this file.

In this case, follow the instructions covered for Case C.

In addition, the Case D class must be moved under another base class to ensure that the document class is now at Level 1. The easiest way to do this in Designer is to switch to Definition Mode, Classes view, and use drag & drop to move it to the required class. You can also implement a scripted procedure, use Project.MoveDocClass, and save the project.

The following diagram illustrates the steps that the system administrator must consider for each document class when migrating a project.

Note: Classes created by the Supervised Learning Workflow cannot have the same names as existing document classes. To ensure that this never happens, you must set up the classname format in the Associative Search Engine settings in a unique way, so that existing class names and new class names never overlap. For more details about Supervised Learning Workflow constraints, see About Supervised Learning.