About Data Extraction - Designer - Foundation 23.1 - Foundation 23.1 - Brainware - external - Brainware/Designer/Foundation-23.1/Brainware-Intelligent-Capture-Designer/Data-Extraction/About-Data-Extraction - 2024-01-22 - To set up the data extraction, complete the following procedures.

Brainware Intelligent Capture Designer

Platform
Brainware
Product
Designer
Release
Foundation 23.1
License

To set up the data extraction, complete the following procedures.

  • Create the data fields
  • Specify the analysis engine to obtain candidates
  • Specify the evaluation engine to select the correct candidate
  • Configure the selected engines
  • Apply validation settings
  • Test and optimize the data extraction

A candidate can only be assigned to a field if its weight exceeds a certain predefined threshold. You can influence the threshold values and the evaluation algorithms, but in most cases, the default settings should be fine.

For now, the following background knowledge is sufficient.

  • The weight indicates the degree of similarity between the properties of a candidate and the properties of user-selected candidates.
  • Data can be extracted if the weight of the best candidate with respect to a given field exceeds a predefined threshold. By default, this threshold is 50 percent.
  • In general, there is only one successful candidate per field and document. If several candidates have a high weight with respect to a field and document, it must be possible to distinguish reliably. Therefore, a certain difference in weight between the winning candidate and the second-best competitor is required as well. By default, this so-called distance is 10 percent.

Extraction means that selected data from a document is automatically written to an extraction file. In general, classification is a precondition for extraction because the fields that need to be extracted are usually different for each class. If it is not necessary to distinguish different document classes and only extraction has to be performed, you can carry out a dummy classification with only one class that needs to be defined as the default. For more information, see About Project-Level Default Classes.

For each class, identify the business processes that use the documents. Identify the data that is required by subsequent systems. Then define the set of fields that are to be filled per class.