About Creating Learnsets - Designer - Foundation 23.1 - Foundation 23.1 - Brainware - external

Brainware Intelligent Capture Designer

Platform
Brainware
Product
Designer
Release
Foundation 23.1
License

The set of sample documents that you need to provide for data extraction is called the extraction learnset. Carefully select the samples as the quality of your learnset is crucial for the success of the extraction.

When you select the sample documents for the learnsets, consider the following.

  • Use only documents with good OCR results for the learnset. When in doubt, highlight the OCR results to review the documents. Never use handwritten documents.
  • Use only documents that generate candidates for all relevant fields.
  • Where possible, prefer documents that generate a couple of candidates instead of just one for each field. The system analyzes the neighboring words of candidates. It not only learns positive indicators for successful candidates, but also negative indicators.
  • The number of samples needed to create a good extraction learnset depends on the task. In general, you need more samples to cover extraction than to cover classification.
  • You can either learn all fields defined for a class at once, or one field at a time. In the first case, your project settings should support selection of the first field when switching to the next document, in the second case your project settings should make sure that the field selection is not changed when switching to the next document.