Classification Learnsets - Designer - Foundation 23.1 - Foundation 23.1 - Brainware - external

Brainware Intelligent Capture Designer

Platform
Brainware
Product
Designer
Release
Foundation 23.1
License

The set of sample documents you need to provide for classification is called the Classification learnset. Because the quality of your learnset is crucial for the classification success, carefully select the samples for each class. Consider the following suggestions for selecting the sample documents for the learnsets.

  • You need a separate learnset for each class.
  • You need sample documents that truly represent the topic you want to cover.
  • You need at least five sample documents per class. The maximum number of samples per class is limited by memory.
  • Use single topic documents.
  • Use unique documents for each learnset.
  • Avoid using similar documents for different classes.
  • Use only documents with good OCR results for the learnset. When in doubt, highlight the OCR results to review the documents.
  • Never use handwritten documents.
  • The number of samples needed to create a good learnset depends on the classification task. The broader a class is supposed to cover, the more samples are required.
  • Increase the number of samples if classes are very similar to each other.
  • Ideally, use a similar number of samples for each class.

The following classification engines require learnsets:

  • Brainware Classify Engine
  • Layout Classify Engine
  • ASSA Classify Engine
  • Brainware Layout Classification