In classification, you define classes to which BIC assigns documents.
The confidence value returned by a classification engine indicates the degree of similarity between a document to process and the documents in the learnset of a given class.
BIC can only assign a document to a class if a number of confidence values exceed predefined thresholds. Although you can adjust the threshold values and the evaluation algorithms, the default settings are applicable for most cases.
Project Classification Properties
-
Threshold: BIC can classify a document if its confidence value with respect to a given class exceeds the specified threshold.
The default threshold is 70 percent.
-
Distance: In general, there is only one target class per document. If several classes have a high confidence with respect to a document, it must be possible to distinguish reliably. This requires a certain difference in confidence between the winning class and the second-best competitor.
By default, the distance is 20 percent.
- NO Threshold: Some classification engines provide the possibility to specify this value, which defines that a document cannot belong to a specific class if the class weight is below this value.
Notes
-
If you applied multiple classification methods to a given class, BIC computes a combined result.
-
Ensure to scan all documents you want to process within the same project to the same resolution.
If you want to use the Brainware Table Extraction engine, the resolution of all documents must be 300 DPI.