Standard classification uses the threshold and distance parameters defined at the project level to decide whether a document can be classified.
- Threshold defines the minimum confidence level.
- Distance defines the minimum difference in confidence between the best class and the second-best competitor.
Both requirements must meet to classify a document.
Example using the maximum method
Class | Result |
---|---|
Class A | 88 |
Class B | 67 |
Class C | 63 |
The threshold is set to 70 and the distance is set to 20.
Result
- Class A is the best class. The confidence for class A is above the threshold.
- Class B is the second-best class. The distance to class A is 21. This is just above the required distance.
- The document is assigned to class A.
Example using the average method
Class | Result |
---|---|
Class A | 78 |
Class B | 63.5 |
Class C | 62.5 |
The threshold is set to 70 and the distance is set to 20.
Result
- Class A is the best class. The confidence for class A is above the threshold.
- Class B is the second-best class. The distance to class A is 14.5. This is just below the required distance.
- The document cannot be classified using standard classification.
Example using the weighted distance method
Class | Result |
---|---|
Class A | 68 |
Class B | 18 |
Class C | 8 |
The threshold is set to 70 and the distance is set to 20.
Result
- Class A is the best class. Confidence for class A is below the threshold.
- Distances to the next classes are very high.
- The document cannot be classified using standard classification.
If you use the weighted distance method, you probably have to work with smaller thresholds to obtain a satisfactory fraction of classified documents.