Assume there are given classes, a given document, and several classification methods that return confidence levels. It is unlikely that all methods return the same confidence level. To get the best bet on what the confidence really is, BIC computes a combined result using one of the following methods.
The maximum method
In this method, only the maximum confidence level of a class is taken into account. This is the default setting.
Method A | Method B | Result | |
---|---|---|---|
Class A | 88% | 68% | 88% |
Class B | 67% | 60% | 67% |
Class C | 62% | 63% | 63% |
The average method
The average of all confidence levels for a class is taken into account.
Method A | Method B | Result | |
---|---|---|---|
Class A | 88% | 68% | 78% |
Class B | 67% | 60% | 63.5% |
Class C | 62% | 63% | 62.5% |
The weighted distance method
The maximum of all results for all classes is taken as a reference. For each result, BIC calculates the distance to this reference. For each class, the distances are added. This sum is subtracted from the maximum result obtained for that class.
Method A | Method B | Result | |
---|---|---|---|
Class A | 88 (0) | 68 (20) | 68 = 88 –(0+20) |
Class B | 67 (21) | 60 (28) | 18 = 67 –(21+28) |
Class C | 62 (26) | 63 (25) | 8 = 63 – (28-27) |
In general, the percentage of documents that will be classified is as follows.
Maximum > Average > Weighted Distance
In particular, with the Weighted Distance method, it is highly unlikely that documents are assigned to a class if the class does not get high confidence levels from both methods applied.