Problems with the learnset most likely result in low average extraction levels with empty and invalid fields. For some documents in the test set, the extraction works, for others not. The problem most likely affects all fields that use the learnset. To resolve learnset problems, complete any of the following steps.
- Check the candidates’ weights and the distances. If your weights and distances are too low, you need more samples.
- Add candidates which were previously not classified correctly due to the initially insufficient learnset.
- Use candidates from the same document with an almost identical weight. These candidates are suitable to differentiate.
- Avoid candidates with a high weight. These candidates do not improve the learnset, because they would already be selected correctly.