Use this feature for quick learning of the Brainware Field Extraction engine without necessity to define header field formats through the Format Analysis engine’s regular expressions and without a need to configure SLW to use Verifier Train Mode for the purpose of Brainware Field Extraction learning.
Note: This feature can be used for training of the former Brainware Extraction engine, as well. On
the other hand, unless specifically desired, it is not recommended to utilize it for the
Brainware Extraction engine for the following reasons:
- Further extraction with the Brainware Extraction engine requires availability of the expected extraction result among the pre-extracted (through Format Analysis engine) candidates. The engine only evaluates the existing candidates assigning them individual weights. These weights are then used by the general extraction subsystem to assign the final extraction result. Therefore, if extraction result was not among the auto-extracted candidates initially and was just created “run-time” during learning, the probability that the same case will occur for a hypothetical similar document to extract is very high. This will then lead to complete unavailability of the correct extraction result.
- When a document is being trained with Brainware Extraction, the engine processes the learned field’s content using it as the single member of internal “Correct Results” Brainware Classifier’s class and using all its other available candidates as multiple members of “Wrong Results” class. The further extraction process is close to smart classification between these two classes. Thus, it is clear that overall quality of Brainware Extraction engine’s outcome also depends on quality of “Wrong Results” class’ content. This content, in case no candidates were available prior to start of the document learning procedure, will be reduced up to just one single member for the “Wrong Results” class, which is going to be the first available word in the document. While such content is sufficient for training / extraction goal in general, it is obviously not optimal in terms of quality.