The Brainware Table Extraction engine considers the following main types of the lines.
- Primary line
- A line that defines table structure. The engine applies advanced and precise similarity analysis for all primary lines. It is important that all primary lines are well-structured and that they look similar in many of the rows to extract. The engine easily supports an unlimited number of types of primary lines for one table definition. The primary line must contain at least four words. Otherwise, the engine will not learn it. In addition, the primary line must be the first line in the table row.
- Secondary line
- A line between primary lines. The engine applies smooth similarity analysis for these types of lines, which is possible because Brainware Table Extraction only searches the area between two neighboring primary lines. This allows the engine to extract data that varies widely, which often happens with multi-line descriptions. There is also no limit to the number of words in secondary lines, and no limit to the number of secondary lines. However, a document's page must have at least one primary line; otherwise, secondary lines on this page are not extracted.
- Wrong line
- A primary line that is learned as a negative line sample. In other words, all lines classified by the engine as members of one particular “wrong” line class are not extracted. In principle, it is possible to learn an unlimited number of wrong lines, though the current restriction is that this will only take effect during in-document learning. Cross-document learning (that is, learning the whole document after all the fields are completely valid) may not automatically train the wrong lines.
After it learns any type of line, the Brainware Table Extraction engine automatically creates and manages a new line class (cluster). Afterward, all lines in the document considered by the engine to be members of the line class (similar to the learned line sample) will be extracted, or not extracted in the case of “wrong” lines.
It is possible to learn an unlimited number of different line classes. However, the overall quality may suffer if too many lines are learned.
Learning lines can be applied in lines learning (or lines highlighting) mode. Mapping of the column data in the lines can be done in column mapping learning (or columns highlighting) mode. The user can switch between learning (highlighting) modes with the Switch Table Highlighting menu option in Verifier Options menu or with the context menu options Show Lines and Show Columns.