The engine considers the following types of lines.
- Primary line A line that defines the table structure. The BTE engine applies advanced and precise similarity analysis for all primary lines. All primary lines must be well structured and similar to each other in many of the rows to be extracted. However, the engine easily supports an unlimited number of different types of primary lines for one table definition. The primary line must be the first line in the table row and must contain at least four words.
- Secondary line A line between primary lines. The engine applies smooth similarity analysis for these types of lines, which is possible because BTE searches only between two neighboring primary lines. This allows BTE to extract data that varies widely, which often happens with multi-line descriptions. There is no limitation on the number of words in secondary lines, and no limitation on the number of secondary lines. However, a document's page must have at least one primary line; otherwise the engine does not extract secondary lines from this page.
- Wrong line A primary line that is learned as a negative line sample. In other words, no lines classified by the engine as a member of one particular “wrong” line class will be extracted. In principle, it is possible to learn an unlimited number of wrong lines, though this will take effect only during in-document learning. Cross-document learning, that means learning the whole document after all fields are completely valid, may not automatically train the wrong lines.
It is possible to learn an unlimited number of different line types. However, the overall quality of the extraction may suffer when learning too many lines.
You can apply the learning of lines in the lines learning or lines highlighting mode.
To learn a line the engine requires the line to contain a minimal number of words. This value defines the number of words that makes a line eligible for learning.