Use the Brainware Layout Classification Engine for content or type classification. The Brainware Layout Classification (BLC) Engine provides a more precise classification between documents with similar templates. For example, considering invoices from different vendors it would be possible to reach a better result by taking into account the positional information of the documents’ content as well as the textual content of the documents.
The document is divided into a number of zones. Each piece of text is tagged to indicate in which zone it appears. This means that if for the learned layout class X the word VAT was mostly located in the top left-hand corner region and for the another class Y it is usually in the bottom right-hand corner, the system would prefer one classification over the other while for normal Brainware classification such a difference between X and Y would have been irrelevant.
The Brainware Layout Classification engine requires more sample documents in the learnset than the Template Classification engine. To achieve best classification weightings, it is important to have a sufficient number of documents in the learnset.
The Brainware Layout Classification engine requires at least two classes defined within a project.
Setting up Brainware Layout Classification is similar to setting up the Brainware Classify Engine. You just need to provide sample documents for learning.
If there is already a learnset for the Brainware Classify Engine, there is usually no need to create another one for Brainware Layout Classification.
To create a separate learnset for Brainware Layout Classification, see Create a Learnset. For more information about learning, see About Learning.
The Brainware Layout Classification engine is an engine for the purpose of so-called layout classification using the powerful Brainware Classifier technology. While the Brainware Classify Engine is used for “content” or “type” classification, the purpose of Brainware Layout Classification (BLC) engine is to provide with more precise classification between documents with similar templates, like, for example, invoices delivered from different vendors, where it is possible to reach more exact result by taking into account the positional information of the documents’ content and not only the textual content of the documents like in ASSA or Brainware “content” classification engines.
The BLC engine simply applies normal Brainware classification. However, each word of both learned and extracted documents is not just used as it is but extended with special character sequences that uniquely identify which zone of the document the word belongs to. Visually this idea can be represented as a document split to a couple of zones (yellow area is the first page of the document and “VAT” is the word we are looking for):
Now the VAT word is going to be extended as VAT_BBBB_AAAA identifying that we look for the word VAT only the {B1; A2} region of the document. This then simply means that if for the learned layout class X the word VAT was mostly located in region {C1; C2} and for the other class Y it was usually {B1; A2} the system will prefer Y while for normal Brainware classification such a difference between X and Y would have been irrelevant. Exactly this very simple approach allows to the content classification engine (in this case Brainware Classifier) as more precise layout one.