Training documents are added to classes to provide additional examples for a learnset.
Each training document must include one or more image files and a word position file. The image file can be either a single TIFF file (containing one or more pages), or one or more PNG files. If you are using PNG files for a training document with multiple pages, the PNG file names should have the page number as a suffix. For example, invoice.pos, invoice_1.png, invoice_2.png would be the files required for a two-page document. The word position file defines the text and location of each word detected in the image file and is typically derived from the output of Optical Character Recognition (OCR) software.
The availability of the word position file is shown in the Viewer’s status bar.
You can view a training document for which there is no word position file and manually enter and save field values but you cannot click on a word or use rubber banding to add words to the field values as this requires the word position file to exist.
The word position file is a comma-separated text file where a line represents a single word in the image. Each line uses the following format.
- Word id
- Page number (page numbering starts at zero for the first page)
- Top left corner – x position in pixels
- Top left corner – y position in pixels
- Width in pixels
- Height in pixels
- The word
Example: 0,0,453,345,273,89, Software
Optionally, a file giving the values for some or all of the project’s fields can be provided. This field value file is a tab-separated text file where a line represents the information for a single field. Each line uses the following format.
Example:
Field Name | Field Type | Value |
---|---|---|
Total | Amount | 1,533.74 |
Add training documents
To add a training document, complete the following steps.
- Select the class.
- Click the Create a training document icon
. The Upload dialog box is displayed.
- Do any of the following:
- Click anywhere in the Click to add a file, or drag and drop a file here box. The Open dialog box appears.
- Navigate to the required folder and double-click the file, or click the file and click Open. The file is uploaded.
Or,
- Drag and drop a file to the Click to add a file, or drag and drop a file here box.
- Click Create after all the required files are selected. The training document is added to the class and it is selected and ready for you to edit the field values.
- The Create button is enabled after image and word position files are selected. A message at the top of the dialog box changes to indicate what files still need to be added.
- To remove a file from the Upload dialog box, click the
Delete the training document icon
next to that file.
Delete training documents
To delete a training document, complete the following steps.
- Select the class and navigate to the document.
- Click the Delete the training document icon
in the toolbar.
- Click Yes to delete the document, or click No to abort the delete process. If you click Yes, the training document is deleted and the next one is displayed.
Move training documents
To move a training document, complete the following steps.
- Select the class and navigate to the document.
- Click the Move the training document to another class icon
in the toolbar.
- In the Move Document dialog box, select the class that is to receive the document.
- Click Move.