Processing PDFs As Images In Full-Text Search Catalogs - Full-Text Search - English - Foundation 22.1 - OnBase - external

Full-Text Search

Platform
OnBase
Product
Full-Text Search
Release
Foundation 22.1
License

PDF documents can present some challenges for full-text indexing due to the format. Without additional processing, Full-Text Search attempts to extract the text layer from the document to use as a rendition for full-text indexing. However, it is not always possible to successfully extract the text layer from a PDF. For this reason, PDF documents can be optionally processed as image documents.

PDF documents processed as images undergo an additional OCR process to create a text rendition of the document for full-text indexing. While this allows most PDFs to be successfully indexed, it is typically slower due to the extra processing.

PDF documents processed as images are displayed in a paginated format, like other image files. PDF documents not processed as images are displayed as HTML output when viewed from a search results list.

To process PDF documents as images in Full-Text Search catalogs:

  1. Launch the OnBase Configuration module.
  2. Select Full-Text Search from the Utils menu. The Full-Text Configuration Module dialog box is displayed.
  3. Click the Catalogs tab. The Document Types available for indexing are listed.
    Tip:

    Filter the list of Document Types displayed by selecting the Document Type Group from the drop-down list below the list of Document Types.

  4. Select the check box in the PDF as Image column to begin indexing PDF documents as images for the catalog. This means the documents go through an OCR process as part of indexing, and a text rendition of the document is created for use with full-text search.
    Note:

    Selecting this option does not automatically re-index documents that have already been indexed. Only PDF documents indexed after PDF as Image is configured are indexed as images.

    If the PDF as Image option is deselected, PDF documents are processed by extracting the text layer of the PDF for full-text indexing. These documents are rendered as HTML output when viewed from a search results list.

  5. Click Save at the bottom of the Catalogs tab to save your changes.
    To undo any changes made since the last save, click Clear Changes.
  6. Restart the Application Server and Hyland Full-Text Search servers for the changes to take effect.