Textract - Hyland Automate Modeling - Current - Current - Ready - Content Innovation Cloud - external

Hyland Automate Modeling

Platform
Content Innovation Cloud
Product
Hyland Automate Modeling
Release
Current
License

The EXTRACT action is used by the Textract connector to execute Amazon Textract to extract text and metadata from JPEG and PNG files that are less than 5mb in size.

The Amazon Textract APIs called are the following:

  • Detect Document Text API, which joins all LINE block objects with a line separator between them
  • Analyze Document API, which performs FORM and TABLES analysis

The IAM user configured to run the Textract services needs to have the textract:DetectDocumentText and textract:AnalyzeDocument permissions.

The authentication for the Textract connector is set using the configuration parameters. For more details, see Textract Configuration Parameters.

The input parameters of the Textract connector are:

Parameter Type Required Description
file Array<content>, Content Required A variable of type file to send for extraction.
outputFormat String Optional The format of the output file. Possible values are JSON and TXT. The default value is JSON.
confidenceLevel String Optional The level of confidence (0 – 10) to use in the analysis, for example: 0.75
timeout Integer Optional The timeout period for calling the Textract service in milliseconds, for example: 910000

The output parameters from the Textract analysis are:

Parameter Type Required Description
awsResult JSON Optional The result of the analysis from the Textract service.