Entities - Hyland Automate Modeling - Current - Current - Ready - Content Innovation Cloud - external

Hyland Automate Modeling

Platform
Content Innovation Cloud
Product
Hyland Automate Modeling
Release
Current
License

The following are the different types of entities:

  • DetectEntities
  • BatchDetectEntities
  • StartEntitiesDetectionJob
  • DescribeEntitiesDetectionJob

Depending on the size of the input the connector will process it in a different a way.

The DetectEntities operation will be called if the supplied text file is smaller than a configurable limit, by default 5000 bytes. If the input file is larger than this then a different API must be used.

The BatchDetectEntitie is used if the file is larger than the DetectEntities limit although it also has a configurable limit, by default 125000 bytes. When you use the Batch API call the input file is split into chunks of less than the configurable limit, by default 5000 bytes.

The StartEntitiesDetectionJob and DescribeEntititesDetectionJob are used if the input file is larger than the BatchDetectEntities limit, by default 5000 bytes. Similar to the batch approach, the input file will be divided into a set of smaller files of a certain configured size. When dividing the original file, the engine ensures that it only includes full words and does not split on a non whitespace character.

The divided files are then uploaded to Amazon S3 using the same key prefix for all files. When all of them have been uploaded an asynchronous entity detection job is started. This is then followed by a polling process to check the status of the job until it finishes or the timeout is reached.

If the asynchronous job finishes successfully a compressed output file (output.tar.gz) with the result will be written by Amazon Comprehend. The file will be saved to the same bucket within a directory that is using the same key prefix. For more, see Asynchronous Batch Processing. The output file is downloaded from Amazon S3 and parsed into a BatchDetectEntitiesResult object. At the end of the process, all the resource files are cleaned, both locally and at Amazon S3.