The following describes an example of how the text analysis connector is setup:
As part of the BPMN definition process, any service task responsible for triggering the text analysis comprehend.ENTITY has to be set as the value for its implementation attribute.
The following variables must be configured for the text analysis to function.
The input parameters of Entity detection are:
Parameter | Type | Required | Description |
---|---|---|---|
files | Array | Optional | The file to be analyzed. If multiple files are passed, then only the first one will be analyzed. |
text | String | Optional | The Text to be analyzed. If the files parameter is set, leave this blank. |
maxEntities | Integer | Optional | The maximum number of entities that is returned. The parameter
defaults to:
${aws.comprehend.defaultMaxResults} |
confidenceLevel | Float | Optional | he minimum confidence level for a entity expressed by a float number
between 0 and 1. The parameter defaults to:
${aws.comprehend.defaultConfidence} |
timeout | Integer | Optional | The timeout for the remote call to the Comprehend service in
milliseconds. The parameter defaults to:
${aws.comprehend.asynchTimeout} |
customRecognizerArn | String | Optional | The custom recognizer ARN endpoint. If left blank, the Comprehend service will use the value given to the AWS_COMPREHEND_CUSTOM_RECOGNIZER_ARN environment variable. |
The following is an example of the POST body for the Activiti REST API http:////rb/v1/process-instances endpoint:
{ "processDefinitionKey": "TextAnalysisProcessTest", "processInstanceName": "processTextAnalysisTest_Simple", "businessKey": "MyBusinessKey", "variables": { "file" : [ { "nodeId":"ad844189-9afb-4afb-965b-380db73022aa" } ], "maxEntities" : 50, "confidenceLevel" : "0.95", "timeout" : 10000 }, "payloadType":"StartProcessPayload" }
In the business process definition, the text analysis service task called textAnalysisTask has the implementation attribute configured to use the connector.
<bpmn2:serviceTask id="ServiceTask_0j2v2yc" name="textAnalysisTask" implementation="comprehend.ENTITY"> <bpmn2:incoming>SequenceFlow_0kibr65</bpmn2:incoming> <bpmn2:outgoing>SequenceFlow_1048knn</bpmn2:outgoing> </bpmn2:serviceTask>
The input parameters of PII Entity detection are:
Parameter | Type | Required | Description |
---|---|---|---|
files | Array | Optional | The file to be analyzed. If multiple files are passed, then only the first one will be analyzed. |
text | String | Optional | The Text to be analyzed. If the files parameter is set, then this should be left blank. |
maxEntities | Integer | Optional | The maximum number of entities that is returned. The parameter
defaults to:
${aws.comprehend.defaultMaxResults} |
confidenceLevel | Float | Optional | The minimum confidence level for a entity expressed by a float number
between 0 and 1. The parameter defaults to:
${aws.comprehend.defaultConfidence} |
timeout | Integer | Optional | The timeout for the remote call to the Comprehend service in
milliseconds. The parameter defaults to:
${aws.comprehend.asynchTimeout} |
The input parameters of Document classification are:
Parameter | Type | Required | Description |
---|---|---|---|
files | Array | Optional | The file to be analyzed. If multiple files are passed, then only the first one will be analyzed. |
text | String | Optional | The Text to be analyzed. If the files parameter is set, then this should be left blank. |
maxEntities | Integer | Optional | The maximum number of entities that is returned. The parameter
defaults to:
${aws.comprehend.defaultMaxResults} |
confidenceLevel | Float | Optional | The minimum confidence level for a entity expressed by a float number
between 0 and 1. The parameter defaults to:
${aws.comprehend.defaultConfidence} |
timeout | Integer | Optional | The timeout for the remote call to the Comprehend service in
milliseconds. The parameter defaults to:
${aws.comprehend.asynchTimeout} |
customClassificationArn | String | Optional |
The custom recognizer ARN endpoint. If left blank, the Comprehend service will use the value given to the AWS_COMPREHEND_CUSTOM_CLASSIFICATION_ARN environment variable. Note:
AWS_COMPREHEND_CUSTOM_CLASSIFICATION_ARN does
not have a default value. If it is being used then you must also set
a value for it.
|
The output parameters from the entity detection are:
Parameter | Type | Required | Description |
---|---|---|---|
awsResponse | JSON | Optional | The result of the analysis from the Comprehend service. |
entities | JSON | Optional | The result object containing the entities detected. |
The output parameters from the PII entity detection are:
Parameter | Type | Required | Description |
---|---|---|---|
awsResponse | JSON | Optional | The result of the analysis from the Comprehend service. |
piiEntityTypes | JSON | Optional | The result object containing the PII entities detected. |
The output parameters from the Document classification are:
Parameter | Type | Required | Description |
---|---|---|---|
awsResponse | Object | Optional | The object that contains the original result of the text analysis performed by the Comprehend service. |
documentClassificationClasses | Array | Optional | An array that contains the list of the different classes detected in the analysis. |