Overriding Metadata Extraction Request in the Repository - Alfresco Content Services - 23.4 - 23.4 - Ready - Alfresco - external

Alfresco Content Services

Platform
Alfresco
Product
Alfresco Content Services
Release
23.4
License

The request from the repository to extract metadata goes through RenditionService2, so will use the asynchronous Alfresco Transform Service if available and a synchronous Local transform if not.

Normally the only transform options are timeout and sourceEncoding, so the extractor code only has the source mimetype and content itself to work on. Customisation of the property mapping should really be done in the T-Engine as described above.

However, it is currently possible for code running in the repository (i.e. alfresco.war) to override the default mapping of metadata to content model properties, with an extractMapping transform option. This approach is deprecated and may be removed in a future minor Content Services 7.x release.

An AMP should supply a class that implements the MetadataExtractorPropertyMappingOverride interface and add it to the metadataExtractorPropertyMappingOverrides property of the extractor.Asynchronous spring bean.

/**
 * Overrides the default metadata mappings for PDF documents:
 *
 * <pre>
 * author=cm:author
 * title=cm:title
 * subject=cm:description
 * created=cm:created
 * </pre>
 * with:
 * <pre>
 * author=cm:author
 * title=cm:title,cm:description
 * </pre>
 */
publicclassPdfMetadataExtractorOverrideimplementsMetadataExtractorPropertyMappingOverride {
    @Override
    publicbooleanmatch(String sourceMimetype) {
        return MIMETYPE_PDF.equals(sourceMimetype);
    }

    @Override
    public Map<String, Set<String>> getExtractMapping(NodeRef nodeRef) {
        Map<String, Set<String>> mapping = new HashMap<>();
        mapping.put("author", Collections.singleton("{http://www.alfresco.org/model/content/1.0}author"));
        mapping.put("title",  Set.of("{http://www.alfresco.org/model/content/1.0}title",
                                     "{http://www.alfresco.org/model/content/1.0}description"));
        return mapping;
    }
}

Resulting in a request that contains the following transform options:

{"extractMapping":{
   "author":["{http://www.alfresco.org/model/content/1.0}author"],
   "title":["{http://www.alfresco.org/model/content/1.0}title",
            "{http://www.alfresco.org/model/content/1.0}description"]},
 "timeout":20000,
 "sourceEncoding":"UTF-8"}