The request from the repository to extract metadata goes through RenditionService2, so will use the asynchronous Alfresco Transform Service if available and a synchronous Local transform if not.
Normally the only transform options are timeout and sourceEncoding, so the extractor code only has the source mimetype and content itself to work on. Customisation of the property mapping should really be done in the T-Engine as described above.
However, it is currently possible for code running in the repository (i.e. alfresco.war) to override the default mapping of metadata to content model properties, with an extractMapping transform option. This approach is deprecated and may be removed in a future minor Content Services 7.x release.
An AMP should supply a class that implements the MetadataExtractorPropertyMappingOverride interface and add it to the metadataExtractorPropertyMappingOverrides property of the extractor.Asynchronous spring bean.
/** * Overrides the default metadata mappings for PDF documents: * * <pre> * author=cm:author * title=cm:title * subject=cm:description * created=cm:created * </pre> * with: * <pre> * author=cm:author * title=cm:title,cm:description * </pre> */ publicclassPdfMetadataExtractorOverrideimplementsMetadataExtractorPropertyMappingOverride { @Override publicbooleanmatch(String sourceMimetype) { return MIMETYPE_PDF.equals(sourceMimetype); } @Override public Map<String, Set<String>> getExtractMapping(NodeRef nodeRef) { Map<String, Set<String>> mapping = new HashMap<>(); mapping.put("author", Collections.singleton("{http://www.alfresco.org/model/content/1.0}author")); mapping.put("title", Set.of("{http://www.alfresco.org/model/content/1.0}title", "{http://www.alfresco.org/model/content/1.0}description")); return mapping; } }
Resulting in a request that contains the following transform options:
{"extractMapping":{ "author":["{http://www.alfresco.org/model/content/1.0}author"], "title":["{http://www.alfresco.org/model/content/1.0}title", "{http://www.alfresco.org/model/content/1.0}description"]}, "timeout":20000, "sourceEncoding":"UTF-8"}