Changing Default Property Mappings for PDF Metadata Extraction - Alfresco Content Services - 23.4 - 23.4 - Ready - Alfresco - external

Alfresco Content Services

Platform
Alfresco
Product
Alfresco Content Services
Release
23.4
License

A common requirement is to be able to change the mapping of out-of-the-box properties, such as having the subject property mapped to cm:title instead of cm:description for a PDF file. This is quite easy to achieve, just override the out-of-the-box JSON configuration and re-configure the mapping. The out-of-the-box definitions for Metadata Extractors can be found in the places described in Metadata Extractors Moved to T-Engines.

To change the subject property so it is mapped to content model property cm:title for PDF files re-define the PdfBoxMetadataExtractor_metadata_extract.properties configuration as follows:

#
# PdfBoxMetadataExtracter - custom mapping
#

# Namespaces
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0

# Mappings
author=cm:author
title=cm:title
subject=cm:title

Note that all the namespaces that the content model properties belong to have to be specified as in the above example with namespace.prefix.cm. It is also very important to know that the property names are case sensitive.