Metadata Extraction and Transform Engines - Alfresco Content Services - 23.4 - 23.4 - Ready - Alfresco - external

Alfresco Content Services

Platform
Alfresco
Product
Alfresco Content Services
Release
23.4
License

The extraction of metadata in the repository is performed in T-Engines (transform engines). Prior to Content Services version 7, it was performed inside the repository. T-Engines provide improved scalability, stability, security and flexibility. New extractors may be added without the need for a new Content Services release or applying an AMP on top of the repository (i.e. alfresco.war).

The Content Services version 6 framework for creating metadata extractors that run as part of the repository still exists, so existing AMPs that add extractors will still work as long as there is not an extractor in a T-Engine that claims to do the same task. The framework is deprecated and could well be removed in a future release.

This page describes how metadata extraction and embedding works, so that it is possible to add a custom T-Engine to do other types. It also lists the various extractors that have been moved to T-Engines.

A framework for embedding metadata into a file was provided as part of the repository prior to Content Services version 7. This too still exists, but has been deprecated. Even though the content repository did not provide any out of the box implementations, the embedding framework of metadata via T-Engines exists.

In the case of an extract, the T-Engine returns a JSON file that contains name value pairs. The names are fully qualified QNames of properties on the source node. The values are the metadata values extracted from the content. The transform defines the mapping of metadata values to properties. Once returned to the repository, the properties are automatically set.

In the case of an embed, the T-Engine takes name value pairs from the transform options, maps them to metadata values which are then updated in the supplied content. The content is then returned to the content repository and the node is updated.