Content Services provides many different types of content transformations out-of-the-box. Custom transformations can also be implemented and configured.
Architecture Information: Software Architecture
Content transformers transform one type of content into another, such as a HTML file into a PDF file. They are used to enable indexing, thumbnails, and preview of content. If the target type cannot be achieved with one transformation several transformations can be chained together, such as JSON to HTML to PDF. To implement these transformations, third party tools such as PDFBox, LibreOffice, ImageMagic, and Apache Tika are used. There are a number of transformations supported out of the box that you should familiarize yourself with before implementing a custom transformer, such as:
- PDF to Text, Html or XML
- Word, PowerPoint, Excel + other MS Office formats -> Text, Html or XML
- HTML to Text
- Outlook Email .msg to Text
- RFC822 Email to Text
- Text to PDF
- Office Open XML to JPEG
- Apple iWorks files to PDFs or JPEGs
- ZIP, TAR to Text
- Mediawiki markup to HTML
These are just a few of the supported transformations and they can also be combined to form so called transformation pipelines when it is not possible to go directly from source mimetype to target mimetype. Renditions are related to Transformations and they will be covered at the end of this article.
To find out what transformers are currently registered and active within a Content Services installation, you can use an admin Web Script. This is available at http://localhost:8080/alfresco/service/mimetypes. This will list all the currently registered mimetypes, and provide a details link for each one. Selecting the details link will then show which transformations are currently supported both to and from that mimetype, and by what transformer. If a transformer becomes unavailable (for example, if the JOD/LibreOffice connection fails), then refreshing the list will show the updated transformations.
When working with transformations and renditions it is important to make sure that the involved mimetypes are known to Content Services. So when accessing the “mimetypes” Web Script make sure the mimetypes that will be used in transformations and renditions are included there, if not you would have to register them with Content Services, see Mimetypes Extension Point for more information about that.
Related to transformations are renditions, and the purpose of them is to provide support for rendering a specific content item into other forms, known as renditions. The rendition items are derived from their source item and as such can be updated automatically when their source item’s content (or other properties) are changed.
Examples of renditions (and rendition engines) include:
- reformat - Transforms content to new format - redirects to vanilla transformers
- image - Rescales images (including thumbnails)
- freemarker - Runs a FreeMarker template against source content
- xslt - Runs XSLT on XML source content
- composite - A rendition series
Renditions can be performed synchronously or asynchronously and can be created at a specified location within the repository. By default they are created as primary children of their source item but it is possible to have them created at other places specified explicitly or as templated paths.
The following describes an example with the reformat rendering engine and the custom transformation you have previously implemented (that is .json to .html). You can invoke the rendering engine and create an HTML rendition for a JSON document via JavaScript executed from a folder rule:
var renderingEngineName = 'reformat'; var renditionDefinitionName = 'cm:htmlRenditionDef'; var renditionDef = renditionService.createRenditionDefinition(renditionDefinitionName, renderingEngineName); renditionDef.parameters['mime-type'] = 'text/html'; var htmlRendition= renditionService.render(document, renditionDef);
The rendition definition name is a QName that we have to come up with, use a known namespace such as cm. The mime-type parameter tells the reformat rendering engine that the new rendition should be a HTML file. By default this rendition will be stored as a hidden child to the uploaded JSON document. If you used the Node Browser to inspect the JSON content node you should see three hidden renditions as follows:
**Children ** Child Name Child Type Child Reference Primary Association Type Index *cm:htmlRenditionDef cm:content workspace://SpacesStore/78448da2-fbab-4ef1-b451-6bbaa569b8c4 true rn:rendition 0* cm:doclib cm:thumbnail workspace://SpacesStore/f7f354a7-010f-4462-82a1-7cec7f36fa1d true rn:rendition 1 cm:pdf cm:thumbnail workspace://SpacesStore/8b3ec283-cd2a-4378-af2d-e72e30127210 true rn:rendition 2