Alfresco Content Services 6.2 provides support for Document Fingerprints to find related documents. Document Fingerprinting is performed by algorithms that map data, such as documents and files to shorter text strings, also known as fingerprints. This feature is exposed as a part of the Alfresco Full Text Search Query Language.
Document Fingerprints can be used to find similar content in general or biased towards containment. The language adds a new FINGERPRINT keyword:
FINGERPRINT:<DBID | NODEREF | UUID>
By default, this will find documents that have any overlap with the source document. The UUID option is likely to be the most useful as UUID is present in the public API. To specify a minimum amount of overlap, use:
FINGERPRINT:<DBID | NODEREF | UUID>_<%overlap> FINGERPRINT:<DBID | NODEREF | UUID>_<%overlap>_<%probability>
To find documents that have 20% overlap with the document 1234, use:
FINGERPRINT:1234_20
To execute a faster query that will be 80% confident anything brought back will overlap by 20%, use:
FINGERPRINT:1234_20_80
To support fingerprint queries, additional information is added to the Solr 6 index using the rerank template. This makes the indexes approximately 15% bigger. Document Fingerprint can only be disabled by changing the schema.