Note: PREREQUISITES AND THE FEDERATION WIZARD These steps can be
performed automatically by using the Federation Wizard, but will still require job
configuration. If you use the wizard, skip steps 1 and 2.
For indexing content you will need:
- A working Authentication Connection for your source system
- An Integration Connection for your source system
- A Content Service Connection for your source system
- A working Authentication Connection for Elasticsearch
- An Integration Connection for Elasticsearch
- Create a job using your two connections
- In the Details tab Set the source repository’s content service connection directly below the job name.
- In the Details tab make sure the start and end times are set to a wide enough range to capture all the data you wish to index
- In the Tasks tab, select the Tika Extractor Task.
- This task will extract the content from a file and set it as a field on the document for indexing
- In the Mappings tab, select “Basic Elasticsearch Mapping” from the Additional Mappings drop-down If this is not present, simply add the field you set on the task in step 2 as a field mapping. The default is content so the mapping would be content ----Field Mapping----> content (optional) Add any additional mappings. The target fields will be created and mapped dynamically as part of the migration
- In the Output Specification, select your id attribute (or leave it as the default) and pick what collection to index to.
- (optional) If you wish to enable highlighting and your extracted content is not in the “content” field, place the name of your content field from your Tika task in Term Vector field.
- (optional) If you wish to use the More Like This (MLT) to search on custom fields, add them to the Term Vector field.