Pipeline

Concept

The connector crawls items from a source system and feeds these items into a target system through a pipeline. To adjust or process each item before it gets ingested, it is possible to provide custom stages for this pipeline. These stages are software modules provided by us which can be uploaded and configured via the UI.

How to upload and use stages

To upload a stage open the pipeline view:

pipeline view

Then click on the + symbol between Source System and Target System and the Select Pipeline Stage view pops up:

select page

Here you can see stages which are already uploaded to the connector. To upload a new stage, drag and drop the respective software module or click on Drag file to upload or browse to navigate to and select the module.

Once the module is uploaded it is listed in the available stages where it can be selected. After selecting the stage it is shown as step in the pipeline where it can be configured:

configure stage

After completing the configuration of this single stage, it has to be saved by pressing SAVE. It is possible to set up and configure multiple custom stages in any order. To apply stages this setup of stages has to be validated by clicking on VALIDATE. If there is something wrong ion the configuration, the connector indicates the wrongly configurated stage and fields.

pipeline validate wrong

After a correct validation, save by pressing the respective buttons. After saving the setup, the connector has to be restarted so that the pipeline stages are applied in the next traversals. This can be done by pressing Apply Changes on the popped up message:

finish setup

Default packaged stages

The connector is packaged with pipeline stages which are delivered but not applied by default:

Metadata Regular Expression Stage

The Metadata Regular Expression Stage is used to manipulate a metadata field by replacing all occurrences of a regular expression with a configured value.

Configuration

Property Description

Metadata Key

Key of the target metadata.

Pattern

The pattern to match and replace with the value.

Metadata Value

The value to replace the matched string with

Prefix

A prefix to add in front of the current value

Suffix

A suffix to add behind the current value

Label

Exclusive name of the stage.

Example

With the configuration

Property Value

Metadata Key

text

Pattern

\\s+

Metadata Value

-

Prefix

start_

Suffix

_end

the metadata
text=this is an example
becomes
text=start_this-is-an-example_end

Icon Assigner Stage

The Icon Assigner Stage sets the Item’s Icon URL based on the Item’s File Extension and Item Type.

Configuration

Property Description

Item Types

The Icon will only be assigned if one of the Item Types matches. If none are defined, all Item Types will be accepted.

File Extensions

The Icon will only be assigned if one of the File Extensions matches. If none are defined, all File Extensions will be accepted.

Icon URL

The Icon URL that will be added to the item if it matches both an Item Type and a File Extension.

Icon Metadata Key

The Key under which the Icon will be added to the metadata.

Label

Exclusive name of the stage.

Enable Conditions

Conditions on which an item should be assigned an Icon’s URL.

Metadata Assigner Stage

The Metadata Assigner Stage alters or assigns a metadata value.

Configuration

Property Description

Metadata Key

Key of the target metadata.

Metadata Values

The list of values which should be assigned to the metadata field using the specified assigner type strategy.

Assigner Type

The strategy how the metadata value is appended (APPEND, REPLACE or IGNORE).

Label

Exclusive name of the stage.

Enable Conditions

Conditions on which an item’s metadata should be assigned.

Metadata Drop Stage

Drops metadata entries of an item with the specified key.

Configuration

Property Description

Metadata Keys

Keys of metadata entries to drop.

Label

Exclusive name of the stage.

Enable Conditions

Conditions on which an item’s metadata should be dropped.

Metadata Mapper Stage

The Metadata Mapper Stage copies the values from the source key into target key.

Configuration

Property Description

Metadata Source Key

Key of the source metadata.

Metadata Target Key

Key of the target metadata.

Label

Exclusive name of the stage.

Enable Conditions

Conditions on which an item should be mapped.

Drop Item Stage

This stage drops items from further processing. A condition on a specific metadata field value can be set.

Configuration

Property

Description

Label

Exclusive name of the stage.

Enable Conditions

Conditions on which an entire item should be dropped.

Custom Stage

This stage supports custom implementations of the Custom Stage API (RaytionPipelineStage). For additional details on implementation, please refer to How to add a custom stage?.

Configuration

Property Description

Stage Class Name

The fully qualified name of the class implementing RaytionPipelineStage.

Key-Value Pairs

The keys with their respective values to be given into the Custom Pipeline Stage. Keys and Values are delimited by a '=' (i.e. key1=value1).

Label

Exclusive name of the stage.

Enable Conditions

Conditions on which the custom stage should be applied to the item.

Apache Tika

Binary Content Analysis applied to items before being processed to the search engine.

Configuration

Property Description

Tika Server URL

URL of the Tika Server.

Label

Exclusive name of the stage.

Enable Conditions

Conditions on which the content of an item should be processed.

Custom Headers

Custom Headers to include to all requests sent to Tika Server.

Header Name

Name of the header to append for outgoing requests.

Header Value

Value of the header to append for outgoing requests

Advanced Client Options

If enabled, it allows you to configure advanced client connection settings

Max. Connection Pool Size

Max. number of connections maintained by the client connection manager

Connection Timeout

Client-side connection timeout

Retry Interval

Retry interval in duration between subsequent retries

Max. Retry Counts

Max. number of retries

Mime Type Detection

This step detects the mime type for an item by parsing its binary content. The detected mime type is processed together with the original mime type to the search engine.

Enabled

If disabled, mime type detection is skipped for all items.

Only If Missing

If set to true, mime type detection is applied only for items with missing media type.

Content Extraction

This step extracts a content text for an item by parsing its binary content. If non-empty text was extracted for an item, the item is processed as text document to the search engine together with the extracted content text.

Enabled

If disabled, content extraction is skipped for all items.

Item Type Exclusions

All items with item type configured in this list will be skipped from extracting their content text.

Media Type Exclusions

All items with media type configured in this list will be skipped from extracting their content text.

Language Detection

This step detects the language of the item content. The detected language code is supplied to the search engine as item metadata.

Enabled

If disabled, language detection is skipped for all items.

Only If Missing

If set to true, language detection is applied only for items with missing language code.

Replace Provided

If set to true, the provided language will be overwritten by the detected language. If this option is disabled, the detected language will be appended to the list of provided languages.

Metadata Extraction

This step extracts additional metadata for an item by parsing its binary content.

Enabled

If disabled, additional metadata extracted by Tika will not be processed to search engine.

Metadata Prefix

The specified prefix will be appended to all metadata keys extracted by Tika.

Content Manipulation

Manipulation of content text.

Regular Expression Manipulations

Regex manipulation steps applied sequentially.

Regex Pattern

The regex pattern to match and replace with the provided value.

Value

The value to replace the matched content text with.

Enable Max. Size

If disabled, entire content will be processed.

Maximum Content Size

Maximum length of the content, if this limit is exceeded then the content is truncated.

Stage Conditions

You can apply a stage to items under specific conditions by clicking 'Enable Conditions' for each stage.

pipeline conditions
Property Description

Enable Conditions

Allows the configuration of conditions which an item must fulfill to be processed by this stage.

Condition Operator

Sets if all conditions (AND) or any single condition (OR) need to be fulfilled for the item to be processed.

Conditions

Condition definitions which need to be matched for the stage to process an item.

Field

The metadata field which should be checked.

Value

The value which the metadata field should contain.

Match Type

The type of matching used to check if the metadata values fulfill the condition.
Possible choices: EQUALS, NOT_EQUALS, REGEX, NOT_SET. REGEX expects a regular expression as a value which needs to be matched with the item’s actual value for the field. NOT_SET is matched if there is no value for the given field.

How to add a Custom Stage?

The connector offers the opportunity to create custom stages and integrate them into the pipeline.

Implementation

To achieve this, you must develop against the provided interface RaytionPipelineStage within the raytion-custom-stage-api Jar.

Raytion Gen9 Connectors, as of the current moment, are written for Java 8. Therefore, customized stages must also be developed with Java 8. Logging inside a custom stage is recommended to be done via slf4j-api.

Setup

To use your custom stages in Gen9 Connectors, please closely follow the following steps:

  1. Shut down your Connector

  2. Navigate to the folder <CONNECTOR_HOME>/bin/ where <CONNECTOR_HOME> is the installation folder of your Connector

  3. Depending on the Operating System the Connector is installed and run from you will want to edit the connector.sh or alternatively connector.bat. If you are unsure, please create a backup of the file.

    1. Add to the default JVM options:

      1. Connector.sh: Add -Dloader.path=$APP_HOME/lib/

      2. Connector.bat: Add -Dloader.path=%~dp0../lib/

    2. Change the execution argument from -jar to -cp.

    3. Append the main class to org.springframework.boot.loader.PropertiesLauncher by adding it to the end of the execution command.

  4. Put all the custom stage JARs that you have implemented and want to use, along with the provided raytion-custom-stage-api Jar into the folder <CONNECTOR_HOME>/lib/ where <CONNECTOR_HOME> is the installation folder of your Connector

  5. Start the Connector via the connector.sh or the connector.bat file.

When you did everything correctly the connector should start as it did before without issue.

Configuration

custom stage
  1. Navigate to the Pipeline tab in the Connector interface.

  2. Add a stage via the +-Button and select the Custom Stage.

  3. Configure the fully qualified class name of an implemented stage you added to the Connector in previous steps.

  4. Set up Key-Value pairs to be passed to the stage by configuring them, where keys and values are delimited by =.

  5. Once you are done with the stage configuration, make sure to hit SAVE on the right.

  6. Click on VALIDATE at the top of the page and then on SAVE. After that restart the connector by clicking on Apply Changes.