Skip to content
This repository was archived by the owner on Jun 13, 2025. It is now read-only.

Transformer

Sivasurya Santhanam edited this page Jun 21, 2018 · 5 revisions

Solr contains some standard transformer to modify the data of different datasources during the dataimport process (see Transformers). In addition, the KnowledgeFinder project implements some additional transformers that can be used.   

de.dlr.knowledgefinder.dataimport.utils.transformer.ArrayToStringTransformer

Convert an array, found with id "concatArrayFromSource", to a comma separated string.

data-conf.xml

<field column="keywordsText" concatArrayFromSource="keywords" />

de.dlr.knowledgefinder.dataimport.utils.transformer.AuthorFormatingTransformer

search for authors in the form \{firstname: String, lastname: String, organisation: String} (any part of firstname lastname or organisation can be an empty string) possible strings in return value can be ["firstname lastname (organisation)", "firstname lastname", "organisation", "f. lastname (organisation)", "firstname", "lastname", "lastname (organisation)",…​].

data-conf.xml

<field column="authors" formatAuthor="true"
                concatArrayFromSource="authors" />

de.dlr.knowledgefinder.dataimport.utils.transformer.CategoriesSeparatedTransformer

Read a json-file which contains the tree-categories structure and organizes the imported field in separated categories-fields (prefix
name-main-category)

data-conf.xml

...
<!-- CategoriesSeparatedTransformer-->
<field column="contentCategories" name="categories" categories="import/cs-categories.json" categories_split_prefix="category_"/>
...

de.dlr.knowledgefinder.dataimport.utils.transformer.DateIncompleteFormatTransformer

Convert an incomplete string-date (p.e. 2003-12) in the maximal or minimal value (Date) posible (parteToDate="start", parseToDate="end"). The regex expresion has to have the groups "year", "month" and "day".

data-conf.xml

...
<!-- DateIncompleteFormatTransformer-->
<field column="startTimePeriodOfDataCollection" sourceColName="startCollection"
parseToDate="start" regex="^(?&lt;year&gt;\d{4})(\D)?(?&lt;month&gt;\d{2})?(\D)?(?&lt;day&gt;\d{2})?$"/>
<field column="endTimePeriodOfDataCollection" sourceColName="endCollection"
parseToDate="end" regex="^(?&lt;year&gt;\d{4})(\D)?(?&lt;month&gt;\d{2})?(\D)?(?&lt;day&gt;\d{2})?$"/>
...

de.dlr.knowledgefinder.dataimport.utils.transformer.DateToStringTransformer

Convert a date into a string with the given format

<field column="publishYear" sourceColName="publishDate" locale="en" datePattern="yyyy"/>

de.dlr.knowledgefinder.dataimport.utils.transformer.DictToStringTransformer

This transformer takes maps of <String, String> and try to match them to the formatting pattern. If a key from the pattern is not contained in the keys string representation will be used. The keys are separated by "\{" or "}".

data-conf.xml

<field column="identifications" formatDictPattern="{id_type}: {id_value}" />

de.dlr.knowledgefinder.dataimport.utils.transformer.ExcludeValuesTransformer

Ignore the values contained in a list defined in a file

data-conf.xml

...
<!-- ExcludeValuesTransformer -->
<field column="spatialCoverage" exclude="import/exclude-spatial.txt" ... />
...

de.dlr.knowledgefinder.dataimport.utils.transformer.FilePathTransformer

This transformer is able to set an prefix before a string and add or replace an file suffix. The field "pathPrefix" have to be present, but can be empty. This field will be set in front of the input. If the "oldFileSuffix" and "fileSuffix" are present, the last occurence of the "oldFileSuffix" will be replaced by the "fileSuffix". If the "oldFileSuffix" is not present, but the "fileSuffix", the "fileSuffix" will simply appended to the string.

data-conf.xml

<field column="filePath"
                filePrefix="/some/path"
                fileSuffix=".pdf" oldFileSuffix=".xml" srcColName="file"/>

de.dlr.knowledgefinder.dataimport.utils.transformer.FormatingStringTransformer

This takes a formatting pattern and match the source value in every occurrence of "\{*}".

data-conf.xml

<field column="dataFileSize" srcColName="____content.size____" formatStringPattern="{*} MB" />

de.dlr.knowledgefinder.dataimport.utils.transformer.IntToDateTransformer

If this transformer is enabled for a field, he takes a string for the field. If this string represents an integer it will be converted to a date, it is expected that the integer gives the date in seconds since 1.1.1979.

data-conf.xml

<field column="____contentcreationdatetime____" convertIntToDate="true" />

de.dlr.knowledgefinder.dataimport.utils.transformer.SelectLatestDateTransformer

Find the latest date from a group of dates. The dates should be java date objects. You also should pass your date pattern and locale in the form of standard java representation.

data-conf.xml

<field column="metadataContentUpdateDateTime" dataTimeSources="creationDate,modificationDate"
                datePattern="yyyy-MM-dd HH:mm" locale="en" />

de.dlr.knowledgefinder.dataimport.utils.transformer.TrimTransformer

If enabled this transformer removes the spaces from the start and end of strings and string arrays.

data-conf.xml

<field column="authors"trim="true" />

de.dlr.knowledgefinder.dataimport.utils.transformer.URLDecodeTransformer

Decode the field content from application/x-www-form-urlencoded MIME format to a Java string.

<field column="fileURL" urlDecode="true"/>

Clone this wiki locally