-
Notifications
You must be signed in to change notification settings - Fork 1
Transformer
Solr contains some standard transformer to modify the data of different datasources during the dataimport process (see Transformers). In addition, the KnowledgeFinder project implements some additional transformers that can be used.
Convert an array, found with id "concatArrayFromSource", to a comma separated string.
data-conf.xml
<field column="keywordsText" concatArrayFromSource="keywords" />search for authors in the form \{firstname: String, lastname: String, organisation: String} (any part of firstname lastname or organisation can be an empty string) possible strings in return value can be ["firstname lastname (organisation)", "firstname lastname", "organisation", "f. lastname (organisation)", "firstname", "lastname", "lastname (organisation)",…].
data-conf.xml
<field column="authors" formatAuthor="true"
concatArrayFromSource="authors" />Read a json-file which contains the tree-categories structure and
organizes the imported field in separated categories-fields (prefix
name-main-category)
data-conf.xml
...
<!-- CategoriesSeparatedTransformer-->
<field column="contentCategories" name="categories" categories="import/cs-categories.json" categories_split_prefix="category_"/>
...Convert an incomplete string-date (p.e. 2003-12) in the maximal or minimal value (Date) posible (parteToDate="start", parseToDate="end"). The regex expresion has to have the groups "year", "month" and "day".
data-conf.xml
...
<!-- DateIncompleteFormatTransformer-->
<field column="startTimePeriodOfDataCollection" sourceColName="startCollection"
parseToDate="start" regex="^(?<year>\d{4})(\D)?(?<month>\d{2})?(\D)?(?<day>\d{2})?$"/>
<field column="endTimePeriodOfDataCollection" sourceColName="endCollection"
parseToDate="end" regex="^(?<year>\d{4})(\D)?(?<month>\d{2})?(\D)?(?<day>\d{2})?$"/>
...Convert a date into a string with the given format
<field column="publishYear" sourceColName="publishDate" locale="en" datePattern="yyyy"/>This transformer takes maps of <String, String> and try to match them to the formatting pattern. If a key from the pattern is not contained in the keys string representation will be used. The keys are separated by "\{" or "}".
data-conf.xml
<field column="identifications" formatDictPattern="{id_type}: {id_value}" />Ignore the values contained in a list defined in a file
data-conf.xml
...
<!-- ExcludeValuesTransformer -->
<field column="spatialCoverage" exclude="import/exclude-spatial.txt" ... />
...This transformer is able to set an prefix before a string and add or replace an file suffix. The field "pathPrefix" have to be present, but can be empty. This field will be set in front of the input. If the "oldFileSuffix" and "fileSuffix" are present, the last occurence of the "oldFileSuffix" will be replaced by the "fileSuffix". If the "oldFileSuffix" is not present, but the "fileSuffix", the "fileSuffix" will simply appended to the string.
data-conf.xml
<field column="filePath"
filePrefix="/some/path"
fileSuffix=".pdf" oldFileSuffix=".xml" srcColName="file"/>This takes a formatting pattern and match the source value in every occurrence of "\{*}".
data-conf.xml
<field column="dataFileSize" srcColName="____content.size____" formatStringPattern="{*} MB" />If this transformer is enabled for a field, he takes a string for the field. If this string represents an integer it will be converted to a date, it is expected that the integer gives the date in seconds since 1.1.1979.
data-conf.xml
<field column="____contentcreationdatetime____" convertIntToDate="true" />Find the latest date from a group of dates. The dates should be java date objects. You also should pass your date pattern and locale in the form of standard java representation.
data-conf.xml
<field column="metadataContentUpdateDateTime" dataTimeSources="creationDate,modificationDate"
datePattern="yyyy-MM-dd HH:mm" locale="en" />If enabled this transformer removes the spaces from the start and end of strings and string arrays.
data-conf.xml
<field column="authors" … trim="true" />