Skip to content

Index Metadata in Elastic Search #409

Closed
@maxceem

Description

@maxceem

During the migration to the Topcoder V5 Standards, we have to index Metadata related data in the Elastic Search index.

Many of our Metadata objects have JSON fields which may contain quite arbitrary data. It causes multiple issues if we try to index them when data types of the same fields in JSON don’t match.

Data type issues

So far there several kind of issues when we try to index Metadata in DEV env into ES index:

  1. failed to parse [priceConfigs.config.buildingBlocks.ZEPLIN_APP_ADDON_CA.price] number_format_exception For input string: \\"600 + templateId / 10\\"
  2. object mapping for [projectTemplates.scope.wizard] tried to parse field [wizard] as object, but found a concrete value
  3. productTemplates.template.sections.subSections.questions.options.value] of different type, current_type [string], merged_type [long]"
  4. mapper [productTemplates.template.sections.subSections.required] of different type, current_type [boolean], merged_type [string]"

On PROD there was found one more issue:

  1. mapper [productTemplates.template.sections.subSections.questions.required] of different type, current_type [boolean], merged_type [string]"}

Solutions

  1. dynamic: false - Don't index data unless we manually define the mapping.

    We can set dynamic: false to these JSON abjects so no mapping would NOT be created for them and data in these objects would NOT be indexed at all if we didn't define mapping for the object manually. Though the data itself would be still present in the objects inside ES index, we just cannot search by it.

    • pros Absolutely safe to add any objects to ES index and operation would never fail.
    • cons We cannot search by any properties of such object even the ones which could be indexed. Though, if we predefine mappings for some of the properties of such objects, then we could search by these properties.
  2. ignore_malformed - Index all data by default, but we can skip indexing properties with inconsistent data.

    We can set ignore-malformed to some of the properties of these JSON which have inconsistent data. As a result, the dynamic mapping would be created for such fields, but if some objects have a different type, then such properties wouldn't be indexed.

    • pros All the fields of the JSON would be still searchable unless we set the property.
    • cons We have to set this property in advance for the fields with inconsistent data. If we try to index data with inconsistent fields without ignore-malforme set, then there would be an error during indexing.
  3. string mapping

    It appears, that if some field has a "string" mapping, then if we try to index other simple types to this field which could be converted to string like ”sdfdsfsd” (string), 5234 (number), 123.23 (float), 2019-10-23 (date), then such data would be indexed as a "string", while it would still present inside objects as initial untouched value.
    See the POC bash script which tests this behavior https://gist.github.com/maxceem/b86b6c66fcbc8e4c59814992efb6e160.

    This solution can be used to solve 3rd issue where we currently have conflicts between "string" and "long" types. 1st issue cannot be solved using this way, as we the keys of object with inconsistent data have arbitrary names.

    • pros Even fields with inconsistent numeric and string types would be still searchable.
    • cons If we try to index data with inconsistent fields without prior defining "string" mapping for them, then there would be an error during trying to create a dynamic mapping.
  4. fix data in DB We can just fix data in DB to have consistent types.

    This solution could be applied to the 2nd issue, as alternative for this would be using 1st or 2nd solution and loose searchability.

Summary

The best solution for the 3 issues we have would be:

  • Apply solution 3 for the 3rd issue.
  • Apply solution 4 for the issues 1, 2, 4, 5.

As a result:

  • All the data would be searchable.
  • If we add some new inconsistent data for which we didn't preset mapping or type cannot be converted to "string", then indexing would fail.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions