Skip to content

Add per-index setting to limit number of nested fields #15989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 19, 2016

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jan 14, 2016

Indexing a document with 100 nested fields actually indexes 101 documents as each nested
document is indexed as a separate document. To safeguard against ill-defined mappings
the number of nested fields that can be defined per index has been limited to 50. This
default limit can be changed with the index setting index.mapping.nested_fields.limit.

Closes #14983

@ywelsch ywelsch added >enhancement review :Search Foundations/Mapping Index mappings, including merging and defining field types v5.0.0-alpha1 v2.3.0 labels Jan 14, 2016
@jpountz
Copy link
Contributor

jpountz commented Jan 14, 2016

This looks good to me. One thing I'm wondering is whether we should only apply the limit to indices created on or after 2.3 in order to not fail recoveries of existing indices if they happen to cross the limit (unless maybe it's something that we want to do so that they know they are in a bad situation). cc @clintongormley

@clintongormley
Copy link
Contributor

@jpountz i agree that this should be for new indices (>= 2.3)

@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 15, 2016

Thinking a bit more about this, I noticed that the case where we update the new settings to a value lower than the current number of nested fields is not covered. After discussing with @jpountz and @s1monw, two possible ways to think about the nested fields limit came up:

  1. The setting index.mapping.nested_fields.limit guarantees that the number of nested fields in the mapping is never above that limit. This means that updating index.mapping.nested_fields.limit to a value lower than the current number of nested fields must fail the index settings update.

or

  1. The setting index.mapping.nested_fields.limit guarantees that adding new nested fields fails if the number of nested fields goes over the current limit. Updating index.mapping.nested_fields.limit to a value lower than the current number of nested fields always succeeds.

In case we go with 2), I would drop the requirement that this only applies to new indices (>= 2.3).

@jpountz @clintongormley thoughts?

@clintongormley
Copy link
Contributor

Good point - I think I prefer option 2

@jpountz
Copy link
Contributor

jpountz commented Jan 15, 2016

I think 2 is more practical too.

@ywelsch ywelsch force-pushed the feature/limit-nested-fields branch 2 times, most recently from 755d20e to 1ff8852 Compare January 15, 2016 15:40
@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 15, 2016

I agree that option 2 is better.

Pushed a new set of changes, can you have a look again @jpountz ?

Instead of adding an(other) ad-hoc parameter to merge, I created a MergeOptions class (maybe we should call it ValidationOptions?)

Other options might be added there. In MetaDataMappingService.PutMappingExecutor.applyRequest I spotted for example a validation check that checks if _parent field points to an already existing type.

@jpountz
Copy link
Contributor

jpountz commented Jan 15, 2016

I am not sure I like giving an option to bypass this check. I would rather like to have a new enum parameter in the merge method that gives the context of the mapping update (type creation, mapping update or restoring an existing mapping) and apply the nested count check whenever it makes sense (type creation and mapping updates). I expect that it would also allow us to get rid of the applyDefault parameter since the defaults only need to be applied upon type creation.

@jpountz
Copy link
Contributor

jpountz commented Jan 15, 2016

If I'm not mistaken it would also work for the parent check which should only be applied for mapping updates.

@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 18, 2016

The reason I chose to set the options externally is that the options for such a context enum seem too complex to me. Looking at the current code on master, I identify 9 call sites:

  • creating a new index (2 calls, case distinction on default mapping)
  • receiving cluster state update from master with new/updated mapping (1 call)
  • adding an alias (to parse the filter) (2 calls, case distinction on default mapping)
  • check compatibility during possible upgrade of index metadata (e.g. when restoring from an old snapshot) (1 call)
  • refresh of mapping triggered by node (1 call)
  • update of mapping triggered by user or by system (e.g. with dynamic mappings) (2 calls, one where we create the mapping from the state before we update and another where we merge the updated mapping)

I'm a bit hesitant on moving these case distinctions into MapperService. @jpountz how should we proceed?

jpountz added a commit to jpountz/elasticsearch that referenced this pull request Jan 18, 2016
This would be useful in order to only perform some validations in the case of
a mapping update and in cases when a mapping is restored eg. after a restart,
such as discussed in elastic#15989.

This replaces the current `applyDefault` parameter which can be derived from
the mapping merge reason: the default mapping should be applied only in case of
a mapping update, if the mapping does not exist yet and if this is not the
default mapping.
@ywelsch ywelsch force-pushed the feature/limit-nested-fields branch from 1ff8852 to a1b8dd2 Compare January 19, 2016 09:05
@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 19, 2016

@jpountz Thanks for the change in #16059. I have updated this PR accordingly. Can you have another look?

@jpountz
Copy link
Contributor

jpountz commented Jan 19, 2016

LGTM

jpountz added a commit that referenced this pull request Jan 19, 2016
This would be useful in order to only perform some validations in the case of
a mapping update and in cases when a mapping is restored eg. after a restart,
such as discussed in #15989.

This replaces the current `applyDefault` parameter which can be derived from
the mapping merge reason: the default mapping should be applied only in case of
a mapping update, if the mapping does not exist yet and if this is not the
default mapping.
ywelsch pushed a commit that referenced this pull request Jan 19, 2016
Add per-index setting to limit number of nested fields
@ywelsch ywelsch merged commit fc4e8fd into elastic:master Jan 19, 2016
@jpountz
Copy link
Contributor

jpountz commented Jan 19, 2016

For the record, this setting was also made updatable via 6ca7909

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Search Foundations/Mapping Index mappings, including merging and defining field types v2.3.0 v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants