Skip to content

Add NestedPathFieldMapper to store nested path information #51100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

romseygeek
Copy link
Contributor

Currently nested documents repurpose the _type field to store their nested paths.
This commit adds a dedicated _nested_path field instead, which decouples this
information from types and will allow the removal of the _type field entirely further
down the line. To preserve backwards compatibility, references to this field are
mediated via methods that take an index settings object, and indexes created before
8x still use the _type field.

Relates to #41059
Closes #24362

@romseygeek romseygeek added :Search Foundations/Mapping Index mappings, including merging and defining field types :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >refactoring v8.0.0 labels Jan 16, 2020
@romseygeek romseygeek self-assigned this Jan 16, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CRUD)

@romseygeek
Copy link
Contributor Author

Note: this supersedes #50312, which removed the type field entirely; this defers that removal to a later commit

…ng updates due to the _type field being overloaded
@romseygeek
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass and added some comments. Overall it looks good to me.


@Override
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName) {
Function<MapperService, String> typeFunction = mapperService -> mapperService.documentMapper().type();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we disallow fielddata on this field ? It's an implementation details and I am not sure that we want to expose it in aggs or sorting ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep it only for indices created before 8 to handle bwc ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it - indices created before 8 still use TypeFieldMapper, so BWC is handled that way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 7ea0da8

@@ -157,6 +158,7 @@ public IndicesModule(List<MapperPlugin> mapperPlugins) {
builtInMetadataMappers.put(IndexFieldMapper.NAME, new IndexFieldMapper.TypeParser());
builtInMetadataMappers.put(SourceFieldMapper.NAME, new SourceFieldMapper.TypeParser());
builtInMetadataMappers.put(TypeFieldMapper.NAME, new TypeFieldMapper.TypeParser());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you switch to the NestedPathFieldMapper.TypeParser in this pr since you handle bwc in the new field type ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may be able to in future, but I think we need to keep TypeFieldMapper around for the moment because it's still used for type queries, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I was confused because there are some logic to handle bwc in NestedPathFieldMapper.

this.nestedTypePathAsString = "__" + fullPath;
this.nestedTypePathAsBytes = new BytesRef(nestedTypePathAsString);
this.nestedTypeFilter = new TermQuery(new Term(TypeFieldMapper.NAME, nestedTypePathAsBytes));
this.nestedTypePath = "__" + fullPath;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For indices created in v8 we don't need to add any prefix since we use a dedicated field ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I pushed 7ea0da8

@@ -157,6 +158,7 @@ public IndicesModule(List<MapperPlugin> mapperPlugins) {
builtInMetadataMappers.put(IndexFieldMapper.NAME, new IndexFieldMapper.TypeParser());
builtInMetadataMappers.put(SourceFieldMapper.NAME, new SourceFieldMapper.TypeParser());
builtInMetadataMappers.put(TypeFieldMapper.NAME, new TypeFieldMapper.TypeParser());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I was confused because there are some logic to handle bwc in NestedPathFieldMapper.

fields.add(new Field(fieldType().name(), MapperService.SINGLE_MAPPING_NAME, fieldType()));
if (fieldType().hasDocValues()) {
fields.add(new SortedSetDocValuesField(fieldType().name(), new BytesRef(MapperService.SINGLE_MAPPING_NAME)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to index root document in this field. We use the nested path to find nested documents but we don't use this field to detect root documents so this indexing would be redundant ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I pushed 7ea0da8

@romseygeek
Copy link
Contributor Author

@elasticmachine update branch

@romseygeek
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments, LGTM otherwise.

@@ -118,7 +118,7 @@
//TODO this needs to be cleaned up: _timestamp and _ttl are not supported anymore, _field_names, _seq_no, _version and _source are
//also missing, not sure if on purpose. See IndicesModule#getMetadataMappers
private static final String[] SORTED_META_FIELDS = new String[]{
"_id", IgnoredFieldMapper.NAME, "_index", "_routing", "_size", "_timestamp", "_ttl", "_type"
"_id", IgnoredFieldMapper.NAME, "_index", "_routing", "_size", "_timestamp", "_ttl", "_type", "_nested_path"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to preserve the sort order ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, will fix. I had a look at fixing the TODO above, but it's not all that straightforward unfortunately.

@romseygeek romseygeek merged commit 1dc9dd4 into elastic:master Jan 22, 2020
@romseygeek romseygeek deleted the types-removal/nested-field-mapper branch January 22, 2020 16:31
@jpountz jpountz mentioned this pull request Mar 11, 2020
66 tasks
@pgomulka pgomulka mentioned this pull request Mar 25, 2020
66 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >refactoring :Search Foundations/Mapping Index mappings, including merging and defining field types v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rethink how to index nested documents when types are gone
4 participants