Skip to content

Ideas around mappings improvements #64663

Open
@javanna

Description

@javanna

This meta issue tracks some of the improvements and cleanups that we identified while working on the runtime fields project (#59332):

  • Remove the last usage of SearchContext#mapperService: it is used for nested objects
  • Remove QueryShardContext#getObjectMapper : it is mostly used for nested objects and it indirectly exposes retrieving MappedFieldType going through FieldMappers, which are not aware of runtime fields
  • Clean up SearchLookup handling in SearchExecutionContext: a separate lookup object needs to be created for the fetch phase which is passed through to MappedFieldFype#valueFetcher, yet the (wrong) lookup can also be retrieved directly from the SearchExecutionContext
  • Revise NestedScope on SearchExecutionContext
  • Clean up special handling for allowUnmappedFields and mapUnmappedAsText in SearchExecutionContext, they can be changed from consumers which seems dangerous and are only needed for the percolator.
  • Review null checks on what SearchExecutionContext#getFieldType returns. They most likely need to be converted to use QueryShardCOntext#isFieldMapped instead which does not take the allowUnmappedField and mapUnmappedAsString flags into account
  • Review calls to FieldsVisitor#postProcess and make sure that they are all necessary, in some cases we may be able to skip the call as it does nothing depending on which fields we are processing
  • Currently, when an index is created without providing mappings it will have a null document mapper until mappings are provided for it, or a document has indexed in it.
    MapperService exposes the DocumentMapper through which the mappings can be accessed. The fact that DocumentMapper can be null leads to null checks in many places and potential NPEs in callers. We'd like to eagerly create a document mapper for a new index, as soon as it gets created. This has the implication that we would lose the semantics around the fact that when a document mapper is null for an index, it means the index is empty.
    That would also allow us to apply index sorting when the index is created instead of delaying it to the document mapper creation.
  • We have a special case to handle the case of an empty doc being indexed, which triggers a dynamic mapping update although the mappings are empty, to ensure that whenever a doc is registered in an index, it has mappings and its corresponding document mapper will not be null then. This special case can be removed if we make sure that every index always has mappings
  • Should we move MapperRegistry to index.mapper package? (Move MapperRegistry to index.mapper package #69805)
  • Mapping is visible from outside of its package but all of its methods are called from within the same package. Review usages and possibly adjust methods visibility accordingly. Also, its instance fields are all package private and accessed directly without going through getters, yet the root object mapper has also a public getter that is only used within the same package. (Remove redundant methods from DocumentMapper #69803)
  • Make ContentPath less confusing
  • Rework ObjectMapper and RootObjectMapper to not rely on clone
  • Make MappedFieldType responsible for building their own field caps objects, and add in parent/nested information
  • Reduce the number of merge methods on MapperService, and clarify when they should be called
  • Merging does not require the entire MapperService: ideally all of the merge methods exposed by MapperService can be factored out to a lightweight MappingMerger component (merge itself was already removed from DocumentMapper and replaced by calling Mapping#merge directly). This means that it will no longer be required to build a full MapperService in the master node to execute merges when the indices are not allocated to it.
  • Rework NestedDocuments to depend on MappingLookup Use a mapping snapshot for fetching nested docs #66877 (@nik9000)
  • can we remove get field mappings API? or shall we refactor it to make some sense? see also Support include_defaults in the GET-mapping API #5368
  • Can we remove getMapper and fieldMappers from MappingLookup? Why would one have to go through mappers by name? Can we consolidate how to access fields in the mappings? What needs access to mappers and objectmappers? Can we have a unified lookup structure?
  • Do we even need a DocumentMapper given that MappingLookup became almost a copy of it? We should probably use MappingLookup whenever possible and re-evaluate what we need DocumentMapper for.\
  • Rewrite tests (e.g. MapperServiceTestCase) to rely less, or not at all, on DocumentMapper. Many tests build a MapperService only to use a small portion of it, or merge mappings, or retrieve document mapper from it.

Caching cleanups

  • Remove MapperService from SearchExecutionContext entirely. We'd prefer there not be an easy way for the context to get a new snapshot of the mapping. OTOH we don't really want to merge all of the immutable mapping stuff into MappingLookup. We have to decide how to remove it.
  • Remove SearchContext's reference to MapperService. This includes exposing NestedDocuments in SearchExecutionContext so that DefaultSearchContext takes it from there. And cleaning up access to indexService.mapperService() in DefaultSearchContext
  • Replace or remove the parsing function passed to MappingLookup, which hides a reference to DocumentMapper.
  • Pipe all volatile read methods on MapperService through #mappingLookup() to make sure we perform a single volatile read. Or we could remove them entirely in favor the caller going through mappingLookup(). If the caller keeps the mappingLookup for the duration of the operation then the caller will get a consistent snapshot of the mapping during the entire process. This can potentially become a big change but it can be done in steps, for instance we can start from duplicated methods that are exposed in MapperService and DocumentMapper (e.g. simpleMatchToFullName), that could be rather accessed through retrieving MappingLookup.
  • MapperService#getObjectMapper is only used in tests, it can probably be replaced by MappingLookup#getObjectMapper
  • Make all FieldMappers "obviously" immutable.
  • Fix caching issues around reloading analyzers (Reloading analyzers doesn't bust the request cache #66722)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions