Skip to content

Search 'fields' option design + implementation #55363

Closed
@jtibshirani

Description

@jtibshirani

Original issue: #49028
Feature branch: field-retrieval
Docs: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-fields.html

Motivation

Often a user wants to retrieve a particular set of fields during a search. Currently, we don't support this usage pattern in a good way. In short, given a list of fields, there is no easy way to load all of their values:

  • We can’t load all of them from doc values. Some fields like text fields may not have doc values at all, or we may exceed the limit for a reasonable number of doc value fields to load.
  • It’s not easy to load all of them through source. For example, if the field is a field alias, it’s difficult to determine where to find its value in the source.

Better field retrieval support is becoming even more important now that we're introducing more field types that don’t fit the typical pattern like constant_keyword and the proposed runtime fields (#48063).

Feature Summary

We plan to add a new fields section to the search request, which users would specify instead of using source filtering to load fields from source:

POST logs-*/_search
{
  "query": { "match_all": {} },
  "fields": [
    "file.*",
    {
      "field": "event.timestamp",
      "format": "epoch_millis"
    },
    ...
  ]
}

Both full field names and wildcard patterns are accepted. Only leaf fields are returned, the API will not allow for fetching object values. The fields are returned as a flat list in the fields section in each hit, the same as we do for docvalue_fields and script_fields.

Overall, the API gives a friendly way to load fields from source:

  • If a non-standard field like a field alias, multi-field, or constant_keyword is specified in fields, then we’ll consult the mappings to find and return the right value.
  • The fields are returned in a flat list, as opposed to structured JSON.
  • For date and numeric field types, we would support the same format parameter as we do for docvalue_fields to allow for adjusting the format of the results.
  • Each value would be returned in a 'canonical' format -- for example if a field is mapped as an integer, it will be returned as an integer even if it was specified as a string in the _source.

Some clarifications:

  • In this first pass, the API will not attempt to load from stored fields or doc values.
  • For simplicity of parsing, values will always be returned in an array, even if there is only one value present.

Implementation Plan

Future improvements:

Open Questions

  • If a wildcard pattern matches both a parent field and one of its multi-fields, should we just return the parent to avoid returning the same value twice? A similar question holds for field aliases and their target fields.
  • Should the API return fields in _source that have been disabled in the mappings (enabled: false)?
  • For keyword fields, should we apply the normalizer or return the original value?

Metadata

Metadata

Assignees

Labels

:Search/SearchSearch-related issues that do not fall into other categories>featureMetaTeam:SearchMeta label for search team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions