Description
Runtime fields
We would like to increase the flexibility of the search API by introducing support for runtime fields.
Runtime fields are not indexed and do not have doc_values, meaning Lucene is completely unaware of them, but they are consumed through the field capabilities API and the search API like any ordinary field. It is possible to retrieve them as well as query them, aggregate and sort on them.
Runtime fields make searches slower, as computing their values for each document (that may match the query) is costly, depending on how they are calculated; it is highly recommended that the Async Search API is used to run searches that use runtime fields.
One limitation of runtime fields compared to ordinary fields is that they don’t support scoring as they are not indexed and we are not going to compute the document frequency for them, which is required for scoring.
Runtime fields are not part of the _source, hence they are not returned by default as part of the search hits. They can be specifically requested through the field retrieval API (#55363).
A runtime field is defined by its data type and the script that computes its values. As of today, each search section already supports scripting, but the contexts are different, as well as the required syntax. We want to unify this to a single place where a script can be specified. Such a script always has access to _source
, any other stored fields as well as doc_values.
A runtime fields can be defined in the mappings by adding its definition to a new runtime
section at the same level as properties
, where fields that exist in _source are defined:
PUT /my-index/_mappings
{
"runtime" : {
"day_of_week" : {
"type" : "keyword",
"script" : {
"source" : "emit(doc['timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
}
}
}
}
The data types supported for runtime fields are initially keyword
, long
, double
, date
, ip
, boolean
, geo_point
. In the example above, we extract the day of week (e.g. Monday) from another field called timestamp
which is defined as a date
. The script can refer to other fields, including other runtime fields: we need to implement a mechanism to resolve fields dependencies in the correct order, and prevent cyclic dependencies.
The defined field can then be used like any other field in the different sections of the search API:
GET my_index/_search
{
"aggs" : {
"days_of_week" : {
"terms" : {
"field" : "day_of_week"
}
}
}
}
Each runtime field type will consist of a MappedFieldType
that exposes a runtime fielddata implementation that generates doc_values on the fly for the needed data type. Additionally, all the basic Lucene queries for each runtime field type need to be written to query the corresponding fielddata/doc_values implementation.
Support for runtime fields in Elasticsearch will be released under the Elastic license.
The following is a high-level list of tasks required to develop the initial support for runtime fields, which will lay the foundations for the next phases:
Mappers and field types
- make
ScriptService
available when parsing field mappers (Add the ScriptService to the field parser config #60933) - Implement
runtime
field mapper with dynamic mapped field type, one per supported runtime_type (Scripted keyword field #58939, Make ScriptFieldMapper a parameterized mapper #59391, Runtime fields: rework script service injection #59659, Render script params in script field xcontent #59813, Standardize script field's rejection error #60029, Replace script unit tests with integration tests #60027, Runtime script field mapper to reject copy_to and fields #60580) - Implement fielddata and queries for
keyword
field type (Add term query for keyword script fields #59372, Two queries for keyword script field #59527, Add tests for keyword script field's fielddata #59523, Remaining queries for script keyword fields #59630, Scripted keyword field type: update family type and test field caps output #59672, Error on bad shape relations in runtime fields #60463) - Implement fielddata and queries for number field types
- Implement fielddata and queries for
date
field type (Add runtime_script date field #60092, Format support for script doc fields #60465, Add a consistent way to parse dates #61105) - Implement fielddata and queries for
ip
field type (Implement runtime script ips #60533 + Fixvalue
method on ip scripts #61230) - Implement fielddata and queries for
boolean
field type (Add boolean values script fields #60830)
API
- Search API: integrate with field retrieval API to allow retrieval of runtime fields (Add fetch fields support for runtime fields #60775)
- Search API: reject queries against runtime fields when expensive queries are disallowed (Mark all scripted field queries as expensive #59658)
- Disallow using runtime fields in index sorting (Pass a SearchLookup supplier through to fielddataBuilder #60224, Implement distance_feature for runtime dates #60851)
Scripting
- Define script syntax for emitting single value as well as arrays/collections: conversion functions VS emit value function (Static return values for scripting runtime fields #59647, Convert double script to return array #61504, Standardize runtime field emit methods #61752)
- Enable using regexes in painless by default (Painless Safety: Regexes #49873)
- Opt runtime fields scripts out of the script compilation limit. It'd be lame to fail a mapping update because you've compiled too many scripts lately. (Opt date valued script fields out of rate limit #61238 + Drop compile limit on runtime fields scripts #61297)
- Expose what fields a script refers to (Expose constant fields used by script #60001, Mute kerberos tests for JDK 8u[262,271) #60995) (done but not back-ported yet)
Infrastructure
- Protect against many values being returned from a script for the same document (Stop runtime script from emitting too many values #61938)
- Protect against cyclic runtime fields references at runtime (Prevent deep recursion in doc values #60318, Pass SearchLookup supplier through to fielddataBuilder #61430)
- Make
SearchLookup
available toMappedFieldType#fielddataBuilder
(Pass a SearchLookup supplier through to fielddataBuilder #60224. Pass SearchLookup supplier through to fielddataBuilder #61430) - Introduce runtime fields in our existing integration tests for more extensive test coverage (Break up a test for with runtime fields (brings #60931 to 7.x) #61114)
Security
- Test behavior when a script is executed that accesses fields that the user is not authorized to see (Test DLS and FLS against runtime fields #61820)
Telemetry
- Add info and usage endpoints for runtime fields (Add xpack info and usage endpoints for runtime fields #65600)
Docs
- Document
runtime
section and corresponding field types ([DOCS] Add docs for runtime fields #62653)- inconsistencies caused by updating a script while queries that rely on it are running
- existing queries / visualizations may break because runtime fields can be updated
- queries against runtime fields are deemed expensive and rejected when expensive queries are disallowed
- Document how to define runtime fields in a search request
- Document the ability to omit the script from the definition of a runtime field
- Document the ability to shadow existing fields with a runtime field
- Document dynamic runtime mode ([DOCS] Add dynamic runtime fields to docs (#66194) #66304)