Description
Some use cases have the desire to query data within a certain tier (or set of tiers), for example, in the presence of a data stream or alias using ILM, query data only in the "hot" tier. (See: #47881 where users have asked for ILM supporting aliases so that queries can target a specific lifecycle of data).
It could be nice to have a general purpose query that could be used for regular searching (as well as aggregations) that allowed specifying a "tier" of data to query. This would allow a query like:
{
"query": {
"bool": {
"filter": {
"tier": "hot"
}
}
},
"aggs": {...}
}
This is especially nice when users start using searchable snapshots for their data, as it would allow bypassing indices in other tiers (such as "cold" and "frozen") without requiring any sort of download of data.
One question that may come up is "why not just use a time range filter for getting the most recent data?". This is useful when only consuming a single set of data (such as a single data stream), but if we had a first-class query for data tier searching, multiple data streams and aliases could be queried that have differing "hot" tier definitions without requiring the user to both be aware of the timing for the tier and separate the filter range based on specific index patterns. For example: searching three data streams that have data in the hot phases for 7, 14, and 21 days respectively, using tier: hot
is much simpler than specifying three different range filters tied to three different data stream index names.
This also helps some of the use cases in #47881 while being accessible to both data streams and aliases.
If this is of interest, we could perform the filtering for this prior to any query execution as the tier is accessible through the index metadata and could be rewritten to exclude indices that aren't in the specified tier.