Change terms aggregation default execution mode for tsdb #101619

martijnvg · 2023-10-31T15:10:10Z

An tsdb backing index when current time is before index.time_series.end_time will receive a lot of writes. Building global ordinals doesn't make sense, because it is expensive and the next search can't reuse it.

This change changes the default execution mode for tsdb backing indices where this is the case. The default will then be MAP instead of GLOBAL_ORDINALS.

This should improve query latency.

An tsdb backing index when current time is before index.time_series.end_time will receive a lot of writes. Building global ordinals doesn't make sense, because it is expensive and the next search can't reuse it. This change changes the default execution mode for tsdb backing indices where this is the case. The default will then be MAP instead of GLOBAL_ORDINALS. This should improve query latency.

elasticsearchmachine · 2023-10-31T15:10:57Z

Hi @martijnvg, I've created a changelog YAML for you.

elasticsearchmachine · 2023-11-03T06:47:49Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

jpountz · 2023-11-03T10:55:13Z

As discussed today, it would be nice to have another execution mode that maps values to bucket ordinals like the MAP execution mode, but also takes advantage of the TSDB index sort by caching the bucket ordinal of the previous document. Then if the current document has the same value ordinal as the previous document, we know it has the same bucket ordinal as well and we can skip an expensive lookup in a BytesRefHash.

martijnvg · 2023-11-06T08:46:13Z

I updated the PR to the suggested change, but instead implemented it as a CollectorSource so that much of the MapStringTermsAggregator can be reused. Unfortunately I'm seeing mixed results in the tsdb k8s rally track. I think this has various reasons:

There is a bug in the data set. The kubernetes.pod.uid field was is a dimension field contains a unique value per document. But in reality there should only 9932 unique pods in this data set (the cardinality of the pod name field is 9932). The suggested optimisation that was added then doesn't help.
The result of the queries are noisy. Indexing and searching happens at the same time (which is realistic for many use cases and that is the focus of that track). However because of that the query latency can fluctuate. This makes it difficult to see certain improvements.

The trend looks like that the all the last 15 minute searches benefit from this change. While the last 24 hours searches don't benefit from this change and actually report a worse query latency with this change. I suspect having a heuristic is estimate to match with X percent of the documents would make be helpful here.

I will rerun the rally track when the track's data set is regenerated.

elasticsearchmachine · 2024-02-14T18:14:12Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

martijnvg added >enhancement :Analytics/Aggregations Aggregations labels Oct 31, 2023

elasticsearchmachine added the v8.12.0 label Oct 31, 2023

Update docs/changelog/101619.yaml

b4039a8

martijnvg added 4 commits October 31, 2023 16:59

iter

bc73a43

iter

3b9cbb5

Merge remote-tracking branch 'es/main' into terms_agg_tsdb

ede3870

iter

dc815ef

martijnvg marked this pull request as ready for review November 3, 2023 06:47

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 3, 2023

martijnvg added 2 commits November 4, 2023 20:15

added SegmentOrdinalsCollectorSource for tsdb dimension fields.

798809f

added specialized impls

d4fd6a0

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

martijnvg mentioned this pull request Feb 12, 2024

Improve metric query performance #95776

Closed

7 tasks

elasticsearchmachine added v8.14.0 and removed v8.13.0 labels Feb 14, 2024

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change terms aggregation default execution mode for tsdb #101619

Change terms aggregation default execution mode for tsdb #101619

martijnvg commented Oct 31, 2023

elasticsearchmachine commented Oct 31, 2023

elasticsearchmachine commented Nov 3, 2023

jpountz commented Nov 3, 2023

martijnvg commented Nov 6, 2023

elasticsearchmachine commented Feb 14, 2024

Change terms aggregation default execution mode for tsdb #101619

Are you sure you want to change the base?

Change terms aggregation default execution mode for tsdb #101619

Conversation

martijnvg commented Oct 31, 2023

elasticsearchmachine commented Oct 31, 2023

elasticsearchmachine commented Nov 3, 2023

jpountz commented Nov 3, 2023

martijnvg commented Nov 6, 2023

elasticsearchmachine commented Feb 14, 2024