Skip to content

Speed up rounding in auto_date_histogram #56384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2020

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 7, 2020

This wires auto_date_histogram into the rounding optimization that I
built in #55559. This is should significantly speed up any
auto_date_histograms with time_zones on them.

This wires `auto_date_histogram` into the rounding optimization that I
built in elastic#55559. This is should significantly speed up any
`auto_date_histogram`s with `time_zone`s on them.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 7, 2020
@nik9000
Copy link
Member Author

nik9000 commented May 7, 2020

I'm not actually sure how to make a proper unit test that this is "plugged in". I will add some benchmark results with it eventually though. My desktop is currently busy benchmarking #56371.

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nik9000
Copy link
Member Author

nik9000 commented May 9, 2020

I've finally finished the benchmarks. When there is a time zone and the index contains a daylight savings time transition this cuts the runtime of auto_date_histogram by 65%. My benchmarks don't show the case where there isn't a transition but it's probably another ~30% faster again, if this is anything like date_histogram.

Before:

|                                                 Min Throughput |                      auto_date_histogram |         0.1 |  ops/s |
|                                              Median Throughput |                      auto_date_histogram |         0.1 |  ops/s |
|                                                 Max Throughput |                      auto_date_histogram |         0.1 |  ops/s |
|                                        50th percentile latency |                      auto_date_histogram |     11464.8 |     ms |
|                                        90th percentile latency |                      auto_date_histogram |       13392 |     ms |
|                                       100th percentile latency |                      auto_date_histogram |     13507.1 |     ms |
|                                   50th percentile service time |                      auto_date_histogram |     10032.3 |     ms |
|                                   90th percentile service time |                      auto_date_histogram |     10167.2 |     ms |
|                                  100th percentile service time |                      auto_date_histogram |     10735.9 |     ms |
|                                                     error rate |                      auto_date_histogram |           0 |      % |
|                                                 Min Throughput |              auto_date_histogram_with_tz |        0.03 |  ops/s |
|                                              Median Throughput |              auto_date_histogram_with_tz |        0.03 |  ops/s |
|                                                 Max Throughput |              auto_date_histogram_with_tz |        0.03 |  ops/s |
|                                        50th percentile latency |              auto_date_histogram_with_tz |     34068.5 |     ms |
|                                        90th percentile latency |              auto_date_histogram_with_tz |     34415.6 |     ms |
|                                       100th percentile latency |              auto_date_histogram_with_tz |     34798.8 |     ms |
|                                   50th percentile service time |              auto_date_histogram_with_tz |     34067.6 |     ms |
|                                   90th percentile service time |              auto_date_histogram_with_tz |     34414.5 |     ms |
|                                  100th percentile service time |              auto_date_histogram_with_tz |     34798.2 |     ms |
|                                                     error rate |              auto_date_histogram_with_tz |           0 |      % |

After:

|                                                 Min Throughput |                      auto_date_histogram |         0.1 |  ops/s |
|                                              Median Throughput |                      auto_date_histogram |         0.1 |  ops/s |
|                                                 Max Throughput |                      auto_date_histogram |         0.1 |  ops/s |
|                                        50th percentile latency |                      auto_date_histogram |     24352.4 |     ms |
|                                        90th percentile latency |                      auto_date_histogram |     33612.9 |     ms |
|                                       100th percentile latency |                      auto_date_histogram |     35688.8 |     ms |
|                                   50th percentile service time |                      auto_date_histogram |     10414.5 |     ms |
|                                   90th percentile service time |                      auto_date_histogram |     10539.9 |     ms |
|                                  100th percentile service time |                      auto_date_histogram |     10809.3 |     ms |
|                                                     error rate |                      auto_date_histogram |           0 |      % |
|                                                 Min Throughput |              auto_date_histogram_with_tz |        0.03 |  ops/s |
|                                              Median Throughput |              auto_date_histogram_with_tz |        0.03 |  ops/s |
|                                                 Max Throughput |              auto_date_histogram_with_tz |        0.03 |  ops/s |
|                                        50th percentile latency |              auto_date_histogram_with_tz |     14550.4 |     ms |
|                                        90th percentile latency |              auto_date_histogram_with_tz |     14969.7 |     ms |
|                                       100th percentile latency |              auto_date_histogram_with_tz |       15375 |     ms |
|                                   50th percentile service time |              auto_date_histogram_with_tz |     14531.7 |     ms |
|                                   90th percentile service time |              auto_date_histogram_with_tz |     14949.2 |     ms |
|                                  100th percentile service time |              auto_date_histogram_with_tz |     15354.9 |     ms |
|                                                     error rate |              auto_date_histogram_with_tz |           0 |      % |

Note: The "after" numbers for auto_date_histogram without a time zone are gnarly because I'm still working on dialing in the throughput to target.

@nik9000
Copy link
Member Author

nik9000 commented May 9, 2020

Note: The "after" numbers for auto_date_histogram without a time zone are gnarly because I'm still working on dialing in the throughput to target.

Actually both of them aren't quite right. But they can give you a sense that this particular auto_date_histogram is about 10 seconds without a time zone. When there is a time zone it is about 35 seconds before this PR and 15 after it. And it'd probably be about 10 seconds if there isn't a daylight savings time transition across the index.

@nik9000 nik9000 merged commit 12e9218 into elastic:master May 9, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 9, 2020
This wires `auto_date_histogram` into the rounding optimization that I
built in elastic#55559. This is should significantly speed up any
`auto_date_histogram`s with `time_zone`s on them.
nik9000 added a commit that referenced this pull request May 9, 2020
This wires `auto_date_histogram` into the rounding optimization that I
built in #55559. This is should significantly speed up any
`auto_date_histogram`s with `time_zone`s on them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants