-
Notifications
You must be signed in to change notification settings - Fork 25.4k
[Transform] optmize histogam group_by change detection #74031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Transform] optmize histogam group_by change detection #74031
Conversation
Pinging @elastic/ml-core (Team:ML) |
a519b38
to
38a31b9
Compare
...stTest/java/org/elasticsearch/xpack/transform/integration/continuous/HistogramGroupByIT.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/transform/transforms/pivot/CompositeBucketsChangeCollector.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/transform/transforms/pivot/CompositeBucketsChangeCollector.java
Show resolved
Hide resolved
minAggregationOutputName = COMPOSITE_AGGREGATION_NAME + "." + targetFieldName + ".min"; | ||
maxAggregationOutputName = COMPOSITE_AGGREGATION_NAME + "." + targetFieldName + ".max"; | ||
|
||
// the time field for the date histogram is different than for sync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand how does this comment relate to the surrounding code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 copy paste error ;-)
...java/org/elasticsearch/xpack/transform/transforms/pivot/CompositeBucketsChangeCollector.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
implement a simple change optimization for histograms using min and max aggregations. The optimization is not applied if the range cutoff would be too small compared to the overall range from previous checkpoints. At least 20% must be cut compared to former checkpoints.
fixes #63801
The histogram case is tough, depending on the meaning of the value. If the value is a metric value it might not make sense to optimize this case as the range might be spread across the full range for all checkpoints. If the number is a case or issue number, I think terms should be used.
The most common scenario I am aware of is a monotonic increasing sequence number generated externally. For this case, the optimization should yield good results. I kept it simple, the change collector only remembers values in memory, so the transform has to run some checkpoints until it kicks in.