ES|QL: Improve aggregation over constants handling #112392

astefan · 2024-08-30T11:48:31Z

This change consists of:

Add separate rule for dealing with nulls in aggregations
Duplicate SubstituteSurrogate in "Operator Optimization" batch
Many more tests
Add test for median_absolute_deviation function
Add mv handling to top function
Allows PropagateEvalFoldables rule to also deal with aggregate functions

Addresses part of #100634. Missing bits:

AwaitsFixes from LogicalPlanOptimizerTests cannot be removed yet. This likely has also to do with substitutions() batch -> NormalizeAggregate() rule (that needs to be added?) in LogicalPlanOptimizer
median_absolute_deviation is still pending on having its own mv_ function
st_centroid agg mv_* function still needs addressing
recent update: mv_values sister function is pending addition

Fixes #110257
Fixes #104430
Fixes #100170

Needs more tests for the new rule and the existent ones in LogicalPlanOptimizerTests.

Duplicate SubstituteSurrogate in "Operator Optimization" batch Many more tests Add tests for mad Add mv handling to top function

astefan · 2024-08-30T11:49:33Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/spatial.csv-spec

+null             |null                     |null
+;
+
+########### failing :-( with InvalidArgumentException: Does not support yet aggregations over constants


This will be removed once we have mv_ function for st_centroid_agg.

astefan · 2024-08-30T11:52:04Z

...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Top.java

@@ -197,11 +202,18 @@ public AggregatorFunctionSupplier supplier(List<Integer> inputChannels) {
    public Expression surrogate() {


The surrogate method, as it stands now, is more a "surrogate-expression-for-foldable-scenario" kind of method. This implies that the behavior that existed below before this change is not possible anymore.

Right - we should probably look into introducing a different interface altogether: surrogate was initially used for expressions that knew they'd be transformed.
But it evolved into a mechanism for "folding" however not to a value, but another expression (which itself might be foldable or not).

I'll leave this one for a follow up I think.

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

astefan · 2024-08-30T12:00:44Z

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/ReplaceAggregatesWithNull.java

+ * All aggregate functions that are also nullable (COUNT_DISTINCT and COUNT are exceptions), will get a NULL
+ * field replacement by the FoldNull rule, COUNT_DISTINCT will benefit from PropagateEvalFoldables.
+ */
+public final class ReplaceAggregatesWithNull extends OptimizerRules.OptimizerRule<Aggregate> {


This rule is a simplified variant of SubstituteSurrogates.

astefan · 2024-08-30T12:01:25Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/SubstituteSurrogates.java

        boolean changed = false;
+        boolean hasSurrogates = false;


I've done this to shortcircuit the execution earlier in the execution.

astefan · 2024-08-30T12:02:10Z

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

@@ -5091,7 +5091,7 @@ public void testNullFoldingDoesNotApplyOnAggregate() throws Exception {
        CountDistinct countd = new CountDistinct(EMPTY, getFieldAttribute("a"), getFieldAttribute("a"));
        assertEquals(countd, rule.rule(countd));
        countd = new CountDistinct(EMPTY, NULL, NULL);
-        assertEquals(new Literal(EMPTY, null, LONG), rule.rule(countd));
+        assertEquals(countd, rule.rule(countd));


This is the consequence of CountDistinct not being nullable anymore.

costin

LGTM - great tests and comments!

costin · 2024-09-02T16:02:48Z

.../src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/CountDistinct.java

@@ -151,6 +152,11 @@ public DataType dataType() {
        return DataType.LONG;
    }

+    @Override
+    public Nullability nullable() {
+        return Nullability.FALSE;


costin · 2024-09-02T16:05:37Z

...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Top.java

@@ -197,11 +202,18 @@ public AggregatorFunctionSupplier supplier(List<Integer> inputChannels) {
    public Expression surrogate() {


Right - we should probably look into introducing a different interface altogether: surrogate was initially used for expressions that knew they'd be transformed.
But it evolved into a mechanism for "folding" however not to a value, but another expression (which itself might be foldable or not).

ivancea · 2024-09-02T16:44:25Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Values.java

+
+    @Override
+    public Expression surrogate() {
+        return field().foldable() ? field() : null;


Values not only merges values, but also removes duplicates (If no test was triggered because of this, we should add some!)

ROW x = [1, 1, 2] | STATS a = VALUES(x) -> [1, 2]

Good catch, BUT I think we have a problem with the documentation. It's not mentioning this aspect. There were other misses in our functions docs (which are fixed in this PR), I think we need to review our documentation on functions and double check its correctness and completeness. I will create an issue.

@ivancea thank you for pro-actively checking this PR 🙏, that was very helpful.
I've created two issues:

improving our documentation: ES|QL: review, double check and add missing bits to functions documentation #112437

this PR also now depends on pending mv_values addition: ES|QL: add mv_values function #112445

alex-spies

This is great @astefan ! I think this change is sound and added mostly minor remarks.

My only major remark is: I think we need LogicalPlanOptimizerTests cases that prove that the foldable propagation actually takes place. The csv tests are great, but they do not prove that foldable propagation actually takes place, only that the result is correct.

But you already mentioned more optimizer tests as one of the tasks to un-draft :)

x-pack/plugin/esql/qa/testFixtures/src/main/resources/meta.csv-spec

alex-spies · 2024-09-03T12:27:08Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/row.csv-spec

Out of scope: stats.csv-spec has a bunch of ...OfConst tests that overlap a lot with the tests here, except that they normally start with from employees. Because these test stats more than row, maybe we should move test cases like row ... | stats ... from here to stats.csv-spec in a follow-up PR.

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec

alex-spies · 2024-09-03T14:53:50Z

.../esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/PropagateEvalFoldables.java

+            } else if (p instanceof Aggregate agg) {
+                List<NamedExpression> newAggs = new ArrayList<>(agg.aggregates().size());
+                agg.aggregates().forEach(e -> {
+                    if (Alias.unwrap(e) instanceof AggregateFunction) {


This looks like it cannot propagate into the groups, as in

... | eval x = [1,2,3] | stats sum(field) by x

right? Maybe it's worth adding a comment.

That's another thing we could optimize though if needed, as I think STATS ... BY const is the same as STATS ... | eval x = mv_values(const) | mv_expand x. Not sure that's worth maintaining an optimization rule for, though.

Yeah, it's not covered. Unintentionally, just I didn't think about this use case.
Will leave it for a follow up, though. There are many things going on in this PR.

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/ReplaceAggregatesWithNull.java

alex-spies · 2024-09-03T15:28:47Z

...ql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/ReplaceAggregatesWithNull.java

+            } else {
+                // All aggs actually have been optimized away
+                // \_Aggregate[[],[AVG([NULL][NULL]) AS s]]
+                // Replace by a local relation with one row, followed by an eval, e.g.
+                // \_Eval[[MVAVG([NULL][NULL]) AS s]]
+                // \_LocalRelation[[{e}#21],[ConstantNullBlock[positions=1]]]
+                plan = new LocalRelation(
+                    source,
+                    List.of(new EmptyAttribute(source)),
+                    LocalSupplier.of(new Block[] { BlockUtils.constantBlock(PlannerUtils.NON_BREAKING_BLOCK_FACTORY, null, 1) })
+                );
+            }


The code in lines 86-106 also happens in SubstituteSurrogates, and kinda-sorta also in ReplaceStatsAggExpressionWithEval. I opened #110345 but maybe, instead of reducing the number of opt. rules, we should just refactor the code path that moves expressions out of aggregates and into evals. We could start here and make sure the code is the same as in SubstituteSurrogates.

…aggregations_over_constants

argument

…aggregations_over_constants

Introduce isConstantFoldable() for aggregate functions

…aggregations_over_constants

ivancea

LGTM!

ivancea · 2024-10-08T10:25:24Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/spatial.csv-spec

+FROM airports 
+| eval z = TO_GEOPOINT(null) 
+| STATS centroidNull = ST_CENTROID_AGG(null), 
+        centroidExpNull = ST_CENTROID_AGG(TO_GEOPOINT(null::string)), 


This one is converted to null, which I suppose is not expected from a user perspective. Is this something to be fixed later? Is it related with #108215?
Maybe it should have a comment or something here

++ The whole result here looks inconsistent.

I expected null since this is consistent with e.g. SUM and AVG over 0 rows/rows with only null values.

Currently, running the same aggregation on an empty index returns POINT (NaN NaN), which itself is inconsistent with the fact that we shouldn't have NaN values in our results - but maybe this consistent with geospatial standards?

In any case, the 3 results should be the same. But it's fine to fix this in a follow-up issue. Let's figure out what the result should be and throw this inconsistency into an issue (new or existing).

@craigtaverner , is POINT(NaN NaN) really the result we need to return?

Per #106025, returning null is correct! Thanks @craigtaverner for pointing me to this issue!

ivancea · 2024-10-08T10:40:20Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats_mad.csv-spec

There's a median_absolute_deviation.csv-spec

alex-spies · 2024-10-14T13:11:30Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats_percentile.csv-spec

+       p50 = 40+10 
+| stats percentile(value200, p90), 
+        percentile(value200, p100), 
+        percentile(value200, p3_null), 


Shouldn't percentile(x, null) be invalid? Similarly for count_distinct(x, null).

That's a good question. Here's the paradox: percentile value (the second argument) must be a number between 0 and 100. If it's 1000 we are issuing a warning and treat the result of the agg as null; by the principle that we issue warnings and return nulls when things "don't work" I think we should treat percentile as null if the second parameter is null.

Yes, it's not a value between 0 and 100, it's not a missing value, it's.... "unknown" value, a null. Which is a bit special than missing or a number between 0 and 100. In main this query fails with an ugly exception.

There are other problems or, better said, inconsistencies: weighted_avg specifically forbids null at data type validation time (weighted_avg(x, null)), but it doesn't do it when the value received is the result of an evaluation (eval n = null + 2 | stats weighted_avg(x, n). If weighted_avg does check for null and forbids it explicitly, percentile doesn't even do it.

I don't have an answer for you to this question right now. There should be a generic approach to dealing with aggregations' arguments that accept only constants: top, percentile, weighted_avg.

The behavior is inconsistent and needs to be fixed, though not in this PR.
If the arguments have to be literals, then throwing an exception (invalid query) would be the way to go.
If not, then folding+ warning is the way to go.

For the former, the function implementation would have to be improved to rely on Validatable interface. Another way, would be marking the field as foldable/not-null and doing this in the planner as it will work across all functions.

We don't have to hash this out now, but maybe it'd be safer to start with a validation exception now - we can still go back and return null later, while the other way around could be considered a breaking change, albeit in a very edge case scenario.

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats_count_distinct.csv-spec

alex-spies · 2024-10-14T13:18:45Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats_top.csv-spec

How about some tests where the limit and/or the "asc"/"desc" is propagated? And, similarly, tests for weighted_avg where the weights are propagated?

I've tried that but the situation is problematic because of the "surrogate" expressions abuse. Right now the validation of these constant arguments in percentile and weighted_avg is done in resolveType which is not completely correct. It should be done in validate after any constant values are propagated, folded and replaced.

If we do this in validate then in some cases the validation is not happening at all because weighted_avg disappears, being replaced by one of the formulas in surrogate. For example ROW values=[1,2,3,4,5] | weighted_avg(values, 0) - if we do the validation properly in validate that method won't even be called because when it is there is no WeightedAvg instance anymore (it's getting replaced with MvAvg(s, field)). Unless we do something about aggregation foldability and surrogate expressions we are in a deadlock: we fix something but break something else.

alex-spies · 2024-10-14T13:23:11Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats_percentile.csv-spec

+;
+
+percentileOfNullsOnRealIndex
+from employees | eval x = null::integer, y = null, z = 123 - null | stats percIntNull = percentile(y, 90), percNull = percentile(y, 90), percentile(null, 90), percentile(null+2, 90), percentile(z, 90) by languages | sort languages desc;


I think the test fix got lost :) It's only reformatted in the current version, but the definitions of percIntNull and percNull are still the same.

alex-spies

I have a small round of comments - plan to finalize the re-review tomorrow :)

alex-spies

I think this is a very good PR that we should go forward with, after resolving whatever conflicts may arise from the STATS ... WHERE ... support that was merged today.

The only thing I think this is really missing is optimizer tests that demonstrate that foldable propagation (of non-null expressions) actually takes place. And we should double check if we really want percentile(x, null) and count_distinct(x, null) to just be null instead of an invalid query, as that's a decision we won't be able to take back without becoming a breaking change.

Other than that, my remarks are mostly minor.

alex-spies · 2024-10-15T06:49:31Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/spatial.csv-spec

+FROM airports 
+| eval z = TO_GEOPOINT(null) 
+| STATS centroidNull = ST_CENTROID_AGG(null), 
+        centroidExpNull = ST_CENTROID_AGG(TO_GEOPOINT(null::string)), 


++ The whole result here looks inconsistent.

I expected null since this is consistent with e.g. SUM and AVG over 0 rows/rows with only null values.

Currently, running the same aggregation on an empty index returns POINT (NaN NaN), which itself is inconsistent with the fact that we shouldn't have NaN values in our results - but maybe this consistent with geospatial standards?

In any case, the 3 results should be the same. But it's fine to fix this in a follow-up issue. Let's figure out what the result should be and throw this inconsistency into an issue (new or existing).

@craigtaverner , is POINT(NaN NaN) really the result we need to return?

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats_count_distinct.csv-spec

alex-spies · 2024-10-15T09:11:53Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats_count_distinct.csv-spec

+from employees | eval x = 82+null | stats count_distinct(salary, x*100), count_distinct(salary, null), count_distinct(salary, null + 1);
+
+count_distinct(salary, x*100):long|count_distinct(salary, null):long|count_distinct(salary, null + 1):long
+100                               |100                              |100


Are we sure we want null precision to be interpreted as default precision? That could hide mistakes in the query.

I am not so sure :-), now that you mentioned this aspect. Maybe we want null here, but at the same time count_distinct and count DO make sense to be not nullable, counting something should always be something concrete.

If null is not the default precision and we cannot return null as the return value of count_distinct, should we error out then? (like weighted_avg and percentile)

counting something should always be something concrete.

COUNT(null) and COUNT_DISTINCT(null, ...) do return 0 now.
Maybe not really comparable, but OTOH MV_COUNT(null) returns null.

However, to cast a vote, COUNT_DISTINCT(..., null) feels like it should return null for the same reason why 1 + null (or some random SUBSTRING(..., 1, null)) should: the operation is applied on a missing / unknown value.
Edit: ... i.e. different from a COUNT_DISCINCT(null, 5), where it's known there's nothing to count and a concrete 0 makes already sense. Not a strong argument, tho.
But then also COUNT_DISTINCT(null, null) could return 0.

count are special in that they are not nullable - they always return a value. If anything, MV_COUNT should be aligned with it.
It's up to the function to decide if null is treated as "default value" or invalid. Since there's no COUNT_DISTINCT(field), I'd opt for null meaning default.

I'd be in favor of COUNT_DISTINCT(field, null) resulting in an invalid query exception via the Validatable interface - but I'm fine with default precision, as long as we document this clearly :)

...ain/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/ReplaceAggregatesWithNull.java

alex-spies · 2024-10-15T09:40:52Z

...ain/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/ReplaceAggregatesWithNull.java

-            if (newAggs.isEmpty() == false) {
-                plan = new Aggregate(source, aggregate.child(), aggregate.aggregateType(), aggregate.groupings(), newAggs);
+            if (remainingAggregates.isEmpty() == false) {// build the new Aggregate with the rest (non-null) aggregates
+                plan = new Aggregate(source, aggregate.child(), aggregate.aggregateType(), aggregate.groupings(), remainingAggregates);
            } else {
                // All aggs actually have been optimized away


For the LocalRelation substitution here to be correct, we mustn't have a BY clause - otherwise, the number of rows still depends on the number of groups.

Normally, if there's a BY clause, this means there'll also be a corresponding aggregate, in which case remainingAggregates will not be empty.

But! Other optimization rules can (and do!) optimize away the grouping from the aggregates if it's not used downstream.

This rule seems unaffected because I tried

FROM test | stats x = avg(null) by b | drop b FROM test | eval i = null | stats x = avg(i) by b | drop b

and it produced correct results.

But I think we should

throw IAE if we end up here and the aggregate has a grouping, and

add a test for good measure, e.g. the queries I wrote above.

I am not following and I need more details. remainingAggregates being empty means exactly that - the BY part is empty, it was folded no null by this rule. If BY is reduced to null this means only one row in the results.

Actually, I found a reproducer on your branch:

ROW field = null::integer, otherfield = [1,2] | STATS min(field) by otherfield | DROP otherfield min(field) --------------- null

This should instead be

min(field) --------------- null null

(as it is on main) because there 2 values for otherfield and thus 2 groups, resp. 2 output rows from the STATS.

It's insufficient to look at the aggregates. We need to check aggregate.groupings() instead, as dropping the groupings after the stats lets us optimize them away from the aggregates.

The removal of the groupings from the aggregates happens in CombineProjections, like in the example above:

[2024-10-17T11:25:35,813][INFO ][o.e.x.e.o.LogicalPlanOptimizer] [runTask-0] Rule logical.CombineProjections applied Limit[1000[INTEGER]] = Limit[1000[INTEGER]] \_EsqlProject[[min(field){r}#7]] ! \_Aggregate[STANDARD,[otherfield{r}#4],[MIN(field{r}#2,true[BOOLEAN]) AS min(field)]] \_Aggregate[STANDARD,[otherfield{r}#4],[MIN(field{r}#2,true[BOOLEAN]) AS min(field), otherfield{r}#4]] ! \_Row[[TOINTEGER(null[NULL]) AS field, [1, 2][INTEGER] AS otherfield]] \_Row[[TOINTEGER(null[NULL]) AS field, [1, 2][INTEGER] AS otherfield]] !

I don't think CombineProjections is to blame, it is doing the right thing. This can be seen with queries that have their projections correctly pruned away.

The "workaround" of calling SubstitueSurrogates again in the "Operator Optimization" is to blame here and this shows the vicious cycle of fixing something - breaking something else that I mentioned before. Foldability and surrogate expressions represent a faulty principle and when things start to not make sense and the code "struggles" to make the right thing (by using workarounds like calling SubstituteSurrogates twice) then it is clear that something more fundamental is not right.

At this point there are two options available:

create an issue for this and merge the PR for the benefits of the tests it adds. This though introduces a bug in a edgy situation (the one that we know of, there may be others that we don't)

try to make SubstituteSurrogates don't do its thing for aggregations that have a grouping which is not kept as a projection anymore. This is clearly a workaround that tries to ignore what CombineProjections did. I am trying now this second idea.

alex-spies · 2024-10-15T11:07:22Z

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

+            from test
+            | stats x = percentile(languages, languages) by emp_no


suggestion: we could also test

from test | eval l = languages | stats x = percentile(l, l) by emp_no

And the same with rename instead of eval.

I just checked manually and that seems to work fine :)

alex-spies · 2024-10-15T11:31:38Z

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

+            from test
+            | eval x = null + 1, y = salary / 1000
+            | limit 5
+            | stats s = avg(y) by x


I think we could add optimizer tests with varying (propagatable/literal) nulls:

multiple agg functions, only some of which receive null (after propagation or literally) the others receive non-foldables

multiple agg functions, some of which receive null, the others receive foldable consts (no agg functions with non-foldables) - with and without a BY clause

multiple agg functions, all of which receive null (after propagation or literally) and there is no BY clause

That's because we hit a different code path than before - e.g. it's not SubstituteSurrogates that takes care of a STATS that's been optimized away, but ReplaceAggregatesWithNull. And it's also good to see that SubstituteSurrogates and ReplaceAggregatesWithNull work well together. And that ReplaceAggregatesWithNull correctly leaves agg functions in place that don't receive null (although that's already kinda tested by using a non-foldable BY clause)

alex-spies · 2024-10-15T11:44:33Z

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

+     *   \_Eval[[$$COUNT$$$SUM$s$0$0{r$}#24 * 10[INTEGER] AS $$SUM$s$0, $$SUM$s$0{r$}#22 / $$COUNT$s$1{r$}#23 AS s]]
+     *     \_Aggregate[STANDARD,[y{r}#6],[COUNT([*][KEYWORD]) AS $$COUNT$$$SUM$s$0$0, COUNT(10[INTEGER]) AS $$COUNT$s$1, y{r}#6]]


We have a problem here: we attempt the surrogate substitution before propagating foldables - this replaces the avg by sum/count instead of simply mv_avg(9+1). If we propagate first, the plan should be identical to that in testAvg_Of_Foldable_NonNull below.

I think this corroborates the observation that substitutions like avg -> sum/count should be separated from the const substitutions avg(const) -> mv_avg(const).

Let's add a comment that we want/could improve this - because the test doesn't document the "should be" state, but the current "is" state.

alex-spies · 2024-10-15T11:48:20Z

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

+            from test
+            | eval x = 9 + 1, y = salary / 1000
+            | limit 5
+            | stats s = avg(x) by y


I think we should add a test where foldable propagation actually succeeds. E.g. with

stats s = sum(x) by y

it should work - and another case each for propagation into the second argument of percentile and count_distinct is also important.

alex-spies · 2024-10-15T12:03:59Z

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

weighted_avg and top are special in that they can take 2(+) arguments which, in case of weighted_avg, can even be non-foldable. I think we should add tests that show foldable propagation and propagation of nulls into weighted_avg and top as well.

…aggregations_over_constants

error message

alex-spies

Heya, just wanted to pick this up again and summarize what I think we need to do:

Per the discussion with Costin, percentile(field, null) is still not well defined (null vs invalid query) - okay to hash this out in a follow-up but maybe invalidating for now is safer w.r.t. bwc.
ST_CENTROID_AGG(null) should return null instead of Point(NaN NaN).
Some additional test cases won't hurt.
Fixing this edge case where all agg functions are optimized away

Consider this unblocked from my side as my main reason for requesting changes was the discussion about percentile(field, null) and similar cases. We could solve some problems in follow-up PRs as well, as I think the general approach here works :)

alex-spies · 2024-11-18T15:12:47Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats_percentile.csv-spec

+       p50 = 40+10 
+| stats percentile(value200, p90), 
+        percentile(value200, p100), 
+        percentile(value200, p3_null), 


We don't have to hash this out now, but maybe it'd be safer to start with a validation exception now - we can still go back and return null later, while the other way around could be considered a breaking change, albeit in a very edge case scenario.

…aggregations_over_constants

Add separate rule for dealing with nulls in aggregations

5159109

Duplicate SubstituteSurrogate in "Operator Optimization" batch Many more tests Add tests for mad Add mv handling to top function

astefan requested review from costin and alex-spies August 30, 2024 11:48

elasticsearchmachine added the v8.16.0 label Aug 30, 2024

astefan commented Aug 30, 2024

View reviewed changes

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java Show resolved Hide resolved

astefan commented Aug 30, 2024

View reviewed changes

One more test from one of the reported bugs

74129a0

costin approved these changes Sep 2, 2024

View reviewed changes

ivancea reviewed Sep 2, 2024

View reviewed changes

This was referenced Sep 2, 2024

ES|QL: review, double check and add missing bits to functions documentation #112437

Open

ES|QL: add mv_values function #112445

Closed

alex-spies reviewed Sep 3, 2024

View reviewed changes

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

astefan added 7 commits September 18, 2024 19:18

Addressing reviews

ef3960e

Make count_distinct deal with Validateable interface

c2db11c

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

d66004b

…aggregations_over_constants

More count_distinct fixes, more tests for percentile's foldable second

aedad55

argument

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

47db145

…aggregations_over_constants

Add more tests

4c03ce2

Introduce isConstantFoldable() for aggregate functions

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

0e8d484

…aggregations_over_constants

astefan marked this pull request as ready for review October 3, 2024 09:53

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Oct 3, 2024

astefan added >bug auto-backport-and-merge and removed needs:triage Requires assignment of a team area label labels Oct 3, 2024

ivancea approved these changes Oct 8, 2024

View reviewed changes

astefan mentioned this pull request Oct 9, 2024

Support aggregations across constants #100634

Open

4 tasks

alex-spies reviewed Oct 14, 2024

View reviewed changes

alex-spies self-requested a review October 14, 2024 15:08

alex-spies requested changes Oct 15, 2024

View reviewed changes

astefan added 5 commits October 15, 2024 16:36

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

d6ee3be

…aggregations_over_constants

Cleanup after merging "main"

020bc9f

Skip one more test

178c383

Change one test according to reviews

29dd29f

More reviews

8ab89dc

brianseeders added v8.17.0 and removed v8.16.0 labels Oct 16, 2024

Bug fix related to the source() used when creating the Validations

db2259b

error message

alex-spies requested review from alex-spies and removed request for alex-spies November 8, 2024 14:13

alex-spies reviewed Nov 18, 2024

View reviewed changes

elasticsearchmachine added v8.18.0 and removed v8.17.0 labels Nov 20, 2024

astefan added 3 commits November 27, 2024 18:23

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

db006d6

…aggregations_over_constants

Fix some things after pull from main

1ef3b87

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

968dcfb

…aggregations_over_constants

This was referenced Dec 6, 2024

ESQL: Rework isNull #118101

Merged

Add nullable to Categorize #118176

Closed

Mechanism for foldable aggregations in ES|QL #118292

Open

elasticsearchmachine added v8.19.0 v9.1.0 and removed v8.18.0 v9.0.0 labels Jan 30, 2025

		@@ -197,11 +202,18 @@ public AggregatorFunctionSupplier supplier(List<Integer> inputChannels) {
		public Expression surrogate() {

		from test
		\| stats x = percentile(languages, languages) by emp_no

		* \_Eval[[$$COUNT$$$SUM$s$0$0{r$}#24 * 10[INTEGER] AS $$SUM$s$0, $$SUM$s$0{r$}#22 / $$COUNT$s$1{r$}#23 AS s]]
		* \_Aggregate[STANDARD,[y{r}#6],[COUNT([*][KEYWORD]) AS $$COUNT$$$SUM$s$0$0, COUNT(10[INTEGER]) AS $$COUNT$s$1, y{r}#6]]

ES|QL: Improve aggregation over constants handling #112392

Are you sure you want to change the base?

ES|QL: Improve aggregation over constants handling #112392

Conversation

astefan commented Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alex-spies left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivancea left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alex-spies Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alex-spies left a comment

Choose a reason for hiding this comment

alex-spies left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bpintea Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alex-spies Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alex-spies left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan commented Aug 30, 2024 •

edited

Loading

alex-spies Oct 15, 2024 •

edited

Loading

bpintea Oct 16, 2024 •

edited

Loading

alex-spies Oct 15, 2024 •

edited

Loading