ESQL: Consider inlinestats when having field_caps check for field names #127564

astefan · 2025-04-30T12:53:18Z

The aggregate inside an inlinestats is "interfering" with the way field names are collected for field_caps requests. This made simple queries like from test | inlinestats max(whatever) by group to not return all fields from test, but to limit the resulting columns to whatever and group. inlinestats' purpose is to add columns to an already existent set of columns, which implies that this command has to be "transparent" to any wider collection of field names.

Fixes #127236

…field_names_for_inlinestats_fix

elasticsearchmachine · 2025-04-30T12:53:41Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-04-30T12:53:42Z

Hi @astefan, I've created a changelog YAML for you.

astefan · 2025-04-30T12:55:38Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec

@@ -360,14 +362,14 @@ FROM airports
 | LIMIT 3
 ;

-abbrev:keyword | city:keyword |       region:text | "COUNT(*)":long 


Unrelated to this PR, but to previous work.

astefan · 2025-04-30T12:56:41Z

...ulti-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java

@@ -127,7 +127,7 @@ protected void shouldSkipTest(String testName) throws IOException {
        assumeFalse("INLINESTATS not yet supported in CCS", testCase.requiredCapabilities.contains(INLINESTATS.capabilityName()));
        assumeFalse("INLINESTATS not yet supported in CCS", testCase.requiredCapabilities.contains(INLINESTATS_V2.capabilityName()));
        assumeFalse("INLINESTATS not yet supported in CCS", testCase.requiredCapabilities.contains(JOIN_PLANNING_V1.capabilityName()));
-        assumeFalse("INLINESTATS not yet supported in CCS", testCase.requiredCapabilities.contains(INLINESTATS_V5.capabilityName()));
+        assumeFalse("INLINESTATS not yet supported in CCS", testCase.requiredCapabilities.contains(INLINESTATS_V7.capabilityName()));


~~V7 because I am trying to work on multiple separate issues. V6 should come from #127383~~

elasticsearchmachine · 2025-04-30T12:57:58Z

Hi @astefan, I've updated the changelog YAML for you.

…astefan/elasticsearch into field_names_for_inlinestats_fix

…field_names_for_inlinestats_fix

elasticsearchmachine · 2025-05-05T12:43:38Z

Hi @astefan, I've updated the changelog YAML for you.

…astefan/elasticsearch into field_names_for_inlinestats_fix

alex-spies

Thanks @astefan ! The fix works and the added tests are nice. I found 2 buggy queries, but they are likely unrelated to this PR's work.

I think this solution is okay, but I'd prefer to avoid adding more complexity to the fieldNames method by special-casing for INLINESTATS. The fact this PR is required is because we parse INLINESTATS as an InlineStats node containing an Aggregate child (containing, in turn, the previous commands as grand-ancestors). Therefore, I'd like to suggest another approach which changes how we represent a parsed INLINESTATS - see below.

alex-spies · 2025-05-06T10:04:19Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec

I've added this test to the suite. Data types are ok from my tests, there are other things wrong with that query. I've added details about the failure to the csv test suite.

alex-spies · 2025-05-06T12:24:58Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

+        List<LogicalPlan> inlinestats = parsed.collect(InlineStats.class::isInstance);
+        Set<Aggregate> inlinestatsAggs = new HashSet<>();
+        for (var i : inlinestats) {
+            inlinestatsAggs.add(((InlineStats) i).aggregate());
+        }


The required solution here looks correct but confusing; this is because we parse INLINESTATS as an InlineStats node containing an Aggregate node as child, so we don't know for any given Aggregate if it's a STATS or an INLINESTATS, and the two have very different semantics.

I think we should rather just parse INLINESTATS as a single plan node - this would prevent this complexity.

Maybe consider refactoring the InlineStats node to avoid adding complexity here, as the fieldNames method is already hard to work with. A low effort fix would be to still have the InlineStats wrap an Aggregate, but not as its child - the actual child would be the preceding command.

More generally, I wonder if there's an abstraction just around the corner that would make away with more special-casing inside this method.

In terms of the sets of attributes before and after INLINESTATS, it behaves similarly to EVAL, DISSECT, GROK, ENRICH and COMPLETION: some attributes are required because they are being referred to, some attributes are newly added and they shadow previous attributes. In the optimizer, we leverage this fact in the push down rules; for this, the plan nodes just need to implement the GeneratingPlan interface.

I think it'd be nice to move this method in a direction that would rely more on this general pattern.

That's out of scope for this PR, of course, but it'd also benefit from parsing INLINESTATS simply as 1 node rather than a combination of 2 nodes.

Those are some good points (the use of GeneratingPlan and refactoring InlineStats), but I need more time to dig through these to prove these are valid changes to make. IMHO, the argument for simplifying what fieldNames is doing (looking at the aggregate inside an inlinestats) is not a strong one to warrant the refactoring. This change needs to be conceptually sound to make sense, ignoring the EsqlSession stuff.

Meaning, the conceptually sound argument needs to drive the refactoring and not the fact that fieldNames becomes more complex.

alex-spies · 2025-05-06T12:43:31Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java

+            | inlinestats max(salary) by l
+            | stats min = min(salary) by l
+            | eval x = min + 1
+            | stats ca = count(*), cx = count(x) by l


I think the same behavior is expected when this stats is replaced by a keep x, l (no wildcard), right?

Maybe let's add such tests, and also some where the STATS or KEEP (no wildcard) comes before the INLINESTATS, for good measure.

Added more tests

…field_names_for_inlinestats_fix

bpintea

I agree with Alex's observation in general, but I think the fix as is is fine and contained. We can consider redesigning INLINESTATS flowingly (maybe considering the join it actually is).

bpintea · 2025-05-21T14:00:00Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

+            plan -> plan instanceof Project
+                || (plan instanceof Aggregate agg && (inlinestatsAggs.isEmpty() || inlinestatsAggs.contains(agg) == false))


Suggested change

plan -> plan instanceof Project

|| (plan instanceof Aggregate agg && (inlinestatsAggs.isEmpty() || inlinestatsAggs.contains(agg) == false))

plan -> plan instanceof Project

|| plan instanceof Aggregate agg && inlinestatsAggs.contains(agg) == false

…field_names_for_inlinestats_fix

astefan · 2025-05-22T10:54:06Z

@elasticmachine run elasticsearch-ci/part-3

astefan · 2025-05-22T11:47:49Z

@elasticmachine run elasticsearch-ci/part-4

astefan · 2025-05-22T11:48:41Z

@elasticmachine run elasticsearch-ci/bwc-snapshots

astefan · 2025-05-22T13:41:38Z

@elasticmachine run elasticsearch-ci/part-4

elasticsearchmachine · 2025-05-22T14:54:19Z

💔 Backport failed

Status	Branch	Result
❌	9.0	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 127564

…es (elastic#127564) * Make inlinestats "transparent" to EsqlSession.fieldNames (cherry picked from commit 28b10c3)

…es (#127564) (#128345) * Make inlinestats "transparent" to EsqlSession.fieldNames (cherry picked from commit 28b10c3)

astefan added 2 commits April 30, 2025 15:47

Make inlinestats "transparent" to EsqlSession.fieldNames

c07d276

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

b6b4f52

…field_names_for_inlinestats_fix

astefan added >enhancement :Analytics/ES|QL AKA ESQL v9.1.0 v9.0.2 labels Apr 30, 2025

astefan requested a review from alex-spies April 30, 2025 12:53

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 30, 2025

Update docs/changelog/127564.yaml

b2d6299

astefan commented Apr 30, 2025

View reviewed changes

astefan added auto-backport Automatically create backport pull requests when merged >bug and removed >enhancement labels Apr 30, 2025

Update docs/changelog/127564.yaml

7e26ed7

elasticsearchmachine and others added 6 commits April 30, 2025 13:05

[CI] Auto commit changes from spotless

bee4222

Fix import

32438c9

Merge branch 'field_names_for_inlinestats_fix' of https://github.com/…

de0aa5c

…astefan/elasticsearch into field_names_for_inlinestats_fix

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

5e998ba

…field_names_for_inlinestats_fix

One more test

abeb667

Update docs/changelog/127564.yaml

762e05e

astefan added 2 commits May 5, 2025 16:39

Fix capability

ae8436f

Merge branch 'field_names_for_inlinestats_fix' of https://github.com/…

37f6d06

…astefan/elasticsearch into field_names_for_inlinestats_fix

alex-spies reviewed May 6, 2025

View reviewed changes

astefan added 2 commits May 20, 2025 17:36

More tests

481b91e

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

ab7dae7

…field_names_for_inlinestats_fix

astefan requested a review from costin May 20, 2025 14:40

bpintea approved these changes May 21, 2025

View reviewed changes

astefan added 2 commits May 21, 2025 17:43

Review

20652af

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

56f4513

…field_names_for_inlinestats_fix

elasticsearchmachine added v9.0.3 and removed v9.0.2 labels May 22, 2025

astefan merged commit 28b10c3 into elastic:main May 22, 2025
18 checks passed

elasticsearchmachine added the backport pending label May 22, 2025

astefan mentioned this pull request May 23, 2025

ESQL: Consider inlinestats when having field_caps check for field names (#127564) #128345

Merged

elasticsearchmachine pushed a commit that referenced this pull request May 23, 2025

ESQL: Consider inlinestats when having field_caps check for field nam…

90ef328

…es (#127564) (#128345) * Make inlinestats "transparent" to EsqlSession.fieldNames (cherry picked from commit 28b10c3)

astefan removed the backport pending label May 23, 2025

		plan -> plan instanceof Project
		\|\| (plan instanceof Aggregate agg && (inlinestatsAggs.isEmpty() \|\| inlinestatsAggs.contains(agg) == false))

ESQL: Consider inlinestats when having field_caps check for field names #127564

ESQL: Consider inlinestats when having field_caps check for field names #127564

Uh oh!

Conversation

astefan commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 30, 2025

Uh oh!

elasticsearchmachine commented Apr 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astefan Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Apr 30, 2025

Uh oh!

elasticsearchmachine commented May 5, 2025

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-spies May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bpintea left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astefan commented May 22, 2025

Uh oh!

astefan commented May 22, 2025

Uh oh!

astefan commented May 22, 2025

Uh oh!

astefan commented May 22, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented May 22, 2025

💔 Backport failed

Uh oh!

Uh oh!

astefan commented Apr 30, 2025 •

edited

Loading

astefan Apr 30, 2025 •

edited

Loading

alex-spies May 6, 2025 •

edited

Loading