-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ESQL: Consider inlinestats when having field_caps check for field names #127564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…field_names_for_inlinestats_fix
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @astefan, I've created a changelog YAML for you. |
@@ -360,14 +362,14 @@ FROM airports | |||
| LIMIT 3 | |||
; | |||
|
|||
abbrev:keyword | city:keyword | region:text | "COUNT(*)":long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, but to previous work.
@@ -127,7 +127,7 @@ protected void shouldSkipTest(String testName) throws IOException { | |||
assumeFalse("INLINESTATS not yet supported in CCS", testCase.requiredCapabilities.contains(INLINESTATS.capabilityName())); | |||
assumeFalse("INLINESTATS not yet supported in CCS", testCase.requiredCapabilities.contains(INLINESTATS_V2.capabilityName())); | |||
assumeFalse("INLINESTATS not yet supported in CCS", testCase.requiredCapabilities.contains(JOIN_PLANNING_V1.capabilityName())); | |||
assumeFalse("INLINESTATS not yet supported in CCS", testCase.requiredCapabilities.contains(INLINESTATS_V5.capabilityName())); | |||
assumeFalse("INLINESTATS not yet supported in CCS", testCase.requiredCapabilities.contains(INLINESTATS_V7.capabilityName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V7 because I am trying to work on multiple separate issues. V6 should come from #127383
Hi @astefan, I've updated the changelog YAML for you. |
…astefan/elasticsearch into field_names_for_inlinestats_fix
…field_names_for_inlinestats_fix
Hi @astefan, I've updated the changelog YAML for you. |
…astefan/elasticsearch into field_names_for_inlinestats_fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @astefan ! The fix works and the added tests are nice. I found 2 buggy queries, but they are likely unrelated to this PR's work.
I think this solution is okay, but I'd prefer to avoid adding more complexity to the fieldNames
method by special-casing for INLINESTATS
. The fact this PR is required is because we parse INLINESTATS
as an InlineStats
node containing an Aggregate
child (containing, in turn, the previous commands as grand-ancestors). Therefore, I'd like to suggest another approach which changes how we represent a parsed INLINESTATS
- see below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya, I tried some queries, trying to break things. I noticed 2 bugs which may or may not be related to this PR:
FROM hosts METADATA _index | eval x = ip1| INLINESTATS ip1 = COUNT(*) BY host_group, card| SORT ip1|LIMIT 1
gives an empty result, but removing the eval x = ip1
makes it work.
FROM hosts METADATA _index| INLINESTATS card = COUNT(*) BY card| SORT card|LIMIT 1
description | host | host_group | ip0 | ip1 | _index | card
---------------+---------------+---------------+---------------+---------------+---------------+---------------
alpha db server|alpha |DB servers |127.0.0.1 |127.0.0.1 |hosts |eth0
The card
column has the wrong type, it should be a long - seems like we get the original index field here, instead.
List<LogicalPlan> inlinestats = parsed.collect(InlineStats.class::isInstance); | ||
Set<Aggregate> inlinestatsAggs = new HashSet<>(); | ||
for (var i : inlinestats) { | ||
inlinestatsAggs.add(((InlineStats) i).aggregate()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The required solution here looks correct but confusing; this is because we parse INLINESTATS
as an InlineStats
node containing an Aggregate
node as child, so we don't know for any given Aggregate
if it's a STATS
or an INLINESTATS
, and the two have very different semantics.
I think we should rather just parse INLINESTATS
as a single plan node - this would prevent this complexity.
Maybe consider refactoring the InlineStats
node to avoid adding complexity here, as the fieldNames
method is already hard to work with. A low effort fix would be to still have the InlineStats
wrap an Aggregate
, but not as its child - the actual child would be the preceding command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, I wonder if there's an abstraction just around the corner that would make away with more special-casing inside this method.
In terms of the sets of attributes before and after INLINESTATS, it behaves similarly to EVAL, DISSECT, GROK, ENRICH and COMPLETION: some attributes are required because they are being referred to, some attributes are newly added and they shadow previous attributes. In the optimizer, we leverage this fact in the push down rules; for this, the plan nodes just need to implement the GeneratingPlan
interface.
I think it'd be nice to move this method in a direction that would rely more on this general pattern.
That's out of scope for this PR, of course, but it'd also benefit from parsing INLINESTATS
simply as 1 node rather than a combination of 2 nodes.
| inlinestats max(salary) by l | ||
| stats min = min(salary) by l | ||
| eval x = min + 1 | ||
| stats ca = count(*), cx = count(x) by l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same behavior is expected when this stats
is replaced by a keep x, l
(no wildcard), right?
Maybe let's add such tests, and also some where the STATS
or KEEP
(no wildcard) comes before the INLINESTATS
, for good measure.
The aggregate inside an inlinestats is "interfering" with the way field names are collected for field_caps requests. This made simple queries like
from test | inlinestats max(whatever) by group
to not return all fields fromtest
, but to limit the resulting columns towhatever
andgroup
.inlinestats
' purpose is to add columns to an already existent set of columns, which implies that this command has to be "transparent" to any wider collection of field names.Fixes #127236