-
Notifications
You must be signed in to change notification settings - Fork 25.3k
[ES|QL] Date nanos implicit casting in union types option #2 #127797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] Date nanos implicit casting in union types option #2 #127797
Conversation
A date in the valid date-nanos range should be acceptable to parse. This issue was showing up in a difference in behaviour between Elasticsearch ingest and the CsvTests. This fix makes CsvTests behave the same as Elasticearch itself.
Thank you for reviewing @alex-spies @costin ! The changes to |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @fang-xing-esql ! I have only a final set of remarks, but overall I think this is a very fine PR and let's get this in!
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Show resolved
Hide resolved
// Data type is different between implicit(date_nanos) and explicit casting, if the conversion is supported, create a | ||
// new MultiTypeEsField with explicit casting type, and add it to unionFieldAttributes. | ||
Set<DataType> supportedTypes = convert.supportedTypes(); | ||
if (supportedTypes.contains(fa.dataType())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this doesn't look fully correct. fa.dataType()
for a field attribute containing a MultiTypeEsField is the type we (implicitly) cast to - but the conversion function needs to be compatible with all of the original types.
I realize that "all of the original types" in this case is limited to date + nano in practice and we shouldn't have conversion functions that take nanos but not date.
However, I think it'd be a tad nicer to actually check the original types so we're not building leaky assumptions into this code, which otherwise is agnostic of this detail and would still work if we expanded implicit casting to other type combinations.
Below, we already retrieve the original field from the map returned by MultiTypeEsField#getIndexToConversionExpressions
. The original field in each entry should have the original type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, I included this extra layer of checking, verifying whether the original types could be converted to the explicitly cast types. I later decided to simplify the logic and removed that step.
The main reason was based on a logical assumption: if A can be converted to B, and B can be converted to C, then A should be convertible to C—which reflects the kind of type conversions we're dealing with here.
A second reason, as you’ve noted, is that we're currently only supporting millis and nanos, so the simplified check is sufficient for now.
That said, it's debatable whether our current conversion functions reliably guarantee this transitive property in all cases. So perhaps it's worth reconsidering and reinstating that extra check to make the logic more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the transitivity assumption makes sense - I was thinking about cases where B just cannot be cast to C, though, but A may still be able to be cast to C.
Very theoretical as I don't have an example, and this doesn't apply to the current state.
As a nit: I'd slightly prefer to just fully stick with the union type logic, which is
- "if all the original types can be cast to the explicitly required type, we'll treat this as a regular union type field and skip the implicit cast; otherwise, we cast implicitly during block loading and explicitly at runtime".
The current solution performs both checks, that is: "can both A be cast to C AND B be cast to C"? Which might have a veeeery small chance of doing an unintended double cast failing some otherwise valid queries some time in the future, or at least it's less obviously the same logic as for union types.
Absolutely not a blocker, of course.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/planner/QueryTranslatorTests.java
Show resolved
Hide resolved
Thanks for the thorough review @alex-spies ! The suggested tests are added and the validation of whether the original data types can be converted to the explicit types are added as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the iterations and dealing with the very subtle edge cases @fang-xing-esql ! This looks very good to me, let's get it in!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Pinging @elastic/kibana-esql (ES|QL-ui) |
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
#129004) * implicit casting for union typed fields mixed with datetime and date_nanos (cherry picked from commit 79e600a) # Conflicts: # x-pack/plugin/esql/qa/server/single-node/src/javaRestTest/java/org/elasticsearch/xpack/esql/qa/single_node/RestEsqlIT.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTestUtils.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
This is an alternative approach to resolve #110009.
#123678 Does the implicit casting for union typed fields with
date
anddate_nanos
by adding anEval
to convert them to adate_nanos
attribute.This PR does the implicit casting for union typed fields with
date
anddate_nanos
by converting them toMultiTypeEsField
in place in theEsRelation