-
Notifications
You must be signed in to change notification settings - Fork 10
Upgrade DataFusion 48.0.0 #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| .set_bool("datafusion.execution.parquet.pushdown_filters", true) | ||
| .set_bool("datafusion.explain.logical_plan_only", true), | ||
| .set_bool("datafusion.explain.logical_plan_only", true) | ||
| .set_bool("datafusion.sql_parser.map_varchar_to_utf8view", false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DF48 made the config true by default, so it'll break the test_rewrite test:
TestCase {
name: "range filter + equality predicate",
base:
"SELECT column1, column2 FROM t1 WHERE column1 = column3 AND column1 >= '2022'",
query:
// Since column1 = column3 in the original view,
// we are allowed to substitute column1 for column3 and vice versa.
"SELECT column2, column3 FROM t1 WHERE column1 = column3 AND column3 >= '2023'",
},
->
+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: t1.column1, t1.column2 |
| | Filter: t1.column1 = CAST(t1.column3 AS Utf8View) AND t1.column1 >= Utf8View("2022") |
| | TableScan: t1 projection=[column1, column2, column3], partial_filters=[t1.column1 = CAST(t1.column3 AS Utf8View), t1.column1 >= Utf8View("2022")] |
+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
Currently, the insert_binary_expr method in Predicate can't infer eq class from t1.column1 = CAST(t1.column3 AS Utf8View).
t1 is from here:
CREATE EXTERNAL TABLE t1 (
column1 VARCHAR,
column2 BIGINT,
column3 CHAR
)
So I'm wondering if CHAR also should be mapped to utf8view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue in upstream: apache/datafusion#16277
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unfortunate, it may be possible to generalize the equivalence class logic to handle certain types of casts (particularly invertible ones). It would probably be somewhat complex to implement though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can add this to our backlist.
| self.eq_classes[idx].columns.insert(c2.clone()); | ||
| } | ||
| (Some(&i), Some(&j)) => { | ||
| if i == j { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DF48 will generate the following plan:
+--------------+---------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+--------------+---------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: t1.column1, t1.column2 |
| | Filter: t1.column3 = t1.column1 AND t1.column1 >= Utf8("2022") |
| | TableScan: t1 projection=[column1, column2, column3], partial_filters=[t1.column3 = t1.column1, t1.column1 >= Utf8("2022")] |
+--------------+---------------------------------------------------------------------------------------------------------------------------------+
That is, t1.column3 = t1.column1 will be processed twice, and the second time they have the same index -> 0. If we don't add the fast return, it'll make column2's index 0, which will break the test (I guess we didn't consider the i==j case before)
suremarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your hard work as always
| .set_bool("datafusion.execution.parquet.pushdown_filters", true) | ||
| .set_bool("datafusion.explain.logical_plan_only", true), | ||
| .set_bool("datafusion.explain.logical_plan_only", true) | ||
| .set_bool("datafusion.sql_parser.map_varchar_to_utf8view", false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unfortunate, it may be possible to generalize the equivalence class logic to handle certain types of casts (particularly invertible ones). It would probably be somewhat complex to implement though.
Related to apache/datafusion#15771