Skip to content

Conversation

@xudong963
Copy link
Member

.set_bool("datafusion.execution.parquet.pushdown_filters", true)
.set_bool("datafusion.explain.logical_plan_only", true),
.set_bool("datafusion.explain.logical_plan_only", true)
.set_bool("datafusion.sql_parser.map_varchar_to_utf8view", false),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DF48 made the config true by default, so it'll break the test_rewrite test:

TestCase {
    name: "range filter + equality predicate",
    base:
        "SELECT column1, column2 FROM t1 WHERE column1 = column3 AND column1 >= '2022'",
    query:
    // Since column1 = column3 in the original view,
    // we are allowed to substitute column1 for column3 and vice versa.
        "SELECT column2, column3 FROM t1 WHERE column1 = column3 AND column3 >= '2023'",
},

->

+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type    | plan                                                                                                                                                  |
+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: t1.column1, t1.column2                                                                                                                    |
|              |   Filter: t1.column1 = CAST(t1.column3 AS Utf8View) AND t1.column1 >= Utf8View("2022")                                                                |
|              |     TableScan: t1 projection=[column1, column2, column3], partial_filters=[t1.column1 = CAST(t1.column3 AS Utf8View), t1.column1 >= Utf8View("2022")] |
+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+

Currently, the insert_binary_expr method in Predicate can't infer eq class from t1.column1 = CAST(t1.column3 AS Utf8View).

t1 is from here:

CREATE EXTERNAL TABLE t1 (
    column1 VARCHAR, 
    column2 BIGINT, 
    column3 CHAR
)

So I'm wondering if CHAR also should be mapped to utf8view.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue in upstream: apache/datafusion#16277

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's unfortunate, it may be possible to generalize the equivalence class logic to handle certain types of casts (particularly invertible ones). It would probably be somewhat complex to implement though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can add this to our backlist.

self.eq_classes[idx].columns.insert(c2.clone());
}
(Some(&i), Some(&j)) => {
if i == j {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DF48 will generate the following plan:

+--------------+---------------------------------------------------------------------------------------------------------------------------------+
| plan_type    | plan                                                                                                                            |
+--------------+---------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: t1.column1, t1.column2                                                                                              |
|              |   Filter: t1.column3 = t1.column1 AND t1.column1 >= Utf8("2022")                                                                |
|              |     TableScan: t1 projection=[column1, column2, column3], partial_filters=[t1.column3 = t1.column1, t1.column1 >= Utf8("2022")] |
+--------------+---------------------------------------------------------------------------------------------------------------------------------+

That is, t1.column3 = t1.column1 will be processed twice, and the second time they have the same index -> 0. If we don't add the fast return, it'll make column2's index 0, which will break the test (I guess we didn't consider the i==j case before)

@xudong963 xudong963 marked this pull request as draft June 12, 2025 14:47
@xudong963 xudong963 marked this pull request as ready for review June 20, 2025 04:57
@xudong963 xudong963 changed the title Prepare for DataFusion 48.0.0 Upgrade DataFusion 48.0.0 Jun 20, 2025
Copy link
Collaborator

@suremarc suremarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your hard work as always

.set_bool("datafusion.execution.parquet.pushdown_filters", true)
.set_bool("datafusion.explain.logical_plan_only", true),
.set_bool("datafusion.explain.logical_plan_only", true)
.set_bool("datafusion.sql_parser.map_varchar_to_utf8view", false),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's unfortunate, it may be possible to generalize the equivalence class logic to handle certain types of casts (particularly invertible ones). It would probably be somewhat complex to implement though.

@xudong963 xudong963 merged commit d4cc10f into datafusion-contrib:main Jun 26, 2025
16 checks passed
@github-actions github-actions bot mentioned this pull request Jun 19, 2025
@github-actions github-actions bot mentioned this pull request Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants