Move wildcard expansions to the analyzer#11681
Conversation
bdf474e to
59cf620
Compare
|
We performed many optimizations and validations based on the expanded expression when planning SQL. Ideally, we should move these behind the ExpandWildcardRule to avoid additional expansions for the schema fields. |
| fn apply_required_rule(logical_plan: LogicalPlan) -> Result<LogicalPlan> { | ||
| let options = ConfigOptions::default(); | ||
| Analyzer::with_rules(vec![Arc::new(ExpandWildcardRule::new())]).execute_and_check( | ||
| logical_plan, | ||
| &options, | ||
| |_, _| {}, | ||
| ) | ||
| } |
There was a problem hiding this comment.
To generate the correct schema, we need to apply ExpandWildcardRule for the plan of the view first
| if let Some(replace) = options.opt_replace { | ||
| let replace_expr = replace | ||
| .items | ||
| .iter() | ||
| .map(|item| { | ||
| Ok(self.sql_select_to_rex( |
There was a problem hiding this comment.
We need to plan the expression of replace first and then expand them in the ExpandWildcardRule.
|
Hi @jayzhan211 @alamb |
jayzhan211
left a comment
There was a problem hiding this comment.
Overall looks good to me!
datafusion/expr/src/expr.rs
Outdated
|
|
||
| #[derive(Clone, PartialEq, Eq, Hash, Debug, Default)] | ||
| pub struct PlannedReplaceSelectItem { | ||
| pub items: Vec<Box<ReplaceSelectElement>>, |
There was a problem hiding this comment.
I don't think we need Box here. Vec<ReplaceSelectElement>
datafusion/expr/src/utils.rs
Outdated
| .map(|c| c.flat_name()) | ||
| .collect(); | ||
| Ok::<_, DataFusionError>( | ||
| (0..wildcard_schema.fields().len()) |
There was a problem hiding this comment.
I think you can use wildcard_schema.field_names()
| .zip(replace.expressions().iter()) | ||
| .find(|(item, _)| item.column_name.value == *name) | ||
| { | ||
| *expr = Expr::Alias(Alias { |
There was a problem hiding this comment.
You can try new_expr.alias(name)
datafusion/sql/src/select.rs
Outdated
| items: replace.items, | ||
| planned_expressions: replace_expr, | ||
| }; | ||
| Ok(WildcardOptions { |
There was a problem hiding this comment.
Maybe it is worth to have a with_replace function, so we just need options.with_replace(planned_option)
| Ok(view) | ||
| } | ||
|
|
||
| fn apply_required_rule(logical_plan: LogicalPlan) -> Result<LogicalPlan> { |
datafusion/expr/src/expr.rs
Outdated
|
|
||
| #[derive(Clone, PartialEq, Eq, Hash, Debug, Default)] | ||
| pub struct WildcardOptions { | ||
| pub opt_ilike: Option<IlikeSelectItem>, |
There was a problem hiding this comment.
nit: I think opt_xxx is redundant
alamb
left a comment
There was a problem hiding this comment.
This looks pretty epic -- thank you @goldmedal and @jayzhan211
| } | ||
|
|
||
| #[derive(Clone, PartialEq, Eq, Hash, Debug, Default)] | ||
| pub struct WildcardOptions { |
There was a problem hiding this comment.
as a follow on PR, can we add some doc comments to this struct explaining why this structure is needed and how to interpret it?
It seems like the core rationale is that wildcards have different semantics depending on what part of the query they appear in?
There was a problem hiding this comment.
Sure, I added descriptions for the purpose and source of each option.
datafusion/expr/src/expr_fn.rs
Outdated
| /// use datafusion_common::TableReference; | ||
| /// use datafusion_expr::{qualified_wildcard}; |
There was a problem hiding this comment.
If you put a # in front of lines in an example they are not shown but still compiled
| /// use datafusion_common::TableReference; | |
| /// use datafusion_expr::{qualified_wildcard}; | |
| /// # use datafusion_common::TableReference; | |
| /// # use datafusion_expr::{qualified_wildcard}; |
|
🚀 Thanks again @goldmedal and @jayzhan211 |
|
Thanks @alamb @jayzhan211 ❤️ |
|
Man I could spend all day reviewing DataFusion PRs these days. There is so much good stuff going on |
|
This PR appears to have introduced a regression found by @DDtKey in |
Which issue does this PR close?
Closes #11639 .
Rationale for this change
As discussed in #11639 (comment), I tried to solve the to-do issue for moving wildcard expansion to the analyzer.
datafusion/datafusion/expr/src/logical_plan/builder.rs
Lines 1443 to 1458 in 7db4213
I keep the wildcard expression when planning the SQL and implementing
ExpandWildcardRule.What changes are included in this PR?
This change impacts many parts. I did the required modifications for them.
Expanding expressions to fields for the schema of the Projection plan
When planning the SQL, we base the schema on a plan to perform multiple validations and optimizations. Even if we move the expansion of
Expr::Wildcardto the analyzer, the schema should contain the actual column information instead of the wildcard field.We should be careful with
calc_func_dependencies_for_project.functional_dependenciesis related to the implementation of constraints (#7040) and also affects the implicit grouping keys optimization (#11903 (comment)).Handle
WildacrdAddiotionsOptionfor replace, exclude, or expect, ...Datafusion supports options for wildcard expressions. Due to moving the expansion to the analyzer, we should handle
WildcardOptionswithinExpr::Wildcard. We need to consider these options when expanding the wildcard in the following:ExpandWildcardRuleutils::exprlist_to_fieldsutils::exprlist_lenType_coercion for union
Previously, we will do the type coercion when planning the union. We should do it again after expanding the wildcard expression.
Unparsing Expr::Wildcard
I also implement the unparsing for the
Expr::Wildcardto pass the tests. However, I leave a to-do issue for unparsingWildcardOptionsAre these changes tested?
yes
Are there any user-facing changes?
no.