-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Robustness of Unparser Testing and Implementation #9623
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -151,6 +151,36 @@ impl Unparser<'_> { | |||
order_by: vec![], | |||
})) | |||
} | |||
Expr::ScalarSubquery(subq) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found that all of the information required to preserve a subquery is encoded in the subquery expression. This means we can actually completely ignore subquery LogicalPlan nodes and not lose anything...
At least this is true in every example I have come up with so far 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure what you mean by "ignore LogicalPlan
" as the Expr has a LogicaPlan
in it, doesn't it (sub.subquery
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at Unparser::select_to_sql_recursively
, we have not implemented a match condition for LogicalPlan::Subquery
. This node is never actually encountered as I have not found a case where you can get to it via the "input" of another node. We do encounter the Expr::ScalarSubquery expression and we encode the information from the LogicalPlan
found there instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I wonder if LogicalPlan:Subquery
is ever actually used 🤔 I took a quick look around and didn't see an obvious creation in the planner.
@@ -169,7 +199,7 @@ impl Unparser<'_> { | |||
pub(super) fn new_ident(&self, str: String) -> ast::Ident { | |||
ast::Ident { | |||
value: str, | |||
quote_style: self.dialect.identifier_quote_style(), | |||
quote_style: Some(self.dialect.identifier_quote_style().unwrap_or('"')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identifiers are now always quoted. This makes unparser work for columns with spaces or other strangeness without any other special handling required.
/// into an actual aggregate expression COUNT(*) as identified in the aggregate node. | ||
pub(crate) fn unproject_agg_exprs(expr: &Expr, agg: &Aggregate) -> Result<Expr> { | ||
expr.clone() | ||
.transform(&|sub_expr| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are borrowing the machinery of logical planning / optimization to unwind logical planning. 😆
for (query, expected) in tests { | ||
let actual = roundtrip(query).unwrap(); | ||
assert_eq!(actual, expected); | ||
assert_eq!(plan, plan_roundtrip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the description, the key condition tested is that the logical plan derived from the original AST and recovered AST are the same logical plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense
Can you please add a comment to this test that explains the rationale (that you so eloquently explained in the PR' description)? I think it will help future readers understand the choice which may not be obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty great to me -- thank you @devinjdangelo
@@ -151,6 +151,36 @@ impl Unparser<'_> { | |||
order_by: vec![], | |||
})) | |||
} | |||
Expr::ScalarSubquery(subq) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure what you mean by "ignore LogicalPlan
" as the Expr has a LogicaPlan
in it, doesn't it (sub.subquery
)?
), | ||
]; | ||
"select id, sum(age), first_name from person group by first_name, id", | ||
"select id, count(*), first_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯
for (query, expected) in tests { | ||
let actual = roundtrip(query).unwrap(); | ||
assert_eq!(actual, expected); | ||
assert_eq!(plan, plan_roundtrip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense
Can you please add a comment to this test that explains the rationale (that you so eloquently explained in the PR' description)? I think it will help future readers understand the choice which may not be obvious
I think your change to the roundtrip test is good as it is testing only the single concern of |
Thanks again @devinjdangelo |
Which issue does this PR close?
Follow on to #9606
Rationale for this change
Based on review comment from @seddonm1 I wanted to see if I could make examples like the following pass in unparser:
While working on this, I found that the implementation in #9606 was not doing what I thought it was doing. I was misled due to the string representation of two different AST's being the same. Namely the literal column expression
"COUNT(*)"
vs the aggregate expressionCOUNT(*)
were represented by the same strings. This led me to believe the implementation was correct, when in fact is was not and would fail on more complex examples (such as aggregate expressions nested in the expression tree or with quoted identifiers).I have modified the round-trip testing strategy to avoid similar confusion in the future. Now, instead of looking at the round trip SQL string and comparing, we logically plan the round tripped AST and ensure it generates the same logical plan as the original sql string. E.g.
The new test condition ensures not that the sql string is literally the same post round trip, but they are logically equivalent. I believe this is the more meaningful condition to test.
I have also spent time trying to come up with more complex queries to break the implementation. The most complete/complex test in the suite is:
This seed sql string results in the following plan:
The plan is then transformed back into an
ast::Statement
and then that statement is planned and compared to be identical to the above plan. This verifies we have preserved all logical query information from the original SQL string in our round trip. Differences in syntax/quoting which do not affect the logic of the query are ignored.What changes are included in this PR?
Aggregate
node that applies to a given Query under constructionAre these changes tested?
Yes
Are there any user-facing changes?
More complex SQL works with unparser now