-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add catalog::resolve_table_references #10876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add catalog::resolve_table_references #10876
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f467b41 to
a15c149
Compare
|
Thank you for the careful review, I have addressed all comments. The CTE case had not occurred to me so thanks for pointing it out. I have separated the CTE list in the return value, returning a |
Thank you @leoyvens . But I prefer the previous implementation and take alamb's approach to remind users in the documentation. So that if users find a table reference that does not exist in their catalog, they do not immediately judge it as an error because it could be a CTE name. Implementing a separated CTE list could be error-prone, especially considering the scope of CTEs. For example, in |
|
Thanks @jonahgao for pointing out that shadowing case, I didn't know that was possible and the change really would have introduced a bug. I've added a sql test to make sure we don't regress there. But I did give another shot at correctly separating ordinary table refs from CTE refs. Let me know if the approach still has problems and there are other edges cases. |
80ae34f to
8a3cfc5
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Hi @leoyvens , for the following query, the table reference in the subquery on the right side of the UNION will still be shadowed by the cte. (with t as (select 1) select * from t) union (select * from t);The situation becomes more complicated when working with nested CTEs and recursive CTEs. I think there might be three possible solutions.
For the latter two solutions, I think it need to be implemented as a standalone function, meaning |
|
@jonahgao You're a SQL grammar wizard, I didn't know nested CTEs were even a thing, thanks for pointing out all those edge cases. I've handled and tested them all to the best of my understanding, by using |
jonahgao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thank you @leoyvens. I think this is an improvement because it no longer searches for CTEs from the catalog.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Name shadowing: | ||
| # The first `t` refers to the table, the second to the CTE. | ||
| query I | ||
| WITH t AS (SELECT * FROM t where t.a < 2) SELECT * FROM t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified this query fails on main:
DataFusion CLI v39.0.0
> WITH t AS (SELECT * FROM t where t.a < 2) SELECT * FROM t;
Error during planning: table 'datafusion.public.t' not found* resolve information_schema references only when necessary * add `catalog::resolve_table_references` as a public utility * collect CTEs separately in resolve_table_references * test CTE name shadowing * handle CTE name shadowing in resolve_table_references * handle unions, recursive and nested CTEs in resolve_table_references
Which issue does this PR close?
Closes #10875.
Rationale for this change
This logic is extracted from
SessionState::resolve_table_references, but can be used without a context making it more flexible as a helper function for catalog management.What changes are included in this PR?
The
resolve_table_referencesfunction will only return the tables from the information schema if it actually implicitly requires them due to aSHOWstatement.A new public function
catalog::resolve_table_referencesis added, containing the same logic asSessionState::resolve_table_referencesbut in a more flexible interface.Are these changes tested?
Yes there is a doctest and it's well covered by sqllogictests as it's executed as part of SQL planning.
Are there any user-facing changes?
A new public function is added.
The behaviour of
SessionState::resolve_table_referencesaround the information schema is changed, which could theoretically break users relying on that directly, but it's quite an edge case.