-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate partitions columns in CREATE EXTERNAL TABLE
if table already exists.
#9912
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @MohamedAbdeen21 -- this looks really nice and will be a better user experience.
I think there are a few more tests to write but otherwise this is looking quite good.
cc @devinjdangelo and @metesynnada FYI
} | ||
|
||
/// Infer the partitioning at the given path on the provided object store. | ||
/// For performance reasons, it doesn't read all the files on disk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Marking this as a draft as I think we are just waiting on some more tests. @MohamedAbdeen21 please let me know if that was not correct or if you need some help finishing this PR up. |
Hi @alamb, I've implemented the check we discussed. Apart from adding unit tests for the two new functions, I believe the PR is now adequately covered. I'm curious to see if anyone can come up with additional queries that should be tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @MohamedAbdeen21 -- this looks awesome
.rev() | ||
.skip(1) // get parents only; skip the file itself | ||
.rev() | ||
.map(|s| s.split('=').take(1).collect()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the partition column name containing a =
a cheeky edge case we need to consider here?
Edit: related #9269 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need to consider this. Now that you mention it, I realize that I never actually thought about how these column names are written to disk. I'll have to take a look on how other systems do it before suggesting anything.
After looking at the linked issue, I think this should be addressed in a follow-up since it may require changing other parts of DF first.
Thanks for the review @Jefffrey -- @MohamedAbdeen21 please let me know if you would like to address @Jefffrey 's comments i this PR or a follow on one. |
I'm not 100% sure. I've never thought about the cases mentioned in #9269 before to be honest. Maybe do follow-ups since it looks like we may need other changes around sqlparser and DF first. |
Yes this sounds good 👍 |
Thanks again @MohamedAbdeen21 and @Jefffrey -- each PR gets us that much better 🙏 |
Which issue does this PR close?
Closes #9785 .
Rationale for this change
Prevent specifying wrong partitions when creating external tables that already exist on disk. This stops a whole suite of problems with other statements (check the issue for examples) that expect the table partitions to match the path partitions.
What changes are included in this PR?
CREATE EXTERNAL TABLE
now infers the partition columns provided inLOCATION
and compares it with the columns provided inPARTITIONED BY
clause. Partial partitions are allowed, and does nothing if path is empty.Are these changes tested?
Only through slt tests, no unit tests.
Are there any user-facing changes?
Better errors and safeguards when dealing with partitioned external tables.