-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Add support for hive partition style reads and writes #76802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add support for hive partition style reads and writes #76802
Conversation
I'll add tests soon |
This is good, and it could replace a problematic feature: #23051 Let's make sure names and values are URL-encoded. One potential problem is memory usage when writing to many partitions at the same time. Let's define some limit, so we will create only up to this limit number of buffers for writing to S3 at the same time. Do we control where exactly the path fragment with partition goes in the URL? |
Done
Ack |
As of now, it us up to the user to choose where in the path the partition goes by using the Maybe we can make it simpler: user is responsible for defining the table root, that's all. The rest (partition key location, filename and file extension) clickhouse will generate.
If the user specifies the partition_id placeholder and use_hive=1, we throw exception. What do you think? @alexey-milovidov |
I suppose that could be implemented, but perhaps we should leave it for a follow up PR? |
Yes, this is a great idea! This PR is good, but what I'd like to see in addition, before merging it, is - fixing the memory consumption problem with PARTITION BY. It's an old flaw of the current mechanism. Having this new feature will make it more frequently used, and the users will bump into this problem more frequently. |
Is there an issue that describes this issue in depth? I could look into that |
src/Storages/PartitionedSink.cpp
Outdated
@@ -42,13 +42,29 @@ Names extractPartitionRequiredColumns(const ASTPtr & partition_by, const Block & | |||
return exp_analyzer->getRequiredColumns(); | |||
} | |||
|
|||
static std::string formatToFileExtension(const std::string & format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be completed
Messed up, need to re-think a couple of things |
The shitty part about this is keeping backwards compatibility with |
@alexey-milovidov couple of questions:
|
Once I am done with this PR, I'll look into the max threads/streams thing |
@kssenii I have resolved most of the From a feature perspective, I think there are a few things pending:
Could you please re-review and shared your thoughts? |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add support for hive partition style reads and writes
Documentation entry for user-facing changes