-
Notifications
You must be signed in to change notification settings - Fork 6
Add {_snowflake_id} wildcard support to object storage #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Cheap alternative to #697 The |
src/Core/Settings.cpp
Outdated
@@ -5951,6 +5951,9 @@ This only affects operations performed on the client side, in particular parsing | |||
Normally this setting should be set in user profile (users.xml or queries like `ALTER USER`), not through the client (client command line arguments, `SET` query, or `SETTINGS` section of `SELECT` query). Through the client it can be changed to false, but can't be changed to true (because the server won't send the settings if user profile has `apply_settings_from_server = false`). | |||
|
|||
Note that initially (24.12) there was a server setting (`send_settings_to_client`), but latter it got replaced with this client setting, for better usability. | |||
)", 0) \ | |||
DECLARE(Bool, object_storage_treat_key_wildcard_as_star, false, R"( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three options here:
- Off by default
- On by default
- No setting at all, default behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pair of minor comments
@@ -373,21 +374,35 @@ void StorageObjectStorage::read( | |||
if (update_configuration_on_read) | |||
configuration->update(object_storage, local_context); | |||
|
|||
if (partition_by && configuration->withPartitionWildcard()) | |||
auto config_clone = configuration->clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We make a clone every time, but actually change only in specific cases.
May be make here a smart_ptr on original config, and make a clone only when required?
@@ -659,13 +674,29 @@ bool StorageObjectStorage::Configuration::withPartitionWildcard() const | |||
|| getNamespace().find(PARTITION_ID_WILDCARD) != String::npos; | |||
} | |||
|
|||
bool StorageObjectStorage::Configuration::withSnowflakeIdWildcard() const | |||
{ | |||
static const String PARTITION_ID_WILDCARD = "{_snowflake_id}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SNOWFLAKE_ID_WILDCARD
@@ -5951,6 +5951,9 @@ This only affects operations performed on the client side, in particular parsing | |||
Normally this setting should be set in user profile (users.xml or queries like `ALTER USER`), not through the client (client command line arguments, `SET` query, or `SETTINGS` section of `SELECT` query). Through the client it can be changed to false, but can't be changed to true (because the server won't send the settings if user profile has `apply_settings_from_server = false`). | |||
|
|||
Note that initially (24.12) there was a server setting (`send_settings_to_client`), but latter it got replaced with this client setting, for better usability. | |||
)", 0) \ | |||
DECLARE(Bool, object_storage_treat_key_related_wildcards_as_star, false, R"( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three options here:
- make it the default behavior, not behind a setting
- make the setting on by default
- keep it off by default
From the usability point of view ChatGPT suggest to consider using
|
That sounds like a terrible idea |
Add {_snowflake_id} wildcard support to object storage paths. Upon writing, ClickHouse will generate a snowflakeid on the fly and replace the wildcard. This will help us with parallel and concurrent writes to object storage.
Also introduce a new setting
object_storage_treat_key_related_wildcards_as_star
to allow symmetrical reads & writes using a single table. Why is it needed? Consider the following:CREATE TABLE ... s3('path_to_table_root/**.parquet')
Ok, we can select from it, but how do we write? How do we name the files? In which directory?
Therefore, we introduced the snowflake id.
CREATE TABLE ... s3('path_to_table_root/{_snowflake_id}.parquet')
- we can now write to it because we know the file location and we know how to name it.But how do we read now? The path isn't globbed anymore. That's what the setting is for.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add {_snowflake_id} wildcard support to object storage paths. Also add a new setting
object_storage_treat_key_related_wildcards_as_star
to allow symmetrical reads & writes using a single table.Documentation entry for user-facing changes