forked from ClickHouse/ClickHouse
-
Notifications
You must be signed in to change notification settings - Fork 1
ClickHouse 25.8 LTS #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
tilman-aiven
wants to merge
34
commits into
v25.8.12.129-lts-aiven
Choose a base branch
from
v25.8.12.129-lts-aiven-core-replication
base: v25.8.12.129-lts-aiven
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
ClickHouse 25.8 LTS #22
tilman-aiven
wants to merge
34
commits into
v25.8.12.129-lts-aiven
from
v25.8.12.129-lts-aiven-core-replication
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In containerized/cloud environments, the system hostname (e.g., pod-abc123) is not the network-accessible address. This causes replica communication to fail. Use the explicitly configured interserver_http_host from config instead of system hostname. The hostname is part of the Host ID hostname:port:database_uuid and replicated database uses the Host ID for Zookeeper registration, replica discovery and DDL coordination. Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
Enable internal_replication=true for clusters created programmatically by DatabaseReplicated to improve performance and consistency of Distributed tables over Replicated databases. Unlike statically configured clusters in config.xml, DatabaseReplicated creates clusters dynamically at runtime. This change ensures these programmatically created clusters also benefit from internal replication. Changes: - Add internal_replication field to ClusterConnectionParameters struct - Set internal_replication=true when creating cluster in DatabaseReplicated - Update both Cluster constructors to pass internal_replication to addShard() This ensures that Distributed tables over Replicated databases write to one replica per shard, allowing ReplicatedMergeTree to handle replication asynchronously, instead of writing to all replicas directly. Benefits: - Reduces network traffic (1× instead of N× writes per shard) - Improves performance (one replica processes INSERT, others replicate async) - Better consistency (uses ReplicatedMergeTree replication mechanism) Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
Add support for altering DatabaseReplicated settings at runtime without
requiring database recreation. This is particularly important for
rotating cluster_secret, which previously required dropping and
recreating the entire database.
Changes:
- Override applySettingsChanges() in DatabaseReplicated to handle
MODIFY SETTING commands
- Add applyChange() and has() methods to DatabaseReplicatedSettings
wrapper to expose BaseSettings functionality
- Invalidate cached cluster when cluster_secret is changed to ensure
new authentication credentials are used
Implementation details:
- Thread-safe: protected by DatabaseReplicated::mutex
- Validates setting existence before applying changes
- Automatically resets cluster cache when cluster_secret changes
- Supports all DatabaseReplicatedSettings (max_broken_tables_ratio,
max_replication_lag_to_enqueue, collection_name, etc.)
Usage:
ALTER DATABASE MODIFY SETTING cluster_secret='new_secret'
Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
Automatically rewrite non-replicated MergeTree engine names to their Replicated equivalents when creating tables in DatabaseReplicated databases. This ensures all tables in a replicated database are properly replicated, even if users specify ENGINE = MergeTree. Changes: - Add rewriteUnreplicatedMergeTreeEngines() method to StorageFactory - Call rewrite method before engine name extraction in StorageFactory::get() - Rewrite applies only on secondary queries (replica execution) - Supports all MergeTree variants (ReplacingMergeTree, SummingMergeTree, etc.) Implementation details: - Checks if engine name ends with MergeTree (catches all variants) - Verifies engine is not already replicated (starts with Replicated) - Only rewrites when query_kind is SECONDARY_QUERY and database engine is Replicated - Modifies AST in-place before storage engine instantiation This allows users to write ENGINE = MergeTree in replicated databases and automatically get ReplicatedMergeTree behavior without manual engine name specification. Co-authored-by: Kevin Michel <kevin.michel@aiven.io> Co-authored-by: Joe Lynch <joe.lynch@aiven.io> Co-authored-by: Dmitry Potepalov <dmitry.potepalov@aiven.io>
…koff When ZooKeeper restarts (especially during version upgrades), it can be unavailable for approximately 6 seconds. ClickHouse previously failed queries if ZooKeeper was not available within ~3 seconds, leading to inconsistent database state because DDL operations are not fully atomic. This change improves ZooKeeper connection resilience by: - Increasing minimum retry attempts from 3 to 6 - Adding exponential backoff between retry attempts (100ms, 200ms, 400ms...) - Capping maximum backoff at 10 seconds to prevent excessive delays The total retry window now covers typical ZooKeeper restart times (~6 seconds), allowing ClickHouse to successfully reconnect after ZooKeeper restarts without requiring manual intervention. This is particularly important during version upgrades when ZooKeeper nodes restart sequentially, as it prevents DDL operations from failing mid-execution and leaving the database in an inconsistent state. Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
When ClickHouse restarts, tables in DatabaseReplicated databases are loaded
from metadata files using ATTACH operations. If a table's ZooKeeper path
contains the {shard} macro (e.g., /clickhouse/tables/{uuid}/{shard}),
the macro expansion fails during server startup because shard and replica
information is not populated in the MacroExpansionInfo structure.
The issue occurs because is_replicated_database is only true for
SECONDARY_QUERY operations (queries executed from the replicated queries
log in ZooKeeper). During server restart, ATTACH operations are not
SECONDARY_QUERY operations, so is_replicated_database is false, and the
shard/replica information is not populated, causing the error:
Code: 62. DB::Exception: No macro 'shard' in config while processing
substitutions in '/clickhouse/tables/{uuid}/{shard}'
This fix adds query.attach as an additional condition to populate shard
and replica information during ATTACH operations, ensuring that macro
expansion succeeds during server startup while maintaining the existing
behavior for replication queries.
The change is safe because:
- query.attach is only true during server startup when loading tables
from disk metadata
- getReplicatedDatabaseShardName() and getReplicatedDatabaseReplicaName()
use assert_cast, so they will fail fast if called on a non-replicated
database
- This pattern is already used elsewhere in the codebase (e.g., line 28
for allow_uuid_macro)
This restores functionality that previously worked in ClickHouse 21.x with
the custom {shard_name} macro, which was removed in later versions.
Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
When adding a new replica to an existing DatabaseReplicated cluster, the recoverLostReplica() function reads table metadata from ZooKeeper and recreates tables. However, the metadata stored in ZooKeeper contains only the CREATE TABLE statement and table settings, not the global settings that were active during original table creation. If a table was created with the DEFLATE_QPL codec (which requires the enable_deflate_qpl_codec global setting), a new replica would fail to create the table during recovery because this setting is not enabled. This fix explicitly enables enable_deflate_qpl_codec in the recovery query context, ensuring that tables using this codec can be successfully recreated on new replicas. Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
If a replica is struggling to execute all tasks in its replication queue, the queue can grow to a huge size and becomes even slower to manage when it's larger (it's an std::list). This queue is also present in ZooKeeper, for each replica. There is also a shared queue, which is called the log. It's used to communicate changes between replicas. One replica will push an item on the log as part of executing a query, then all other replicas will read that log item and repush it in their own queue. Each replica monitors the log size and which replicas are deemed lost when they are too late to consume log items. However, the replicas do not monitor the size of the queue of the other replicas. If a replica is pushing lots of items in the log and another replica is able to copy them to their own queue, but too slow to consume the queue items, then we have a problem. At some point, if the imbalance persists, ZooKeeper will crash because it won't have enough memory to hold all the queues. It will be very hard to recover: ZooKeeper itself often corrupts its state when OOMing. Manually cleaning hundreds of thousands of items in the queues to know which ones can be removed and which ones are necessary to keep the replicated data in a consistent state is almost impossible. To fix that, we reuse the infrastructure that delays insert queries when there are too many parts. The same logic is applied when any replica queue size is too large for the table, or the grand total of all replicas' queue sizes, over all tables, is too large. We also do the same to fail queries if either counter reaches a second threshold, exactly like it's done with the parts count. In each replica, a thread is updating a local copy of the maximum queue size in all replicas. A map is added to the context to keep track of these per-storage maxima. These values are then used to gate the inserts. We do it like that to avoid adding many ZooKeeper queries in the hot path, and we don't need a very accurate queue size to get the feedback loop we need. Changes: - Add ReplicatedMergeTreeQueueSizeThread to periodically monitor queue sizes - Add settings: queue_size_to_delay_insert, queue_size_to_throw_insert, queues_total_size_to_delay_insert, queues_total_size_to_throw_insert - Add metric: ReplicatedQueuesTotalSize to track total queue size across all tables - Extend delayInsertOrThrowIfNeeded() to check queue sizes in addition to parts count - Add Context methods to track per-storage maximum queue sizes - Integrate queue size monitoring into StorageReplicatedMergeTree startup and shutdown lifecycle Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
Previously, MOVE PARTITION commands were not replicated through the DatabaseReplicated DDL log, causing data inconsistency when moving partitions between tables across replicas in a DatabaseReplicated cluster. The issue occurred because shouldReplicateQuery() only checked for AlterCommand (metadata changes) but not for PartitionCommand::MOVE_PARTITION. Other partition commands like DROP PARTITION and ATTACH PARTITION are handled by ReplicatedMergeTree's own replication mechanism, but MOVE PARTITION TO TABLE requires database-level coordination to ensure both source and destination tables are updated consistently across all replicas. Changes: - Add PartitionCommands.h include - Extend shouldReplicateQuery() to detect MOVE_PARTITION commands - Return true for MOVE_PARTITION to enable database-level replication This ensures that when a partition is moved between tables, all replicas execute the operation atomically, maintaining data consistency across the cluster. Co-authored-by: Joe Lynch <joe.lynch@aiven.io>
Reduce the default value of setting from 1000 to 300 for DatabaseReplicated databases. This reduces ZooKeeper resource consumption by ~70% while maintaining a 6x safety margin over max_replication_lag_to_enqueue (50). Context: Previously, we implemented a server-level setting (replicated_database_logs_to_keep) to centralize control of this value. However, after analysis, we determined that: 1. Customers do not have ALTER_DATABASE_SETTINGS permission, so they cannot modify database settings via ALTER DATABASE MODIFY SETTING 2. The simpler approach of changing the database-level default is sufficient 3. No additional readonly checks are needed since access control already prevents customer modifications This change affects only newly created databases. Existing databases retain their current logs_to_keep value stored in ZooKeeper. The default value of 300 provides adequate recovery buffer while significantly reducing ZooKeeper memory usage in multi-database managed provider environments. Co-authored-by: Khatskevich <khatskevich@aiven.io>
244dd99 to
4f53d4e
Compare
Main service users are allowed the SHOW DATABASES access because it is necessary for their operation. It is implicitly granted by ClickHouse when giving access to a database. However, we do not want to give them access to SHOW CREATE DATABASE. This query shows the entire create statement, unredacted. This is actually a useful feature for superusers, but can leak credentials to other users. Also only show the create query in system.tables to users that were able to create that table. Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
This change enables S3 object storage to use custom CA certificates for HTTPS connections, which is useful for testing environments or private deployments where certificates are not signed by public CAs. The implementation adds a configuration option that can be specified in S3 disk settings. When provided, this path is used to create a custom SSL context for all HTTPS connections to that S3 endpoint. Key changes: - Added field to S3Settings struct with config loading, serialization, and deserialization support - Extended PocoHTTPClientConfiguration to store and pass ca_path - Updated ClientFactory::createClientConfiguration to accept ca_path - Modified PocoHTTPClient to create SSL context from ca_path when provided - Extended HTTP layer (HTTPCommon, HTTPConnectionPool) to accept and use custom SSL contexts - Updated all makeHTTPSession call sites to match new signature - Preserved ca_path through credentials provider chain Backwards compatibility: - ca_path is optional and defaults to std::nullopt - Serialization includes backwards compatibility handling for old data formats without ca_path - All existing call sites continue to work with default (empty) context parameter Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
This change enables Azure Blob Storage to use custom CA certificates for HTTPS connections, which is useful for testing environments or private deployments where certificates are not signed by public CAs. The implementation adds a configuration option that can be specified in Azure Blob Storage disk settings. When provided, this path is used to configure Curl's CAInfo option (Azure SDK uses Curl underneath) for all HTTPS connections to that Azure endpoint. Key changes: - Added field to RequestSettings struct - Updated getRequestSettings() to read ca_path from config - Modified getClientOptions() to set curl_options.CAInfo when curl_ca_path is provided - Endpoint-specific settings automatically supported through existing getRequestSettings() call Implementation details: - Uses Curl's built-in CAInfo option via Azure SDK's CurlTransportOptions - Simpler than S3 implementation as no HTTP layer modifications needed - Azure SDK handles HTTP/HTTPS layer, so we only configure Curl options This commit is based on the original patch: 0049-Custom_Azure_certificate_authority.patch Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
This change prevents users from bypassing default profile restrictions by creating new users/roles that use less restrictive profiles (like admin profile). The security issue: Normally, ClickHouse prevents switching to settings profiles that are less restrictive than your current profile by verifying the new profiles against the constraints of the current profile. However, if a user has permission to create new users, they can create a user that directly uses the admin profile, then connect directly with this user. This way they can skip the verification that happens while switching profiles and escape the constraints of the default profile. The solution: Enforce that all created profiles must inherit from the default profile (or one of its descendants), unless the current user has the `allow_non_default_profile` setting enabled. This works because the admin profile is pre-created using users.xml and not via SQL, so it can already exist and be configured with `allow_non_default_profile` set to true, while the pre-created default profile has `allow_non_default_profile` set to false. Key changes: - Added `allow_non_default_profile` setting to control profile creation - Added methods to AccessControl and SettingsProfilesCache to check profile inheritance hierarchy - Added circular dependency detection when creating/altering profiles - Auto-inject default profile as parent if no parent is specified and user doesn't have `allow_non_default_profile` permission - Added validation in SettingsConstraints to ensure parent profiles inherit from default Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
This commit adds support for delegating AWS S3 signature generation to
an external HTTP service. This enables use cases where signature
generation needs to be controlled by a proxy or external service for
security, compliance, or monitoring purposes.
The process is accessed over HTTP and the URL to access it can be
configured with the signature_delegation_url parameter in the S3 disk
configuration.
ClickHouse will make a POST request with a JSON body:
{
"canonicalRequest": "PUT\n..."
}
And expects a JSON response with status 200 and this structure:
{
"signature": "01234567890abcdef..."
}
The canonical request matches the format defined by AWS for signatures
and contains all the required information (path, host, operation...) to
let the proxy decide if the request is allowed and can be signed.
Changes:
- Added AWSAuthV4DelegatedSigner class that delegates signature
generation to an external HTTP service
- Updated S3Client to use AWSAuthSignerProvider-based constructor
(compatible with newer AWS SDK)
- Added signature_delegation_url configuration parameter to S3 disk
settings
- Updated AWS SDK submodule to aiven/clickhouse-v25.8.12.129 branch
which includes the necessary SDK changes for delegated signatures
- Updated all S3 client creation sites to pass the new parameter
This commit was adapted from the original patch to work with the newer
AWS SDK version (1.7.321) used in ClickHouse v25.8.12.129, which
requires using AWSAuthSignerProvider instead of the older constructor
signature.
Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
The process is accessed over HTTP and the URL to access it can be
configured with the signature_delegation_url parameter in the Azure
disk configuration. ClickHouse will make a POST request with a json body:
{
"stringToSign": "..."
}
And expects a JSON response with status 200 and this structure:
{
"signature": "01234567890abcdefg..."
}
The string to sign matches the format defined by Azure for signatures.
The canonical request also contains all the required information
(path, operation...) to let the proxy decide if the request is
allowed and can be signed. It's also enough information to know
the cost of the request.
This commit was applied from the patch file 0067-Delegated_signature_azure.patch
and adapted to work with ClickHouse v25.8.12.129 codebase structure.
Changes include:
- Updated Azure SDK submodule to aiven/azure-sdk-for-cpp fork
with branch aiven/clickhouse-v25.8.12.129 containing the SDK
modifications (virtual GetSignature method in SharedKeyPolicy)
- Added AzureDelegatedKeyPolicy class that extends SharedKeyPolicy
to delegate signature generation via HTTP POST requests
- Added account_name and signature_delegation_url fields to
RequestSettings for configuration
- Added delegated_signature flag to ConnectionParams to control
client creation behavior
- Updated all ConnectionParams creation sites to include the
delegated_signature flag
- Modified getClientOptions to inject AzureDelegatedKeyPolicy into
PerRetryPolicies when signature delegation is enabled
Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
This commit enforces SSL/TLS encryption for all MySQL protocol connections to ClickHouse. Previously, clients could connect without SSL and communicate in plaintext. Now, any connection attempt without SSL is rejected with an error message. When a client attempts to connect without SSL: - The connection is immediately rejected - An error packet (MySQL error code 3159) is sent with a clear message - The connection is closed Error message: "SSL support for MySQL TCP protocol is required. If using the MySQL CLI client, please connect with --ssl-mode=REQUIRED." This is a security hardening change that ensures all MySQL protocol traffic is encrypted, protecting credentials and data in transit. This helps meet compliance requirements and prevents man-in-the-middle attacks. This commit was applied from the patch file 0071-MySQL_enforce_SSL.patch Co-authored-by: Joe Lynch <joe.lynch@aiven.io>
This commit enforces HTTPS/TLS encryption for all HTTP/HTTPS connections in two ClickHouse features: 1. HTTP Dictionary Sources - dictionaries loaded from remote URLs 2. URL Storage - tables that read/write data from remote URLs Previously, both features allowed unencrypted HTTP connections, which exposed credentials, queries, and data to potential interception. Now, any attempt to use HTTP (non-HTTPS) URLs is rejected with a clear error message. Changes include: - Changed HTTPDictionarySource::Configuration::url from std::string to Poco::URI for better type safety and scheme validation - Added HTTPS validation in HTTPDictionarySource registration: throws UNSUPPORTED_METHOD error if scheme is not "https" - Added HTTPS validation in StorageURL constructor: throws BAD_ARGUMENTS error if scheme is not "https" - Simplified code by removing redundant Poco::URI object creations and using configuration.url directly throughout Security impact: - Prevents unencrypted transmission of dictionary data, table data, credentials, and query parameters - Protects against man-in-the-middle attacks - Helps meet compliance requirements (PCI-DSS, HIPAA, GDPR) - Fails fast with clear error messages if HTTP is used Breaking change: Existing configurations using http:// URLs will fail and must be updated to use https:// URLs. This commit was applied from the patch file 0091-HTTP-dictionary-source.patch Co-authored-by: Joe Lynch <joe.lynch@aiven.io>
This commit enables a special user (configured via
user_with_indirect_database_creation setting) to create and drop
Replicated databases through SQL with automatic parameter setup and
privilege management.
Main changes:
1. Allow special user calling CREATE DATABASE and DROP DATABASE, which
overrides during execution phase, ensuring correct parameters are
passed and running it with elevated privileges.
2. Introduce GRANT DEFAULT REPLICATED DATABASE PRIVILEGES statement,
which grants a default set of privileges for a database. This
maintains a common source of truth while creating databases from
different environments.
3. Add server settings:
- reserved_replicated_database_prefixes: Prohibits database names
with certain prefixes
- user_with_indirect_database_creation: User allowed simplified
database creation
- cluster_database: Reference database for cluster operations
4. Enhance DatabaseReplicated to store shard_macros for reuse when
creating new databases.
5. Enforce ON CLUSTER requirement for non-admin users when dropping
databases to ensure complete removal across the cluster.
Technical details:
- Added createReplicatedDatabaseByClient() to handle automatic database
creation with proper cluster configuration
- Added checkDatabaseNameAllowed() to validate database names
- Modified executeDDLQueryOnCluster() to support skipping distributed
checks for internal operations
- Added setGlobalContext() to Context for executing queries with
elevated privileges
[DDB-1615] [DDB-1839] [DDB-1968]
CHECK TABLE requires an explicit grant since e96e0ae. This commit adds CHECK to the default privileges granted when using GRANT DEFAULT REPLICATED DATABASE PRIVILEGES, ensuring users can run CHECK TABLE on tables in replicated databases without requiring an additional grant. The change adds "CHECK, " to the privilege list in InterpreterGrantQuery::execute() when handling default replicated database privileges. Co-authored-by: Aliaksei Khatskevich <alex.khatskevich@aiven.io>
This commit adds comprehensive SSL/TLS configuration capabilities for
PostgreSQL and MySQL database connections in ClickHouse, along with a
security fix for the MariaDB connector.
Changes:
1. MariaDB Connector/C Security Fix:
- Updated submodule to aiven/mariadb-connector-c fork
- Fixed X509_check_host call to include hostname length parameter
- Prevents potential certificate validation bypass vulnerabilities
2. PostgreSQL SSL Configuration:
- Added SSLMode enum (DISABLE, ALLOW, PREFER, REQUIRE, VERIFY_CA, VERIFY_FULL)
- Added server settings:
* postgresql_connection_pool_ssl_mode (default: PREFER)
* postgresql_connection_pool_ssl_root_cert (default: empty)
- Updated PoolWithFailover to accept SSL mode and CA certificate path
- Modified formatConnectionString to include sslmode and sslrootcert parameters
- Integrated SSL settings across all PostgreSQL integration points:
* DatabasePostgreSQL
* DatabaseMaterializedPostgreSQL
* StoragePostgreSQL
* StorageMaterializedPostgreSQL
* TableFunctionPostgreSQL
* PostgreSQLDictionarySource
3. MySQL SSL Configuration:
- Added MySQLSSLMode enum (DISABLE, PREFER, VERIFY_FULL)
- Updated Connection, Pool, and PoolWithFailover classes to accept SSL mode
- Added ssl_mode and ssl_root_cert to StorageMySQL::Configuration
- Enhanced MySQL dictionary source to support ssl_mode in named collections
- Integrated SSL settings in MySQLHelpers and StorageMySQL
Security Benefits:
- Enables encrypted connections to prevent data interception
- Supports certificate validation to prevent man-in-the-middle attacks
- Provides flexible SSL mode selection for different security requirements
- Fixes critical certificate hostname validation bug in MariaDB connector
The changes maintain backward compatibility with default SSL mode set to
PREFER, which attempts SSL but falls back gracefully if unavailable.
Co-authored-by: Joe Lynch <joe.lynch@aiven.io>
1080c5a to
3132b13
Compare
Remote calls (like `azureBlobStorageCluster`) format `AST` into `String` using `formatWithSecretsOneLine`. This function wipes sensitive data, which can lead to incorrect remote call. This commit makes wiping optional and stops it during the remote calls.
Add an extra flag to each user, called "Protected". Users with the "Protected" flag can only be created, altered, removed, or have privileges granted/revoked by users with the extra `PROTECTED_ACCESS_MANAGEMENT` privilege. This privilege is only meant to be granted to our internal admin user, and the "Protected" flag will be given to the main service user, allowing it to exist as an SQL user (which is necessary to have less privileges than hardcoded XML users), but not be removable or alterable by cluster users. Changes: - Added `protected_flag` field to User entity and `isProtected()` method to IAccessEntity interface - Added `PROTECTED_ACCESS_MANAGEMENT` access type for controlling operations on protected users - Added `PROTECTED` keyword to SQL parser for CREATE USER statements - Updated all access storage implementations (Disk, Memory, Multiple, Replicated, AccessControl) to support CheckFunc parameter for validation during insert/remove operations - Updated interpreters to enforce PROTECTED_ACCESS_MANAGEMENT privilege: - InterpreterCreateUserQuery: Check when creating/altering protected users - InterpreterDropAccessEntityQuery: Check when dropping protected users - InterpreterMoveAccessEntityQuery: Check when moving protected users - InterpreterGrantQuery: Check when granting/revoking to protected users - Updated User equality comparison to include protected_flag - Updated SHOW CREATE USER to display PROTECTED keyword when applicable API Adaptations: - Vector-based insert/insertOrReplace methods don't support CheckFunc in this ClickHouse version, so manual checks were added before calling these methods - ContextAccess::checkGranteeIsAllowed signature differs from original patch (doesn't accept ContextPtr), so the protected check is enforced at the interpreter level instead Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
The AWS SDK does not fully support IPv6 in hostnames, mainly because the escaping brackets are not parsed and removed at the right time. Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
The Azure SDK does not fully support IPv6 in hostnames, mainly because the escaping brackets are not parsed and removed at the right time. The second fix is to detect IPv6 in hosts and validate them by parsing them instead of rejecting them because they don't look like DNS name segments. This change updates the URL validation regex pattern in validateStorageAccountUrl() to accept IPv6 addresses in brackets format (e.g., [2001:db8::1]) and also adds support for optional port numbers in the URL. The azure submodule has been updated to include the corresponding fixes in the Azure SDK for C++. Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
df11094 to
5e66d02
Compare
Azure storage does not support having a prefix before the stored object key. This makes it unusable for us when we use shared buckets with prefixes for projects and backup sites. This change adds support for a `storage_prefix` configuration option that can be used with `storage_account_url` or `connection_string` configuration methods. The prefix is prepended to all blob operations, allowing multiple ClickHouse instances or projects to share the same Azure container by using different path prefixes. Previously, prefix could only be specified via the `endpoint` configuration method by including it in the endpoint URL path. This change makes prefix configuration more flexible and readable, especially for shared container scenarios. Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
This adds a new disk type that wraps an object storage disk and enables coordination with our backup system. When this disk type is used as a wrapper on top of an object storage disk, the files in object storage are not really deleted. Instead, a local file is created for each file in object storage that should be deleted. Our backup system will then be in charge to do garbage collection and only delete files that are not referenced by any backup. The file names of the deletion markers are like this: CLICKHOUSE_DIR/disks/remote_backup/backup/ESCAPED_OBJECT_PATH The object path is escaped to simplify directory management and uses the same escape mechanism that ClickHouse uses in many places to reference files (essentially escape encoding). In case Backup disk is wrapping a Cache disk then it lets the remove calls pass. Changes: - Added BackupObjectStorage wrapper class that intercepts delete operations and creates local deletion marker files - Added registerDiskBackup factory function to register the backup disk type - Extended IObjectStorage interface with getLayerName() and getWrappedObjectStorage() methods for generic layer traversal - Renamed getCacheLayersNames() to getLayersNames() to support both cache and backup layers - Added supportsLayers() method to IDisk interface - Updated DiskObjectStorage to support wrapping with backup layer - Updated MergeTreeData to track backup layers in addition to cache layers Adaptations from original patch: - Removed cloneObjectStorage() method (not in current IObjectStorage interface) - Fixed include order to prevent Poco namespace resolution issues - Updated all wrapper classes (ReadOnlyDiskWrapper, DiskEncrypted) to support the new layer interface This commit was applied from the patch file 0059-Backup_disk.patch Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
If an exception happened while copying data, then the finalize call was not done explicitly. The finalize was then be done while the out object is destructed and that led to a fatal error, killing the entire server. To fix that, catch the exception that might happen while copyData is running, run finalize there and rethrow. The rethrow is very important, the finalize after a failing copyData can also not-throw, in which case the initial error would be forgotten and ClickHouse would believe the copy succeeded. Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
When using Azure Blob Storage as a storage (table functions, external tables), the container is expected to already exist and is managed externally. Setting container_already_exists = true in StorageAzureConfiguration::createObjectStorage() optimizes initialization by skipping unnecessary container existence checks and creation attempts, eliminating 1-2 Azure API calls per storage initialization. This improves performance and avoids handling "container already exists" exceptions during initialization. The pattern matches the approach used in BackupIO_AzureBlobStorage. Co-authored-by: Tilman Moeller <tilman.moeller@aiven.io>
This commit introduces new enums for Kafka SASL mechanisms, security protocols, and SSL endpoint identification algorithms. It also implements case-insensitive parsing for these settings to ensure backward compatibility. The KafkaConfigLoader is updated to handle these new configurations, allowing for more flexible and secure Kafka connections. Changes include: - Added `KafkaSASLMechanism`, `KafkaSecurityProtocol`, and `KafkaSSLEndpointIdentificationAlgorithm` enums. - Implemented `toString` and `fromString` methods for these enums with case-insensitive support for backwards compability. - Updated KafkaConfigLoader to apply security and SSL settings from the configuration parameters. Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
Decrease the lower bound on the number of Kafka consumers. This allows setting it to zero, effectively disabling Kafka consumers globally without relying on SQL statement execution, which would allow users to interfere adversely with management operations. Changes: - Remove validation in StorageKafkaUtils.cpp that prevented num_consumers < 1, allowing kafka_num_consumers to be set to 0 - Fix division by zero in StorageKafka::getMaxBlockSize() when num_consumers is 0 by using a guard value of 1 - Fix division by zero in StorageKafka2::getMaxBlockSize() with the same guard (this fix was missing from the original patch) When num_consumers is set to 0: - Consumer loops safely skip execution (for i < 0 doesn't execute) - Container operations (resize, reserve) handle 0 correctly - Block size calculation uses guard value to prevent division by zero
Add an extra option for tables with the Kafka engine to pass the URL of a
schema registry. Also add support for schema registry basic authentication
via username:password in the URL.
Changes:
- Add kafka_format_avro_schema_registry_url setting to KafkaSettings
- Store format_avro_schema_registry_url in StorageKafka class
- Propagate schema registry URL to format settings via
createSettingsAdjustments() (works for both StorageKafka and StorageKafka2)
- Add basic authentication parsing in AvroRowInputFormat::SchemaRegistry
to extract username:password from URL and authenticate HTTP requests
This allows per-table configuration of Avro Schema Registry URLs with
authentication, enabling different Kafka tables to use different schema
registries or credentials.
Usage example:
CREATE TABLE kafka_table (...) ENGINE = Kafka(...) SETTINGS
kafka_format_avro_schema_registry_url = 'http://user:pass@registry:8081';
Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
b99eb21 to
27b27d7
Compare
kafka_auto_offset_reset was previously hardcoded to "earliest" in the Kafka consumer configuration. This commit makes it configurable per table, allowing users to choose from: smallest, earliest, beginning, largest, latest, or end. kafka_date_time_input_format was only configurable through the global config file or the context (but context settings affecting CREATE query do not work well with replication/restoration). This commit makes it configurable per table as a Kafka table setting. The default value for kafka_auto_offset_reset is EARLIEST, which maintains backward compatibility with the previous hardcoded behavior. Co-authored-by: Kevin Michel <kevin.michel@aiven.io>
These settings can be tweaked to help improve throughput when writing to Kafka using the Kafka table engine. The settings are configurable per table, allowing fine-grained control over producer behavior. New producer settings added: - kafka_producer_batch_size: Maximum size of a batch in bytes - kafka_producer_batch_num_messages: Maximum number of messages in a batch - kafka_producer_compression_codec: Compression codec (none, gzip, snappy, lz4, zstd) - kafka_producer_compression_level: Compression level (-1 for default) - kafka_producer_linger_ms: Delay to wait for messages to form batches - kafka_producer_queue_buffering_max_messages: Max messages in producer queue - kafka_producer_queue_buffering_max_kbytes: Max KB in producer queue - kafka_producer_request_required_acks: Required acknowledgments (-1, 0, or 1) All settings default to 0 or -1, which means librdkafka will use its default values. This maintains backward compatibility with existing configurations. Changes: - Added KafkaCompressionCodec enum with case-preserving string conversion - Extended ProducerConfigParams with all producer settings - Applied producer settings to cppkafka configuration in getProducerConfiguration() - Updated both StorageKafka and StorageKafka2 to pass settings to ProducerConfigParams
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Patches
Core replication (patch group 1)
(1) Advertise host from config for replicated databases
(2) Enable internal replication for DatabaseReplicated clusters
(5) Enable ALTER DATABASE MODIFY SETTING for Replicated databases
(8) Replace MergeTree with ReplicatedMergeTree in Replicated databases
(14) Tolerate ZooKeeper restart with increased retries and exponential backoff
(15) Fix ClickHouse restart with replicated tables containing {shard} macro
(16) Add missing settings to recoverLostReplica
(23) Fix unbounded replication queue growth
(37) Replicate ALTER TABLE MOVE PARTITION queries through DatabaseReplicated
(55) Change default logs_to_keep from 1000 to 300 for DatabaseReplicated
Security features (patch group 2)
(6) Added support for protected users
(13) Restrict SHOW CREATE DATABASE access
(20) Allow custom CA certificate path for S3 connections
(22) Allow custom CA certificate path for Azure Blob Storage connections
(26) Fix default profile escape vulnerability
(32) Allow delegating S3 signature to a separate process
(33) Allow delegating Azure signature to a separate process
(35) Enforce SSL in the MySQL handler
(45) Enforce HTTPS for URL storage and HTTPDictionarySource
(54) Allow avnadmin creating database using sql
(57) Add CHECK TABLE to default privileges
(58) Add SSL/TLS configuration support for PostgreSQL and MySQL connections
(61) stop wiping secrets from remote calls
S3/Azure
(19) Fix IPv6 S3 object storage host
(21) IPv6 Azure object storage host
(24) Add support for Azure object storage path prefix
(28) Add Backup disk type
(34) Fix uncaught exception if S3 storage fails
(51) Skip attempt to create a container in azure blob storage
(53) Fix use after move in Azure new settings application
Kafka integration
(12) Add Kafka configuration support for SASL and SSL settings
(25) Allow decreasing number of Kafka consumers to zero
(29) Support per-table schema registry with authentication
(30)Add kafka_auto_offset_reset and kafka_date_time_input_format settings
(56) Add extra settings to Kafka Table Engine
Appendix