Skip to content

Conversation

@tomas-quix
Copy link
Collaborator

This commit introduces a new destination connector that writes time-series data from Kafka to S3 as Hive-partitioned Parquet files.

Key features:

  • Supports Hive partitioning by any column, including time-based partitioning from timestamp columns.
  • Offers optional integration with a REST Catalog for table registration.
  • Includes configurable batch sizes and parallel uploads for optimal performance.
  • Validates partition strategies against existing tables to prevent data corruption.

This commit introduces a new destination connector that writes time-series data from Kafka to S3 as Hive-partitioned Parquet files.

Key features:
- Supports Hive partitioning by any column, including time-based partitioning from timestamp columns.
- Offers optional integration with a REST Catalog for table registration.
- Includes configurable batch sizes and parallel uploads for optimal performance.
- Validates partition strategies against existing tables to prevent data corruption.
Updates the library item ID to be more descriptive
of the destination.
Refactors test configurations for Quixlake Timeseries and S3 File destinations.

Updates test parameters such as batch sizes, commit intervals, worker counts,
and message counts to optimize test execution time and reliability.

Adds `mypy-boto3-s3` dependency to s3-file destination.

Renames "Quix TS Datalake Sink" to "Quix DataLake Timeseries Sink" for clarity.

## How to run

Create a [Quix](https://portal.platform.quix.io/signup?xlink=github) account or log in and visit the `Connectors` tab to use this connector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants