Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] add ClickHouse sink #50377

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

jecsand838
Copy link
Contributor

Why are these changes needed?

Greetings again from ElastiFlow!

This PR introduces a fully-featured ClickHouse Datasink for Ray, enabling distributed writes from Ray Datasets into ClickHouse. The implementation aligns with Ray’s modern Datasink lifecycle, incorporating mode-based table management (create, append, overwrite), automatic schema handling, parallel block insertion, and optional chunking for large inserts. Additionally, it enhances compatibility with newer Ray versions by handling WriteResult objects in on_write_complete(), ensuring robust write tracking. These changes are essential for supporting high-performance, scalable data ingestion into ClickHouse, making Ray more versatile for real-time analytics and ETL workflows.

Evidence

Screenshot 2025-02-10 at 3 26 59 AM

Related issue number

Follow-up on #49526
@alexeykudinkin

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Connor Sanders <connor@elastiflow.com>
… were building

Signed-off-by: Connor Sanders <connor@elastiflow.com>
Signed-off-by: Connor Sanders <connor@elastiflow.com>
Signed-off-by: Connor Sanders <connor@elastiflow.com>
Signed-off-by: Connor Sanders <connor@elastiflow.com>
@jecsand838 jecsand838 requested a review from a team as a code owner February 10, 2025 09:52
@jecsand838 jecsand838 changed the title [data] add Clickouse sink [data] add ClickHouse sink Feb 10, 2025
Signed-off-by: Connor Sanders <connor@elastiflow.com>
Signed-off-by: Connor Sanders <connor@elastiflow.com>
@jcotant1 jcotant1 added the data Ray Data-related issues label Feb 11, 2025
@alexeykudinkin alexeykudinkin added the go add ONLY when ready to merge, run all tests label Feb 12, 2025
@alexeykudinkin alexeykudinkin self-requested a review February 12, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants