Skip to content

refactor: Improved Hilbert Clustering with Range Partition #17424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 78 commits into from
Mar 20, 2025

Conversation

zhyass
Copy link
Member

@zhyass zhyass commented Feb 7, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR introduces an improved implementation of Hilbert clustering, replacing the original global sorting approach with a more efficient range partition strategy. The new approach significantly reduces computational overhead and improves performance, especially for large-scale datasets.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Feb 7, 2025
@zhyass zhyass marked this pull request as draft February 7, 2025 14:31
@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Feb 7, 2025
@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Feb 10, 2025
@databendlabs databendlabs deleted a comment from github-actions bot Feb 10, 2025
@databendlabs databendlabs deleted a comment from github-actions bot Feb 10, 2025
@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Feb 10, 2025
@databendlabs databendlabs deleted a comment from github-actions bot Feb 11, 2025
@databendlabs databendlabs deleted a comment from github-actions bot Feb 11, 2025
@databendlabs databendlabs deleted a comment from github-actions bot Feb 11, 2025
@databendlabs databendlabs deleted a comment from github-actions bot Mar 10, 2025
@databendlabs databendlabs deleted a comment from github-actions bot Mar 10, 2025
@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Mar 13, 2025
Copy link
Contributor

Docker Image for PR

  • tag: pr-17424-a8b9159-1741831967

note: this image tag is only available for internal use,
please check the internal doc for more details.

@zhyass zhyass changed the title refactor: [DO NOT MERGE] hilbert clustering refactor: hilbert clustering Mar 17, 2025
@zhyass zhyass changed the title refactor: hilbert clustering refactor: Improved Hilbert Clustering with Range Partition Mar 17, 2025
@zhyass zhyass marked this pull request as ready for review March 17, 2025 03:04
@dantengsky
Copy link
Member

dantengsky commented Mar 20, 2025

@zhyass Pushed a commit with some extra code comments, hoping they’re helpful. Please help check if there are any incorrect descriptions and revise them.

@dantengsky dantengsky merged commit 08c4f54 into databendlabs:main Mar 20, 2025
211 of 221 checks passed
loloxwg pushed a commit to loloxwg/databend that referenced this pull request Apr 3, 2025
…abs#17424)

* fix

* fix

* fix

* fix

* fix

* fix

* update

* for test

* for test

* for test

* for test

* fix

* fix

* fix

* remove m_cte

* fix

* fix

* fix

* fix

* fix

* restore m cte

* fix

* fix

* fix

* remove m_cte

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* for test

* fix

* fix

* fix

* fix

* for test

* fix

* fix memory size

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* recover

* fix

* fix

* fix

* fix

* fix

fix

fix

* fix

* fix test

* fix test

* fix test

* fix test

* add hilbert_range_index

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* chore: add some extra code comments

---------

Co-authored-by: dantengsky <dantengsky@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants