-
Notifications
You must be signed in to change notification settings - Fork 491
feat: support GEO RTree index #5034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
101e660 to
5352673
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Kontinuation
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very interested in your work. After a preliminary review of the code, I have a few suggestions/comments
135714d to
75c7b08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
c00e492 to
e1ffeb1
Compare
|
The python example is currently blocked at #5144 |
fbdd59f to
49cfef2
Compare
5b53354 to
0b29e09
Compare
d15373f to
3966395
Compare
| "rust/lance-encoding", | ||
| "rust/lance-file", | ||
| "rust/lance-index", | ||
| "rust/lance-geo", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pretty big change, I would suggest we have a PR to get the lance-geo module in first, with the tests for geo type and functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK,I separate to #4678, please help review it.
Cargo.toml
Outdated
| ] } | ||
| crossbeam-queue = "0.3" | ||
| datafusion = { version = "50.0.0", default-features = false, features = [ | ||
| datafusion = { version = "50.3.0", default-features = false, features = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think let's bump datafusion in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This upgrade may be unnecessary, I just wanted to keep it consistent with the lock file. I noticed that bump datafusion 51 is in progress, so I've rolled back this bump.
rust/lance-geo/src/bbox.rs
Outdated
|
|
||
| /// Convert a length-1 GeoArrowArray to a geo::Geometry scalar. | ||
| pub fn bounding_box_single_scalar(arr: &dyn GeoArrowArray) -> ArrowResult<BoundingBox> { | ||
| assert_eq!(arr.len(), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer error not panic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method is unnecessary, removed it.
| let hilbert_max = ((1 << 16) - 1) as f64; | ||
| let len = rect_array.len(); | ||
| let width = self.bbox.maxx() - self.bbox.minx(); | ||
| let height = self.bbox.maxy() - self.bbox.miny(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the points are in the same x or y axis? seems like we need a minimum for width and height that is not zero, since I remember hilbert curve does not divide by 0 (if it does then never mind)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. In fact, division-by-zero will not occur. If all x are 0, in the subsequent expression, (bbox.minx() + bbox.maxx()) / 2. - self.bbox.minx() will yield 0, resulting in the entire expression being 0.
let x = (hilbert_max * ((bbox.minx() + bbox.maxx()) / 2. - self.bbox.minx()) / width)
.floor() as u32;
To be safe, I've also updated the logic to guard against division-by-zero by setting the width/height to 1.0 if max equals min. This approach aligns the one used in flatbush (lines 158-171) which effectively ensures the normalization accuracy.
918adb1 to
9d0ab4d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And one other thing is that we should raise a separated PR + vote to add RTree to the table index spec, treat it as a spec change. This follows the new community governance rule, and also allows a pre-review of the general design. We did not do that for zone map and it caused quite a few issues, so I think let's follow the new process here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I’ve raised a PR #5360 outlining the RTree index spec.
Change-Id: Ic057263d3ff4e9dc7d5874e0b92687b551cfb836
No description provided.