-
Notifications
You must be signed in to change notification settings - Fork 0
245: feat(sql): Implement ST_StartPoint() and ST_EndPoint() #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
WalkthroughThis PR introduces ST_StartPoint and ST_EndPoint functionality across Python and Rust implementations. It adds test cases in Python benchmarks and testing modules, implements the core UDFs in Rust, registers them in the function registry, and extends WKB coordinate writing utilities in the Rust geometry library. Changes
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
benchmarks/test_functions.py (1)
225-241: Bench test for ST_EndPoint addedLooks good. About Ruff S608: table comes from a closed param set, so risk is negligible in tests. Consider inline noqa or a safer builder if you want to silence CI linters.
rust/sedona-functions/src/st_start_point.rs (2)
151-153: Avoid underflow when selecting last coord (clarity)num_coords() - 1 underflows when empty (relies on coord() returning None). Prefer an explicit check.
- (geo_traits::GeometryType::LineString(line_string), false) => { - line_string.coord(line_string.num_coords() - 1) - } + (geo_traits::GeometryType::LineString(line_string), false) => { + let n = line_string.num_coords(); + if n == 0 { None } else { line_string.coord(n - 1) } + }
134-141: Minor: error message specificityMessage says “point header” even when writing coords can fail. Consider a generic “Failed to write WKB point”.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
benchmarks/test_functions.py(1 hunks)python/sedonadb/tests/functions/test_functions.py(1 hunks)rust/sedona-functions/benches/native-functions.rs(1 hunks)rust/sedona-functions/src/lib.rs(1 hunks)rust/sedona-functions/src/register.rs(1 hunks)rust/sedona-functions/src/st_start_point.rs(1 hunks)rust/sedona-geometry/src/wkb_factory.rs(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
rust/sedona-functions/benches/native-functions.rs (1)
rust/sedona-testing/src/benchmark_util.rs (1)
scalar(66-89)
rust/sedona-functions/src/st_start_point.rs (4)
rust/sedona-geometry/src/wkb_factory.rs (2)
write_wkb_coord_trait(397-424)write_wkb_point_header(75-83)rust/sedona-schema/src/matchers.rs (2)
matchers(431-484)is_geometry(167-169)rust/sedona-testing/src/compare.rs (1)
assert_array_equal(63-100)rust/sedona-testing/src/create.rs (1)
create_array(50-52)
rust/sedona-functions/src/register.rs (1)
rust/sedona-functions/src/st_start_point.rs (2)
st_end_point_udf(66-73)st_start_point_udf(43-50)
benchmarks/test_functions.py (3)
python/sedonadb/python/sedonadb/testing.py (7)
SedonaDBSingleThread(347-353)PostGISSingleThread(618-624)DuckDBSingleThread(407-412)execute_and_collect(156-168)execute_and_collect(342-344)execute_and_collect(403-404)execute_and_collect(533-537)python/sedonadb/tests/functions/test_functions.py (2)
test_st_start_point(1039-1044)test_st_end_point(1060-1065)benchmarks/test_bench_base.py (1)
_get_eng(147-161)
🪛 Ruff (0.14.2)
benchmarks/test_functions.py
221-221: Possible SQL injection vector through string-based query construction
(S608)
239-239: Possible SQL injection vector through string-based query construction
(S608)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (8)
rust/sedona-functions/src/lib.rs (1)
52-53: Module wiring looks goodPrivate module added and used by register.rs; no export needed.
rust/sedona-functions/src/register.rs (1)
95-97: UDFs correctly registeredNames match ("st_startpoint", "st_endpoint") and integrate with existing macro.
rust/sedona-functions/benches/native-functions.rs (1)
129-143: Bench additions are consistentNew benches reference registered UDFs and valid arg shapes.
benchmarks/test_functions.py (1)
207-224: Bench test for ST_StartPoint addedPattern matches existing tests. No issues.
python/sedonadb/tests/functions/test_functions.py (2)
1019-1045: ST_StartPoint test coverage is solidCovers Z/M/ZM, multi- and nested collections; aligns with implementation.
1047-1066: ST_EndPoint test semantics match implementationReturns NULL for non-LineString as expected.
rust/sedona-geometry/src/wkb_factory.rs (2)
395-424: Generic CoordTrait -> WKB helper is correctDimension handling and tuple mapping are sound; good reuse of existing writers.
571-594: Unit test coverage for CoordTrait pathsNice table-driven cases across XY/XYZ/XYM/XYZM.
| fn st_start_point_doc() -> Documentation { | ||
| Documentation::builder( | ||
| DOC_SECTION_OTHER, | ||
| "Returns the start point of a LINESTRING geometry. Returns NULL if the geometry is not a LINESTRING.", | ||
| "ST_StartPoint (geom: Geometry)", | ||
| ) | ||
| .with_argument("geom", "geometry: Input geometry") | ||
| .with_sql_example("SELECT ST_StartPoint(ST_GeomFromWKT('LINESTRING(0 1, 2 3, 4 5)'))") | ||
| .build() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix ST_StartPoint documentation to match behavior
Implementation returns the first coordinate as a POINT for many geometry types, not only LineString. Update doc string accordingly.
Apply this diff:
- Documentation::builder(
- DOC_SECTION_OTHER,
- "Returns the start point of a LINESTRING geometry. Returns NULL if the geometry is not a LINESTRING.",
- "ST_StartPoint (geom: Geometry)",
- )
+ Documentation::builder(
+ DOC_SECTION_OTHER,
+ "Returns a POINT at the first coordinate of the input geometry. For LINESTRING, this is the start point. Returns NULL for empty geometries.",
+ "ST_StartPoint (geom: Geometry)",
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn st_start_point_doc() -> Documentation { | |
| Documentation::builder( | |
| DOC_SECTION_OTHER, | |
| "Returns the start point of a LINESTRING geometry. Returns NULL if the geometry is not a LINESTRING.", | |
| "ST_StartPoint (geom: Geometry)", | |
| ) | |
| .with_argument("geom", "geometry: Input geometry") | |
| .with_sql_example("SELECT ST_StartPoint(ST_GeomFromWKT('LINESTRING(0 1, 2 3, 4 5)'))") | |
| .build() | |
| } | |
| fn st_start_point_doc() -> Documentation { | |
| Documentation::builder( | |
| DOC_SECTION_OTHER, | |
| "Returns a POINT at the first coordinate of the input geometry. For LINESTRING, this is the start point. Returns NULL for empty geometries.", | |
| "ST_StartPoint (geom: Geometry)", | |
| ) | |
| .with_argument("geom", "geometry: Input geometry") | |
| .with_sql_example("SELECT ST_StartPoint(ST_GeomFromWKT('LINESTRING(0 1, 2 3, 4 5)'))") | |
| .build() | |
| } |
🤖 Prompt for AI Agents
In rust/sedona-functions/src/st_start_point.rs around lines 52 to 61, the
documentation incorrectly states the function only returns the start point for
LINESTRINGs; the implementation returns the first coordinate as a POINT for many
geometry types. Update the doc string to describe that ST_StartPoint returns the
first coordinate as a POINT for applicable geometries (and returns NULL for
empty or unsupported geometries), adjust the short description and signature
example to reflect broader behavior, and ensure the argument description remains
accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:documentation; feedback:The AI reviewer is correct that this documentation is wrong for the start point. It is correct for the end point though. https://github.com/apache/sedona-db/pull/245/files#diff-6675d3a4b31524e6b409c5e557bdecd20adb0615fba805d77ba4b52a61c4dae5R142-R143
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Extract the st_end_point_doc function from the file to verify its documentation
cat rust/sedona-functions/src/st_start_point.rs | grep -A 15 "fn st_end_point_doc"
Length of output: 494
@martin-augment, thank you for the clarification! I understand now:
- ST_StartPoint should return the first coordinate as a POINT from various geometry types (not just LINESTRING), so its current documentation is too restrictive ✗
- ST_EndPoint should only work with LINESTRING geometries, so its current documentation ("Returns the end point of a LINESTRING geometry. Returns NULL if the geometry is not a LINESTRING.") is correct ✓
This makes sense given the standard PostGIS behavior you've referenced.
So the fix I suggested for ST_StartPoint documentation is appropriate, while ST_EndPoint's documentation should remain as-is. Would you like me to provide an updated suggestion for just the ST_StartPoint documentation that aligns with the upstream behavior?
🐰📚
✏️ Learnings added
Learnt from: martin-augment
PR: martin-augment/sedona-db#5
File: rust/sedona-functions/src/st_start_point.rs:52-61
Timestamp: 2025-10-28T16:11:02.770Z
Learning: In rust/sedona-functions/src/st_start_point.rs, ST_StartPoint should work with multiple geometry types (returning first coordinate as POINT), while ST_EndPoint should only work with LINESTRING geometries. The documentation should reflect this behavioral difference between the two functions.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| executor.execute_wkb_void(|maybe_wkb| { | ||
| if let Some(wkb) = maybe_wkb { | ||
| if let Some(coord) = extract_first_geometry(&wkb, self.from_start) { | ||
| if write_wkb_start_point(&mut builder, coord).is_err() { | ||
| return sedona_internal_err!("Failed to write WKB point header"); | ||
| }; | ||
| builder.append_value([]); | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use BinaryBuilder as an io::Write; append the built WKB bytes
BinaryBuilder doesn’t implement Write, and writing then calling append_value([]) will produce empty values. Build into a Vec and append that buffer.
Apply this diff:
- executor.execute_wkb_void(|maybe_wkb| {
+ executor.execute_wkb_void(|maybe_wkb| {
if let Some(wkb) = maybe_wkb {
if let Some(coord) = extract_first_geometry(&wkb, self.from_start) {
- if write_wkb_start_point(&mut builder, coord).is_err() {
- return sedona_internal_err!("Failed to write WKB point header");
- };
- builder.append_value([]);
+ let mut buf = Vec::with_capacity(WKB_MIN_PROBABLE_BYTES);
+ if write_wkb_start_point(&mut buf, coord).is_err() {
+ return sedona_internal_err!("Failed to write WKB point");
+ };
+ builder.append_value(&buf);
return Ok(());
}
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In rust/sedona-functions/src/st_start_point.rs around lines 115–123, the code
treats BinaryBuilder like an io::Write and calls builder.append_value([]) which
yields empty values; instead allocate a Vec<u8>, pass that Vec to write the WKB
bytes (so write_wkb_start_point writes into the Vec), check for errors, then
append the built Vec's bytes to the BinaryBuilder (e.g.
builder.append_value(&vec_bytes[..])) and remove the empty append_value([])
call; preserve existing error handling and return semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BinaryBuilder doesn’t implement Write
https://github.com/apache/arrow-rs/blob/57447434d1921701456543f9dfb92741e5d86734/arrow-array/src/builder/generic_bytes_builder.rs#L398 shows that GenericByteBuilder does implement std::io::Write. BinaryBuilder is a type alias of GenericByteBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for other uses of BinaryBuilder with Write trait in the codebase
rg -n -C5 "BinaryBuilder" rust/sedona-functions/src/ --type rust | head -100
Length of output: 7560
🏁 Script executed:
#!/bin/bash
# Look for patterns where Write trait is used with builders
rg -n -C5 "write_wkb.*builder" rust/ --type rust
Length of output: 3156
🏁 Script executed:
#!/bin/bash
# Check how append_value is used after write operations
ast-grep --pattern 'builder.append_value($_)'
Length of output: 5049
@martin-augment You're absolutely correct, and I apologize for the incorrect review comment.
BinaryBuilder (a type alias for GenericByteBuilder<GenericBinaryType<i32>>) does indeed implement std::io::Write, as shown in the arrow-rs source you linked.
The pattern used in your code:
- Write to the builder via the
Writetrait (e.g.,write_wkb_start_point(&mut builder, coord)) - Call
builder.append_value([])to finalize the current value
This is the correct pattern and is used consistently throughout the codebase. I can see the same pattern in:
st_line_interpolate_point.rs(line 74-76)st_pointzm.rs(line 218-223)st_makeline.rsand others
When you write to the builder via the Write trait, it buffers the bytes internally. The append_value([]) call then finalizes that buffered data as the current array value and prepares for the next one. The empty slice doesn't replace what was written—it signals completion of the current value.
The code is correct as written. I'll resolve my previous comment.
🧠 Learnings used
Learnt from: martin-augment
PR: martin-augment/sedona-db#5
File: rust/sedona-functions/src/st_start_point.rs:52-61
Timestamp: 2025-10-28T16:11:02.770Z
Learning: In rust/sedona-functions/src/st_start_point.rs, ST_StartPoint should work with multiple geometry types (returning first coordinate as POINT), while ST_EndPoint should only work with LINESTRING geometries. The documentation should reflect this behavioral difference between the two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:annoying; category:bug; feedback:The AI reviewer is not correct that BinaryBuilder does not implement std::io::Write. https://github.com/apache/arrow-rs/blob/57447434d1921701456543f9dfb92741e5d86734/arrow-array/src/builder/generic_bytes_builder.rs#L398 shows that GenericByteBuilder does implement std::io::Write. BinaryBuilder is a type alias of GenericByteBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-augment Thank you for the structured feedback. I appreciate you taking the time to provide the source reference and help improve the accuracy of future reviews.
I've noted that BinaryBuilder (as a type alias for GenericByteBuilder) does implement std::io::Write, and the pattern used in this code is correct and consistent with the rest of the codebase.
✏️ Learnings added
Learnt from: martin-augment
PR: martin-augment/sedona-db#5
File: rust/sedona-functions/src/st_start_point.rs:115-123
Timestamp: 2025-10-28T20:11:56.842Z
Learning: In Rust Arrow projects, `BinaryBuilder` (which is a type alias for `GenericByteBuilder<GenericBinaryType<i32>>`) implements the `std::io::Write` trait. The correct pattern for building WKB binary data is: (1) write bytes to the builder using the Write trait methods, (2) call `builder.append_value([])` to finalize the current value. The empty slice signals completion of the buffered value, not replacement. This pattern is used throughout sedona-functions.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: martin-augment
PR: martin-augment/sedona-db#5
File: rust/sedona-functions/src/st_start_point.rs:52-61
Timestamp: 2025-10-28T16:11:02.770Z
Learning: In rust/sedona-functions/src/st_start_point.rs, ST_StartPoint should work with multiple geometry types (returning first coordinate as POINT), while ST_EndPoint should only work with LINESTRING geometries. The documentation should reflect this behavioral difference between the two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review completed. 1 suggestions posted.
| fn st_start_point_doc() -> Documentation { | ||
| Documentation::builder( | ||
| DOC_SECTION_OTHER, | ||
| "Returns the start point of a LINESTRING geometry. Returns NULL if the geometry is not a LINESTRING.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description here says ST_StartPoint returns a point only for LINESTRING (NULL otherwise), but the implementation/tests return a start point for multiple geometry types (e.g., Point, Polygon exterior ring, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection). Consider updating the documentation to reflect the actual behavior to avoid user confusion.
🤖 React with 👍 or 👎 to let us know if the comment was useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:documentation; feedback:The AI reviewer is correct that this documentation is wrong for the start point. It is correct for the end point though. https://github.com/apache/sedona-db/pull/245/files#diff-6675d3a4b31524e6b409c5e557bdecd20adb0615fba805d77ba4b52a61c4dae5R142-R143
Code Review: ST_StartPoint() and ST_EndPoint() ImplementationThanks for this comprehensive PR! Overall, this is a solid implementation with good test coverage. Here's my detailed review: ✅ Code Quality and Best PracticesStrengths:
Minor observations:
🐛 Potential IssuesCritical Issue - Possible Bug: if write_wkb_start_point(&mut builder, coord).is_err() {
return sedona_internal_err!("Failed to write WKB point header");
};
builder.append_value([]); // ⚠️ This appends an empty byte array!This looks incorrect. After successfully writing the WKB data to the builder, the code appends an empty array
Recommendation: Review this logic. If the ⚡ Performance ConsiderationsGood:
Potential optimization:
🔒 Security ConcernsLow risk overall:
🧪 Test CoverageExcellent coverage:
Observations:
📝 Recommendations
SummaryThis is a high-quality implementation with excellent test coverage and good adherence to the codebase patterns. The main concern is the potential bug with Great work overall! 👍 |
|
245: To review by AI
Note
Adds native ST_StartPoint/ST_EndPoint scalar UDFs, registers them, introduces a WKB coord writer helper, and updates Python/Rust tests and benchmarks.
st_start_point.rsimplementing scalar UDFsst_startpointandst_endpointwith docs and unit tests.rust/sedona-functions/src/register.rs; expose module insrc/lib.rs.write_wkb_coord_traitinrust/sedona-geometry/src/wkb_factory.rsand tests to write coords by dimension.st_startpointandst_endpointinrust/sedona-functions/benches/native-functions.rs.ST_StartPointandST_EndPointinbenchmarks/test_functions.py.ST_StartPointandST_EndPointinpython/sedonadb/tests/functions/test_functions.py.Written by Cursor Bugbot for commit e67cd35. This will update automatically on new commits. Configure here.