Skip to content

Conversation

@james-willis
Copy link
Collaborator

@james-willis james-willis commented Jul 9, 2025

Did you read the Contributor Guide?

Is this PR related to a ticket?

  • Yes, and the PR name follows the format [GH-2074] my subject. Closes #2074

What changes were proposed in this PR?

This add ExpandAddress and ParseAddress functions backed by libpostal, including support for code gen.

How was this patch tested?

Unit Tests

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation.

This comment was marked as outdated.

@james-willis
Copy link
Collaborator Author

james-willis commented Jul 10, 2025

@Imbruced Any idea what might be causing the test failure in this new version I added to the test matrix?

spark 3.5.4, java 17, scala 2.12.8

Edit: I see it is that we only set spark home when the spark version is 3.5.0 or 4.0.0. I will update this.

@james-willis james-willis marked this pull request as ready for review July 10, 2025 21:19
@james-willis james-willis requested a review from jiayuasu as a code owner July 10, 2025 21:19
@jiayuasu
Copy link
Member

Jpostal requires Java 17? Sedona is compiled against Java 11. Will this even pass our Java tests in CI?

@jiayuasu
Copy link
Member

@james-willis Since the jpostal under Wherobots is already a fork, why not just compile it against Java 11? What specific Java 16+ features were used in the JNI binding and can we drop it?

@james-willis
Copy link
Collaborator Author

actually turned out to be really easy to support java 11: wherobots/jpostal#13

@github-actions github-actions bot added the root label Jul 11, 2025
@github-actions github-actions bot removed the root label Jul 11, 2025
@jiayuasu
Copy link
Member

Please also remove all show and print command

@james-willis james-willis requested a review from jiayuasu July 11, 2025 23:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates libpostal to provide address normalization and parsing via two new functions, ExpandAddress and ParseAddress, complete with codegen support, configuration, and documentation.

  • Introduce new SQL and DataFrame API functions backed by libpostal (ExpandAddress, ParseAddress).
  • Add helper LibPostalUtils and extend SedonaConf to configure data directory and useSenzing options.
  • Provide comprehensive Scala and Python tests, update documentation, add jpostal dependency, and adjust CI matrix.

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
spark/common/src/test/scala/org/apache/sedona/sql/GeoStatsSuite.scala Disable adaptive SQL execution in tests
spark/common/src/test/scala/org/apache/sedona/sql/AddressProcessingFunctionsTest.scala Add unit tests for ExpandAddress and ParseAddress
spark/common/src/test/java/org/apache/sedona/core/utils/SedonaConfTest.java Cover null input for bytesFromString
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/st_functions.scala Register ExpandAddress and ParseAddress in SQL functions
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/LibPostalUtils.scala Implement utility methods for jpostal config
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Functions.scala Define expression classes with eval and codegen for the new functions
spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala Add new functions to the UDF catalog
spark/common/src/main/java/org/apache/sedona/core/utils/SedonaConf.java Extend configuration to include libpostal settings
spark/common/pom.xml Add com.wherobots:jpostal dependency
python/tests/sql/test_dataframe_api.py Add DataFrame API plan tests for the new functions
python/sedona/spark/sql/st_functions.py Add Python wrappers for ExpandAddress and ParseAddress
docs/api/sql/Function.md Document the new SQL functions
.github/workflows/java.yml Expand CI Spark version matrix and simplify Python setup
Comments suppressed due to low confidence (1)

james-willis and others added 5 commits July 14, 2025 13:42
…ssingFunctionsTest.scala

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ConfTest.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ssingFunctionsTest.scala

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jiayuasu jiayuasu added this to the sedona-1.8.0 milestone Jul 15, 2025
@jiayuasu jiayuasu merged commit 178d0f4 into apache:master Jul 15, 2025
35 checks passed
Subham-KRLX pushed a commit to Subham-KRLX/sedona that referenced this pull request Jul 17, 2025
* ExpandAddress and ParseAddess support via libpostal

* PR comments; edge case in SedonaConf changes

* Update spark/common/src/test/scala/org/apache/sedona/sql/AddressProcessingFunctionsTest.scala

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update python/sedona/spark/sql/st_functions.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update spark/common/src/test/java/org/apache/sedona/core/utils/SedonaConfTest.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update spark/common/src/test/scala/org/apache/sedona/sql/AddressProcessingFunctionsTest.scala

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* align null case logic between eval and codegen cases

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Integrated with libpostal to provide ExpandAddress and ParseAddress functions

2 participants