Skip to content

Conversation

@rahil-c
Copy link
Contributor

@rahil-c rahil-c commented Feb 2, 2026

Summary

  • Adding intial regression test for polaris hudi integration, following exist pattern set by Delta regression test
  • Made changes in run.sh and setup.sh in order to ensure that spark session can be started correctly depending on the table format.
  • Ran locally both delta regression test and hudi regression test to ensure they pass.

cc @gh-yzou @singhpk234 @flyrain

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @rahil-c ! I'm juts wondering if using shell might be an overkill for this test. Specific comment thread below.

"http://${POLARIS_HOST:-localhost}:8181/api/catalog/v1/config?warehouse=${CATALOG_NAME}"
echo
echo "Catalog created"
cat << EOF | ${SPARK_HOME}/bin/spark-sql -S \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to run this test as an Integration test under JUnit inside the Gradle builld?

Copy link
Contributor Author

@rahil-c rahil-c Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dimas-b we currently have the following integration test for polaris hudi here: #3194

In terms of the reg test, I was the following the shell pattern that @gh-yzou had done for Delta, now for Hudi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dimas-b The purpose of this regression test is to validate the end-to-end user experience when using Spark with both --packages and --jars. This is an important scenario that cannot be fully covered by integration tests.
While it is true that this test is relatively expensive, that is why it includes only very basic test cases. More complex scenarios and edge cases are covered by integration tests, which provide a more cost-effective approach.

# Define test suites to run
# Each suite specifies: test_file:table_format:test_shortname
declare -a TEST_SUITES=(
"spark_sql.sh:delta:spark_sql"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about let's enforce the test file name to the format like xxx_<table_format>.sh, and have a separate folder to include all test src file and reference file. Then we just need to list the folder to get all test files, and extract the table format by parsing the file name. The benefit would be easy to onboard new tests, and developer doesn't have to input a long string when running single test (just the file name)

# this is mostly useful for building the Docker image with all needed dependencies
${SPARK_HOME}/bin/spark-sql -e "SELECT 1"
if [[ "$TABLE_FORMAT" == "hudi" ]]; then
# For Hudi: Pass --packages on command line to match official Hudi docs approach
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need the if else here anymore

rm -rf /tmp/spark_hudi_catalog/

curl -i -X DELETE -H "Authorization: Bearer ${SPARK_BEARER_TOKEN}" -H 'Accept: application/json' -H 'Content-Type: application/json' \
http://${POLARIS_HOST:-localhost}:8181/api/management/v1/catalogs/${CATALOG_NAME} > /dev/stderr No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants