-
Notifications
You must be signed in to change notification settings - Fork 369
feat: intial hudi reg test #3641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, @rahil-c ! I'm juts wondering if using shell might be an overkill for this test. Specific comment thread below.
| "http://${POLARIS_HOST:-localhost}:8181/api/catalog/v1/config?warehouse=${CATALOG_NAME}" | ||
| echo | ||
| echo "Catalog created" | ||
| cat << EOF | ${SPARK_HOME}/bin/spark-sql -S \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to run this test as an Integration test under JUnit inside the Gradle builld?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dimas-b The purpose of this regression test is to validate the end-to-end user experience when using Spark with both --packages and --jars. This is an important scenario that cannot be fully covered by integration tests.
While it is true that this test is relatively expensive, that is why it includes only very basic test cases. More complex scenarios and edge cases are covered by integration tests, which provide a more cost-effective approach.
| # Define test suites to run | ||
| # Each suite specifies: test_file:table_format:test_shortname | ||
| declare -a TEST_SUITES=( | ||
| "spark_sql.sh:delta:spark_sql" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about let's enforce the test file name to the format like xxx_<table_format>.sh, and have a separate folder to include all test src file and reference file. Then we just need to list the folder to get all test files, and extract the table format by parsing the file name. The benefit would be easy to onboard new tests, and developer doesn't have to input a long string when running single test (just the file name)
| # this is mostly useful for building the Docker image with all needed dependencies | ||
| ${SPARK_HOME}/bin/spark-sql -e "SELECT 1" | ||
| if [[ "$TABLE_FORMAT" == "hudi" ]]; then | ||
| # For Hudi: Pass --packages on command line to match official Hudi docs approach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think we need the if else here anymore
| rm -rf /tmp/spark_hudi_catalog/ | ||
|
|
||
| curl -i -X DELETE -H "Authorization: Bearer ${SPARK_BEARER_TOKEN}" -H 'Accept: application/json' -H 'Content-Type: application/json' \ | ||
| http://${POLARIS_HOST:-localhost}:8181/api/management/v1/catalogs/${CATALOG_NAME} > /dev/stderr No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line
Summary
cc @gh-yzou @singhpk234 @flyrain
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)