-
Notifications
You must be signed in to change notification settings - Fork 0
2686: feat: Add bash script to build and run fuzz testing #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe change adds a new Bash script that orchestrates building and running fuzz tests for Apache Comet with Spark. It handles Maven-based version and artifact retrieval, locates required JARs, sets environment defaults, and executes build steps followed by three separate Spark jobs. Changes
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| make -f "${PARENT_DIR}/Makefile" | ||
| else | ||
| echo "Building Fuzz testing jar..." | ||
| mvn package -DskipTests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Fuzz Testing Builds Too Much
The mvn package -DskipTests command runs from the current working directory instead of the fuzz-testing directory. When users invoke the script from the parent directory (e.g., ./fuzz-testing/run.sh), this builds all Maven modules rather than only the fuzz testing jar, contradicting both the else-branch logic and the "Building Fuzz testing jar..." message. The command needs -f "${DIR}/pom.xml" to build only the fuzz module regardless of invocation location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback: The Bugbot AI reviewer is correct that the mvn command needs -f ... argument to specify which Maven module to build. Otherwise it will build whatever the current module is. Also the reviewer suggests to wrap $DIR in double quotes - this is also a good suggestion because $DIR may contain space characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review completed. 3 suggestions posted. Focused on ensuring the script reliably builds required artifacts and behaves consistently regardless of the caller’s working directory.
Comment augment review to trigger a new review at any time.
| NUM_ROWS="${NUM_ROWS:-200}" | ||
| NUM_QUERIES="${NUM_QUERIES:-500}" | ||
|
|
||
| if [ ! -f "${COMET_SPARK_JAR}" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional only builds the fuzz-testing jar when the Comet Spark jar already exists; on a clean run, COMET_FUZZ_JAR may not be built before it’s used below, causing spark-submit to fail. Consider building the fuzz jar regardless (e.g., always after ensuring the Spark jar is built).
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback: The Augment AI reviewer is correct that the check for existence of jars may lead to a case where the fuzz jar is not built. The suggestion to always build the fuzz jar is good because even if it is built it might be an old version. Prevents runtime errors due to a missing jar in the classpath or execution of out of date fuzz testing classes.
| DIR="$(cd "$(dirname "$0")" && pwd)" | ||
| PARENT_DIR="${DIR}/.." | ||
| SPARK_MASTER="${SPARK_MASTER:-local[*]}" | ||
| PROJECT_VERSION="$(mvn -q help:evaluate -Dexpression=project.version -DforceStdout)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PROJECT_VERSION is obtained via mvn without specifying a POM (-f), unlike the other Maven evaluations; this makes the result depend on the current working directory. Consider pinning the POM path so the script is robust when run from any location.
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback: The Augment AI reviewer is correct that the mvn command may read the version of some other Maven module when the script is executed from a wrong folder. Prevents calculating wrong version and a failure to find the fuzz testing jar later when including it in the classpath.
| make -f "${PARENT_DIR}/Makefile" | ||
| else | ||
| echo "Building Fuzz testing jar..." | ||
| mvn package -DskipTests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, mvn package is invoked without -f, so it depends on the caller’s CWD and may not build the intended fuzz module. Consider specifying the module POM explicitly to avoid accidental builds or failures.
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback: The Augment AI reviewer is correct that without using the -f ... argument the mvn command may build the wrong jar when the script is executed from a different folder than fuzz-testing. It would be better to use -f absolutePath/pom.xml to prevent that.
Code ReviewThank you for contributing this automation script for fuzz testing! This is a valuable addition that simplifies the workflow. Here's my detailed review: ✅ Strengths
🐛 Potential Bugs & Issues1. Build Logic Error (line 34-40): The logic is inverted. If the Comet Spark jar does not exist, it builds it but then skips building the fuzz testing jar (which is always needed). If the Comet jar exists, it only builds the fuzz jar but does not check if it exists. Recommendation: Always build the fuzz testing jar. The condition should only determine whether to build the Comet jar first. 2. Missing SPARK_HOME Validation: The script uses SPARK_HOME without checking if it is set. If undefined, the error message will be cryptic. Recommendation: Add validation at the top to check if SPARK_HOME is set and exit with a clear error message if not. 3. Maven Command Scope Issue (line 26): Maven runs from the current directory but references a different POM. If dependencies are not resolved, this could fail. 🔒 Security Concerns
⚡ Performance Considerations
🧪 Test Coverage
📝 Code Quality & Best Practices
🎯 Recommendations SummaryCritical (should fix before merge):
High Priority: Nice to Have: Overall, this is a useful addition to the project. The main concern is the build logic bug which should be fixed before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
fuzz-testing/run.sh (2)
44-44: Add explicit validation forSPARK_HOMEenvironment variable.While
set -uwill cause the script to exit ifSPARK_HOMEis undefined, the resulting error message will be cryptic (e.g., "SPARK_HOME: unbound variable"). For better user experience, add an explicit check early in the script to provide a clear error message guiding users to set this environment variable.Example:
if [ -z "${SPARK_HOME:-}" ]; then echo "Error: SPARK_HOME environment variable is not set. Please set it to your Spark installation directory." >&2 exit 1 fiBased on learnings: In this project, explicit validation of
SPARK_HOMEis optional (per prior review feedback), but adding it would improve UX without downside.Also applies to: 53-53, 60-60
21-21: Document the purpose ofset -euxif not self-evident to readers.The
set -euxenables strict error handling (-e: exit on error,-u: error on undefined variables) and debug output (-x: print commands). While robust, the-xflag produces verbose output. Consider documenting whether this is intentional for debugging or if it should be conditional (e.g., only enabled whenDEBUG=1).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fuzz-testing/run.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 9
File: fuzz-testing/run.sh:42-71
Timestamp: 2025-11-05T14:57:33.434Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -u` is enabled, explicit validation of environment variables like `SPARK_HOME` before use is optional. The script will automatically fail if the variable is undefined due to `set -u`, though explicit checks can provide more user-friendly error messages.
📚 Learning: 2025-11-05T14:57:33.434Z
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 9
File: fuzz-testing/run.sh:42-71
Timestamp: 2025-11-05T14:57:33.434Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -u` is enabled, explicit validation of environment variables like `SPARK_HOME` before use is optional. The script will automatically fail if the variable is undefined due to `set -u`, though explicit checks can provide more user-friendly error messages.
Applied to files:
fuzz-testing/run.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: claude-review
- GitHub Check: codex
| PROJECT_VERSION="$(mvn -q help:evaluate -Dexpression=project.version -DforceStdout)" | ||
| COMET_SPARK_ARTIFACT_ID="$(mvn -f ${PARENT_DIR}/spark/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)" | ||
| COMET_SPARK_JAR="${PARENT_DIR}/spark/target/${COMET_SPARK_ARTIFACT_ID}-${PROJECT_VERSION}.jar" | ||
| COMET_FUZZ_ARTIFACT_ID="$(mvn -f ${DIR}/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)" | ||
| COMET_FUZZ_JAR="${DIR}/target/${COMET_FUZZ_ARTIFACT_ID}-${PROJECT_VERSION}-jar-with-dependencies.jar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate that Maven commands succeed before using their outputs.
Lines 26–29 execute Maven commands to retrieve version and artifact IDs but don't validate success. If Maven fails, variables will contain empty strings, causing confusing failures downstream (e.g., spark-submit called with malformed JAR paths). Add validation after each Maven invocation or consolidate into a function with error handling.
Example with explicit error checking:
PROJECT_VERSION="$(mvn -q help:evaluate -Dexpression=project.version -DforceStdout)" || {
echo "Error: Failed to retrieve project version from Maven" >&2
exit 1
}🤖 Prompt for AI Agents
In fuzz-testing/run.sh around lines 26 to 30, the script runs several mvn
commands to populate PROJECT_VERSION, COMET_SPARK_ARTIFACT_ID and
COMET_FUZZ_ARTIFACT_ID but never checks for failure; update the script to
validate each mvn invocation (or create a small helper function) so that if mvn
returns a non‑zero status or produces an empty output you print a clear stderr
message identifying which value failed and exit with a non‑zero code before
using the variables; ensure the check covers both command failure and empty
result and use the same pattern for all three mvn calls so downstream jar path
construction never proceeds with empty values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:valid-but-wont-fix; category:bug; feedback: The CodeRabbit AI reviewer is correct that the Maven command may fail for any reason. But set -e will lead to script execution failure and the error message by Maven will be better than the one in the suggestion because it will give more details why the version cannot be read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-augment, thank you for the clarification! You're absolutely right—with set -e enabled as part of set -eux, the script will automatically exit if any Maven command fails, and Maven's native error output will provide detailed diagnostics about why the command failed. This is more informative than a generic custom error message would be. Your approach makes sense for this use case.
I've noted this preference for future reviews.
✏️ Learnings added
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 15
File: fuzz-testing/run.sh:26-30
Timestamp: 2025-11-10T09:11:39.343Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -e` is enabled, explicit validation of Maven command outputs (e.g., `mvn help:evaluate`) is not required. The script will automatically fail if Maven returns a non-zero exit code, and Maven's detailed error messages are preferred over custom error handling messages.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 9
File: fuzz-testing/run.sh:42-71
Timestamp: 2025-11-05T14:57:33.434Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -u` is enabled, explicit validation of environment variables like `SPARK_HOME` before use is optional. The script will automatically fail if the variable is undefined due to `set -u`, though explicit checks can provide more user-friendly error messages.
| COMET_SPARK_ARTIFACT_ID="$(mvn -f ${PARENT_DIR}/spark/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)" | ||
| COMET_SPARK_JAR="${PARENT_DIR}/spark/target/${COMET_SPARK_ARTIFACT_ID}-${PROJECT_VERSION}.jar" | ||
| COMET_FUZZ_ARTIFACT_ID="$(mvn -f ${DIR}/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quote Maven file path arguments for robustness.
Lines 27 and 29 pass unquoted ${PARENT_DIR} and ${DIR} variables to Maven's -f flag. If paths contain spaces or special characters, the command will break. Wrap file paths in quotes to ensure proper argument parsing.
Apply this diff:
-COMET_SPARK_ARTIFACT_ID="$(mvn -f ${PARENT_DIR}/spark/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
+COMET_SPARK_ARTIFACT_ID="$(mvn -f "${PARENT_DIR}/spark/pom.xml" -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
-COMET_FUZZ_ARTIFACT_ID="$(mvn -f ${DIR}/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
+COMET_FUZZ_ARTIFACT_ID="$(mvn -f "${DIR}/pom.xml" -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| COMET_SPARK_ARTIFACT_ID="$(mvn -f ${PARENT_DIR}/spark/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)" | |
| COMET_SPARK_JAR="${PARENT_DIR}/spark/target/${COMET_SPARK_ARTIFACT_ID}-${PROJECT_VERSION}.jar" | |
| COMET_FUZZ_ARTIFACT_ID="$(mvn -f ${DIR}/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)" | |
| COMET_SPARK_ARTIFACT_ID="$(mvn -f "${PARENT_DIR}/spark/pom.xml" -q help:evaluate -Dexpression=project.artifactId -DforceStdout)" | |
| COMET_SPARK_JAR="${PARENT_DIR}/spark/target/${COMET_SPARK_ARTIFACT_ID}-${PROJECT_VERSION}.jar" | |
| COMET_FUZZ_ARTIFACT_ID="$(mvn -f "${DIR}/pom.xml" -q help:evaluate -Dexpression=project.artifactId -DforceStdout)" |
🤖 Prompt for AI Agents
In fuzz-testing/run.sh around lines 27 to 29, the mvn -f arguments use unquoted
path variables (${PARENT_DIR} and ${DIR}) which will break if the paths contain
spaces or special characters; update the commands to wrap the file paths in
quotes (e.g., -f "${PARENT_DIR}/spark/pom.xml" and -f "${DIR}/pom.xml") so the
full path is passed as a single argument to Maven.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct that directory names could contain space characters. Wrapping the variables in quotes would prevent failed resolutions of absolute paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-augment, thank you for confirming! I'm glad the suggestion helps prevent potential path resolution issues. The quoted variables will ensure the script works correctly even when directory paths contain spaces or special characters.
🧠 Learnings used
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 15
File: fuzz-testing/run.sh:26-30
Timestamp: 2025-11-10T09:11:39.343Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -e` is enabled, explicit validation of Maven command outputs (e.g., `mvn help:evaluate`) is not required. The script will automatically fail if Maven returns a non-zero exit code, and Maven's detailed error messages are preferred over custom error handling messages.
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 9
File: fuzz-testing/run.sh:42-71
Timestamp: 2025-11-05T14:57:33.434Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -u` is enabled, explicit validation of environment variables like `SPARK_HOME` before use is optional. The script will automatically fail if the variable is undefined due to `set -u`, though explicit checks can provide more user-friendly error messages.
| if [ ! -f "${COMET_SPARK_JAR}" ]; then | ||
| echo "Building Comet Spark jar..." | ||
| make -f "${PARENT_DIR}/Makefile" | ||
| else | ||
| echo "Building Fuzz testing jar..." | ||
| mvn package -DskipTests | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate build artifacts exist after compilation.
Line 35 checks if COMET_SPARK_JAR exists before deciding which build to run; however, after invoking make (line 37) or mvn package (line 40), the script doesn't verify that the expected JAR files were actually produced. A failed build (e.g., compilation errors, missing dependencies) may not be caught, leading to misleading failures in subsequent spark-submit invocations.
Add validation after each build step:
make -f "${PARENT_DIR}/Makefile"
if [ ! -f "${COMET_SPARK_JAR}" ]; then
echo "Error: Failed to build Comet Spark JAR at ${COMET_SPARK_JAR}" >&2
exit 1
fi🤖 Prompt for AI Agents
fuzz-testing/run.sh around lines 35 to 41: after running the build commands you
must validate that the expected JAR artifact(s) were actually produced; add a
post-build check after the make invocation to verify COMET_SPARK_JAR exists and
if not print an error to stderr and exit 1, and likewise add a post-build check
after the mvn package invocation to verify the fuzz testing JAR (the artifact
path/variable used later in the script) exists and if not print an error to
stderr and exit 1 so subsequent spark-submit calls never run against missing
artifacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct that the check for existence of jars may lead to a case where the fuzz jar is not built. The suggestion to always build the fuzz jar is good because even if it is built it might be an old version. Prevents runtime errors due to a missing jar in the classpath or execution of out of date fuzz testing classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-augment, thank you for confirming the issue! You make an excellent point about stale artifacts—even when the JAR exists, it could be outdated from a previous build. Always rebuilding the fuzz testing JAR ensures you're running tests against the current codebase, which is especially important in a development workflow where code changes frequently.
The validation checks after each build command will catch any compilation failures early, and the consistent rebuild strategy avoids the subtle bugs that come from running outdated test code. This makes the script more reliable and predictable.
✏️ Learnings added
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 15
File: fuzz-testing/run.sh:35-41
Timestamp: 2025-11-10T09:21:54.836Z
Learning: In the DataFusion Comet project's fuzz-testing/run.sh script, the fuzz testing JAR should always be built (not conditionally checked for existence) because even if it exists, it might be an outdated version from a previous build. This prevents runtime errors from executing stale test code.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 15
File: fuzz-testing/run.sh:26-30
Timestamp: 2025-11-10T09:11:39.343Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -e` is enabled, explicit validation of Maven command outputs (e.g., `mvn help:evaluate`) is not required. The script will automatically fail if Maven returns a non-zero exit code, and Maven's detailed error messages are preferred over custom error handling messages.
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 9
File: fuzz-testing/run.sh:42-71
Timestamp: 2025-11-05T14:57:33.434Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -u` is enabled, explicit validation of environment variables like `SPARK_HOME` before use is optional. The script will automatically fail if the variable is undefined due to `set -u`, though explicit checks can provide more user-friendly error messages.
|
value:useful; category:bug; feedback: The Claude AI reviewer is correct that the check for existence of jars may lead to a case where the fuzz jar is not built. The suggestion to always build the fuzz jar is good because even if it is built it might be an old version. Prevents runtime errors due to a missing jar in the classpath or execution of out of date fuzz testing classes. |
value:incorrect-but-reasonable; category:bug; feedback: The Claude AI reviewer is not correct. The script uses |
value:useful; category:bug; feedback: The Claude AI reviewer is correct that without using the -f ... argument the mvn command may build the wrong jar when the script is executed from a different folder than fuzz-testing. It would be better to use -f absolutePath/pom.xml to prevent that. |
value:useful; category:bug; feedback: The Codex AI reviewer is correct that the current working directory has to be changed before executing the Make command because its commands will be executed in the wrong folder. Prevents building the wrong artifacts when the current working directory is wrong. |
value:useful; category:bug; feedback: The Codex AI reviewer is correct that the run.sh script now requires haveing Maven installed in the system. Using |
value:annoying; category:bug; feedback:The Codex AI reviewer is wrong! These files are not removed in this Pull Request! |
2686: To review by AI
Note
Add a bash script to build required jars, generate data/queries, and run fuzz tests via spark-submit with Comet configs.
fuzz-testing/run.shorg.apache.comet.fuzz.Mainusingspark-submit.SPARK_MASTER,NUM_FILES,NUM_ROWS,NUM_QUERIES.Written by Cursor Bugbot for commit a20c848. This will update automatically on new commits. Configure here.