Skip to content

Conversation

@martin-augment
Copy link
Owner

@martin-augment martin-augment commented Nov 10, 2025

2686: To review by AI


Note

Add a bash script to build required jars, generate data/queries, and run fuzz tests via spark-submit with Comet configs.

  • New script: fuzz-testing/run.sh
    • Builds Comet Spark jar if missing; packages fuzz testing jar.
    • Generates test data and queries via org.apache.comet.fuzz.Main using spark-submit.
    • Executes fuzz test run with Comet plugin and shuffle configs, adding the built jar to driver/executor classpaths.
    • Supports overrides via env vars: SPARK_MASTER, NUM_FILES, NUM_ROWS, NUM_QUERIES.

Written by Cursor Bugbot for commit a20c848. This will update automatically on new commits. Configure here.

@coderabbitai
Copy link

coderabbitai bot commented Nov 10, 2025

Walkthrough

The change adds a new Bash script that orchestrates building and running fuzz tests for Apache Comet with Spark. It handles Maven-based version and artifact retrieval, locates required JARs, sets environment defaults, and executes build steps followed by three separate Spark jobs.

Changes

Cohort / File(s) Summary
Fuzz Testing Script
fuzz-testing/run.sh
New Bash script orchestrating Apache Comet fuzz test builds and execution. Determines project version and artifact IDs via Maven, locates Comet Spark and fuzz JARs, sets defaults for NUM_FILES, NUM_ROWS, and NUM_QUERIES, builds JARs conditionally, and runs three Spark jobs for data generation, query generation, and fuzz testing with off-heap memory and Comet plugin configuration. Enables strict shell options for error handling.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-2686-2025-11-10-07-35-17

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

make -f "${PARENT_DIR}/Makefile"
else
echo "Building Fuzz testing jar..."
mvn package -DskipTests
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Fuzz Testing Builds Too Much

The mvn package -DskipTests command runs from the current working directory instead of the fuzz-testing directory. When users invoke the script from the parent directory (e.g., ./fuzz-testing/run.sh), this builds all Maven modules rather than only the fuzz testing jar, contradicting both the else-branch logic and the "Building Fuzz testing jar..." message. The command needs -f "${DIR}/pom.xml" to build only the fuzz module regardless of invocation location.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Bugbot AI reviewer is correct that the mvn command needs -f ... argument to specify which Maven module to build. Otherwise it will build whatever the current module is. Also the reviewer suggests to wrap $DIR in double quotes - this is also a good suggestion because $DIR may contain space characters.

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted. Focused on ensuring the script reliably builds required artifacts and behaves consistently regardless of the caller’s working directory.

Comment augment review to trigger a new review at any time.

NUM_ROWS="${NUM_ROWS:-200}"
NUM_QUERIES="${NUM_QUERIES:-500}"

if [ ! -f "${COMET_SPARK_JAR}" ]; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional only builds the fuzz-testing jar when the Comet Spark jar already exists; on a clean run, COMET_FUZZ_JAR may not be built before it’s used below, causing spark-submit to fail. Consider building the fuzz jar regardless (e.g., always after ensuring the Spark jar is built).

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct that the check for existence of jars may lead to a case where the fuzz jar is not built. The suggestion to always build the fuzz jar is good because even if it is built it might be an old version. Prevents runtime errors due to a missing jar in the classpath or execution of out of date fuzz testing classes.

DIR="$(cd "$(dirname "$0")" && pwd)"
PARENT_DIR="${DIR}/.."
SPARK_MASTER="${SPARK_MASTER:-local[*]}"
PROJECT_VERSION="$(mvn -q help:evaluate -Dexpression=project.version -DforceStdout)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROJECT_VERSION is obtained via mvn without specifying a POM (-f), unlike the other Maven evaluations; this makes the result depend on the current working directory. Consider pinning the POM path so the script is robust when run from any location.

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct that the mvn command may read the version of some other Maven module when the script is executed from a wrong folder. Prevents calculating wrong version and a failure to find the fuzz testing jar later when including it in the classpath.

make -f "${PARENT_DIR}/Makefile"
else
echo "Building Fuzz testing jar..."
mvn package -DskipTests
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, mvn package is invoked without -f, so it depends on the caller’s CWD and may not build the intended fuzz module. Consider specifying the module POM explicitly to avoid accidental builds or failures.

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct that without using the -f ... argument the mvn command may build the wrong jar when the script is executed from a different folder than fuzz-testing. It would be better to use -f absolutePath/pom.xml to prevent that.

@claude
Copy link

claude bot commented Nov 10, 2025

Code Review

Thank you for contributing this automation script for fuzz testing! This is a valuable addition that simplifies the workflow. Here's my detailed review:

✅ Strengths

  1. Good Licensing: Proper Apache 2.0 license header is included, consistent with the repository standards
  2. Useful Abstraction: Consolidates the three-step process (data generation, query generation, execution) from the README into a single script
  3. Environment Variable Support: Allows customization via SPARK_MASTER, NUM_FILES, NUM_ROWS, and NUM_QUERIES
  4. Executable Permissions: Correctly set as executable

🐛 Potential Bugs & Issues

1. Build Logic Error (line 34-40):

The logic is inverted. If the Comet Spark jar does not exist, it builds it but then skips building the fuzz testing jar (which is always needed). If the Comet jar exists, it only builds the fuzz jar but does not check if it exists.

Recommendation: Always build the fuzz testing jar. The condition should only determine whether to build the Comet jar first.

2. Missing SPARK_HOME Validation:

The script uses SPARK_HOME without checking if it is set. If undefined, the error message will be cryptic.

Recommendation: Add validation at the top to check if SPARK_HOME is set and exit with a clear error message if not.

3. Maven Command Scope Issue (line 26):

Maven runs from the current directory but references a different POM. If dependencies are not resolved, this could fail.

🔒 Security Concerns

  1. Path Quoting: Variable expansions in Maven commands (lines 26-29) should be quoted to prevent word splitting and glob expansion issues
  2. set -u Missing: Consider adding set -u or set -o pipefail to catch undefined variables and pipeline failures

⚡ Performance Considerations

  1. Sequential Execution: The three spark-submit commands run sequentially, which is correct for this workflow. No changes needed.
  2. Redundant Maven Execution: Lines 26-29 call Maven 4 times just to get metadata. Consider optimizing this.
  3. Build Optimization: The make command might build more than necessary.

🧪 Test Coverage

  1. No Error Handling for Maven Build: The mvn package command could fail. An explicit error message would be helpful.
  2. No Verification of Generated Files: After data/query generation, consider checking if the expected files exist.

📝 Code Quality & Best Practices

  1. Consistency with Existing Scripts: Good use of set -e and DIR pattern. Consider whether -x flag is needed for production use.
  2. Documentation: Consider adding usage instructions as comments at the top of the script.
  3. Magic Numbers: Consider making the memory configuration (16G) configurable via environment variable.

🎯 Recommendations Summary

Critical (should fix before merge):

  1. Fix the inverted build logic (lines 34-40)
  2. Add SPARK_HOME validation

High Priority:
3. Add path quoting for Maven commands
4. Consider adding set -o pipefail

Nice to Have:
5. Add usage documentation in comments
6. Optimize Maven metadata extraction
7. Add verification for generated files
8. Make off-heap memory size configurable
9. Consider removing -x from set -eux for less verbose output

Overall, this is a useful addition to the project. The main concern is the build logic bug which should be fixed before merging.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
fuzz-testing/run.sh (2)

44-44: Add explicit validation for SPARK_HOME environment variable.

While set -u will cause the script to exit if SPARK_HOME is undefined, the resulting error message will be cryptic (e.g., "SPARK_HOME: unbound variable"). For better user experience, add an explicit check early in the script to provide a clear error message guiding users to set this environment variable.

Example:

if [ -z "${SPARK_HOME:-}" ]; then
  echo "Error: SPARK_HOME environment variable is not set. Please set it to your Spark installation directory." >&2
  exit 1
fi

Based on learnings: In this project, explicit validation of SPARK_HOME is optional (per prior review feedback), but adding it would improve UX without downside.

Also applies to: 53-53, 60-60


21-21: Document the purpose of set -eux if not self-evident to readers.

The set -eux enables strict error handling (-e: exit on error, -u: error on undefined variables) and debug output (-x: print commands). While robust, the -x flag produces verbose output. Consider documenting whether this is intentional for debugging or if it should be conditional (e.g., only enabled when DEBUG=1).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 074b676 and a20c848.

📒 Files selected for processing (1)
  • fuzz-testing/run.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 9
File: fuzz-testing/run.sh:42-71
Timestamp: 2025-11-05T14:57:33.434Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -u` is enabled, explicit validation of environment variables like `SPARK_HOME` before use is optional. The script will automatically fail if the variable is undefined due to `set -u`, though explicit checks can provide more user-friendly error messages.
📚 Learning: 2025-11-05T14:57:33.434Z
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 9
File: fuzz-testing/run.sh:42-71
Timestamp: 2025-11-05T14:57:33.434Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -u` is enabled, explicit validation of environment variables like `SPARK_HOME` before use is optional. The script will automatically fail if the variable is undefined due to `set -u`, though explicit checks can provide more user-friendly error messages.

Applied to files:

  • fuzz-testing/run.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: claude-review
  • GitHub Check: codex

Comment on lines +26 to +30
PROJECT_VERSION="$(mvn -q help:evaluate -Dexpression=project.version -DforceStdout)"
COMET_SPARK_ARTIFACT_ID="$(mvn -f ${PARENT_DIR}/spark/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
COMET_SPARK_JAR="${PARENT_DIR}/spark/target/${COMET_SPARK_ARTIFACT_ID}-${PROJECT_VERSION}.jar"
COMET_FUZZ_ARTIFACT_ID="$(mvn -f ${DIR}/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
COMET_FUZZ_JAR="${DIR}/target/${COMET_FUZZ_ARTIFACT_ID}-${PROJECT_VERSION}-jar-with-dependencies.jar"
Copy link

@coderabbitai coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate that Maven commands succeed before using their outputs.

Lines 26–29 execute Maven commands to retrieve version and artifact IDs but don't validate success. If Maven fails, variables will contain empty strings, causing confusing failures downstream (e.g., spark-submit called with malformed JAR paths). Add validation after each Maven invocation or consolidate into a function with error handling.

Example with explicit error checking:

PROJECT_VERSION="$(mvn -q help:evaluate -Dexpression=project.version -DforceStdout)" || {
  echo "Error: Failed to retrieve project version from Maven" >&2
  exit 1
}
🤖 Prompt for AI Agents
In fuzz-testing/run.sh around lines 26 to 30, the script runs several mvn
commands to populate PROJECT_VERSION, COMET_SPARK_ARTIFACT_ID and
COMET_FUZZ_ARTIFACT_ID but never checks for failure; update the script to
validate each mvn invocation (or create a small helper function) so that if mvn
returns a non‑zero status or produces an empty output you print a clear stderr
message identifying which value failed and exit with a non‑zero code before
using the variables; ensure the check covers both command failure and empty
result and use the same pattern for all three mvn calls so downstream jar path
construction never proceeds with empty values.

Copy link
Owner Author

@martin-augment martin-augment Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:valid-but-wont-fix; category:bug; feedback: The CodeRabbit AI reviewer is correct that the Maven command may fail for any reason. But set -e will lead to script execution failure and the error message by Maven will be better than the one in the suggestion because it will give more details why the version cannot be read.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-augment, thank you for the clarification! You're absolutely right—with set -e enabled as part of set -eux, the script will automatically exit if any Maven command fails, and Maven's native error output will provide detailed diagnostics about why the command failed. This is more informative than a generic custom error message would be. Your approach makes sense for this use case.

I've noted this preference for future reviews.


✏️ Learnings added
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 15
File: fuzz-testing/run.sh:26-30
Timestamp: 2025-11-10T09:11:39.343Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -e` is enabled, explicit validation of Maven command outputs (e.g., `mvn help:evaluate`) is not required. The script will automatically fail if Maven returns a non-zero exit code, and Maven's detailed error messages are preferred over custom error handling messages.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 9
File: fuzz-testing/run.sh:42-71
Timestamp: 2025-11-05T14:57:33.434Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -u` is enabled, explicit validation of environment variables like `SPARK_HOME` before use is optional. The script will automatically fail if the variable is undefined due to `set -u`, though explicit checks can provide more user-friendly error messages.

Comment on lines +27 to +29
COMET_SPARK_ARTIFACT_ID="$(mvn -f ${PARENT_DIR}/spark/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
COMET_SPARK_JAR="${PARENT_DIR}/spark/target/${COMET_SPARK_ARTIFACT_ID}-${PROJECT_VERSION}.jar"
COMET_FUZZ_ARTIFACT_ID="$(mvn -f ${DIR}/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
Copy link

@coderabbitai coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Quote Maven file path arguments for robustness.

Lines 27 and 29 pass unquoted ${PARENT_DIR} and ${DIR} variables to Maven's -f flag. If paths contain spaces or special characters, the command will break. Wrap file paths in quotes to ensure proper argument parsing.

Apply this diff:

-COMET_SPARK_ARTIFACT_ID="$(mvn -f ${PARENT_DIR}/spark/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
+COMET_SPARK_ARTIFACT_ID="$(mvn -f "${PARENT_DIR}/spark/pom.xml" -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"

-COMET_FUZZ_ARTIFACT_ID="$(mvn -f ${DIR}/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
+COMET_FUZZ_ARTIFACT_ID="$(mvn -f "${DIR}/pom.xml" -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COMET_SPARK_ARTIFACT_ID="$(mvn -f ${PARENT_DIR}/spark/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
COMET_SPARK_JAR="${PARENT_DIR}/spark/target/${COMET_SPARK_ARTIFACT_ID}-${PROJECT_VERSION}.jar"
COMET_FUZZ_ARTIFACT_ID="$(mvn -f ${DIR}/pom.xml -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
COMET_SPARK_ARTIFACT_ID="$(mvn -f "${PARENT_DIR}/spark/pom.xml" -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
COMET_SPARK_JAR="${PARENT_DIR}/spark/target/${COMET_SPARK_ARTIFACT_ID}-${PROJECT_VERSION}.jar"
COMET_FUZZ_ARTIFACT_ID="$(mvn -f "${DIR}/pom.xml" -q help:evaluate -Dexpression=project.artifactId -DforceStdout)"
🤖 Prompt for AI Agents
In fuzz-testing/run.sh around lines 27 to 29, the mvn -f arguments use unquoted
path variables (${PARENT_DIR} and ${DIR}) which will break if the paths contain
spaces or special characters; update the commands to wrap the file paths in
quotes (e.g., -f "${PARENT_DIR}/spark/pom.xml" and -f "${DIR}/pom.xml") so the
full path is passed as a single argument to Maven.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct that directory names could contain space characters. Wrapping the variables in quotes would prevent failed resolutions of absolute paths.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-augment, thank you for confirming! I'm glad the suggestion helps prevent potential path resolution issues. The quoted variables will ensure the script works correctly even when directory paths contain spaces or special characters.


🧠 Learnings used
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 15
File: fuzz-testing/run.sh:26-30
Timestamp: 2025-11-10T09:11:39.343Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -e` is enabled, explicit validation of Maven command outputs (e.g., `mvn help:evaluate`) is not required. The script will automatically fail if Maven returns a non-zero exit code, and Maven's detailed error messages are preferred over custom error handling messages.

Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 9
File: fuzz-testing/run.sh:42-71
Timestamp: 2025-11-05T14:57:33.434Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -u` is enabled, explicit validation of environment variables like `SPARK_HOME` before use is optional. The script will automatically fail if the variable is undefined due to `set -u`, though explicit checks can provide more user-friendly error messages.

Comment on lines +35 to +41
if [ ! -f "${COMET_SPARK_JAR}" ]; then
echo "Building Comet Spark jar..."
make -f "${PARENT_DIR}/Makefile"
else
echo "Building Fuzz testing jar..."
mvn package -DskipTests
fi
Copy link

@coderabbitai coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate build artifacts exist after compilation.

Line 35 checks if COMET_SPARK_JAR exists before deciding which build to run; however, after invoking make (line 37) or mvn package (line 40), the script doesn't verify that the expected JAR files were actually produced. A failed build (e.g., compilation errors, missing dependencies) may not be caught, leading to misleading failures in subsequent spark-submit invocations.

Add validation after each build step:

make -f "${PARENT_DIR}/Makefile"
if [ ! -f "${COMET_SPARK_JAR}" ]; then
  echo "Error: Failed to build Comet Spark JAR at ${COMET_SPARK_JAR}" >&2
  exit 1
fi
🤖 Prompt for AI Agents
fuzz-testing/run.sh around lines 35 to 41: after running the build commands you
must validate that the expected JAR artifact(s) were actually produced; add a
post-build check after the make invocation to verify COMET_SPARK_JAR exists and
if not print an error to stderr and exit 1, and likewise add a post-build check
after the mvn package invocation to verify the fuzz testing JAR (the artifact
path/variable used later in the script) exists and if not print an error to
stderr and exit 1 so subsequent spark-submit calls never run against missing
artifacts.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct that the check for existence of jars may lead to a case where the fuzz jar is not built. The suggestion to always build the fuzz jar is good because even if it is built it might be an old version. Prevents runtime errors due to a missing jar in the classpath or execution of out of date fuzz testing classes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-augment, thank you for confirming the issue! You make an excellent point about stale artifacts—even when the JAR exists, it could be outdated from a previous build. Always rebuilding the fuzz testing JAR ensures you're running tests against the current codebase, which is especially important in a development workflow where code changes frequently.

The validation checks after each build command will catch any compilation failures early, and the consistent rebuild strategy avoids the subtle bugs that come from running outdated test code. This makes the script more reliable and predictable.


✏️ Learnings added
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 15
File: fuzz-testing/run.sh:35-41
Timestamp: 2025-11-10T09:21:54.836Z
Learning: In the DataFusion Comet project's fuzz-testing/run.sh script, the fuzz testing JAR should always be built (not conditionally checked for existence) because even if it exists, it might be an outdated version from a previous build. This prevents runtime errors from executing stale test code.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 15
File: fuzz-testing/run.sh:26-30
Timestamp: 2025-11-10T09:11:39.343Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -e` is enabled, explicit validation of Maven command outputs (e.g., `mvn help:evaluate`) is not required. The script will automatically fail if Maven returns a non-zero exit code, and Maven's detailed error messages are preferred over custom error handling messages.

Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 9
File: fuzz-testing/run.sh:42-71
Timestamp: 2025-11-05T14:57:33.434Z
Learning: In bash scripts for the DataFusion Comet project (fuzz-testing/run.sh), when `set -u` is enabled, explicit validation of environment variables like `SPARK_HOME` before use is optional. The script will automatically fail if the variable is undefined due to `set -u`, though explicit checks can provide more user-friendly error messages.

@github-actions
Copy link

  • fuzz-testing/run.sh:34 – When the script is executed from fuzz-testing (e.g. cd fuzz-testing && ./run.sh), the make -f "${PARENT_DIR}/Makefile" call runs Make with the current working directory still set to fuzz-testing. All of the recipes inside the top-level Makefile use relative paths like ./mvnw and cd native, so they now resolve relative to fuzz-testing/ and immediately fail (e.g. ./mvnw is not found). Please either cd "${PARENT_DIR}" before invoking make or use make -C "${PARENT_DIR}" … so the recipes execute from the repository root.

  • fuzz-testing/run.sh:23 – Every Maven call in this script (help:evaluate, package, etc.) invokes the system mvn binary. The repo already ships the Maven wrapper, and the Makefile depends on it. Requiring a globally installed Maven (and the right version) will break on environments that only have the wrapper available. Could you switch these invocations to ${PARENT_DIR}/mvnw … (or similar) so the script stays self-contained?

  • .github/workflows/claude-code-review.yml – This PR is described as “Add bash script to build and run fuzz testing”, but it also deletes our existing Claude/Codex GitHub workflows. That’s a sizeable unrelated change; if it was accidental we should restore those files, otherwise it would be good to split the workflow removal into a dedicated change with context.

@martin-augment
Copy link
Owner Author

1. Build Logic Error (line 34-40):

The logic is inverted. If the Comet Spark jar does not exist, it builds it but then skips building the fuzz testing jar (which is always needed). If the Comet jar exists, it only builds the fuzz jar but does not check if it exists.

Recommendation: Always build the fuzz testing jar. The condition should only determine whether to build the Comet jar first.

value:useful; category:bug; feedback: The Claude AI reviewer is correct that the check for existence of jars may lead to a case where the fuzz jar is not built. The suggestion to always build the fuzz jar is good because even if it is built it might be an old version. Prevents runtime errors due to a missing jar in the classpath or execution of out of date fuzz testing classes.

@martin-augment
Copy link
Owner Author

2. Missing SPARK_HOME Validation:

The script uses SPARK_HOME without checking if it is set. If undefined, the error message will be cryptic.

Recommendation: Add validation at the top to check if SPARK_HOME is set and exit with a clear error message if not.

value:incorrect-but-reasonable; category:bug; feedback: The Claude AI reviewer is not correct. The script uses set -u (https://github.com/apache/datafusion-comet/pull/2686/files#diff-b962d9d2dba118e0eb0fe84f474de330bbed5ccdb1ca0d3566589d3d49a460f7R21) which will lead to an error like run.sh: line 44: SPARK_HOME: unbound variable that is not cryptic at all.

@martin-augment
Copy link
Owner Author

3. Maven Command Scope Issue (line 26):

Maven runs from the current directory but references a different POM. If dependencies are not resolved, this could fail.

value:useful; category:bug; feedback: The Claude AI reviewer is correct that without using the -f ... argument the mvn command may build the wrong jar when the script is executed from a different folder than fuzz-testing. It would be better to use -f absolutePath/pom.xml to prevent that.

@martin-augment
Copy link
Owner Author

  • fuzz-testing/run.sh:34 – When the script is executed from fuzz-testing (e.g. cd fuzz-testing && ./run.sh), the make -f "${PARENT_DIR}/Makefile" call runs Make with the current working directory still set to fuzz-testing. All of the recipes inside the top-level Makefile use relative paths like ./mvnw and cd native, so they now resolve relative to fuzz-testing/ and immediately fail (e.g. ./mvnw is not found). Please either cd "${PARENT_DIR}" before invoking make or use make -C "${PARENT_DIR}" … so the recipes execute from the repository root.

value:useful; category:bug; feedback: The Codex AI reviewer is correct that the current working directory has to be changed before executing the Make command because its commands will be executed in the wrong folder. Prevents building the wrong artifacts when the current working directory is wrong.

@martin-augment
Copy link
Owner Author

  • fuzz-testing/run.sh:23 – Every Maven call in this script (help:evaluate, package, etc.) invokes the system mvn binary. The repo already ships the Maven wrapper, and the Makefile depends on it. Requiring a globally installed Maven (and the right version) will break on environments that only have the wrapper available. Could you switch these invocations to ${PARENT_DIR}/mvnw … (or similar) so the script stays self-contained?

value:useful; category:bug; feedback: The Codex AI reviewer is correct that the run.sh script now requires haveing Maven installed in the system. Using $PARENT_DIR/mvnw (the wrapper) will avoid that and it will use the preferred version of Maven too.

@martin-augment
Copy link
Owner Author

  • .github/workflows/claude-code-review.yml – This PR is described as “Add bash script to build and run fuzz testing”, but it also deletes our existing Claude/Codex GitHub workflows. That’s a sizeable unrelated change; if it was accidental we should restore those files, otherwise it would be good to split the workflow removal into a dedicated change with context.

value:annoying; category:bug; feedback:The Codex AI reviewer is wrong! These files are not removed in this Pull Request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants