-
Notifications
You must be signed in to change notification settings - Fork 2
Defects4j scripts #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…to defects4j-scripts
📝 WalkthroughWalkthroughAdds defect-detection support and shared shell utilities to the GRT testing framework. New driver scripts: Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (3)📓 Common learnings📚 Learning: 2025-10-14T03:30:11.765ZApplied to files:
📚 Learning: 2025-10-13T23:36:38.701ZApplied to files:
🧬 Code graph analysis (2)scripts/defects4j-randoop.sh (1)
scripts/defects4j-evosuite.sh (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
scripts/experiment-scripts/mutation-fig8-9.sh (1)
39-134: LGTM! Clean refactoring to GRT_TESTING_ROOT.The variable rename from MUTATION_DIR to GRT_TESTING_ROOT is consistently applied, and the portable CPU detection logic is well-implemented with appropriate fallbacks.
The CPU core detection logic (lines 81-89) is duplicated in
mutation-fig6-table3.sh(lines 74-82). Consider extracting this into a shared utility function or sourced script to reduce duplication:# In scripts/common-utils.sh get_num_cores() { if command -v nproc > /dev/null 2>&1; then NPROC=$(nproc) elif command -v getconf > /dev/null 2>&1; then NPROC=$(getconf _NPROCESSORS_ONLN) else NPROC=1 fi NUM_CORES=$((NPROC - 4)) if [ "$NUM_CORES" -lt 1 ]; then NUM_CORES=1; fi echo "$NUM_CORES" }Then source and use it:
. "$SCRIPT_DIR/common-utils.sh" NUM_CORES=$(get_num_cores)scripts/experiment-scripts/mutation-fig7.sh (1)
38-42: Python availability check is correct but could be more explicit.The logic correctly checks for python3 first, then python. However, the error message and exit code (1) differ from other preflight checks in related scripts that use exit code 2. Consider standardizing exit codes for consistency.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (15)
README.md(1 hunks)scripts/Makefile(2 hunks)scripts/defects4j-evosuite.sh(1 hunks)scripts/defects4j-prerequisites.md(1 hunks)scripts/defects4j-randoop.sh(1 hunks)scripts/experiment-scripts/README.md(2 hunks)scripts/experiment-scripts/defects4j-table4.sh(1 hunks)scripts/experiment-scripts/generate-grt-figures.py(5 hunks)scripts/experiment-scripts/mutation-fig6-table3.sh(6 hunks)scripts/experiment-scripts/mutation-fig7.sh(6 hunks)scripts/experiment-scripts/mutation-fig8-9.sh(6 hunks)scripts/mutation-evosuite.sh(9 hunks)scripts/mutation-prerequisites.md(1 hunks)scripts/mutation-randoop.sh(8 hunks)scripts/usejdk.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.
📚 Learning: 2025-10-14T03:30:11.765Z
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.
Applied to files:
scripts/experiment-scripts/mutation-fig6-table3.shscripts/experiment-scripts/README.mdscripts/defects4j-evosuite.shscripts/Makefilescripts/experiment-scripts/generate-grt-figures.pyscripts/defects4j-prerequisites.mdscripts/experiment-scripts/mutation-fig7.shscripts/mutation-evosuite.shscripts/experiment-scripts/defects4j-table4.shscripts/defects4j-randoop.shscripts/mutation-randoop.shREADME.mdscripts/experiment-scripts/mutation-fig8-9.sh
📚 Learning: 2025-10-13T23:36:38.701Z
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.
Applied to files:
scripts/defects4j-evosuite.shscripts/Makefilescripts/defects4j-prerequisites.mdscripts/defects4j-randoop.sh
🧬 Code graph analysis (5)
scripts/defects4j-evosuite.sh (1)
scripts/usejdk.sh (1)
usejdk11(17-25)
scripts/mutation-evosuite.sh (1)
scripts/usejdk.sh (1)
usejdk8(6-14)
scripts/experiment-scripts/defects4j-table4.sh (3)
scripts/experiment-scripts/mutation-fig6-table3.sh (1)
run_task(103-120)scripts/experiment-scripts/mutation-fig7.sh (1)
run_task(107-120)scripts/experiment-scripts/mutation-fig8-9.sh (1)
run_task(110-123)
scripts/defects4j-randoop.sh (1)
scripts/usejdk.sh (1)
usejdk11(17-25)
scripts/mutation-randoop.sh (1)
scripts/usejdk.sh (1)
usejdk8(6-14)
🔇 Additional comments (33)
scripts/experiment-scripts/README.md (1)
8-38: LGTM! Clear documentation structure.The documentation properly distinguishes between mutation analysis and defect detection workflows, with appropriate references to the new prerequisite files and clear examples for both evaluation methods.
scripts/mutation-prerequisites.md (1)
3-31: LGTM! Well-structured prerequisites.The Java version setup section properly documents the environment variables needed for the usejdk.sh script, and the step renumbering is consistent throughout.
scripts/experiment-scripts/generate-grt-figures.py (1)
263-317: LGTM! Defect detection logic is sound.The function correctly implements the workflow:
- Marks a bug as detected if any test fails for it
- Counts total faults detected per tool and time budget
- Generates a tabular visualization
The aggregation logic using
groupby(...).any()followed bysum()appropriately handles the two-level detection criteria.Consider adding validation for expected data format:
def generate_table_4(df: pd.DataFrame) -> mpl.figure.Figure: """...""" # Validate expected columns required_cols = {'ProjectId', 'Version', 'TimeLimit', 'TestSuiteSource', 'TestClassification'} if not required_cols.issubset(df.columns): raise ValueError(f"Missing required columns. Expected: {required_cols}, Got: {set(df.columns)}") df.columns = [col.strip() for col in df.columns] # ... rest of functionThis would catch data format issues early rather than failing with cryptic pandas errors.
scripts/experiment-scripts/mutation-fig6-table3.sh (1)
35-131: LGTM! Consistent refactoring to GRT_TESTING_ROOT.The changes mirror those in
mutation-fig8-9.sh, consistently replacing MUTATION_DIR with GRT_TESTING_ROOT and adding portable CPU detection.Note: The CPU detection logic duplication between this file and
mutation-fig8-9.shhas already been flagged for potential refactoring in that file's review.scripts/experiment-scripts/mutation-fig7.sh (2)
36-36: Use GRT_TESTING_ROOT consistently throughout script.The variable is correctly initialized but should be exported or passed consistently. Line 97 correctly passes it to run_task, and line 131 correctly uses it. Looks good.
78-87: Robust CPU core detection with sensible defaults.The multi-stage approach (nproc → getconf → default 1) with max(NPROC-4, 1) lower bound is appropriate for parallel execution safety. Line 87 correctly ensures NUM_CORES never goes below 1.
README.md (2)
12-19: Clear evaluation methods documentation.The distinction between mutation analysis and defect detection is well articulated. This provides good context for users on what each evaluation method measures.
25-40: Setup and script references are logically organized.The split between mutation prerequisites and defect detection prerequisites improves navigation. References to scripts align with the files introduced in this PR.
scripts/defects4j-evosuite.sh (5)
72-78: Java 11 validation is correct.The script correctly sources usejdk.sh, calls usejdk11, and validates the version. The version extraction (line 74) handles both Java 8 and 11+ formats correctly.
210-218: CSV file creation with proper locking pattern.The flock usage prevents concurrent header writes to the results CSV. The check for empty file (
-s) ensures header is only written once. Pattern is consistent with mutation scripts.
244-251: Time budget allocation logic is sound.If per-class time is provided, it's used as-is. If total time is specified, it's divided by number of classes with a 1-second minimum. This matches the mutation-evosuite.sh pattern.
289-297: Test cleanup after tar packaging is appropriate.Per the retrieved learning, removing RegressionTest.java and ErrorTest.java (driver files) after test generation is intentional; only numbered test files are packaged. The tar-with-error-handling pattern correctly ignores tar code 1.
312-317: Results appending with file locking is correct.Using flock on a dedicated fd ensures atomic writes. The
tr -d '\r'removes any Windows line endings,tail -n +2skips header, and awk appends the time limit. Pattern is identical to mutation scripts.scripts/mutation-evosuite.sh (5)
64-75: Comprehensive preflight checks with consistent exit codes.All dependency checks use exit code 2, consistent with defects4j scripts. The order (directories then files then executables) is logical.
77-83: Java 8 enforcement is correct and aligned with mutation workflow.Sourcing usejdk.sh and calling usejdk8 ensures Java 8 is used. The version extraction handles both formats. Error message correctly indicates setting JAVA8_HOME.
142-153: Strict validation for output CSV parameter.Requires filename (no paths), must end with .csv, and -o is mandatory. These constraints prevent accidental path traversal and ensure results stay in results/ directory. Good security posture.
428-437: CSV header creation with proper concurrency protection.The mkdir, flock, and
-scheck pattern is consistent with defects4j scripts. Ensures header is written exactly once even with concurrent invocations.
479-486: Per-iteration EVOSUITE_COMMAND construction is clean.Building the command from EVOSUITE_BASE_COMMAND and adding per-iteration parameters (-Dtest_dir, -Dreport_dir) avoids repetition and makes maintenance easier.
scripts/defects4j-randoop.sh (5)
85-91: Java 11 requirement is correct for Defects4J.Defects4J requires Java 11; this enforcement via usejdk.sh is appropriate. Error message is clear.
191-214: Feature flag mapping and expansion is robust.The associative array (lines 191-200) maps features to Randoop flags. The expansion loop (202-214) validates each feature exists before using it. Unknown features cause exit with helpful error message listing valid options.
294-298: Time budget calculation correctly uses total time as-is.When -t is specified, TIME_LIMIT is set to TOTAL_TIME directly (not divided by NUM_CLASSES like EvoSuite). This is correct because Randoop's --time-limit is global, not per-class.
332-334: Test cleanup follows intentional pattern from prior PR.Removing RegressionTest.java and ErrorTest.java (driver files) aligns with the retrieved learning; only numbered test files are packaged for defect detection.
340-345: Tarball naming logic distinguishes BASELINE from feature combinations.When features are provided (non-BASELINE), tar suffix is "grt"; otherwise "randoop". This aids in results interpretation and file organization.
scripts/experiment-scripts/defects4j-table4.sh (3)
85-92: get_bug_ids() function provides flexible bug ID resolution.If BUG_IDS array has an entry for the project, uses it; otherwise queries Defects4J. This allows both hardcoded (for testing) and dynamic (for production) bug ID sources.
111-129: run_task() function correctly dispatches to appropriate test generators.The three cases (EVOSUITE, GRT, BASELINE) invoke the correct scripts with appropriate feature flags. GRT case correctly expands all features. Error case is clear.
94-105: Task generation loop structure produces comprehensive test matrix.Iterates over time budgets, projects, bug IDs, test generators, and iterations. Each combination produces one task argument set. This ensures all combinations are tested.
scripts/mutation-randoop.sh (7)
69-88: Comprehensive preflight checks with consistent exit codes.All dependency checks use exit code 2. Order is logical: directories first, then files. This ensures all required tools are available before script proceeds.
171-200: Feature flag expansion is robust and well-validated.The associative array (lines 177-186) maps feature names to Randoop flags. The expansion loop (lines 188-200) validates each feature and builds EXPANDED_FEATURE_FLAGS. Unknown features produce helpful error message with all valid options.
534-543: CSV header creation follows concurrency-safe pattern.The mkdir, flock, and
-scheck ensures the header is written exactly once despite concurrent invocations. Consistent with mutation-evosuite.sh and defects4j scripts.
553-557: Per-iteration file suffix includes feature set for clarity.Using FEATURE_SUFFIX (e.g., "BASELINE+BLOODHOUND") instead of generic identifier makes result directories self-documenting and aids debugging. Good practice.
663-670: Special case handling for hamcrest-core-1.3 test runner conversion.Comment (lines 656-662) explains why: Randoop-generated tests may not work with Major's mutation instrumentation. Python script converts them to EvoSuite format for proper isolation. This is a known limitation being worked around appropriately.
697-697: Results row format includes feature set instead of tool name.Line 697 uses FEATURE_SUFFIX (e.g., "BASELINE" or "BLOODHOUND+ORIENTEERING") as first column, which differs from mutation-evosuite.sh that uses "EVOSUITE". The CSV header (line 540) says "Version", so this appears intentional: "Version" now represents the feature configuration rather than tool type.
Verify this format matches expectations from generate-grt-figures.py; if figures consume these CSVs and expect different format, this could cause issues.
Can you verify that the CSV output format (first column = FEATURE_SUFFIX) is compatible with generate-grt-figures.py expectations?
Also applies to: 699-704
715-715: Result directory cleanup via cd back to SCRIPT_DIR.Line 715 changes back to SCRIPT_DIR after per-iteration work. This prevents relative path issues and ensures clean state for next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/mutation-evosuite.sh (1)
95-95: Add validation check for uuidgen availability.The script generates a UUID at line 95 without checking if
uuidgenis available. Add an explicit check:+ command -v uuidgen > /dev/null 2>&1 || { + echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2 + exit 2 + } UUID=$(uuidgen) # Generate a unique identifier per instance
♻️ Duplicate comments (7)
scripts/defects4j-evosuite.sh (2)
87-87: Add validation check for uuidgen availability.The script generates a UUID at line 87 without checking if
uuidgenis available. Add an explicit validation check:+ command -v uuidgen > /dev/null 2>&1 || { + echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2 + exit 2 + } UUID=$(uuidgen) # A unique identifier per instance
158-161: Add stderr redirection to error message.Error messages should be redirected to stderr for consistency. Line 159 is missing the
>&2redirection:[[ "$RESULTS_CSV" == *.csv ]] || { - echo "Error: -o must end with .csv" + echo "Error: -o must end with .csv" >&2 exit 2 }scripts/mutation-evosuite.sh (2)
586-586: Use the$Generatorvariable instead of hardcoding "EVOSUITE" in CSV output.Line 586 hardcodes
"EVOSUITE"in the CSV row instead of using the$Generatorvariable defined at line 59. This breaks the multi-generator abstraction:- row="EVOSUITE,$(basename "$SRC_JAR"),$LOGGED_TIME,0,$instruction_coverage,$branch_coverage,$mutation_score" + row="$Generator,$(basename "$SRC_JAR"),$LOGGED_TIME,0,$instruction_coverage,$branch_coverage,$mutation_score"
80-86: Add defensive checks for usejdk.sh sourcing and usejdk8 call.Line 80 sources
usejdk.shwithout verifying existence. Line 82 callsusejdk8without checking its return code, so if it fails (e.g.,JAVA8_HOMEunset), the subsequent Java version check may use an incorrect installation. Add file check and error handling:+ [ -f "$SCRIPT_DIR/usejdk.sh" ] || { + echo "Error: Missing $SCRIPT_DIR/usejdk.sh." >&2 + exit 2 + } . "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching - usejdk8 + usejdk8 || { + echo "Error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2 + exit 2 + }scripts/defects4j-randoop.sh (1)
100-100: Add validation check for uuidgen availability.The script uses
uuidgenwithout checking if it's installed. Althoughset -ewill catch command failures, add an explicit validation check matching the pattern used fordefects4jand JAR files (lines 64–83):+ command -v uuidgen > /dev/null 2>&1 || { + echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2 + exit 2 + } UUID=$(uuidgen) # A unique identifier per instancescripts/mutation-randoop.sh (2)
87-93: Add defensive checks for usejdk.sh sourcing and usejdk8 call.Line 87 sources
usejdk.shwithout verifying existence. Line 88 callsusejdk8without checking its return code, so if it fails (e.g.,JAVA8_HOMEunset), the Java version check may use an incorrect installation. Add file existence check and error handling:+ [ -f "$SCRIPT_DIR/usejdk.sh" ] || { + echo "Error: Missing $SCRIPT_DIR/usejdk.sh." >&2 + exit 2 + } . "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching - usejdk8 + usejdk8 || { + echo "Error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2 + exit 2 + }
604-676: Replace all instances of$1with$SUBJECT_PROGRAMin ant buildfile paths (lines 610–676).Verification confirms all 12 instances of
"$1"in buildfile paths remain unfixed at lines 610, 612, 615, 616, 622, 624, 627, 628, 634, 637, 673, and 676. Update all occurrences:- "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$1"/${buildfile} ... + "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$SUBJECT_PROGRAM"/${buildfile} ...Apply to both direct ant invocations and verbose echo commands throughout the coverage/mutation analysis section.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
scripts/defects4j-evosuite.sh(1 hunks)scripts/defects4j-randoop.sh(1 hunks)scripts/mutation-evosuite.sh(9 hunks)scripts/mutation-randoop.sh(5 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.
📚 Learning: 2025-10-14T03:30:11.765Z
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.
Applied to files:
scripts/mutation-evosuite.shscripts/defects4j-randoop.shscripts/defects4j-evosuite.shscripts/mutation-randoop.sh
📚 Learning: 2025-10-13T23:36:38.701Z
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.
Applied to files:
scripts/defects4j-randoop.shscripts/defects4j-evosuite.sh
🧬 Code graph analysis (4)
scripts/mutation-evosuite.sh (2)
scripts/mutation-randoop.sh (1)
require_file(76-81)scripts/usejdk.sh (1)
usejdk8(6-14)
scripts/defects4j-randoop.sh (1)
scripts/usejdk.sh (1)
usejdk11(17-25)
scripts/defects4j-evosuite.sh (1)
scripts/usejdk.sh (1)
usejdk11(17-25)
scripts/mutation-randoop.sh (2)
scripts/mutation-evosuite.sh (1)
require_file(71-76)scripts/usejdk.sh (1)
usejdk8(6-14)
🔇 Additional comments (5)
scripts/defects4j-randoop.sh (2)
232-241: Approve feature handling and directory structure.The feature flag mapping (lines 191–200), feature validation (lines 202–214), and directory setup with per-iteration UUID-based isolation (lines 232–241) are well-structured. The use of unique
FILE_SUFFIXacross per-iteration directories prevents concurrent execution issues.
332-334: Confirm: RegressionTest.java and ErrorTest.java removal is intentional.Lines 332–333 intentionally remove the driver test files (
RegressionTest.javaandErrorTest.java). Per the retrieved learning,run_bug_detection.pldoes not expect these files; the numbered test files (e.g.,RegressionTest0.java) remain and are packaged correctly.scripts/defects4j-evosuite.sh (1)
244-251: Approve time budget allocation with sensible minimum cap.The time budget calculation (lines 244–251) correctly handles
-c(per-class) budgeting with a minimum 1-second floor when-t(total time) is specified. This prevents excessively small per-class budgets on projects with many classes.scripts/mutation-randoop.sh (2)
168-197: Approve feature flag handling and validation.The feature parsing (lines 168–172), feature-to-flag mapping (lines 174–183), and validation with informative error messages (lines 185–197) are well-designed. The BASELINE default is correctly handled with an empty flag string.
531-540: Approve CSV header initialization and file locking.The CSV header initialization (lines 531–540) correctly uses exclusive file locking via
flockon a dedicated file descriptor to prevent race conditions across concurrent iterations. This pattern ensures atomic writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (11)
scripts/prerequisites.md (1)
63-78: Document how to invoke the JDK switcher.The environment variables
JAVA8_HOMEandJAVA11_HOMEare documented, but users need clear guidance on how to sourceusejdk.shand callusejdk11()orusejdk8()before running scripts. Add a usage example:```sh export JAVA8_HOME=/path/to/your/java8 export JAVA11_HOME=/path/to/your/java11
+Then, before running scripts, source the JDK switcher and select the appropriate version:
+
+sh +source scripts/usejdk.sh +usejdk11 # for defect detection scripts +usejdk8 # for mutation analysis scripts +</blockquote></details> <details> <summary>scripts/mutation-evosuite.sh (2)</summary><blockquote> `79-85`: **Add defensive checks for usejdk.sh sourcing and usejdk8 call.** The script sources `usejdk.sh` without verifying its existence and calls `usejdk8` without checking the return code. If `usejdk.sh` is missing or `JAVA8_HOME` is not set, the script will fail with unclear errors. Add checks before and after: ```diff +[ -f "$SCRIPT_DIR/usejdk.sh" ] || { + echo "Error: Missing $SCRIPT_DIR/usejdk.sh." >&2 + exit 2 +} . "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching -usejdk8 +usejdk8 || { + echo "Error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2 + exit 2 +} JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{sub("^$", "0", $2); print ($1=="1")?$2:$1}')
585-585: Use the$Generatorvariable instead of hardcoding "EVOSUITE" in CSV output.The script defines
Generator=EvoSuiteat line 58 for abstraction, but line 585 hardcodes "EVOSUITE". This breaks consistency and makes the script harder to maintain.-row="EVOSUITE,$(basename "$SRC_JAR"),$LOGGED_TIME,0,$instruction_coverage,$branch_coverage,$mutation_score" +row="$Generator,$(basename "$SRC_JAR"),$LOGGED_TIME,0,$instruction_coverage,$branch_coverage,$mutation_score"scripts/defects4j-evosuite.sh (3)
71-77: Add defensive checks for usejdk.sh sourcing and usejdk11 call.The script sources
usejdk.shwithout verifying its existence (line 71) and callsusejdk11without checking the return code (line 72). Add validation:+[ -f "$SCRIPT_DIR/usejdk.sh" ] || { + echo "Error: Missing $SCRIPT_DIR/usejdk.sh." >&2 + exit 2 +} . "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching -usejdk11 +usejdk11 || { + echo "Error: Failed to switch to Java 11. Set JAVA11_HOME to a JDK 11 installation." >&2 + exit 2 +} JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{print ($1=="1")?$2:$1}')
86-86: Add validation check for uuidgen availability.The script calls
uuidgenwithout checking if it's installed. Add an explicit check:+command -v uuidgen > /dev/null 2>&1 || { + echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2 + exit 2 +} UUID=$(uuidgen) # A unique identifier per instance
157-159: Add stderr redirection to error message.Line 158 should redirect the error message to stderr for consistency with other error handling in the script:
[[ "$RESULTS_CSV" == *.csv ]] || { - echo "Error: -o must end with .csv" + echo "Error: -o must end with .csv" >&2 exit 2 }scripts/defects4j-randoop.sh (2)
84-90: Add defensive checks for usejdk.sh sourcing and usejdk11 call.The script sources
usejdk.sh(line 84) without verifying existence and callsusejdk11(line 85) without checking the return code. Add validation:+[ -f "$SCRIPT_DIR/usejdk.sh" ] || { + echo "Error: Missing $SCRIPT_DIR/usejdk.sh." >&2 + exit 2 +} . "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching -usejdk11 +usejdk11 || { + echo "Error: Failed to switch to Java 11. Set JAVA11_HOME to a JDK 11 installation." >&2 + exit 2 +} JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{print ($1=="1")?$2:$1}')
99-99: Add validation check for uuidgen availability.The script calls
uuidgenwithout checking if it's installed. Add an explicit validation check:+command -v uuidgen > /dev/null 2>&1 || { + echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2 + exit 2 +} UUID=$(uuidgen) # A unique identifier per instancescripts/mutation-randoop.sh (3)
86-92: Add error check after usejdk8 function call.The
usejdk8function can return 1 ifJAVA8_HOMEis not set, but the script doesn't check this return code. Add validation:. "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching -usejdk8 +usejdk8 || { + echo "Error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2 + exit 2 +} JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{sub("^$", "0", $2); print ($1=="1")?$2:$1}')
101-101: Add validation check for uuidgen availability.The script calls
uuidgenwithout verifying it's installed. Add an explicit check:+command -v uuidgen > /dev/null 2>&1 || { + echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2 + exit 2 +} UUID=$(uuidgen) # Generate a unique identifier per instance
609-636: Use$SUBJECT_PROGRAMinstead of$1in ant command paths.The ant buildfile paths reference
"$1"to access the subject program name, but$SUBJECT_PROGRAMwas already captured at line 210. Using$1is fragile—if the script is refactored and$1is shifted, these commands will silently reference the wrong program.Replace all instances of
"$1"with"$SUBJECT_PROGRAM"in ant command paths:- "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$1"/${buildfile} ... + "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$SUBJECT_PROGRAM"/${buildfile} ...This affects lines: 609, 611, 614, 615, 621, 623, 626, 627, 633, 636, 672, 675.
Also applies to: 672-676
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
scripts/defects4j-evosuite.sh(1 hunks)scripts/defects4j-randoop.sh(1 hunks)scripts/experiment-scripts/README.md(1 hunks)scripts/experiment-scripts/generate-grt-figures.py(5 hunks)scripts/mutation-evosuite.sh(6 hunks)scripts/mutation-randoop.sh(5 hunks)scripts/prerequisites.md(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.
📚 Learning: 2025-10-14T03:30:11.765Z
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.
Applied to files:
scripts/defects4j-randoop.shscripts/experiment-scripts/generate-grt-figures.pyscripts/mutation-evosuite.shscripts/experiment-scripts/README.mdscripts/mutation-randoop.shscripts/defects4j-evosuite.sh
📚 Learning: 2025-10-13T23:36:38.701Z
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.
Applied to files:
scripts/defects4j-randoop.shscripts/defects4j-evosuite.sh
🧬 Code graph analysis (4)
scripts/defects4j-randoop.sh (1)
scripts/usejdk.sh (1)
usejdk11(17-25)
scripts/mutation-evosuite.sh (1)
scripts/usejdk.sh (1)
usejdk8(6-14)
scripts/mutation-randoop.sh (2)
scripts/mutation-evosuite.sh (1)
require_file(70-75)scripts/usejdk.sh (1)
usejdk8(6-14)
scripts/defects4j-evosuite.sh (1)
scripts/usejdk.sh (1)
usejdk11(17-25)
🔇 Additional comments (17)
scripts/prerequisites.md (1)
30-30: LGTM: Standardized shell syntax highlighting.The change from
bashtoshfor code fence identifiers is appropriate since these are simple POSIX-compatible commands that work in any shell.Also applies to: 36-36
scripts/experiment-scripts/generate-grt-figures.py (4)
7-7: LGTM: Clear documentation of Table IV generation.The documentation correctly describes the new
defect-table4.shscript and Table IV generation workflow. The clarification that this evaluates "four Defects4J subject programs" is consistent with the past review discussion.Also applies to: 16-17, 20-20
40-50: LGTM: Correct handling of raw data for defect detection.The conditional logic appropriately bypasses averaging for
table4, since defect detection results should preserve per-iteration test outcomes rather than being averaged. This is the correct approach for fault detection experiments.
263-315: LGTM: Well-structured fault detection aggregation.The
generate_table_4function correctly implements fault detection logic:
- Marks a fault as detected if ANY test case fails (line 278)
- Aggregates at the bug level using
.any()(lines 279-283)- Counts detected faults per configuration (lines 286-291)
- Pivots for tabular display (lines 294-299)
The table rendering follows standard matplotlib patterns and produces a readable output.
349-352: LGTM: Consistent integration of table4 into PDF generation.The table4 case is correctly integrated into the
save_to_pdffunction, following the same pattern as other figure types.scripts/mutation-evosuite.sh (3)
65-77: LGTM: Robust prerequisite validation.The addition of defensive checks for
MAJOR_HOME,EVOSUITE_JAR, andJACOCO_CLI_JARimproves script reliability. Therequire_filehelper function provides consistent validation with clear error messages.
148-165: LGTM: Thorough validation of -o argument.The validation ensures that:
- The CSV filename contains no path separators (preventing directory traversal)
- The filename ends with
.csv(enforcing correct extension)- Clear error messages guide users to correct usage
430-439: LGTM: Robust file locking for concurrent CSV writes.The implementation uses
flockwith dedicated file descriptors to prevent interleaved writes when multiple script instances run concurrently. The header is correctly initialized only when the file is empty. This is a solid approach for preventing race conditions.Also applies to: 586-592
scripts/experiment-scripts/README.md (1)
48-58: LGTM: Clear examples and helpful documentation notes.The addition of the Table 4 (defect detection) example and the note about per-script documentation improve the README's usability. Users now have clear guidance for both mutation analysis and defect detection workflows.
scripts/defects4j-evosuite.sh (2)
58-69: LGTM: Comprehensive Defects4J prerequisite validation.The validation checks ensure that:
defects4jis available on PATH- Required JAR files exist
- Defects4J's
run_bug_detection.plscript is executableError messages provide clear guidance, including suggesting
make build/defects4jif components are missing.
210-217: LGTM: Thread-safe CSV writes with flock.The CSV handling correctly uses
flockwith dedicated file descriptors to prevent race conditions during concurrent writes. The header is appropriately initialized only when the file is empty.Also applies to: 311-316
scripts/defects4j-randoop.sh (2)
63-82: LGTM: Thorough validation of Randoop and Defects4J prerequisites.The script validates all required components:
defects4javailability on PATH- Randoop JAR, Jacoco agents, and ReplaceCall JAR existence
- Defects4J's
run_bug_detection.plscript executabilityError messages are clear and guide users to resolve issues.
260-267: LGTM: Correct implementation of concurrent-safe CSV writes.The CSV handling uses
flockwith file descriptors to ensure thread-safe writes when multiple instances run in parallel. The pattern is consistent with other scripts in this PR.Also applies to: 370-375
scripts/mutation-randoop.sh (4)
70-85: LGTM: Comprehensive validation of Major and JAR prerequisites.The validation checks for
MAJOR_HOMEdirectory and required JAR files (Randoop, Jacoco, ReplaceCall) with therequire_filehelper function. Error messages are clear and consistent.
158-165: LGTM: Robust argument validation.The validation ensures:
- CSV filename contains no path separators
- Filename ends with
.csv- Required
SUBJECT_PROGRAMargument is providedError messages are clear and redirect to stderr appropriately.
Also applies to: 212-216
167-196: LGTM: Clean feature flags refactor.The new feature selection mechanism is well-structured:
- Uses an associative array for feature-to-flag mapping
- Defaults to
BASELINEwhen no features specified- Validates features and provides helpful error messages listing valid options
- Expands features into command-line flags cleanly
This is a significant improvement over the previous implementation.
530-539: LGTM: Robust per-run isolation and thread-safe CSV writes.The implementation provides:
- Per-run directories (test, coverage, result) isolated by unique FILE_SUFFIX
- Feature-based suffix construction for clear identification
- Thread-safe CSV writes using
flockwith file descriptorsThis prevents race conditions when multiple instances run concurrently.
Also applies to: 548-568, 694-700
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
♻️ Duplicate comments (6)
scripts/defects4j-randoop.sh (1)
91-91: Add validation check for uuidgen availability and document as prerequisite.The concern is valid. Line 91 calls
uuidgenwithout validation, and uuidgen is not universally available—it requires util-linux/uuid-runtime on Linux systems (may need installation) and is built-in on macOS. A validation pattern already exists in the same file at line 64 fordefects4j, establishing a clear precedent.Additionally,
uuidgenis used in four scripts without validation:
- defects4j-evosuite.sh
- defects4j-randoop.sh (line 91)
- mutation-evosuite.sh
- mutation-randoop.sh
Add the suggested validation check to all four scripts and update
scripts/prerequisites.mdto listuuidgen(oruuid-runtimeon Linux) as a required tool.+ command -v uuidgen > /dev/null 2>&1 || { + echo "${SCRIPT_NAME}: error: uuidgen not found on PATH." >&2 + exit 2 + } UUID=$(uuidgen) # A unique identifier per instancescripts/defects4j-evosuite.sh (2)
72-78: Add defensive checks for usejdk.sh sourcing and usejdk11 call.These defensive checks were flagged in the previous review. Ensure
scripts/defs.shexists before sourcing and verifyusejdk11succeeds before proceeding.Apply this diff:
+ [ -f "$SCRIPT_DIR/defs.sh" ] || { + echo "${SCRIPT_NAME}: error: Missing $SCRIPT_DIR/defs.sh." >&2 + exit 2 + } . "$SCRIPT_DIR/defs.sh" # Define shell functions. - usejdk11 + usejdk11 || { + echo "${SCRIPT_NAME}: error: Failed to switch to Java 11. Set JAVA11_HOME to a JDK 11 installation." >&2 + exit 2 + } JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{print ($1=="1")?$2:$1}')
87-87: Add explicit uuidgen availability check.This check was flagged in the previous review. Add an explicit validation before UUID generation.
Apply this diff:
+ command -v uuidgen > /dev/null 2>&1 || { + echo "${SCRIPT_NAME}: error: uuidgen not found. Install util-linux or equivalent package." >&2 + exit 2 + } UUID=$(uuidgen) # A unique identifier per instancescripts/experiment-scripts/defects4j-table4.sh (1)
58-63: Remove hardcoded temporary test parameters before merge (blocker).This issue was flagged in the previous review as a blocker. Lines 58–63 override production parameters with incomplete temporary values for testing. Shipping with these hardcoded parameters will silently run incorrect experiments.
Choose one of the following solutions:
Option 1 (preferred): Remove lines 58–65 entirely and use production defaults.
- # Temporary parameters for testing that override the defaults (GRT has not been finished yet) - NUM_LOOP=1 - TOTAL_SECONDS=(10) - PROJECT_IDS=("Lang") - TEST_GENERATORS=(BASELINE EVOSUITE) - BUG_IDS["Lang"]="1 3"Option 2: Make configurable via environment variables with fallback to production defaults.
NUM_LOOP="${GRT_NUM_LOOP:-10}" TOTAL_SECONDS=(${GRT_TOTAL_SECONDS:-120 300 600}) PROJECT_IDS=(${GRT_PROJECT_IDS:-Chart Math Time Lang}) TEST_GENERATORS=(${GRT_TEST_GENERATORS:-BASELINE GRT EVOSUITE})scripts/mutation-randoop.sh (2)
74-79: Add error check afterusejdk8function call.The
usejdk8function can fail and return 1 ifJAVA8_HOMEis not set, but the script doesn't check this return code. Ifusejdk8fails, the subsequent Java version detection may use an incorrect installation or fail entirely. This was flagged in the previous review.Apply this diff:
. "$SCRIPT_DIR/defs.sh" # Define shell functions. - usejdk8 + usejdk8 || { + echo "${SCRIPT_NAME}: error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2 + exit 2 + } JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{sub("^$", "0", $2); print ($1=="1")?$2:$1}')
596-596: Use$SUBJECT_PROGRAMinstead of$1in ant command paths.This issue was flagged in the previous review. The ant buildfile paths reference
"$1"instead of the already-captured$SUBJECT_PROGRAMvariable (line 196). Using$1is fragile: if the script is refactored later and$1is shifted again, these commands will silently reference the wrong program name.Replace all instances of
"$1"with"$SUBJECT_PROGRAM"in ant command paths and their associated verbose echo lines.Apply this diff to all affected lines:
- echo "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$1"/${buildfile} ... + echo "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$SUBJECT_PROGRAM"/${buildfile} ... - "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$1"/${buildfile} ... + "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$SUBJECT_PROGRAM"/${buildfile} ...Affected lines: 596, 598, 610, 614, 620, 623, 659, 662.
Also applies to: 598-598, 610-610, 614-614, 620-620, 623-623, 659-659, 662-662
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (13)
scripts/clean-script-dir.sh(1 hunks)scripts/defects4j-evosuite.sh(1 hunks)scripts/defects4j-randoop.sh(1 hunks)scripts/defs.sh(1 hunks)scripts/experiment-scripts/README.md(1 hunks)scripts/experiment-scripts/defects4j-table4.sh(1 hunks)scripts/experiment-scripts/mutation-fig6-table3.sh(3 hunks)scripts/experiment-scripts/mutation-fig7.sh(2 hunks)scripts/experiment-scripts/mutation-fig8-9.sh(2 hunks)scripts/get-all-subject-src.sh(1 hunks)scripts/mutation-evosuite.sh(6 hunks)scripts/mutation-randoop.sh(5 hunks)scripts/prerequisites.md(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.
📚 Learning: 2025-10-14T03:30:11.765Z
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.
Applied to files:
scripts/mutation-evosuite.shscripts/defects4j-randoop.shscripts/prerequisites.mdscripts/defects4j-evosuite.shscripts/mutation-randoop.shscripts/experiment-scripts/defects4j-table4.shscripts/experiment-scripts/README.md
📚 Learning: 2025-10-13T23:36:38.701Z
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.
Applied to files:
scripts/defects4j-randoop.shscripts/defects4j-evosuite.sh
🧬 Code graph analysis (5)
scripts/mutation-evosuite.sh (1)
scripts/defs.sh (4)
require_directory(12-17)require_file(5-10)usejdk8(31-39)require_csv_basename(19-28)
scripts/defects4j-randoop.sh (1)
scripts/defs.sh (3)
require_file(5-10)usejdk11(42-50)require_csv_basename(19-28)
scripts/defects4j-evosuite.sh (1)
scripts/defs.sh (2)
usejdk11(42-50)require_csv_basename(19-28)
scripts/mutation-randoop.sh (1)
scripts/defs.sh (4)
require_directory(12-17)require_file(5-10)usejdk8(31-39)require_csv_basename(19-28)
scripts/experiment-scripts/defects4j-table4.sh (3)
scripts/experiment-scripts/mutation-fig7.sh (1)
run_task(85-98)scripts/experiment-scripts/mutation-fig8-9.sh (1)
run_task(90-103)scripts/experiment-scripts/mutation-fig6-table3.sh (1)
run_task(82-100)
🔇 Additional comments (17)
scripts/prerequisites.md (3)
30-30: Code fence language improved for portability.Changing code fence language from
bashtoshis a good improvement. Theshselector is more portable and POSIX-compatible, and the commands shown (export statements, curl, pip) work fine with POSIX shells. This aligns with shell script best practices.Also applies to: 36-36
76-76: Bullet-point indentation formatting is correct.The continuation lines now properly include 2-space indentation for list nesting. This ensures markdown renderers correctly display these lines as part of their preceding bullet points rather than as separate paragraphs.
Also applies to: 78-78
63-78: Add guidance on using the usejdk.sh helper script.The Java Versions Setup section documents environment variables but omits how to actually invoke the JDK switcher. Users who set
JAVA8_HOMEandJAVA11_HOMEhave no clear instruction on how to use these variables in practice. Expand the section to show how to source theusejdk.shhelper and call the switcher functions before running scripts:This is needed because different scripts require different Java versions: * **Defect scripts** (via [Defects4J](https://github.com/rjust/defects4j)) require **Java 11**. * **Mutation scripts** (via [Major](https://github.com/rjust/major)) require **Java 8**. + +Before running any scripts, source the JDK switcher and select the appropriate version: + +```sh +source scripts/usejdk.sh +usejdk11 # for defect detection scripts +usejdk8 # for mutation analysis scripts +```scripts/experiment-scripts/README.md (4)
9-11: Overview accurately reflects expanded evaluation scope.The updated text now correctly conveys that the experiment scripts support both mutation analysis and defect detection evaluation methods, aligning with the PR's integration of Defects4J workflows.
13-15: Setup section properly consolidates prerequisites reference.The section now correctly references
scripts/prerequisites.mdas the single source of truth for environment setup, eliminating duplication. This addresses prior feedback about consolidating prerequisites documentation and reduces maintenance burden.
17-30: Running Scripts section effectively documents both evaluation methods.The section provides clear examples for both mutation analysis (
./mutation-fig7.sh) and defect detection (./defects4j-table4.sh). The examples assume invocation from thescripts/experiment-scripts/directory and follow consistent documentation practices.
38-49: Output section comprehensively describes results structure.The section properly documents CSV/PDF output locations, result naming conventions, and the important caveat about experiment isolation and preservation. It correctly references both mutation analysis and defect detection workflows.
scripts/clean-script-dir.sh (1)
9-10: ✓ Consistent naming convention applied.The variable rename from
script_dirtoSCRIPT_DIRaligns the script with shell conventions and the broader pattern introduced across the PR.scripts/get-all-subject-src.sh (1)
8-8: ✓ Improved error message identifiability.The addition of
SCRIPT_NAMEvariable and its use in error messages provides better identification when multiple scripts run concurrently or in logs.Also applies to: 12-12
scripts/experiment-scripts/mutation-fig6-table3.sh (1)
36-36: ✓ Consistent logging identifier pattern.The addition of
SCRIPT_NAMEfor logging is consistent with the broader refactoring across experiment scripts. Error handling maintains proper exit codes.Also applies to: 62-62, 97-98
scripts/experiment-scripts/mutation-fig8-9.sh (1)
40-40: ✓ Consistent with related scripts.The logging identifier pattern matches
mutation-fig6-table3.shandmutation-fig7.sh. Minor documentation improvement noted.Also applies to: 45-45, 70-70
scripts/experiment-scripts/mutation-fig7.sh (1)
36-36: ✓ Consistent logging pattern maintained.The addition of
SCRIPT_NAMEaligns with the refactoring pattern across mutation experiment scripts.Also applies to: 40-40, 65-65
scripts/mutation-evosuite.sh (1)
415-424: ✓ CSV locking mechanism correctly implemented for concurrent safety.The flock-based approach for header initialization (lines 415–424) and row appending (lines 572–577) correctly prevents interleaving writes across concurrent processes. The exclusive lock is acquired before reading/writing and released immediately after via file descriptor management.
Also applies to: 572-577
scripts/defs.sh (1)
31-50: Note: usejdk functions use return, not exit.The
usejdk8andusejdk11functions return 1 on error rather than exiting. This is appropriate for functions (allowing callers to handle the error), but all callers must check the return code with|| { ... }patterns. Verify that calling scripts properly handle errors from these functions.scripts/defects4j-randoop.sh (2)
318-318: ✓ Correct removal of driver test files per Defects4J workflow.Line 318 correctly removes
RegressionTest.javaandErrorTest.java(the driver/suite files) after test generation. Per the learned behavior,run_bug_detection.pldoes not expect these files; only the numbered test files (RegressionTest0.java, etc.) remain and are packaged for defect detection. Based on learningsAlso applies to: 318-318
246-253: ✓ CSV locking mechanism correctly implemented for concurrent safety.The flock-based approach for header initialization (lines 246–253) and row appending (lines 356–361) prevents concurrent write interleaving. The pattern mirrors the robust approach in
mutation-evosuite.sh.Also applies to: 356-361
scripts/mutation-randoop.sh (1)
75-79: Verify Java version check logic.Line 75 extracts the Java major version; ensure the regex and fallback (
sub("^$", "0", $2)) correctly handle all JDK 8 version formats (e.g.,1.8.0_392). The logic appears sound, but confirm the expected behavior matches your testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/mutation-evosuite.sh (1)
86-86: Add validation check for uuidgen availability.The script uses
uuidgenwithout checking if it's installed. Add a validation check consistent with other dependency checks.Apply this diff:
+command -v uuidgen > /dev/null 2>&1 || { + echo "${SCRIPT_NAME}: error: uuidgen not found. Install util-linux or equivalent package." >&2 + exit 2 +} UUID=$(uuidgen) # Generate a unique identifier per instance
♻️ Duplicate comments (10)
scripts/mutation-randoop.sh (3)
93-93: Add validation check for uuidgen availability.The script uses
uuidgenwithout checking if it's installed. Add a validation check consistent with other dependency checks.Apply this diff:
+command -v uuidgen > /dev/null 2>&1 || { + echo "${SCRIPT_NAME}: error: uuidgen not found. Install util-linux or equivalent package." >&2 + exit 2 +} UUID=$(uuidgen) # Generate a unique identifier per instance
74-79: Add error handling for usejdk8 call.The
usejdk8function returns 1 ifJAVA8_HOMEis unset, but line 74 doesn't check the return code. Add explicit error checking to fail early.Apply this diff:
-usejdk8 +usejdk8 || { + echo "${SCRIPT_NAME}: error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2 + exit 2 +} JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{sub("^$", "0", $2); print ($1=="1")?$2:$1}')
586-658: Use$SUBJECT_PROGRAMinstead of$1in ant command paths.Multiple ant commands reference
"$1"to access the subject program name, but$SUBJECT_PROGRAMwas captured at line 196 and should be used consistently throughout the script. Using$1is fragile—if the script is refactored and positional parameters change, these references will silently break.Replace all instances of
"$1"in ant command paths with"$SUBJECT_PROGRAM". This affects lines: 592, 594, 597-598, 604, 606, 609-610, 616, 619, 623, 655, 658.Example for line 597:
-"$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$1"/${buildfile} ... +"$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$SUBJECT_PROGRAM"/${buildfile} ...scripts/defects4j-evosuite.sh (2)
74-79: Add error handling for usejdk11 call.The
usejdk11function returns 1 ifJAVA11_HOMEis unset, but the script doesn't check the return code. Add explicit error checking.Apply this diff:
-usejdk11 +usejdk11 || { + echo "${SCRIPT_NAME}: error: Failed to switch to Java 11. Set JAVA11_HOME to a JDK 11 installation." >&2 + exit 2 +} JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{print ($1=="1")?$2:$1}')
88-88: Add validation check for uuidgen availability.The script uses
uuidgenwithout checking if it's installed. Add a validation check consistent with other dependency checks.Apply this diff:
+command -v uuidgen > /dev/null 2>&1 || { + echo "${SCRIPT_NAME}: error: uuidgen not found. Install util-linux or equivalent package." >&2 + exit 2 +} UUID=$(uuidgen) # A unique identifier per instancescripts/defects4j-randoop.sh (3)
92-92: Add validation check for uuidgen availability.The script uses
uuidgenwithout checking if it's installed. Add an explicit validation check consistent with other dependency checks in the script.Apply this diff:
+command -v uuidgen > /dev/null 2>&1 || { + echo "${SCRIPT_NAME}: error: uuidgen not found. Install util-linux or equivalent package." >&2 + exit 2 +} UUID=$(uuidgen) # A unique identifier per instance
78-83: Add error handling for usejdk11 call.The
usejdk11function returns 1 ifJAVA11_HOMEis unset, but the script doesn't check the return code. If the function fails, the subsequent Java version check may use the wrong Java installation or fail unexpectedly.Apply this diff:
-usejdk11 +usejdk11 || { + echo "${SCRIPT_NAME}: error: Failed to switch to Java 11. Set JAVA11_HOME to a JDK 11 installation." >&2 + exit 2 +} JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{print ($1=="1")?$2:$1}')
64-64: Add defensive check before sourcing defs.sh.The script sources
defs.shwithout verifying it exists. Add a pre-check to fail early with a clear error if the file is missing.Apply this diff:
+[ -f "$SCRIPT_DIR/defs.sh" ] || { + echo "$(basename "$0"): error: Missing $SCRIPT_DIR/defs.sh" >&2 + exit 2 +} . "$SCRIPT_DIR/defs.sh" # Define shell functions.scripts/mutation-evosuite.sh (2)
67-72: Add error handling for usejdk8 call.The
usejdk8function returns 1 ifJAVA8_HOMEis unset, but the script doesn't check the return code. Add explicit error checking.Apply this diff:
-usejdk8 +usejdk8 || { + echo "${SCRIPT_NAME}: error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2 + exit 2 +} JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{sub("^$", "0", $2); print ($1=="1")?$2:$1}')
61-61: Add defensive check before sourcing defs.sh.The script sources
defs.shwithout verifying it exists. Add a pre-check for consistency with other validations in the script.Apply this diff:
+[ -f "$SCRIPT_DIR/defs.sh" ] || { + echo "$(basename "$0"): error: Missing $SCRIPT_DIR/defs.sh" >&2 + exit 2 +} . "$SCRIPT_DIR/defs.sh" # Define shell functions.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
scripts/defects4j-evosuite.sh(1 hunks)scripts/defects4j-randoop.sh(1 hunks)scripts/defs.sh(1 hunks)scripts/experiment-scripts/generate-grt-figures.py(5 hunks)scripts/mutation-evosuite.sh(6 hunks)scripts/mutation-randoop.sh(6 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.
📚 Learning: 2025-10-14T03:30:11.765Z
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.
Applied to files:
scripts/mutation-evosuite.shscripts/defects4j-evosuite.shscripts/defects4j-randoop.shscripts/mutation-randoop.shscripts/experiment-scripts/generate-grt-figures.py
📚 Learning: 2025-10-13T23:36:38.701Z
Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.
Applied to files:
scripts/defects4j-evosuite.shscripts/defects4j-randoop.sh
🧬 Code graph analysis (4)
scripts/mutation-evosuite.sh (1)
scripts/defs.sh (5)
require_directory(33-38)require_file(26-31)usejdk8(52-60)require_csv_basename(40-49)append_csv(5-24)
scripts/defects4j-evosuite.sh (1)
scripts/defs.sh (3)
usejdk11(63-71)require_csv_basename(40-49)append_csv(5-24)
scripts/defects4j-randoop.sh (1)
scripts/defs.sh (4)
require_file(26-31)usejdk11(63-71)require_csv_basename(40-49)append_csv(5-24)
scripts/mutation-randoop.sh (1)
scripts/defs.sh (5)
require_directory(33-38)require_file(26-31)usejdk8(52-60)require_csv_basename(40-49)append_csv(5-24)
🔇 Additional comments (1)
scripts/defs.sh (1)
40-49: LGTM! CSV basename validation is now correct.The logic inversion issue flagged in previous reviews has been fixed. The function now correctly validates that the filename ends with
.csvand contains no path separators.
|
@mernst Ready for re-review. |
mernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
Can I merge this into main? |
This PR adds functionality for integrating Defects4J with Randoop and EvoSuite for assessing test efficacy as measured by defect detection.