Defects4j scripts #70

varuniy · 2025-11-03T23:15:24Z

This PR adds functionality for integrating Defects4J with Randoop and EvoSuite for assessing test efficacy as measured by defect detection.

…to defects4j-scripts

coderabbitai · 2025-11-03T23:19:08Z

📝 Walkthrough

Walkthrough

Adds defect-detection support and shared shell utilities to the GRT testing framework. New driver scripts: scripts/defects4j-randoop.sh and scripts/defects4j-evosuite.sh. Adds scripts/defs.sh with CSV-append locking, file/directory validators, and JDK switch helpers. Introduces Table IV workflow: scripts/experiment-scripts/defects4j-table4.sh and scripts/experiment-scripts/generate-grt-figures.py changes (new generate_table_4). Makefile gains build/defects4j. README, scripts/prerequisites.md, experiment README, and mutation driver scripts were updated to use the new utilities and to separate mutation-analysis vs defect-detection workflows.

Possibly related PRs

Set and use SCRIPT_NAME variable #85: Introduces the same SCRIPT_NAME variable pattern into multiple experiment scripts (e.g., mutation-fig7.sh, mutation-fig8-9.sh) for consistent logging identifiers.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.71% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 813f965 and 5c4ee83.

📒 Files selected for processing (2)

scripts/defects4j-evosuite.sh (1 hunks)
scripts/defects4j-randoop.sh (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

📚 Learning: 2025-10-14T03:30:11.765Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

Applied to files:

scripts/defects4j-randoop.sh
scripts/defects4j-evosuite.sh

📚 Learning: 2025-10-13T23:36:38.701Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.

Applied to files:

scripts/defects4j-evosuite.sh

🧬 Code graph analysis (2)

scripts/defects4j-randoop.sh (1)

scripts/defs.sh (4)

require_file (27-32)

usejdk11 (64-72)

require_csv_basename (41-50)

append_csv (5-25)

scripts/defects4j-evosuite.sh (1)

scripts/defs.sh (3)

usejdk11 (64-72)

require_csv_basename (41-50)

append_csv (5-25)

🔇 Additional comments (2)

scripts/defects4j-randoop.sh (1)

317-318: Confirm intentional removal of driver test files.

Lines 317–318 remove RegressionTest.java, ErrorTest.java, and the relevant-classes file after test generation. Per learnings from PR #64, this is intentional because run_bug_detection.pl expects only the numbered test files (e.g., RegressionTest0.java). The numbered files remain in TEST_DIR for packaging.

scripts/defects4j-evosuite.sh (1)

236-243: Per-class time budget logic is sound.

Lines 236–243 implement defensive time budget allocation: if SECONDS_PER_CLASS is set, use it directly; otherwise, divide total time by the number of classes with a minimum floor of 1 second. This prevents division-by-zero and ensures every class receives at least 1 second, which is reasonable.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

scripts/experiment-scripts/mutation-fig8-9.sh (1)
39-134: LGTM! Clean refactoring to GRT_TESTING_ROOT.

The variable rename from MUTATION_DIR to GRT_TESTING_ROOT is consistently applied, and the portable CPU detection logic is well-implemented with appropriate fallbacks.

The CPU core detection logic (lines 81-89) is duplicated in mutation-fig6-table3.sh (lines 74-82). Consider extracting this into a shared utility function or sourced script to reduce duplication:
# In scripts/common-utils.sh
get_num_cores() {
  if command -v nproc > /dev/null 2>&1; then
    NPROC=$(nproc)
  elif command -v getconf > /dev/null 2>&1; then
    NPROC=$(getconf _NPROCESSORS_ONLN)
  else
    NPROC=1
  fi
  NUM_CORES=$((NPROC - 4))
  if [ "$NUM_CORES" -lt 1 ]; then NUM_CORES=1; fi
  echo "$NUM_CORES"
}
Then source and use it:
. "$SCRIPT_DIR/common-utils.sh"
NUM_CORES=$(get_num_cores)
scripts/experiment-scripts/mutation-fig7.sh (1)

38-42: Python availability check is correct but could be more explicit.

The logic correctly checks for python3 first, then python. However, the error message and exit code (1) differ from other preflight checks in related scripts that use exit code 2. Consider standardizing exit codes for consistency.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6acde6f and eaf81e8.

📒 Files selected for processing (15)

README.md (1 hunks)
scripts/Makefile (2 hunks)
scripts/defects4j-evosuite.sh (1 hunks)
scripts/defects4j-prerequisites.md (1 hunks)
scripts/defects4j-randoop.sh (1 hunks)
scripts/experiment-scripts/README.md (2 hunks)
scripts/experiment-scripts/defects4j-table4.sh (1 hunks)
scripts/experiment-scripts/generate-grt-figures.py (5 hunks)
scripts/experiment-scripts/mutation-fig6-table3.sh (6 hunks)
scripts/experiment-scripts/mutation-fig7.sh (6 hunks)
scripts/experiment-scripts/mutation-fig8-9.sh (6 hunks)
scripts/mutation-evosuite.sh (9 hunks)
scripts/mutation-prerequisites.md (1 hunks)
scripts/mutation-randoop.sh (8 hunks)
scripts/usejdk.sh (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

📚 Learning: 2025-10-14T03:30:11.765Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

Applied to files:

scripts/experiment-scripts/mutation-fig6-table3.sh
scripts/experiment-scripts/README.md
scripts/defects4j-evosuite.sh
scripts/Makefile
scripts/experiment-scripts/generate-grt-figures.py
scripts/defects4j-prerequisites.md
scripts/experiment-scripts/mutation-fig7.sh
scripts/mutation-evosuite.sh
scripts/experiment-scripts/defects4j-table4.sh
scripts/defects4j-randoop.sh
scripts/mutation-randoop.sh
README.md
scripts/experiment-scripts/mutation-fig8-9.sh

📚 Learning: 2025-10-13T23:36:38.701Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.

Applied to files:

scripts/defects4j-evosuite.sh
scripts/Makefile
scripts/defects4j-prerequisites.md
scripts/defects4j-randoop.sh

🧬 Code graph analysis (5)

scripts/defects4j-evosuite.sh (1)

scripts/usejdk.sh (1)

usejdk11 (17-25)

scripts/mutation-evosuite.sh (1)

scripts/usejdk.sh (1)

usejdk8 (6-14)

scripts/experiment-scripts/defects4j-table4.sh (3)

scripts/experiment-scripts/mutation-fig6-table3.sh (1)

run_task (103-120)

scripts/experiment-scripts/mutation-fig7.sh (1)

run_task (107-120)

scripts/experiment-scripts/mutation-fig8-9.sh (1)

run_task (110-123)

scripts/defects4j-randoop.sh (1)

scripts/usejdk.sh (1)

usejdk11 (17-25)

scripts/mutation-randoop.sh (1)

scripts/usejdk.sh (1)

usejdk8 (6-14)

🔇 Additional comments (33)

scripts/experiment-scripts/README.md (1)

8-38: LGTM! Clear documentation structure.

The documentation properly distinguishes between mutation analysis and defect detection workflows, with appropriate references to the new prerequisite files and clear examples for both evaluation methods.

scripts/mutation-prerequisites.md (1)

3-31: LGTM! Well-structured prerequisites.

The Java version setup section properly documents the environment variables needed for the usejdk.sh script, and the step renumbering is consistent throughout.
scripts/experiment-scripts/generate-grt-figures.py (1)
263-317: LGTM! Defect detection logic is sound.

The function correctly implements the workflow:

Marks a bug as detected if any test fails for it

Counts total faults detected per tool and time budget

Generates a tabular visualization

The aggregation logic using groupby(...).any() followed by sum() appropriately handles the two-level detection criteria.

Consider adding validation for expected data format:
def generate_table_4(df: pd.DataFrame) -> mpl.figure.Figure:
    """..."""
    # Validate expected columns
    required_cols = {'ProjectId', 'Version', 'TimeLimit', 'TestSuiteSource', 'TestClassification'}
    if not required_cols.issubset(df.columns):
        raise ValueError(f"Missing required columns. Expected: {required_cols}, Got: {set(df.columns)}")
    
    df.columns = [col.strip() for col in df.columns]
    # ... rest of function
This would catch data format issues early rather than failing with cryptic pandas errors.
scripts/experiment-scripts/mutation-fig6-table3.sh (1)

35-131: LGTM! Consistent refactoring to GRT_TESTING_ROOT.

The changes mirror those in mutation-fig8-9.sh, consistently replacing MUTATION_DIR with GRT_TESTING_ROOT and adding portable CPU detection.

Note: The CPU detection logic duplication between this file and mutation-fig8-9.sh has already been flagged for potential refactoring in that file's review.

scripts/experiment-scripts/mutation-fig7.sh (2)

36-36: Use GRT_TESTING_ROOT consistently throughout script.

The variable is correctly initialized but should be exported or passed consistently. Line 97 correctly passes it to run_task, and line 131 correctly uses it. Looks good.

78-87: Robust CPU core detection with sensible defaults.

The multi-stage approach (nproc → getconf → default 1) with max(NPROC-4, 1) lower bound is appropriate for parallel execution safety. Line 87 correctly ensures NUM_CORES never goes below 1.

README.md (2)

12-19: Clear evaluation methods documentation.

The distinction between mutation analysis and defect detection is well articulated. This provides good context for users on what each evaluation method measures.

25-40: Setup and script references are logically organized.

The split between mutation prerequisites and defect detection prerequisites improves navigation. References to scripts align with the files introduced in this PR.

scripts/defects4j-evosuite.sh (5)

72-78: Java 11 validation is correct.

The script correctly sources usejdk.sh, calls usejdk11, and validates the version. The version extraction (line 74) handles both Java 8 and 11+ formats correctly.

210-218: CSV file creation with proper locking pattern.

The flock usage prevents concurrent header writes to the results CSV. The check for empty file (-s) ensures header is only written once. Pattern is consistent with mutation scripts.

244-251: Time budget allocation logic is sound.

If per-class time is provided, it's used as-is. If total time is specified, it's divided by number of classes with a 1-second minimum. This matches the mutation-evosuite.sh pattern.

289-297: Test cleanup after tar packaging is appropriate.

Per the retrieved learning, removing RegressionTest.java and ErrorTest.java (driver files) after test generation is intentional; only numbered test files are packaged. The tar-with-error-handling pattern correctly ignores tar code 1.

312-317: Results appending with file locking is correct.

Using flock on a dedicated fd ensures atomic writes. The tr -d '\r' removes any Windows line endings, tail -n +2 skips header, and awk appends the time limit. Pattern is identical to mutation scripts.

scripts/mutation-evosuite.sh (5)

64-75: Comprehensive preflight checks with consistent exit codes.

All dependency checks use exit code 2, consistent with defects4j scripts. The order (directories then files then executables) is logical.

77-83: Java 8 enforcement is correct and aligned with mutation workflow.

Sourcing usejdk.sh and calling usejdk8 ensures Java 8 is used. The version extraction handles both formats. Error message correctly indicates setting JAVA8_HOME.

142-153: Strict validation for output CSV parameter.

Requires filename (no paths), must end with .csv, and -o is mandatory. These constraints prevent accidental path traversal and ensure results stay in results/ directory. Good security posture.

428-437: CSV header creation with proper concurrency protection.

The mkdir, flock, and -s check pattern is consistent with defects4j scripts. Ensures header is written exactly once even with concurrent invocations.

479-486: Per-iteration EVOSUITE_COMMAND construction is clean.

Building the command from EVOSUITE_BASE_COMMAND and adding per-iteration parameters (-Dtest_dir, -Dreport_dir) avoids repetition and makes maintenance easier.

scripts/defects4j-randoop.sh (5)

85-91: Java 11 requirement is correct for Defects4J.

Defects4J requires Java 11; this enforcement via usejdk.sh is appropriate. Error message is clear.

191-214: Feature flag mapping and expansion is robust.

The associative array (lines 191-200) maps features to Randoop flags. The expansion loop (202-214) validates each feature exists before using it. Unknown features cause exit with helpful error message listing valid options.

294-298: Time budget calculation correctly uses total time as-is.

When -t is specified, TIME_LIMIT is set to TOTAL_TIME directly (not divided by NUM_CLASSES like EvoSuite). This is correct because Randoop's --time-limit is global, not per-class.

332-334: Test cleanup follows intentional pattern from prior PR.

Removing RegressionTest.java and ErrorTest.java (driver files) aligns with the retrieved learning; only numbered test files are packaged for defect detection.

340-345: Tarball naming logic distinguishes BASELINE from feature combinations.

When features are provided (non-BASELINE), tar suffix is "grt"; otherwise "randoop". This aids in results interpretation and file organization.

scripts/experiment-scripts/defects4j-table4.sh (3)

85-92: get_bug_ids() function provides flexible bug ID resolution.

If BUG_IDS array has an entry for the project, uses it; otherwise queries Defects4J. This allows both hardcoded (for testing) and dynamic (for production) bug ID sources.

111-129: run_task() function correctly dispatches to appropriate test generators.

The three cases (EVOSUITE, GRT, BASELINE) invoke the correct scripts with appropriate feature flags. GRT case correctly expands all features. Error case is clear.

94-105: Task generation loop structure produces comprehensive test matrix.

Iterates over time budgets, projects, bug IDs, test generators, and iterations. Each combination produces one task argument set. This ensures all combinations are tested.

scripts/mutation-randoop.sh (7)

69-88: Comprehensive preflight checks with consistent exit codes.

All dependency checks use exit code 2. Order is logical: directories first, then files. This ensures all required tools are available before script proceeds.

171-200: Feature flag expansion is robust and well-validated.

The associative array (lines 177-186) maps feature names to Randoop flags. The expansion loop (lines 188-200) validates each feature and builds EXPANDED_FEATURE_FLAGS. Unknown features produce helpful error message with all valid options.

534-543: CSV header creation follows concurrency-safe pattern.

The mkdir, flock, and -s check ensures the header is written exactly once despite concurrent invocations. Consistent with mutation-evosuite.sh and defects4j scripts.

553-557: Per-iteration file suffix includes feature set for clarity.

Using FEATURE_SUFFIX (e.g., "BASELINE+BLOODHOUND") instead of generic identifier makes result directories self-documenting and aids debugging. Good practice.

663-670: Special case handling for hamcrest-core-1.3 test runner conversion.

Comment (lines 656-662) explains why: Randoop-generated tests may not work with Major's mutation instrumentation. Python script converts them to EvoSuite format for proper isolation. This is a known limitation being worked around appropriately.

697-697: Results row format includes feature set instead of tool name.

Line 697 uses FEATURE_SUFFIX (e.g., "BASELINE" or "BLOODHOUND+ORIENTEERING") as first column, which differs from mutation-evosuite.sh that uses "EVOSUITE". The CSV header (line 540) says "Version", so this appears intentional: "Version" now represents the feature configuration rather than tool type.

Verify this format matches expectations from generate-grt-figures.py; if figures consume these CSVs and expect different format, this could cause issues.

Can you verify that the CSV output format (first column = FEATURE_SUFFIX) is compatible with generate-grt-figures.py expectations?

Also applies to: 699-704

715-715: Result directory cleanup via cd back to SCRIPT_DIR.

Line 715 changes back to SCRIPT_DIR after per-iteration work. This prevents relative path issues and ensures clean state for next iteration.

scripts/defects4j-prerequisites.md

scripts/Makefile

scripts/usejdk.sh

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scripts/mutation-evosuite.sh (1)
95-95: Add validation check for uuidgen availability.

The script generates a UUID at line 95 without checking if uuidgen is available. Add an explicit check:
+ command -v uuidgen > /dev/null 2>&1 || {
+   echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2
+   exit 2
+ }
  UUID=$(uuidgen) # Generate a unique identifier per instance

♻️ Duplicate comments (7)

scripts/defects4j-evosuite.sh (2)
87-87: Add validation check for uuidgen availability.

The script generates a UUID at line 87 without checking if uuidgen is available. Add an explicit validation check:
+ command -v uuidgen > /dev/null 2>&1 || {
+   echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2
+   exit 2
+ }
  UUID=$(uuidgen) # A unique identifier per instance
158-161: Add stderr redirection to error message.

Error messages should be redirected to stderr for consistency. Line 159 is missing the >&2 redirection:
  [[ "$RESULTS_CSV" == *.csv ]] || {
-   echo "Error: -o must end with .csv"
+   echo "Error: -o must end with .csv" >&2
    exit 2
  }
scripts/mutation-evosuite.sh (2)
586-586: Use the $Generator variable instead of hardcoding "EVOSUITE" in CSV output.

Line 586 hardcodes "EVOSUITE" in the CSV row instead of using the $Generator variable defined at line 59. This breaks the multi-generator abstraction:
- row="EVOSUITE,$(basename "$SRC_JAR"),$LOGGED_TIME,0,$instruction_coverage,$branch_coverage,$mutation_score"
+ row="$Generator,$(basename "$SRC_JAR"),$LOGGED_TIME,0,$instruction_coverage,$branch_coverage,$mutation_score"
80-86: Add defensive checks for usejdk.sh sourcing and usejdk8 call.

Line 80 sources usejdk.sh without verifying existence. Line 82 calls usejdk8 without checking its return code, so if it fails (e.g., JAVA8_HOME unset), the subsequent Java version check may use an incorrect installation. Add file check and error handling:
+ [ -f "$SCRIPT_DIR/usejdk.sh" ] || {
+   echo "Error: Missing $SCRIPT_DIR/usejdk.sh." >&2
+   exit 2
+ }
  . "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching
- usejdk8
+ usejdk8 || {
+   echo "Error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2
+   exit 2
+ }
scripts/defects4j-randoop.sh (1)
100-100: Add validation check for uuidgen availability.

The script uses uuidgen without checking if it's installed. Although set -e will catch command failures, add an explicit validation check matching the pattern used for defects4j and JAR files (lines 64–83):
+ command -v uuidgen > /dev/null 2>&1 || {
+   echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2
+   exit 2
+ }
  UUID=$(uuidgen) # A unique identifier per instance
scripts/mutation-randoop.sh (2)
87-93: Add defensive checks for usejdk.sh sourcing and usejdk8 call.

Line 87 sources usejdk.sh without verifying existence. Line 88 calls usejdk8 without checking its return code, so if it fails (e.g., JAVA8_HOME unset), the Java version check may use an incorrect installation. Add file existence check and error handling:
+ [ -f "$SCRIPT_DIR/usejdk.sh" ] || {
+   echo "Error: Missing $SCRIPT_DIR/usejdk.sh." >&2
+   exit 2
+ }
  . "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching
- usejdk8
+ usejdk8 || {
+   echo "Error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2
+   exit 2
+ }
604-676: Replace all instances of $1 with $SUBJECT_PROGRAM in ant buildfile paths (lines 610–676).

Verification confirms all 12 instances of "$1" in buildfile paths remain unfixed at lines 610, 612, 615, 616, 622, 624, 627, 628, 634, 637, 673, and 676. Update all occurrences:
- "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$1"/${buildfile} ...
+ "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$SUBJECT_PROGRAM"/${buildfile} ...
Apply to both direct ant invocations and verbose echo commands throughout the coverage/mutation analysis section.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b443322 and 75a4aa6.

📒 Files selected for processing (4)

scripts/defects4j-evosuite.sh (1 hunks)
scripts/defects4j-randoop.sh (1 hunks)
scripts/mutation-evosuite.sh (9 hunks)
scripts/mutation-randoop.sh (5 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.

📚 Learning: 2025-10-14T03:30:11.765Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

Applied to files:

scripts/mutation-evosuite.sh
scripts/defects4j-randoop.sh
scripts/defects4j-evosuite.sh
scripts/mutation-randoop.sh

📚 Learning: 2025-10-13T23:36:38.701Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.

Applied to files:

scripts/defects4j-randoop.sh
scripts/defects4j-evosuite.sh

🧬 Code graph analysis (4)

scripts/mutation-evosuite.sh (2)

scripts/mutation-randoop.sh (1)

require_file (76-81)

scripts/usejdk.sh (1)

usejdk8 (6-14)

scripts/defects4j-randoop.sh (1)

scripts/usejdk.sh (1)

usejdk11 (17-25)

scripts/defects4j-evosuite.sh (1)

scripts/usejdk.sh (1)

usejdk11 (17-25)

scripts/mutation-randoop.sh (2)

scripts/mutation-evosuite.sh (1)

require_file (71-76)

scripts/usejdk.sh (1)

usejdk8 (6-14)

🔇 Additional comments (5)

scripts/defects4j-randoop.sh (2)

232-241: Approve feature handling and directory structure.

The feature flag mapping (lines 191–200), feature validation (lines 202–214), and directory setup with per-iteration UUID-based isolation (lines 232–241) are well-structured. The use of unique FILE_SUFFIX across per-iteration directories prevents concurrent execution issues.

332-334: Confirm: RegressionTest.java and ErrorTest.java removal is intentional.

Lines 332–333 intentionally remove the driver test files (RegressionTest.java and ErrorTest.java). Per the retrieved learning, run_bug_detection.pl does not expect these files; the numbered test files (e.g., RegressionTest0.java) remain and are packaged correctly.

scripts/defects4j-evosuite.sh (1)

244-251: Approve time budget allocation with sensible minimum cap.

The time budget calculation (lines 244–251) correctly handles -c (per-class) budgeting with a minimum 1-second floor when -t (total time) is specified. This prevents excessively small per-class budgets on projects with many classes.

scripts/mutation-randoop.sh (2)

168-197: Approve feature flag handling and validation.

The feature parsing (lines 168–172), feature-to-flag mapping (lines 174–183), and validation with informative error messages (lines 185–197) are well-designed. The BASELINE default is correctly handled with an empty flag string.

531-540: Approve CSV header initialization and file locking.

The CSV header initialization (lines 531–540) correctly uses exclusive file locking via flock on a dedicated file descriptor to prevent race conditions across concurrent iterations. This pattern ensures atomic writes.

scripts/defects4j-evosuite.sh

scripts/defects4j-randoop.sh

scripts/mutation-randoop.sh

scripts/experiment-scripts/README.md

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (11)

scripts/prerequisites.md (1)
63-78: Document how to invoke the JDK switcher.

The environment variables JAVA8_HOME and JAVA11_HOME are documented, but users need clear guidance on how to source usejdk.sh and call usejdk11() or usejdk8() before running scripts. Add a usage example:
 ```sh
 export JAVA8_HOME=/path/to/your/java8
 export JAVA11_HOME=/path/to/your/java11
+Then, before running scripts, source the JDK switcher and select the appropriate version:
+
+sh +source scripts/usejdk.sh +usejdk11 # for defect detection scripts +usejdk8 # for mutation analysis scripts +
</blockquote></details>
<details>
<summary>scripts/mutation-evosuite.sh (2)</summary><blockquote>

`79-85`: **Add defensive checks for usejdk.sh sourcing and usejdk8 call.**

The script sources `usejdk.sh` without verifying its existence and calls `usejdk8` without checking the return code. If `usejdk.sh` is missing or `JAVA8_HOME` is not set, the script will fail with unclear errors.

Add checks before and after:

```diff
+[ -f "$SCRIPT_DIR/usejdk.sh" ] || {
+  echo "Error: Missing $SCRIPT_DIR/usejdk.sh." >&2
+  exit 2
+}
 . "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching
-usejdk8
+usejdk8 || {
+  echo "Error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2
+  exit 2
+}
 JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{sub("^$", "0", $2); print ($1=="1")?$2:$1}')
585-585: Use the $Generator variable instead of hardcoding "EVOSUITE" in CSV output.

The script defines Generator=EvoSuite at line 58 for abstraction, but line 585 hardcodes "EVOSUITE". This breaks consistency and makes the script harder to maintain.
-row="EVOSUITE,$(basename "$SRC_JAR"),$LOGGED_TIME,0,$instruction_coverage,$branch_coverage,$mutation_score"
+row="$Generator,$(basename "$SRC_JAR"),$LOGGED_TIME,0,$instruction_coverage,$branch_coverage,$mutation_score"
scripts/defects4j-evosuite.sh (3)
71-77: Add defensive checks for usejdk.sh sourcing and usejdk11 call.

The script sources usejdk.sh without verifying its existence (line 71) and calls usejdk11 without checking the return code (line 72). Add validation:
+[ -f "$SCRIPT_DIR/usejdk.sh" ] || {
+  echo "Error: Missing $SCRIPT_DIR/usejdk.sh." >&2
+  exit 2
+}
 . "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching
-usejdk11
+usejdk11 || {
+  echo "Error: Failed to switch to Java 11. Set JAVA11_HOME to a JDK 11 installation." >&2
+  exit 2
+}
 JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{print ($1=="1")?$2:$1}')
86-86: Add validation check for uuidgen availability.

The script calls uuidgen without checking if it's installed. Add an explicit check:
+command -v uuidgen > /dev/null 2>&1 || {
+  echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2
+  exit 2
+}
 UUID=$(uuidgen) # A unique identifier per instance
157-159: Add stderr redirection to error message.

Line 158 should redirect the error message to stderr for consistency with other error handling in the script:
 [[ "$RESULTS_CSV" == *.csv ]] || {
-  echo "Error: -o must end with .csv"
+  echo "Error: -o must end with .csv" >&2
   exit 2
 }
scripts/defects4j-randoop.sh (2)
84-90: Add defensive checks for usejdk.sh sourcing and usejdk11 call.

The script sources usejdk.sh (line 84) without verifying existence and calls usejdk11 (line 85) without checking the return code. Add validation:
+[ -f "$SCRIPT_DIR/usejdk.sh" ] || {
+  echo "Error: Missing $SCRIPT_DIR/usejdk.sh." >&2
+  exit 2
+}
 . "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching
-usejdk11
+usejdk11 || {
+  echo "Error: Failed to switch to Java 11. Set JAVA11_HOME to a JDK 11 installation." >&2
+  exit 2
+}
 JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{print ($1=="1")?$2:$1}')
99-99: Add validation check for uuidgen availability.

The script calls uuidgen without checking if it's installed. Add an explicit validation check:
+command -v uuidgen > /dev/null 2>&1 || {
+  echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2
+  exit 2
+}
 UUID=$(uuidgen) # A unique identifier per instance
scripts/mutation-randoop.sh (3)
86-92: Add error check after usejdk8 function call.

The usejdk8 function can return 1 if JAVA8_HOME is not set, but the script doesn't check this return code. Add validation:
 . "$SCRIPT_DIR/usejdk.sh" # Source the usejdk.sh script to enable JDK switching
-usejdk8
+usejdk8 || {
+  echo "Error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2
+  exit 2
+}
 JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{sub("^$", "0", $2); print ($1=="1")?$2:$1}')
101-101: Add validation check for uuidgen availability.

The script calls uuidgen without verifying it's installed. Add an explicit check:
+command -v uuidgen > /dev/null 2>&1 || {
+  echo "Error: uuidgen not found. Install util-linux or equivalent package." >&2
+  exit 2
+}
 UUID=$(uuidgen) # Generate a unique identifier per instance
609-636: Use $SUBJECT_PROGRAM instead of $1 in ant command paths.

The ant buildfile paths reference "$1" to access the subject program name, but $SUBJECT_PROGRAM was already captured at line 210. Using $1 is fragile—if the script is refactored and $1 is shifted, these commands will silently reference the wrong program.

Replace all instances of "$1" with "$SUBJECT_PROGRAM" in ant command paths:
-  "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$1"/${buildfile} ...
+  "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$SUBJECT_PROGRAM"/${buildfile} ...
This affects lines: 609, 611, 614, 615, 621, 623, 626, 627, 633, 636, 672, 675.

Also applies to: 672-676

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75a4aa6 and a9ce322.

📒 Files selected for processing (7)

scripts/defects4j-evosuite.sh (1 hunks)
scripts/defects4j-randoop.sh (1 hunks)
scripts/experiment-scripts/README.md (1 hunks)
scripts/experiment-scripts/generate-grt-figures.py (5 hunks)
scripts/mutation-evosuite.sh (6 hunks)
scripts/mutation-randoop.sh (5 hunks)
scripts/prerequisites.md (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

📚 Learning: 2025-10-14T03:30:11.765Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

Applied to files:

scripts/defects4j-randoop.sh
scripts/experiment-scripts/generate-grt-figures.py
scripts/mutation-evosuite.sh
scripts/experiment-scripts/README.md
scripts/mutation-randoop.sh
scripts/defects4j-evosuite.sh

📚 Learning: 2025-10-13T23:36:38.701Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.

Applied to files:

scripts/defects4j-randoop.sh
scripts/defects4j-evosuite.sh

🧬 Code graph analysis (4)

scripts/defects4j-randoop.sh (1)

scripts/usejdk.sh (1)

usejdk11 (17-25)

scripts/mutation-evosuite.sh (1)

scripts/usejdk.sh (1)

usejdk8 (6-14)

scripts/mutation-randoop.sh (2)

scripts/mutation-evosuite.sh (1)

require_file (70-75)

scripts/usejdk.sh (1)

usejdk8 (6-14)

scripts/defects4j-evosuite.sh (1)

scripts/usejdk.sh (1)

usejdk11 (17-25)

🔇 Additional comments (17)

scripts/prerequisites.md (1)

30-30: LGTM: Standardized shell syntax highlighting.

The change from bash to sh for code fence identifiers is appropriate since these are simple POSIX-compatible commands that work in any shell.

Also applies to: 36-36

scripts/experiment-scripts/generate-grt-figures.py (4)

7-7: LGTM: Clear documentation of Table IV generation.

The documentation correctly describes the new defect-table4.sh script and Table IV generation workflow. The clarification that this evaluates "four Defects4J subject programs" is consistent with the past review discussion.

Also applies to: 16-17, 20-20

40-50: LGTM: Correct handling of raw data for defect detection.

The conditional logic appropriately bypasses averaging for table4, since defect detection results should preserve per-iteration test outcomes rather than being averaged. This is the correct approach for fault detection experiments.

263-315: LGTM: Well-structured fault detection aggregation.

The generate_table_4 function correctly implements fault detection logic:

Marks a fault as detected if ANY test case fails (line 278)

Aggregates at the bug level using .any() (lines 279-283)

Counts detected faults per configuration (lines 286-291)

Pivots for tabular display (lines 294-299)

The table rendering follows standard matplotlib patterns and produces a readable output.

349-352: LGTM: Consistent integration of table4 into PDF generation.

The table4 case is correctly integrated into the save_to_pdf function, following the same pattern as other figure types.

scripts/mutation-evosuite.sh (3)

65-77: LGTM: Robust prerequisite validation.

The addition of defensive checks for MAJOR_HOME, EVOSUITE_JAR, and JACOCO_CLI_JAR improves script reliability. The require_file helper function provides consistent validation with clear error messages.

148-165: LGTM: Thorough validation of -o argument.

The validation ensures that:

The CSV filename contains no path separators (preventing directory traversal)

The filename ends with .csv (enforcing correct extension)

Clear error messages guide users to correct usage

430-439: LGTM: Robust file locking for concurrent CSV writes.

The implementation uses flock with dedicated file descriptors to prevent interleaved writes when multiple script instances run concurrently. The header is correctly initialized only when the file is empty. This is a solid approach for preventing race conditions.

Also applies to: 586-592

scripts/experiment-scripts/README.md (1)

48-58: LGTM: Clear examples and helpful documentation notes.

The addition of the Table 4 (defect detection) example and the note about per-script documentation improve the README's usability. Users now have clear guidance for both mutation analysis and defect detection workflows.

scripts/defects4j-evosuite.sh (2)

58-69: LGTM: Comprehensive Defects4J prerequisite validation.

The validation checks ensure that:

defects4j is available on PATH

Required JAR files exist

Defects4J's run_bug_detection.pl script is executable

Error messages provide clear guidance, including suggesting make build/defects4j if components are missing.

210-217: LGTM: Thread-safe CSV writes with flock.

The CSV handling correctly uses flock with dedicated file descriptors to prevent race conditions during concurrent writes. The header is appropriately initialized only when the file is empty.

Also applies to: 311-316

scripts/defects4j-randoop.sh (2)

63-82: LGTM: Thorough validation of Randoop and Defects4J prerequisites.

The script validates all required components:

defects4j availability on PATH

Randoop JAR, Jacoco agents, and ReplaceCall JAR existence

Defects4J's run_bug_detection.pl script executability

Error messages are clear and guide users to resolve issues.

260-267: LGTM: Correct implementation of concurrent-safe CSV writes.

The CSV handling uses flock with file descriptors to ensure thread-safe writes when multiple instances run in parallel. The pattern is consistent with other scripts in this PR.

Also applies to: 370-375

scripts/mutation-randoop.sh (4)

70-85: LGTM: Comprehensive validation of Major and JAR prerequisites.

The validation checks for MAJOR_HOME directory and required JAR files (Randoop, Jacoco, ReplaceCall) with the require_file helper function. Error messages are clear and consistent.

158-165: LGTM: Robust argument validation.

The validation ensures:

CSV filename contains no path separators

Filename ends with .csv

Required SUBJECT_PROGRAM argument is provided

Error messages are clear and redirect to stderr appropriately.

Also applies to: 212-216

167-196: LGTM: Clean feature flags refactor.

The new feature selection mechanism is well-structured:

Uses an associative array for feature-to-flag mapping

Defaults to BASELINE when no features specified

Validates features and provides helpful error messages listing valid options

Expands features into command-line flags cleanly

This is a significant improvement over the previous implementation.

530-539: LGTM: Robust per-run isolation and thread-safe CSV writes.

The implementation provides:

Per-run directories (test, coverage, result) isolated by unique FILE_SUFFIX

Feature-based suffix construction for clear identification

Thread-safe CSV writes using flock with file descriptors

This prevents race conditions when multiple instances run concurrently.

Also applies to: 548-568, 694-700

scripts/experiment-scripts/README.md

scripts/defects4j-evosuite.sh

coderabbitai

Actionable comments posted: 8

♻️ Duplicate comments (6)

scripts/defects4j-randoop.sh (1)
91-91: Add validation check for uuidgen availability and document as prerequisite.

The concern is valid. Line 91 calls uuidgen without validation, and uuidgen is not universally available—it requires util-linux/uuid-runtime on Linux systems (may need installation) and is built-in on macOS. A validation pattern already exists in the same file at line 64 for defects4j, establishing a clear precedent.

Additionally, uuidgen is used in four scripts without validation:

defects4j-evosuite.sh

defects4j-randoop.sh (line 91)

mutation-evosuite.sh

mutation-randoop.sh

Add the suggested validation check to all four scripts and update scripts/prerequisites.md to list uuidgen (or uuid-runtime on Linux) as a required tool.
+ command -v uuidgen > /dev/null 2>&1 || {
+   echo "${SCRIPT_NAME}: error: uuidgen not found on PATH." >&2
+   exit 2
+ }
  UUID=$(uuidgen) # A unique identifier per instance
scripts/defects4j-evosuite.sh (2)
72-78: Add defensive checks for usejdk.sh sourcing and usejdk11 call.

These defensive checks were flagged in the previous review. Ensure scripts/defs.sh exists before sourcing and verify usejdk11 succeeds before proceeding.

Apply this diff:
+ [ -f "$SCRIPT_DIR/defs.sh" ] || {
+   echo "${SCRIPT_NAME}: error: Missing $SCRIPT_DIR/defs.sh." >&2
+   exit 2
+ }
  . "$SCRIPT_DIR/defs.sh" # Define shell functions.
- usejdk11
+ usejdk11 || {
+   echo "${SCRIPT_NAME}: error: Failed to switch to Java 11. Set JAVA11_HOME to a JDK 11 installation." >&2
+   exit 2
+ }
  JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{print ($1=="1")?$2:$1}')
87-87: Add explicit uuidgen availability check.

This check was flagged in the previous review. Add an explicit validation before UUID generation.

Apply this diff:
+ command -v uuidgen > /dev/null 2>&1 || {
+   echo "${SCRIPT_NAME}: error: uuidgen not found. Install util-linux or equivalent package." >&2
+   exit 2
+ }
  UUID=$(uuidgen) # A unique identifier per instance
scripts/experiment-scripts/defects4j-table4.sh (1)
58-63: Remove hardcoded temporary test parameters before merge (blocker).

This issue was flagged in the previous review as a blocker. Lines 58–63 override production parameters with incomplete temporary values for testing. Shipping with these hardcoded parameters will silently run incorrect experiments.

Choose one of the following solutions:

Option 1 (preferred): Remove lines 58–65 entirely and use production defaults.
- # Temporary parameters for testing that override the defaults (GRT has not been finished yet)
- NUM_LOOP=1
- TOTAL_SECONDS=(10)
- PROJECT_IDS=("Lang")
- TEST_GENERATORS=(BASELINE EVOSUITE)
- BUG_IDS["Lang"]="1 3"
Option 2: Make configurable via environment variables with fallback to production defaults.
NUM_LOOP="${GRT_NUM_LOOP:-10}"
TOTAL_SECONDS=(${GRT_TOTAL_SECONDS:-120 300 600})
PROJECT_IDS=(${GRT_PROJECT_IDS:-Chart Math Time Lang})
TEST_GENERATORS=(${GRT_TEST_GENERATORS:-BASELINE GRT EVOSUITE})
scripts/mutation-randoop.sh (2)
74-79: Add error check after usejdk8 function call.

The usejdk8 function can fail and return 1 if JAVA8_HOME is not set, but the script doesn't check this return code. If usejdk8 fails, the subsequent Java version detection may use an incorrect installation or fail entirely. This was flagged in the previous review.

Apply this diff:
  . "$SCRIPT_DIR/defs.sh" # Define shell functions.
- usejdk8
+ usejdk8 || {
+   echo "${SCRIPT_NAME}: error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2
+   exit 2
+ }
  JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{sub("^$", "0", $2); print ($1=="1")?$2:$1}')
596-596: Use $SUBJECT_PROGRAM instead of $1 in ant command paths.

This issue was flagged in the previous review. The ant buildfile paths reference "$1" instead of the already-captured $SUBJECT_PROGRAM variable (line 196). Using $1 is fragile: if the script is refactored later and $1 is shifted again, these commands will silently reference the wrong program name.

Replace all instances of "$1" with "$SUBJECT_PROGRAM" in ant command paths and their associated verbose echo lines.

Apply this diff to all affected lines:
- echo "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$1"/${buildfile} ...
+ echo "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$SUBJECT_PROGRAM"/${buildfile} ...

- "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$1"/${buildfile} ...
+ "$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$SUBJECT_PROGRAM"/${buildfile} ...
Affected lines: 596, 598, 610, 614, 620, 623, 659, 662.

Also applies to: 598-598, 610-610, 614-614, 620-620, 623-623, 659-659, 662-662

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9ce322 and cea4449.

📒 Files selected for processing (13)

scripts/clean-script-dir.sh (1 hunks)
scripts/defects4j-evosuite.sh (1 hunks)
scripts/defects4j-randoop.sh (1 hunks)
scripts/defs.sh (1 hunks)
scripts/experiment-scripts/README.md (1 hunks)
scripts/experiment-scripts/defects4j-table4.sh (1 hunks)
scripts/experiment-scripts/mutation-fig6-table3.sh (3 hunks)
scripts/experiment-scripts/mutation-fig7.sh (2 hunks)
scripts/experiment-scripts/mutation-fig8-9.sh (2 hunks)
scripts/get-all-subject-src.sh (1 hunks)
scripts/mutation-evosuite.sh (6 hunks)
scripts/mutation-randoop.sh (5 hunks)
scripts/prerequisites.md (2 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

📚 Learning: 2025-10-14T03:30:11.765Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

Applied to files:

scripts/mutation-evosuite.sh
scripts/defects4j-randoop.sh
scripts/prerequisites.md
scripts/defects4j-evosuite.sh
scripts/mutation-randoop.sh
scripts/experiment-scripts/defects4j-table4.sh
scripts/experiment-scripts/README.md

📚 Learning: 2025-10-13T23:36:38.701Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.

Applied to files:

scripts/defects4j-randoop.sh
scripts/defects4j-evosuite.sh

🧬 Code graph analysis (5)

scripts/mutation-evosuite.sh (1)

scripts/defs.sh (4)

require_directory (12-17)

require_file (5-10)

usejdk8 (31-39)

require_csv_basename (19-28)

scripts/defects4j-randoop.sh (1)

scripts/defs.sh (3)

require_file (5-10)

usejdk11 (42-50)

require_csv_basename (19-28)

scripts/defects4j-evosuite.sh (1)

scripts/defs.sh (2)

usejdk11 (42-50)

require_csv_basename (19-28)

scripts/mutation-randoop.sh (1)

scripts/defs.sh (4)

require_directory (12-17)

require_file (5-10)

usejdk8 (31-39)

require_csv_basename (19-28)

scripts/experiment-scripts/defects4j-table4.sh (3)

scripts/experiment-scripts/mutation-fig7.sh (1)

run_task (85-98)

scripts/experiment-scripts/mutation-fig8-9.sh (1)

run_task (90-103)

scripts/experiment-scripts/mutation-fig6-table3.sh (1)

run_task (82-100)

🔇 Additional comments (17)

scripts/prerequisites.md (3)
30-30: Code fence language improved for portability.

Changing code fence language from bash to sh is a good improvement. The sh selector is more portable and POSIX-compatible, and the commands shown (export statements, curl, pip) work fine with POSIX shells. This aligns with shell script best practices.

Also applies to: 36-36

76-76: Bullet-point indentation formatting is correct.

The continuation lines now properly include 2-space indentation for list nesting. This ensures markdown renderers correctly display these lines as part of their preceding bullet points rather than as separate paragraphs.

Also applies to: 78-78

63-78: Add guidance on using the usejdk.sh helper script.

The Java Versions Setup section documents environment variables but omits how to actually invoke the JDK switcher. Users who set JAVA8_HOME and JAVA11_HOME have no clear instruction on how to use these variables in practice. Expand the section to show how to source the usejdk.sh helper and call the switcher functions before running scripts:
 This is needed because different scripts require different Java versions:
 
 * **Defect scripts** (via [Defects4J](https://github.com/rjust/defects4j))
   require **Java 11**.
 * **Mutation scripts** (via [Major](https://github.com/rjust/major))
   require **Java 8**.
+
+Before running any scripts, source the JDK switcher and select the appropriate version:
+
+```sh
+source scripts/usejdk.sh
+usejdk11  # for defect detection scripts
+usejdk8   # for mutation analysis scripts
+```
scripts/experiment-scripts/README.md (4)

9-11: Overview accurately reflects expanded evaluation scope.

The updated text now correctly conveys that the experiment scripts support both mutation analysis and defect detection evaluation methods, aligning with the PR's integration of Defects4J workflows.

13-15: Setup section properly consolidates prerequisites reference.

The section now correctly references scripts/prerequisites.md as the single source of truth for environment setup, eliminating duplication. This addresses prior feedback about consolidating prerequisites documentation and reduces maintenance burden.

17-30: Running Scripts section effectively documents both evaluation methods.

The section provides clear examples for both mutation analysis (./mutation-fig7.sh) and defect detection (./defects4j-table4.sh). The examples assume invocation from the scripts/experiment-scripts/ directory and follow consistent documentation practices.

38-49: Output section comprehensively describes results structure.

The section properly documents CSV/PDF output locations, result naming conventions, and the important caveat about experiment isolation and preservation. It correctly references both mutation analysis and defect detection workflows.

scripts/clean-script-dir.sh (1)

9-10: ✓ Consistent naming convention applied.

The variable rename from script_dir to SCRIPT_DIR aligns the script with shell conventions and the broader pattern introduced across the PR.

scripts/get-all-subject-src.sh (1)

8-8: ✓ Improved error message identifiability.

The addition of SCRIPT_NAME variable and its use in error messages provides better identification when multiple scripts run concurrently or in logs.

Also applies to: 12-12

scripts/experiment-scripts/mutation-fig6-table3.sh (1)

36-36: ✓ Consistent logging identifier pattern.

The addition of SCRIPT_NAME for logging is consistent with the broader refactoring across experiment scripts. Error handling maintains proper exit codes.

Also applies to: 62-62, 97-98

scripts/experiment-scripts/mutation-fig8-9.sh (1)

40-40: ✓ Consistent with related scripts.

The logging identifier pattern matches mutation-fig6-table3.sh and mutation-fig7.sh. Minor documentation improvement noted.

Also applies to: 45-45, 70-70

scripts/experiment-scripts/mutation-fig7.sh (1)

36-36: ✓ Consistent logging pattern maintained.

The addition of SCRIPT_NAME aligns with the refactoring pattern across mutation experiment scripts.

Also applies to: 40-40, 65-65

scripts/mutation-evosuite.sh (1)

415-424: ✓ CSV locking mechanism correctly implemented for concurrent safety.

The flock-based approach for header initialization (lines 415–424) and row appending (lines 572–577) correctly prevents interleaving writes across concurrent processes. The exclusive lock is acquired before reading/writing and released immediately after via file descriptor management.

Also applies to: 572-577

scripts/defs.sh (1)

31-50: Note: usejdk functions use return, not exit.

The usejdk8 and usejdk11 functions return 1 on error rather than exiting. This is appropriate for functions (allowing callers to handle the error), but all callers must check the return code with || { ... } patterns. Verify that calling scripts properly handle errors from these functions.

scripts/defects4j-randoop.sh (2)

318-318: ✓ Correct removal of driver test files per Defects4J workflow.

Line 318 correctly removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation. Per the learned behavior, run_bug_detection.pl does not expect these files; only the numbered test files (RegressionTest0.java, etc.) remain and are packaged for defect detection. Based on learnings

Also applies to: 318-318

246-253: ✓ CSV locking mechanism correctly implemented for concurrent safety.

The flock-based approach for header initialization (lines 246–253) and row appending (lines 356–361) prevents concurrent write interleaving. The pattern mirrors the robust approach in mutation-evosuite.sh.

Also applies to: 356-361

scripts/mutation-randoop.sh (1)

75-79: Verify Java version check logic.

Line 75 extracts the Java major version; ensure the regex and fallback (sub("^$", "0", $2)) correctly handle all JDK 8 version formats (e.g., 1.8.0_392). The logic appears sound, but confirm the expected behavior matches your testing.

scripts/defects4j-evosuite.sh

scripts/defects4j-randoop.sh

scripts/defs.sh

scripts/mutation-evosuite.sh

scripts/mutation-randoop.sh

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scripts/mutation-evosuite.sh (1)
86-86: Add validation check for uuidgen availability.

The script uses uuidgen without checking if it's installed. Add a validation check consistent with other dependency checks.

Apply this diff:
+command -v uuidgen > /dev/null 2>&1 || {
+  echo "${SCRIPT_NAME}: error: uuidgen not found. Install util-linux or equivalent package." >&2
+  exit 2
+}
 UUID=$(uuidgen) # Generate a unique identifier per instance

♻️ Duplicate comments (10)

scripts/mutation-randoop.sh (3)
93-93: Add validation check for uuidgen availability.

The script uses uuidgen without checking if it's installed. Add a validation check consistent with other dependency checks.

Apply this diff:
+command -v uuidgen > /dev/null 2>&1 || {
+  echo "${SCRIPT_NAME}: error: uuidgen not found. Install util-linux or equivalent package." >&2
+  exit 2
+}
 UUID=$(uuidgen) # Generate a unique identifier per instance
74-79: Add error handling for usejdk8 call.

The usejdk8 function returns 1 if JAVA8_HOME is unset, but line 74 doesn't check the return code. Add explicit error checking to fail early.

Apply this diff:
-usejdk8
+usejdk8 || {
+  echo "${SCRIPT_NAME}: error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2
+  exit 2
+}
 JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{sub("^$", "0", $2); print ($1=="1")?$2:$1}')
586-658: Use $SUBJECT_PROGRAM instead of $1 in ant command paths.

Multiple ant commands reference "$1" to access the subject program name, but $SUBJECT_PROGRAM was captured at line 196 and should be used consistently throughout the script. Using $1 is fragile—if the script is refactored and positional parameters change, these references will silently break.

Replace all instances of "$1" in ant command paths with "$SUBJECT_PROGRAM". This affects lines: 592, 594, 597-598, 604, 606, 609-610, 616, 619, 623, 655, 658.

Example for line 597:
-"$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$1"/${buildfile} ...
+"$MAJOR_HOME"/bin/ant -f "$SCRIPT_DIR"/program-config/"$SUBJECT_PROGRAM"/${buildfile} ...
scripts/defects4j-evosuite.sh (2)
74-79: Add error handling for usejdk11 call.

The usejdk11 function returns 1 if JAVA11_HOME is unset, but the script doesn't check the return code. Add explicit error checking.

Apply this diff:
-usejdk11
+usejdk11 || {
+  echo "${SCRIPT_NAME}: error: Failed to switch to Java 11. Set JAVA11_HOME to a JDK 11 installation." >&2
+  exit 2
+}
 JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{print ($1=="1")?$2:$1}')
88-88: Add validation check for uuidgen availability.

The script uses uuidgen without checking if it's installed. Add a validation check consistent with other dependency checks.

Apply this diff:
+command -v uuidgen > /dev/null 2>&1 || {
+  echo "${SCRIPT_NAME}: error: uuidgen not found. Install util-linux or equivalent package." >&2
+  exit 2
+}
 UUID=$(uuidgen) # A unique identifier per instance
scripts/defects4j-randoop.sh (3)
92-92: Add validation check for uuidgen availability.

The script uses uuidgen without checking if it's installed. Add an explicit validation check consistent with other dependency checks in the script.

Apply this diff:
+command -v uuidgen > /dev/null 2>&1 || {
+  echo "${SCRIPT_NAME}: error: uuidgen not found. Install util-linux or equivalent package." >&2
+  exit 2
+}
 UUID=$(uuidgen) # A unique identifier per instance
78-83: Add error handling for usejdk11 call.

The usejdk11 function returns 1 if JAVA11_HOME is unset, but the script doesn't check the return code. If the function fails, the subsequent Java version check may use the wrong Java installation or fail unexpectedly.

Apply this diff:
-usejdk11
+usejdk11 || {
+  echo "${SCRIPT_NAME}: error: Failed to switch to Java 11. Set JAVA11_HOME to a JDK 11 installation." >&2
+  exit 2
+}
 JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{print ($1=="1")?$2:$1}')
64-64: Add defensive check before sourcing defs.sh.

The script sources defs.sh without verifying it exists. Add a pre-check to fail early with a clear error if the file is missing.

Apply this diff:
+[ -f "$SCRIPT_DIR/defs.sh" ] || {
+  echo "$(basename "$0"): error: Missing $SCRIPT_DIR/defs.sh" >&2
+  exit 2
+}
 . "$SCRIPT_DIR/defs.sh" # Define shell functions.
scripts/mutation-evosuite.sh (2)
67-72: Add error handling for usejdk8 call.

The usejdk8 function returns 1 if JAVA8_HOME is unset, but the script doesn't check the return code. Add explicit error checking.

Apply this diff:
-usejdk8
+usejdk8 || {
+  echo "${SCRIPT_NAME}: error: Failed to switch to Java 8. Set JAVA8_HOME to a JDK 8 installation." >&2
+  exit 2
+}
 JAVA_VER=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{sub("^$", "0", $2); print ($1=="1")?$2:$1}')
61-61: Add defensive check before sourcing defs.sh.

The script sources defs.sh without verifying it exists. Add a pre-check for consistency with other validations in the script.

Apply this diff:
+[ -f "$SCRIPT_DIR/defs.sh" ] || {
+  echo "$(basename "$0"): error: Missing $SCRIPT_DIR/defs.sh" >&2
+  exit 2
+}
 . "$SCRIPT_DIR/defs.sh" # Define shell functions.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cea4449 and 702a829.

📒 Files selected for processing (6)

scripts/defects4j-evosuite.sh (1 hunks)
scripts/defects4j-randoop.sh (1 hunks)
scripts/defs.sh (1 hunks)
scripts/experiment-scripts/generate-grt-figures.py (5 hunks)
scripts/mutation-evosuite.sh (6 hunks)
scripts/mutation-randoop.sh (6 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

📚 Learning: 2025-10-14T03:30:11.765Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defects4j-randoop.sh:332-335
Timestamp: 2025-10-14T03:30:11.765Z
Learning: In defects4j-randoop.sh, the script intentionally removes RegressionTest.java and ErrorTest.java (the driver/suite files) after test generation because run_bug_detection.pl does not expect these files. The numbered test files (RegressionTest0.java, RegressionTest1.java, ErrorTest0.java, etc.) remain in TEST_DIR and are packaged for defect detection.

Applied to files:

scripts/mutation-evosuite.sh
scripts/defects4j-evosuite.sh
scripts/defects4j-randoop.sh
scripts/mutation-randoop.sh
scripts/experiment-scripts/generate-grt-figures.py

📚 Learning: 2025-10-13T23:36:38.701Z

Learnt from: varuniy
Repo: randoop/grt-testing PR: 64
File: scripts/defect-randoop.sh:236-243
Timestamp: 2025-10-13T23:36:38.701Z
Learning: In Defects4J, the command `defects4j export -p cp.compile` automatically compiles the project before returning the classpath, so an explicit `defects4j compile` step is not needed when using `export -p cp.compile`.

Applied to files:

scripts/defects4j-evosuite.sh
scripts/defects4j-randoop.sh

🧬 Code graph analysis (4)

scripts/mutation-evosuite.sh (1)

scripts/defs.sh (5)

require_directory (33-38)

require_file (26-31)

usejdk8 (52-60)

require_csv_basename (40-49)

append_csv (5-24)

scripts/defects4j-evosuite.sh (1)

scripts/defs.sh (3)

usejdk11 (63-71)

require_csv_basename (40-49)

append_csv (5-24)

scripts/defects4j-randoop.sh (1)

scripts/defs.sh (4)

require_file (26-31)

usejdk11 (63-71)

require_csv_basename (40-49)

append_csv (5-24)

scripts/mutation-randoop.sh (1)

scripts/defs.sh (5)

require_directory (33-38)

require_file (26-31)

usejdk8 (52-60)

require_csv_basename (40-49)

append_csv (5-24)

🔇 Additional comments (1)

scripts/defs.sh (1)

40-49: LGTM! CSV basename validation is now correct.

The logic inversion issue flagged in previous reviews has been fixed. The function now correctly validates that the filename ends with .csv and contains no path separators.

scripts/defects4j-evosuite.sh

scripts/experiment-scripts/generate-grt-figures.py

scripts/mutation-evosuite.sh

scripts/mutation-randoop.sh

varuniy · 2025-11-14T22:38:37Z

@mernst Ready for re-review.

mernst

Thanks!

varuniy · 2025-11-14T23:02:12Z

Can I merge this into main?

Varun Iyengar and others added 27 commits September 3, 2025 16:01

Finished initial implementation of Defects4J scripts

6cfa079

Merge branch 'randoop:main' into defects4j-scripts

aee12f7

Pass linter feedback

d0936af

Merge branch 'defects4j-scripts' of github.com:varuniy/grt-testing in…

bfae310

…to defects4j-scripts

Fixed more linter suggestions

9c8b186

Use NUM LOOP

ece6c82

Finish addressing linter issues

b4a0fce

Code review edits

09dbc8b

Merge branch 'randoop:main' into defects4j-scripts

43aa763

Merge branch 'randoop:main' into defects4j-scripts

9646948

Addressed Mike's feedback and CodeRabbit comments

03c960f

Fix linter suggestions

f829d04

Attempt to eliminate linter suggestions

612fd62

Fix last of linter suggestions hopefully

68b4649

Address more coderabbit suggestions

cbf55ea

Made testing command arrays instead of strings

815f56c

Fix linter suggestions

23e4354

Fix minor linter sugggestion

d1ef86d

Fixed feature logic in mutation randoop

5b3d32d

Fix indentation

050c70e

Finished Defects4j scripts

aaf98c3

Use consistent spacing

9d009d1

Merge ../grt-testing-branch-main into defects4j-scripts

4bd9ab6

Markdown style

3047d8f

Merge ../grt-testing-branch-main into defects4j-scripts

4daa174

Merge ../grt-testing-branch-main into defects4j-scripts

1b38f4d

Reduce line length

eaf81e8

varuniy assigned mernst Nov 3, 2025

coderabbitai bot reviewed Nov 3, 2025

View reviewed changes

scripts/defects4j-prerequisites.md Outdated Show resolved Hide resolved

scripts/Makefile Show resolved Hide resolved

scripts/usejdk.sh Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 12, 2025

View reviewed changes

scripts/defects4j-evosuite.sh Outdated Show resolved Hide resolved

scripts/defects4j-randoop.sh Outdated Show resolved Hide resolved

scripts/mutation-randoop.sh Show resolved Hide resolved

mernst added 2 commits November 11, 2025 16:26

Merge ../grt-testing-branch-main into defects4j-scripts

36c76bc

Merge ../grt-testing-branch-main into defects4j-scripts

a5cb935

mernst requested changes Nov 12, 2025

View reviewed changes

scripts/experiment-scripts/README.md Outdated Show resolved Hide resolved

mernst added 4 commits November 12, 2025 16:24

Merge ../grt-testing-branch-main into defects4j-scripts

6678d07

Documentation and code fence improvements

2908c35

Improve command-line options documentation

40c4085

Merge ../grt-testing-branch-main into defects4j-scripts

a9ce322

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

scripts/experiment-scripts/README.md Outdated Show resolved Hide resolved

mernst added 2 commits November 12, 2025 18:06

Set and use SCRIPT_NAME variable

d2fd0c6

Abstract shell functions

640715b

mernst requested changes Nov 13, 2025

View reviewed changes

scripts/defects4j-evosuite.sh Outdated Show resolved Hide resolved

mernst assigned varuniy and unassigned mernst Nov 13, 2025

mernst added 2 commits November 12, 2025 20:58

Reduce duplication

cea4449

Merge ../grt-testing-branch-main into defects4j-scripts

10aefad

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

Addressed Mike and Code Rabbit feedback

702a829

coderabbitai bot reviewed Nov 14, 2025

View reviewed changes

Varun Iyengar added 2 commits November 14, 2025 14:31

Fix linter

7e160b2

Fix last linter suggestion, also fix some bugs

813f965

varuniy assigned mernst and unassigned varuniy Nov 14, 2025

Reduce diffs

5c4ee83

mernst approved these changes Nov 14, 2025

View reviewed changes

varuniy merged commit fd3a4d4 into randoop:main Nov 15, 2025
2 checks passed

mernst deleted the defects4j-scripts branch November 15, 2025 00:40

coderabbitai bot mentioned this pull request Nov 26, 2025

Tweak description of experimental script prerequisites #91

Open

Defects4j scripts #70

Defects4j scripts #70

Uh oh!

Conversation

varuniy commented Nov 3, 2025

Uh oh!

coderabbitai bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

varuniy commented Nov 14, 2025

Uh oh!

mernst left a comment

Choose a reason for hiding this comment

Uh oh!

varuniy commented Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Nov 3, 2025 •

edited

Loading