Skip to content

Conversation

@robfrank
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link
Contributor

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review

Summary

This PR makes two changes to the GitHub Actions workflow:

  1. Changes the unit test command from mvn verify to mvn test
  2. Reorders test reporter parameters (list-tests before list-suites)

Analysis

Change 1: verifytest (line 96)

✅ This change is technically correct for a job named unit-tests:

  • mvn verify runs unit tests + integration tests + coverage
  • mvn test runs ONLY unit tests + coverage
  • Since integration tests run in a separate integration-tests job, using test here avoids redundant execution

⚠️ Issue: The comment on line 95 is now outdated and should be updated:

- name: Run Unit Tests with Coverage
  # test (not verify) to run only unit tests, excluding integration tests which run in separate job
  run: ./mvnw test -Pcoverage --batch-mode --errors --fail-never --show-version -pl !e2e,!e2e-perf -DexcludedGroups=slow -Dtest=!OpenCypherTCKTest

Change 2: Parameter reordering

The reordering of list-tests and list-suites is cosmetic and doesn't affect functionality.

Recommendations

  1. Update the outdated comment (line 95) to reflect the new command
  2. Add a PR description explaining what issue this fixes - it would help reviewers understand the motivation

Questions

  • Does this fix a specific CI failure or coverage reporting issue?
  • Have you verified that coverage reports are still generated correctly with test instead of verify?

Verdict

The code change is correct, but the comment needs updating. Otherwise, this looks like a good cleanup that makes the unit test job more focused on its actual purpose.


Review generated per CLAUDE.md guidelines

@claude
Copy link

claude bot commented Feb 10, 2026

PR Review: Fix tentative unit test execution

Summary

This PR addresses CI test execution issues by properly segregating benchmark tests from regular unit tests, preventing them from running unnecessarily in the main test suite. The changes introduce a new dedicated job for benchmark tests and apply proper JUnit 5 tagging.


✅ Positive Aspects

  1. Proper Test Segregation: The addition of @Tag("benchmark") to all benchmark test classes follows JUnit 5 best practices and aligns with the existing pattern for slow tests.

  2. Improved CI Efficiency: Separating benchmark tests into their own job (benchmark-unit-tests) prevents long-running benchmarks from slowing down regular unit test execution.

  3. Consistent Architecture: The new benchmark-unit-tests job mirrors the structure of slow-unit-tests, maintaining consistency in the workflow design.

  4. Coverage Completeness: Benchmark test coverage is properly collected and integrated into the overall coverage reporting pipeline.


🔍 Code Quality Observations

GitHub Workflow Changes (.github/workflows/mvn-test.yml)

Line 96: Changed from verify to test goal

-run: ./mvnw verify -Pcoverage ... -DexcludedGroups=slow -Dtest=!OpenCypherTCKTest
+run: ./mvnw test -Pcoverage ... -DexcludedGroups=slow,benchmark -Dtest=!OpenCypherTCKTest

Concern: Changing from verify to test phase may skip important checks. The verify phase runs additional validation beyond just tests. Was this intentional?

Recommendation: Verify that no critical validations are being skipped. If the change is purely to avoid running integration tests (which should already be skipped via other mechanisms), consider documenting this reasoning in the commit message or PR description.

Test Tagging

All 6 benchmark test classes now have @Tag("benchmark"):

  • ✅ Consistent application across all benchmarks
  • ✅ Follows existing pattern (@Tag("slow") precedent)
  • ✅ Enables flexible test execution via Maven/Gradle

⚠️ Potential Issues

  1. Job Scope Limitation (Line 182)

    run: ./mvnw verify -Pcoverage ... -pl engine -Dgroups=benchmark

    The benchmark-unit-tests job only runs on the engine module (-pl engine), but benchmark tests exist in both:

    • engine/src/test/java/performance/ (3 files)
    • gremlin/src/test/java/performance/ (2 files)

    Issue: Benchmarks in the gremlin module won't be executed or covered.

    Recommendation: Either:

    • Change to -pl engine,gremlin to include both modules
    • Or document why gremlin benchmarks are excluded
    • Or remove the -pl restriction entirely to catch any future benchmark tests in other modules
  2. Minor: Order Swap in test-reporter (Lines 106-107, 149-150)

    -list-suites: "failed"
     list-tests: "failed"
    +list-tests: "failed"
    +list-suites: "failed"

    This appears to be a cosmetic change with no functional impact. While it improves consistency, it's worth noting that this swap occurs in multiple places but not in the new benchmark-unit-tests job (lines 192-193), where the original order is maintained.

  3. Whitespace Change (Line 544)

    -files:  {{ coverage-files-output }}
    +files: {{ coverage-files-output }}

    Trivial formatting fix (extra space removal). Good catch.


🎯 Recommendations

  1. Critical: Update line 182 to include the gremlin module:

    run: ./mvnw verify -Pcoverage --batch-mode --errors --fail-never --show-version -pl engine,gremlin -Dgroups=benchmark
  2. Verify: Confirm that changing from verify to test on line 96 doesn't skip important validation steps.

  3. Consider: Adding a comment in the workflow explaining why benchmarks are separated:

    # Benchmark tests are isolated due to their long execution time
    # and to avoid impacting regular unit test performance metrics
    benchmark-unit-tests:
  4. Nice-to-have: Apply the list-tests/list-suites ordering consistently across all test reporter uses (including the new benchmark job).


📊 Test Coverage

  • ✅ All identified benchmark classes are properly tagged
  • ✅ Coverage collection configured for benchmark tests
  • ✅ Integration with codecov/codacy maintained
  • ⚠️ Missing gremlin module benchmarks in execution

🔒 Security & Performance

  • ✅ No security concerns identified
  • ✅ Performance improvement expected (faster CI runs)
  • ✅ No new dependencies introduced
  • ✅ Follows principle of test isolation

✨ Overall Assessment

This is a solid improvement to the CI pipeline that properly addresses test segregation. The main issue is the incomplete scope of the benchmark job (missing gremlin module). Once that's addressed, this PR will significantly improve CI efficiency.

Recommendation: Approve with minor changes requested (add gremlin module to benchmark job scope).


Review conducted using CLAUDE.md guidelines and ArcadeDB project conventions.

@codacy-production
Copy link

codacy-production bot commented Feb 10, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-10.81%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (5d1b7fb) 82996 57777 69.61%
Head commit (45ef40e) 112579 (+29583) 66201 (+8424) 58.80% (-10.81%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3398) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@claude
Copy link

claude bot commented Feb 10, 2026

Comprehensive Pull Request Review

Overview

This PR addresses CI test execution by properly segregating benchmark tests from regular unit tests through JUnit 5 tagging and a dedicated CI job. The changes improve CI efficiency by preventing long-running benchmarks from slowing down standard test execution.


✅ Strengths

  1. Proper Test Classification: Adding @Tag("benchmark") to all benchmark test classes follows JUnit 5 best practices and maintains consistency with the existing @Tag("slow") pattern.

  2. Clean Separation of Concerns: The new benchmark-unit-tests job mirrors the structure of slow-unit-tests, maintaining architectural consistency.

  3. Coverage Integration: Benchmark test coverage is properly collected and integrated into the codecov/codacy pipeline.

  4. Minor Cleanup: Whitespace fix on line 544 (files: double-space → single-space).


⚠️ Issues Identified

1. CRITICAL: Incomplete Module Coverage in Benchmark Job

Location: .github/workflows/mvn-test.yml:182

run: ./mvnw verify -Pcoverage ... -pl engine -Dgroups=benchmark

Problem: The benchmark job only runs on the engine module, but benchmarks exist in:

  • engine/src/test/java/performance/ (3 tests)
  • gremlin/src/test/java/performance/ (2 tests) — NOT EXECUTED

Impact: Two benchmark tests (CypherEngineComparisonBenchmark, QueryLanguageWithGremlinAndCypherBenchmark) are tagged but never run, and their coverage is not collected.

Fix:

run: ./mvnw verify -Pcoverage --batch-mode --errors --fail-never --show-version -pl engine,gremlin -Dgroups=benchmark

Or remove -pl entirely to catch future benchmarks in any module.


2. POTENTIAL ISSUE: OpenCypherTCKTest Tagging Change

Location: engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java:36

Change: @Tag("slow")@Tag("benchmark")

Analysis:

  • OpenCypherTCKTest is a comprehensive TCK (Technology Compatibility Kit) test suite, not a benchmark
  • TCK tests verify compliance with the OpenCypher specification
  • The previous @Tag("slow") classification seems more semantically correct

Questions:

  1. Is this test actually a benchmark measuring performance, or is it a functional compliance test that happens to be slow?
  2. If it's truly a compliance test, should it remain tagged as slow and run in the slow-unit-tests job instead?

The distinction matters because:

  • Benchmarks measure performance characteristics
  • Slow tests verify correctness but take a long time

Recommendation: Clarify the intent. If this is a compliance test (TCK), consider reverting to @Tag("slow").


3. QUESTIONABLE: SelectClasspathResource → SelectPackages

Location: OpenCypherTCKTest.java:33

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")

Problem: These are different annotations with different semantics:

  • @SelectClasspathResource: Selects resources from the classpath (e.g., .feature files in src/test/resources)
  • @SelectPackages: Selects test classes from Java packages

Verification needed:

# The features are in resources, not packages:
engine/src/test/resources/opencypher/tck/features/

Impact: This change may break the test discovery mechanism. Cucumber tests typically use @SelectClasspathResource to find .feature files.

Recommendation:

  1. Verify this test still works correctly after the change
  2. If it doesn't work, revert to @SelectClasspathResource
  3. Document why the change was necessary if it does work

4. MINOR: Maven Goal Change Lacks Justification

Location: .github/workflows/mvn-test.yml:96

-run: ./mvnw verify -Pcoverage ...
+run: ./mvnw test -Pcoverage ...

Analysis:

  • verify phase: runs tests + integration tests + additional verification plugins
  • test phase: runs only unit tests

Questions:

  1. Does this skip any important Maven plugins that run in the verify phase?
  2. Is this purely to avoid running ITs (which should already be skipped)?

Recommendation: The change is likely fine (ITs run in a separate job), but consider:

  1. Updating the now-outdated comment on line 95
  2. Documenting the rationale in the PR description

5. INCONSISTENCY: Test Reporter Parameter Ordering

Locations: Multiple test reporter blocks

The PR reorders list-tests/list-suites in some places but not the new benchmark-unit-tests job (lines 192-193). This is cosmetic but creates inconsistency.

Recommendation: Either apply the ordering consistently everywhere or revert the changes (they have no functional impact).


🔍 Detailed Code Review

GitHub Workflow Changes

Line Change Assessment
96 verifytest, add benchmark to exclusions ✅ Correct (with comment update needed)
106-107, 149-150 Reorder test reporter params ⚠️ Cosmetic, creates inconsistency
159-201 New benchmark-unit-tests job ✅ Good structure, ❌ missing gremlin module
511-518 Download benchmark coverage ✅ Correct
544 Whitespace cleanup ✅ Good

Test Tagging Changes

File Change Assessment
LSMVectorIndexStorageBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
DuanSSSPBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
ParserBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
QueryLanguageBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
CypherEngineComparisonBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
QueryLanguageWithGremlinAndCypherBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
OpenCypherTCKTest slowbenchmark ⚠️ Questionable (TCK ≠ benchmark)

🎯 Required Actions

Critical

  1. Add gremlin module to benchmark job scope (line 182)

High Priority

  1. Verify OpenCypherTCKTest change — confirm it's truly a benchmark or revert to @Tag("slow")
  2. Test SelectPackages change — verify Cucumber still finds feature files

Medium Priority

  1. Update comment on line 95 to reflect test vs verify
  2. Add PR description explaining what problem this solves

Nice-to-Have

  1. Apply parameter ordering consistently across all test reporter blocks

🔒 Security & Performance

  • ✅ No security concerns
  • ✅ Improves CI performance (faster test execution)
  • ✅ No new dependencies
  • ✅ Follows Apache 2.0 licensing

📊 Test Coverage

Per CLAUDE.md requirements:

  • ✅ All modified files are test-related (no production code changes)
  • ❌ Missing verification that the changes actually work (especially SelectPackages change)
  • ⚠️ Gremlin benchmarks excluded from execution

🎬 Recommendation

Status: Changes Requested

The concept is excellent, but there are critical issues:

  1. Gremlin module benchmarks are not executed
  2. OpenCypherTCKTest tag change needs justification/verification
  3. SelectPackages change may break test discovery

Please address the critical and high-priority items before merging.


Review conducted per CLAUDE.md guidelines and ArcadeDB conventions.

@codecov
Copy link

codecov bot commented Feb 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.30%. Comparing base (5144fcc) to head (247afe9).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3398      +/-   ##
==========================================
- Coverage   60.67%   60.30%   -0.37%     
==========================================
  Files        1249     1249              
  Lines       84354    84354              
  Branches    17312    17312              
==========================================
- Hits        51182    50870     -312     
- Misses      25881    26229     +348     
+ Partials     7291     7255      -36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mergify
Copy link
Contributor

mergify bot commented Feb 10, 2026

🧪 CI Insights

Here's what we observed from your CI run for 247afe9.

🟢 All jobs passed!

But CI Insights is watching 👀

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review: fix tentative unit test execution

Summary

This PR reorganizes the CI test execution strategy by introducing a new benchmark tag for JUnit tests and creating a separate CI job to run benchmark tests independently. The main goal appears to be preventing benchmark tests from running during normal unit test execution while ensuring they still get coverage tracking.

Positive Changes ✅

  1. Clear Test Categorization: The introduction of the @Tag("benchmark") annotation provides a clean separation between regular unit tests and performance benchmarks, which is a best practice.

  2. Consistent Tagging: All benchmark tests across the codebase are now consistently tagged:

    • LSMVectorIndexStorageBenchmark
    • TriggerBenchmark
    • DuanSSSPBenchmark
    • ParserBenchmark
    • QueryLanguageBenchmark
    • CypherEngineComparisonBenchmark
    • QueryLanguageWithGremlinAndCypherBenchmark
  3. Dedicated CI Job: The new benchmark-unit-tests job runs benchmarks separately with their own coverage reporting, which prevents them from slowing down the main test suite.

  4. Coverage Preservation: Benchmark tests still contribute to code coverage via the separate job and artifact upload.

  5. Code Cleanup: The pom.xml changes include minor formatting improvements and better property organization.

Issues & Concerns ⚠️

1. Critical Issue: OpenCypherTCKTest Tag Change

-@Tag("slow")
+@Tag("benchmark")

Problem: The OpenCypherTCKTest was changed from @Tag("slow") to @Tag("benchmark"). This is semantically incorrect because:

  • TCK (Technology Compatibility Kit) tests are conformance tests, not benchmarks
  • They verify OpenCypher specification compliance, not performance
  • The test class doesn't measure timing or performance metrics
  • This will exclude TCK tests from the slow-unit-tests job where they belong

Impact: TCK tests will now run in the benchmark job instead of the slow tests job, which may not be the intended behavior.

Recommendation:

  • If TCK tests are too slow for regular unit tests, keep @Tag("slow")
  • If they're also benchmarks, use multiple tags: @Tag("slow") @Tag("benchmark")
  • Or create a separate @Tag("tck") category if they need special handling

2. SelectClasspathResource → SelectPackages Change

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")

Problem: This changes how Cucumber features are discovered:

  • SelectClasspathResource looks for a directory path with / separators
  • SelectPackages looks for Java package names with . separators

Risk: This could cause the TCK tests to not find the feature files at all, depending on how they're organized in the resources directory.

Verification Needed:

  • Check if feature files exist at src/test/resources/opencypher/tck/features/
  • Verify the benchmark job successfully runs these tests
  • Ensure no TCK test scenarios are silently skipped

3. Incomplete Benchmark Job Configuration

The new benchmark-unit-tests job only runs tests from the engine module:

run: ./mvnw verify -Pcoverage ... -pl engine -Dgroups=benchmark

Problem: The gremlin module also has benchmark tests (CypherEngineComparisonBenchmark, QueryLanguageWithGremlinAndCypherBenchmark) that won't be executed.

Recommendation: Either:

  • Expand to -pl engine,gremlin
  • Or document why gremlin benchmarks are intentionally excluded
  • Or create a separate gremlin benchmark job

4. Minor: Property Duplication in pom.xml

The pom.xml has duplicate property definitions:

<maven-install-plugin.version>3.1.4</maven-install-plugin.version>
<!-- appears twice at lines 75 and 81 in the diff -->

Impact: Low - Maven uses the last definition, but it's confusing for maintainers.

5. Test Report Naming

All three jobs (unit, slow, benchmark) use the same report name:

name: Unit Tests Report

Problem: This makes it harder to distinguish which job produced which report in the GitHub Actions UI.

Recommendation: Use descriptive names like:

  • "Unit Tests Report" (for regular tests)
  • "Slow Unit Tests Report" (for slow tests)
  • "Benchmark Tests Report" (for benchmarks)

Questions for Author

  1. Was the OpenCypherTCKTest tag change intentional? Should it really be classified as a benchmark?
  2. Have you verified that the TCK tests still run successfully with the SelectPackages change?
  3. Should gremlin module benchmarks be included in the benchmark job?
  4. What's the expected CI time savings from this reorganization?

Testing Recommendations

Before merging, please verify:

  1. Run the benchmark job locally/in CI and confirm all expected benchmarks execute
  2. Verify TCK tests still discover and run all feature files
  3. Check that gremlin benchmarks are handled appropriately
  4. Confirm coverage reporting includes all benchmark tests
  5. Ensure no tests were accidentally excluded from all jobs

Code Style & Conventions

✅ Code adheres to project standards from CLAUDE.md:

  • No new dependencies added
  • Proper formatting maintained
  • Test-focused changes

Security & Performance

✅ No security concerns identified
✅ Performance impact: This change should improve CI performance by isolating slow benchmarks

Overall Assessment

Status: ⚠️ Needs Revision

The core idea is sound and follows best practices, but there are several issues that need attention:

  1. The OpenCypherTCKTest categorization appears incorrect
  2. The Cucumber test discovery change needs verification
  3. Gremlin benchmarks are not included in the new job

Recommendation: Address the concerns above before merging, particularly verifying the TCK test behavior and deciding on the correct categorization.

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review: Fix Tentative Unit Test Execution

Overview

This PR addresses test organization and execution issues by separating benchmark tests from regular unit tests and attempting to resolve OpenCypher TCK test execution problems.


Positive Changes

1. Benchmark Test Isolation

  • Good practice: Adding @Tag("benchmark") to performance/benchmark tests (LSMVectorIndexStorageBenchmark, TriggerBenchmark, DuanSSSPBenchmark, ParserBenchmark, QueryLanguageBenchmark, etc.)
  • Rationale: Separating slow-running benchmark tests from regular unit tests improves CI/CD pipeline efficiency
  • Implementation: New benchmark-unit-tests job in GitHub Actions runs benchmarks separately

2. Improved Console.Output Test Coverage

  • Enhancement: The consoleLogStatement() test now properly validates actual output instead of just checking that it doesn't throw
  • Implementation: Captures System.out and asserts the expected output matches
  • Quality: This follows the project's preference for proper assertions (using AssertJ)

3. CI/CD Workflow Improvements

  • Excludes benchmark group from regular unit tests via -DexcludedGroups=slow,benchmark
  • Separate job for benchmark tests with dedicated coverage reporting
  • Consistent test reporter configuration

⚠️ Critical Issues

1. Commented-Out Cucumber Dependencies Break OpenCypherTCKTest

Location: engine/pom.xml lines 210-232

Problem: The PR comments out all Cucumber dependencies, but OpenCypherTCKTest.java still imports and uses Cucumber classes:

import static io.cucumber.junit.platform.engine.Constants.GLUE_PROPERTY_NAME;
import static io.cucumber.junit.platform.engine.Constants.PLUGIN_PROPERTY_NAME;

@Suite
@IncludeEngines("cucumber")
@SelectPackages("opencypher.tck.features")
@ConfigurationParameter(key = GLUE_PROPERTY_NAME, value = "com.arcadedb.query.opencypher.tck")
@ConfigurationParameter(key = PLUGIN_PROPERTY_NAME, value = "pretty, html:target/tck-report.html, com.arcadedb.query.opencypher.tck.TCKReportPlugin")
@Tag("benchmark")
public class OpenCypherTCKTest {
}

Files affected:

  • engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java
  • engine/src/test/java/com/arcadedb/query/opencypher/tck/TCKReportPlugin.java
  • engine/src/test/java/com/arcadedb/query/opencypher/tck/TCKStepDefinitions.java

Impact: The engine module will fail to compile in test scope. This violates the project's principle: "after every change in the backend (Java), compile the project and fix all the issues until the compilation passes"

Recommendation: Either:

  1. Keep Cucumber dependencies with <scope>test</scope> (preferred if TCK tests are still needed)
  2. OR remove/move the OpenCypher TCK test files entirely if they're being deprecated
  3. The @Tag("benchmark") annotation on OpenCypherTCKTest should probably be @Tag("slow") instead

2. Inconsistent Tag Usage

Location: OpenCypherTCKTest.java line 36

The PR changes the tag from @Tag("slow") to @Tag("benchmark"). However:

  • OpenCypher TCK is a compliance test suite, not a benchmark
  • Benchmarks measure performance; TCK tests verify specification compliance
  • This semantic mismatch could cause confusion

Recommendation: Keep @Tag("slow") or use a new @Tag("tck") tag instead

3. Missing SelectClasspathResource → SelectPackages Migration

Location: OpenCypherTCKTest.java line 25

Changed from:

@SelectClasspathResource("opencypher/tck/features")

to:

@SelectPackages("opencypher.tck.features")

Problem: @SelectPackages expects Java package names, but opencypher.tck.features is a resource path. This change may prevent Cucumber from finding feature files.

Recommendation: Either keep @SelectClasspathResource or verify that the feature files are accessible via the package name


📋 Minor Issues

1. Code Formatting Inconsistencies

Location: engine/src/test/java/com/arcadedb/query/sql/TriggerBenchmark.java lines 40-43

Changed from:

private static Database database;
private static final int WARMUP_ITERATIONS = 10000;
private static final int BENCHMARK_ITERATIONS = 100000;

to:

private static       Database database;
private static final int      WARMUP_ITERATIONS    = 10000;
private static final int      BENCHMARK_ITERATIONS = 100000;

Issue: Excessive whitespace for alignment is not standard Java style and doesn't match the existing codebase conventions

Recommendation: Revert to standard formatting without extra alignment spaces

2. Property Ordering in pom.xml

Location: Root pom.xml lines 50-98

Properties are reorganized but some duplicates remain:

  • maven-install-plugin.version appears twice (lines 72 and 77)
  • license-maven-plugin.version appears twice (lines 75 and 88 in original)

Recommendation: Clean up duplicate property definitions

3. Whitespace-Only Changes

Location: Multiple files in pom.xml

Many changes are purely whitespace (indentation corrections). While consistency is good, mixing functional changes with formatting changes makes reviews harder.

Recommendation: Consider separating formatting changes into a dedicated commit


🔍 Testing & Verification Concerns

1. No Verification of Changes

According to CLAUDE.md guidelines:

"Before writing any code: State how you will verify this change works"

Missing:

  • No evidence that the engine module compiles after commenting out Cucumber
  • No test execution results showing benchmarks run separately
  • No verification that OpenCypherTCKTest still works (or is intentionally broken)

2. Incomplete Cucumber Removal

If the intent is to remove Cucumber support, the following should also be addressed:

  • Remove TCK-related test files
  • Remove cucumber.version property from engine/pom.xml (line 48)
  • Update documentation if TCK compliance testing is being dropped

🎯 Recommendations

Before merging, please:

  1. Fix compilation: Either restore Cucumber dependencies or remove TCK test files
  2. Run verification: Execute mvn clean install to ensure everything compiles
  3. Run the tests: Verify that:
    • Regular unit tests exclude benchmarks: mvn test -DexcludedGroups=benchmark
    • Benchmark tests run separately: mvn test -Dgroups=benchmark
    • CI workflow completes successfully
  4. Clarify intent: Add a clear description to the PR explaining:
    • Why Cucumber is being removed (if that's the goal)
    • What happens to OpenCypher TCK compliance testing
  5. Fix semantic issues: Use appropriate tags (@Tag("slow") or @Tag("tck") instead of @Tag("benchmark") for OpenCypherTCKTest)
  6. Clean up formatting: Remove unnecessary whitespace alignment changes

📊 Security & Performance

  • ✅ No security concerns identified
  • ✅ No performance regressions expected (benchmark isolation should improve CI performance)
  • ✅ Test coverage improvements with better console output validation

📝 Summary

The benchmark test isolation strategy is excellent and aligns with best practices. However, the Cucumber dependency removal is incomplete and will break compilation. The PR needs additional work before it can be merged successfully.

Verdict: ❌ Changes requested - compilation issues must be resolved first

@codacy-production
Copy link

codacy-production bot commented Feb 10, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-30.25%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (5d1b7fb) 82996 57777 69.61%
Head commit (20707b7) 112605 (+29609) 44324 (-13453) 39.36% (-30.25%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3398) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review: Fix Tentative Unit Test Execution

Summary

This PR introduces a new test execution strategy by separating benchmark tests into their own CI job and making configuration adjustments to test dependencies. The changes improve test organization and CI pipeline clarity.


✅ Positive Aspects

  1. Better Test Organization: Separating benchmark tests from regular unit tests is a good practice. Benchmarks typically take longer and have different performance characteristics.

  2. Improved CI Parallelization: The new benchmark-unit-tests job runs in parallel with other test jobs, which can improve overall CI runtime.

  3. Consistent Tagging: Adding @Tag("benchmark") to benchmark test classes provides clear semantic meaning and enables selective test execution.

  4. Enhanced Test Coverage Tracking: The new job properly collects and uploads coverage reports for benchmark tests separately.


⚠️ Issues and Concerns

1. Breaking Change in OpenCypherTCKTest ⚠️

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")
-@Tag("slow")
+@Tag("benchmark")

Problem: Changing from @SelectClasspathResource to @SelectPackages fundamentally changes how Cucumber discovers test features:

  • @SelectClasspathResource looks for files in the classpath (e.g., .feature files in src/test/resources/opencypher/tck/features/)
  • @SelectPackages looks for Java packages (e.g., classes in com.arcadedb.query.opencypher.tck.features)

Since TCK tests are typically Cucumber .feature files (not Java classes), this change will likely break the OpenCypherTCKTest.

Recommendation: Revert to @SelectClasspathResource("opencypher/tck/features") and only change the tag from slow to benchmark.

2. Commented Out Dependencies Without Explanation ⚠️

<!--        <dependency>-->
<!--            <groupId>io.cucumber</groupId>-->
<!--            <artifactId>cucumber-junit-platform-engine</artifactId>-->
...
<!--        <dependency>-->
<!--            <groupId>org.junit.platform</groupId>-->
<!--            <artifactId>junit-platform-suite</artifactId>-->

Problems:

  • These dependencies are needed for the OpenCypherTCKTest to run (it uses @Suite annotation from junit-platform-suite)
  • Commenting them out will cause compilation failures or runtime errors for the TCK test
  • No explanation in the PR description or commit message about why these were removed

Recommendation: Either:

  1. Keep these dependencies if the TCK test is still needed, OR
  2. If removing the TCK test is intentional, remove the entire test class and explain why in the PR description

3. Tag Classification Inconsistency

The PR changes OpenCypherTCKTest from @Tag("slow") to @Tag("benchmark"). However:

  • TCK (Technology Compatibility Kit) tests are compliance/conformance tests, not benchmarks
  • They verify correctness against the OpenCypher specification, not measure performance
  • Tagging them as "benchmark" is semantically incorrect

Recommendation: Keep the TCK test tagged as slow since it's a long-running conformance test, not a performance benchmark.

4. Incomplete Test Execution Strategy

Looking at the workflow changes:

-DexcludedGroups=slow,benchmark

The regular unit tests now exclude BOTH slow and benchmark groups. However:

  • If OpenCypherTCKTest is tagged benchmark, it will run in the benchmark-unit-tests job
  • But that job only runs on the engine module: -pl engine -Dgroups=benchmark
  • This seems intentional, but needs verification that this is the desired behavior

🔍 Minor Issues

  1. Whitespace Changes in pom.xml: The PR includes formatting changes (tabs/spaces) in the pom.xml that are unrelated to the core functionality. While improving consistency is good, mixing formatting with functional changes makes review harder.

  2. Property Organization in pom.xml: Properties are reorganized but no functional change. Consider separating pure refactoring from functional changes in future PRs.

  3. Missing PR Description: The PR body is empty, making it hard to understand the motivation and expected behavior changes.


🧪 Test Coverage Assessment

Positive:

  • The SQLScriptAdditionalCoverageTest.consoleLogStatement() improvement is excellent - it now actually verifies the output instead of just checking that it doesn't throw
  • All benchmark tests are properly tagged

Concern:

  • No new tests added to verify the CI changes work as expected
  • The commented-out dependencies may break existing tests

🔒 Security Considerations

No security concerns identified. The changes are limited to test execution configuration.


🚀 Performance Considerations

Positive: Separating benchmarks into their own job prevents them from slowing down regular test feedback loops.


📋 Recommendations

Critical (Must Fix):

  1. ✅ Revert OpenCypherTCKTest to use @SelectClasspathResource instead of @SelectPackages
  2. ✅ Un-comment the Cucumber and JUnit Platform Suite dependencies in engine/pom.xml
  3. ✅ Keep OpenCypherTCKTest tagged as @Tag("slow") not benchmark

Suggested:

  1. Add a PR description explaining:
    • Why benchmark tests are being separated
    • Expected impact on CI runtime
    • Whether the TCK test should continue running
  2. Verify that the workflow correctly runs all tests (run a test build)
  3. Consider separating formatting changes from functional changes

✅ After Fixes

Once the critical issues are addressed, this PR will provide a good improvement to the test execution strategy. The core idea of separating benchmarks is sound and follows best practices.

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review: Fix Tentative Unit Test Execution

I've reviewed PR #3398 and have the following feedback:

Positive Changes

  1. Improved Test Categorization: Excellent separation of benchmark tests from regular unit tests using the @Tag("benchmark") annotation. This allows for more granular test execution control.

  2. Dedicated Benchmark Job: The new benchmark-unit-tests job in the CI workflow properly isolates performance-intensive tests, preventing them from slowing down the main test suite.

  3. Better Test Coverage Organization: Separating benchmark coverage reports ensures they can be tracked independently without skewing regular test metrics.

  4. Proper Console Output Testing: The improvement to SQLScriptAdditionalCoverageTest.consoleLogStatement() now actually verifies the output instead of just checking that it doesn't throw. This is a meaningful test quality improvement.

🔍 Issues & Concerns

1. Commented-Out Code (Critical)

<!--        <dependency>-->
<!--            <groupId>io.cucumber</groupId>-->
<!--            <artifactId>cucumber-junit-platform-engine</artifactId>-->

Location: engine/pom.xml:216-225

Issue: Commented-out code should be removed entirely, not left in the codebase. This violates the project's guidelines ("Avoid backwards-compatibility hacks like... adding // removed comments for removed code... If something is unused, delete it completely").

Recommendation: Either remove this dependency entirely or document WHY it's commented out if there's a temporary reason.

2. Inconsistent Tagging Strategy

Location: engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java:36

-@Tag("slow")
+@Tag("benchmark")

Issue: The OpenCypherTCKTest was changed from @Tag("slow") to @Tag("benchmark"). However, the CI workflow still excludes slow tests. If there are other tests tagged with "slow", they will now run in the regular test suite, which may not be intended.

Questions:

  • Are there other tests with @Tag("slow")?
  • Should the workflow exclude both "slow" and "benchmark" tests, or are you replacing "slow" with "benchmark" globally?

3. Cucumber Configuration Change

Location: OpenCypherTCKTest.java:33

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")

Issue: This is a significant behavioral change. @SelectClasspathResource scans for files in the classpath, while @SelectPackages scans for packages. This might not find the Cucumber feature files if they're not structured as Java packages.

Recommendation: Verify that the TCK tests still execute correctly with this change. Run the benchmark tests locally to confirm.

4. Whitespace-Only Changes

Location: Multiple files (pom.xml, TriggerBenchmark.java)

Issue: The PR includes several whitespace-only changes (e.g., reordering property definitions, reformatting). While not harmful, these make the diff harder to review and don't add value.

Recommendation: Consider separating formatting changes into a dedicated PR or avoiding them when making functional changes.

5. Missing Verification

Issue: The PR description is empty, and there's no evidence that the benchmark tests were actually run to verify they work with the new tagging system.

Recommendation: Add a description explaining:

  • What problem this PR solves
  • How the changes were tested
  • Confirmation that benchmark tests run successfully in isolation

🎯 Testing Concerns

  1. New Job Configuration: The benchmark-unit-tests job uses:

    -Dgroups=benchmark -Dtest=!OpenCypherTCKTest

    But it only runs on the engine module (-pl engine). The gremlin module also has benchmark tests. Should those be included?

  2. Coverage Gaps: The workflow downloads benchmark-unit-coverage-reports but benchmark tests might not contribute meaningfully to coverage metrics since they focus on performance, not code paths.

📝 Minor Issues

  1. Typo in Commit Message: "tenttive" should be "tentative" in commit 706c0dd.

  2. Order Swap: The list-tests and list-suites parameters were swapped in several places. While this likely doesn't affect functionality, consistency would be better if there's no reason for the change.

🛠️ Recommendations

  1. Remove commented code in engine/pom.xml
  2. Verify Cucumber test execution with the new @SelectPackages annotation
  3. Document the tagging strategy: Add a comment or documentation explaining when to use @Tag("benchmark") vs @Tag("slow")
  4. Consider including gremlin benchmarks in the benchmark job or document why they're excluded
  5. Add PR description explaining the changes and verification performed
  6. Run the benchmark tests to ensure they work correctly with the new configuration

⚖️ Overall Assessment

This PR addresses a real issue with test organization and CI performance. The core idea of separating benchmark tests is sound and follows good practices. However, there are several implementation details that need attention:

  • Code Quality: Remove commented code
  • Testing: Verify the Cucumber configuration change
  • Documentation: Explain the changes and tagging strategy

Recommendation: Request changes to address the commented code and verify the Cucumber test configuration before merging.

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review: Fix Tentative Unit Test Execution

Summary

This PR separates benchmark tests from regular unit tests by introducing a new @Tag("benchmark") tag and creating a dedicated CI job to run them separately. This improves CI performance by excluding long-running benchmarks from the main test suite while still maintaining coverage.

Code Quality & Best Practices ✅

Positive aspects:

  • Clean separation of concerns: benchmark tests are now properly tagged and isolated
  • Follows existing test patterns and JUnit 5 tagging conventions
  • Consistent application of the @Tag("benchmark") annotation across all benchmark test classes
  • Good use of JUnit Platform's group filtering capabilities

Areas for improvement:

  1. Inconsistent indentation in engine/pom.xml (lines 216-227):

    • The cucumber-junit-platform-engine dependency has incorrect indentation (extra spaces)
    • The commented-out exclusion block should either be removed entirely or properly documented
    • Recommendation: Fix the indentation to match surrounding code and either remove the commented XML or add a clear comment explaining why it's kept
  2. Commented-out code without explanation (engine/pom.xml):

    • Lines 221-226 contain commented XML for exclusions
    • Per CLAUDE.md: "remove any System.out you used for debug when you have finished"
    • Recommendation: Either remove the commented code or add a clear comment explaining why the exclusion was removed and might need to be reconsidered
  3. OpenCypherTCKTest tag change (OpenCypherTCKTest.java:36):

    • Changed from @Tag("slow") to @Tag("benchmark")
    • This is a semantic change - is this test truly a benchmark, or just slow?
    • Also changed from @SelectClasspathResource to @SelectPackages - this could affect test discovery
    • Recommendation: Clarify if this is the intended behavior, as TCK tests are typically compliance tests, not benchmarks

Test Coverage ✅

Improvements:

  1. Enhanced test assertion in SQLScriptAdditionalCoverageTest.java (lines 287-301):
    • The consoleLogStatement() test now properly captures and verifies console output
    • Uses proper stream redirection with try-finally to ensure cleanup
    • Adds actual assertion instead of just checking it doesn't throw
    • Excellent improvement! This follows the guideline: "Prefer this syntax: assertThat(property.isMandatory()).isTrue();"

Coverage considerations:

  • The new benchmark-unit-tests job runs only on the engine module with -Dgroups=benchmark
  • Benchmark tests in the gremlin module (CypherEngineComparisonBenchmark.java, QueryLanguageWithGremlinAndCypherBenchmark.java) are tagged but won't run in the new CI job
  • Recommendation: Either extend the benchmark job to include gremlin module or document why only engine benchmarks are run in CI

Performance Considerations ✅

Benefits:

  • Main unit test suite will run faster by excluding benchmarks: -DexcludedGroups=slow,benchmark
  • Benchmarks run in parallel with other test jobs, maximizing CI efficiency
  • Proper separation allows different timeout/resource settings for benchmark tests

Potential issue:

  • Line 182: The benchmark job runs ./mvnw verify -Pcoverage ... -pl engine
  • This runs only the engine module, but benchmark tests exist in gremlin module too
  • Recommendation: Consider running -pl engine,gremlin or documenting why gremlin benchmarks are excluded

Security Concerns ✅

No security issues identified. Changes are limited to test organization and CI configuration.

Potential Bugs ⚠️

  1. Duplicate property definitions in root pom.xml (lines 72-77):

    • maven-install-plugin.version is defined twice (lines 72 and 77)
    • license-maven-plugin.version is defined twice (lines 75 and 88)
    • Impact: Low - Maven will use the last definition, but this is confusing
    • Recommendation: Remove the duplicate definitions
  2. Minor formatting inconsistencies in root pom.xml:

    • Lines 309-310: Multi-line license merge text might cause readability issues
    • Lines 126-138: Module list formatting is now consistent (good!)
    • Line 544: Extra space removed before files: (good cleanup)
  3. OpenCypherTCKTest package selection change:

    • Changed from @SelectClasspathResource("opencypher/tck/features") to @SelectPackages("opencypher.tck.features")
    • This changes how Cucumber discovers feature files
    • Risk: Medium - this could cause test discovery failures if the package doesn't exist or feature files aren't found
    • Recommendation: Verify this change was tested and feature files are still discovered correctly

CI/CD Workflow Changes 🔍

GitHub Actions modifications:

  1. New benchmark-unit-tests job (lines 162-203):

    • Properly depends on build-and-package
    • Uses correct JDK 21 setup
    • Runs with coverage profile
    • Issue: The job name in the test reporter is "Unit Tests Report" - should be "Benchmark Unit Tests Report" to avoid confusion
    • Issue: The benchmark job only runs -pl engine, missing gremlin benchmarks
  2. Updated coverage aggregation (lines 508-516):

    • Properly downloads benchmark coverage reports
    • Includes them in the codecov upload
    • Uses continue-on-error: true for artifact downloads (good practice)
  3. Minor reordering (lines 106-107, 146-147):

    • Swapped list-tests and list-suites order - no functional impact
    • Appears to be just formatting/style preference

Adherence to CLAUDE.md Guidelines

Followed:

  • Uses existing test patterns and frameworks
  • Maintains existing code style
  • Improved test assertions (SQLScriptAdditionalCoverageTest)
  • No new dependencies added

⚠️ Needs attention:

  • Commented-out code should be removed (engine/pom.xml lines 221-226)
  • Indentation issues in engine/pom.xml (line 216)
  • Duplicate property definitions in root pom.xml should be cleaned up

Recommendations Summary

High Priority:

  1. ✅ Remove or document the commented-out exclusions in engine/pom.xml
  2. ✅ Fix indentation in engine/pom.xml (line 216)
  3. ✅ Remove duplicate property definitions in root pom.xml
  4. ⚠️ Verify OpenCypherTCKTest still discovers feature files correctly after the @SelectPackages change

Medium Priority:
5. 📝 Update the test reporter name in the benchmark job to "Benchmark Unit Tests Report"
6. 🔍 Consider including gremlin module in the benchmark CI job or document why it's excluded
7. 📝 Clarify if OpenCypherTCKTest should be tagged as "benchmark" vs "slow"

Low Priority:
8. 🎨 Fix multi-line license merge formatting in root pom.xml (line 309-310)

Verification Needed

Per CLAUDE.md guidelines, please verify:

  1. Run the full test suite locally to ensure no tests are accidentally excluded
  2. Verify OpenCypherTCKTest still runs and discovers all feature files
  3. Check that benchmark tests in the gremlin module are intentionally excluded from CI or should be included
  4. Confirm the CI pipeline completes successfully with the new job

Overall Assessment: This is a solid improvement that will speed up CI by properly separating benchmark tests. The main concerns are the commented-out code, duplicate properties, and ensuring the OpenCypherTCKTest change doesn't break test discovery. With minor cleanup, this PR will be ready to merge.

@claude
Copy link

claude bot commented Feb 10, 2026

PR Review: Fix tentative unit test execution

Overview

This PR improves the CI/CD test execution strategy by separating benchmark tests into their own workflow job, ensuring better organization and more accurate test coverage reporting.


✅ Positive Changes

1. Improved Test Organization

  • Excellent separation of benchmark tests from regular unit tests using JUnit 5's @Tag("benchmark") annotation
  • New dedicated benchmark-unit-tests job in CI workflow allows for better isolation and potentially different resource allocation
  • Proper coverage aggregation in the coverage-report job now includes benchmark test coverage

2. Test Quality Enhancement

  • SQLScriptAdditionalCoverageTest.java:287-301: Great improvement to the consoleLogStatement() test - now actually verifies output instead of just checking it doesn't throw. This is proper test-driven development.
  • Proper cleanup with finally block ensures System.out is restored even if test fails

3. Workflow Improvements

  • Consistent test reporter configuration across all jobs
  • Proper dependency chain: coverage-report now depends on benchmark-unit-tests job
  • Good use of continue-on-error: true for artifact downloads to handle missing artifacts gracefully

4. Dependency Management

  • Centralized JMH version in engine/pom.xml properties (good practice)
  • Consistent formatting improvements in pom.xml

⚠️ Issues & Concerns

1. CRITICAL: Duplicate Property Definition (pom.xml:77-78)

<maven-install-plugin.version>3.1.4</maven-install-plugin.version>
<maven-enforcer-plugin.version>3.4.1</maven-enforcer-plugin.version>

Issue: maven-install-plugin.version is defined twice (lines 73 and 77)
Impact: While Maven takes the last definition, this is confusing and violates DRY principles
Recommendation: Remove the duplicate at line 77

2. CRITICAL: Duplicate Property Definition (pom.xml:92)

<mockito-core.version>5.21.0</mockito-core.version>

Issue: mockito-core.version is defined twice (lines 88 and 92) with different values (5.16.0 vs 5.21.0)
Impact: This creates ambiguity - version 5.21.0 will be used, but version 5.16.0 might be expected elsewhere
Recommendation: Consolidate to a single definition and verify which version is intended

3. Questionable Change: OpenCypherTCKTest Tag (engine/.../OpenCypherTCKTest.java:36)

-@Tag("slow")
+@Tag("benchmark")

Issue: Changed from slow to benchmark tag, but this is a TCK (Technology Compatibility Kit) compliance test, not a performance benchmark
Impact: May cause confusion about test purpose; TCK tests verify correctness, not performance
Recommendation: Consider whether slow tag was more semantically accurate, or if this test should have both tags

4. Code Style: Whitespace Changes (TriggerBenchmark.java:41-43)

-  private static Database database;
-  private static final int WARMUP_ITERATIONS = 10000;
+  private static       Database database;
+  private static final int      WARMUP_ITERATIONS    = 10000;

Issue: Added column alignment whitespace that doesn't match ArcadeDB's existing code style
Per CLAUDE.md: "adhere to the existing code"
Recommendation: Revert these whitespace changes - they're not necessary for the PR's purpose and violate project style guidelines

5. Commented-Out Code (engine/pom.xml:221-226)

<!--                    <exclusions>-->
<!--                        <exclusion>-->
<!--                            <groupId>org.junit.platform</groupId>-->
<!--                            <artifactId>junit-platform-engine</artifactId>-->
<!--                        </exclusion>-->
<!--                    </exclusions>-->

Issue: Commented-out exclusion block with no explanation
Per CLAUDE.md: Clean code practices suggest removing unused code rather than commenting it
Recommendation: Either remove entirely or add a comment explaining why it's disabled and might be needed later

6. Minor: Indentation Inconsistency (engine/pom.xml:216)

                <dependency>
                    <groupId>io.cucumber</groupId>

Issue: Inconsistent indentation (extra spaces before <dependency>)
Recommendation: Fix indentation to match surrounding code

7. Workflow: Missing Test Exclusion Clarification
The new benchmark job runs:

-Dgroups=benchmark -Dtest=!OpenCypherTCKTest

Question: Why exclude OpenCypherTCKTest if it's now tagged as benchmark? This seems contradictory.
Recommendation: Clarify the intent - either OpenCypherTCKTest should be benchmark or it should remain slow


🔍 Additional Observations

Missing Context

  • The PR description is empty - would benefit from explaining:
    • What issue was this fixing?
    • Why were benchmark tests failing or causing problems?
    • What was the "tentative" behavior being fixed?

Test Execution Strategy

  • Current approach: exclude benchmarks from main tests, run them separately
  • Consider: Should benchmarks run on every commit, or only on schedule/manual trigger? They may be time-intensive.

Coverage Considerations

  • Benchmark tests are now included in coverage, which is good
  • However, benchmark tests typically focus on performance, not coverage - ensure this doesn't skew coverage metrics

📋 Recommendations Summary

Must Fix:

  1. Remove duplicate maven-install-plugin.version property (pom.xml:77)
  2. Resolve conflicting mockito-core.version definitions (lines 88 vs 92)
  3. Revert unnecessary whitespace changes in TriggerBenchmark.java

Should Consider:
4. Re-evaluate OpenCypherTCKTest tag change (slow vs benchmark)
5. Remove or document commented-out exclusion in engine/pom.xml
6. Add PR description explaining what's being fixed
7. Clarify why OpenCypherTCKTest is excluded in benchmark job if it's tagged benchmark

Nice to Have:
8. Fix indentation in engine/pom.xml:216
9. Consider if benchmarks should run on every commit or on schedule


✅ Testing & Verification

The PR correctly:

  • ✅ Adds proper @Tag("benchmark") annotations to benchmark tests
  • ✅ Updates CI workflow to handle benchmark tests separately
  • ✅ Improves test assertion quality (console output test)
  • ✅ Maintains coverage reporting for all test types

Overall, this is a useful improvement to the testing infrastructure, but requires cleanup of duplicate properties and style violations before merging.


Review generated following CLAUDE.md guidelines for code quality, testing practices, and project conventions.

@claude
Copy link

claude bot commented Feb 10, 2026

PR Review: Fix tentative unit test execution

Summary

This PR reorganizes test execution to separate benchmark/performance tests from regular unit tests and improve CI efficiency. The changes include:

  • Adding a new benchmark JUnit tag for performance tests
  • Creating a dedicated CI job for benchmark tests
  • Changing Maven lifecycle phase from verify to package for unit tests
  • Improving test coverage for console output functionality
  • Minor POM cleanup and formatting improvements

✅ Strengths

1. Better Test Organization

  • Clear separation of concerns: benchmark tests now run in their own CI job
  • Excludes benchmark tag from regular unit tests (-DexcludedGroups=slow,benchmark)
  • Consistent tagging applied across all benchmark test classes

2. Improved CI Efficiency

  • Using package phase instead of verify prevents unnecessary integration test setup
  • Allows parallel execution of different test categories
  • Better isolation of slow/benchmark tests from regular unit tests

3. Enhanced Test Coverage

  • SQLScriptAdditionalCoverageTest.consoleLogStatement() now properly verifies console output instead of just checking it does not throw
  • Good use of ByteArrayOutputStream to capture and assert System.out content

⚠️ Issues & Concerns

1. Commented-out Code in POM (engine/pom.xml:195-201)

<dependency>
    <groupId>io.cucumber</groupId>
    <artifactId>cucumber-junit-platform-engine</artifactId>
    <version>${cucumber.version}</version>
    <scope>test</scope>
<!--                    <exclusions>-->
<!--                        <exclusion>-->
<!--                            <groupId>org.junit.platform</groupId>-->
<!--                            <artifactId>junit-platform-engine</artifactId>-->
<!--                        </exclusion>-->
<!--                    </exclusions>-->
</dependency>
  • Issue: Commented-out exclusions left in the code
  • Recommendation: Either remove the comments entirely or document WHY the exclusion was removed (commit message says "just remove junit bindings" but rationale unclear)
  • Risk: The exclusion might have been there to prevent dependency conflicts. Removing it could cause classpath issues.

2. OpenCypherTCKTest Tag Change (Line 246)

-@Tag("slow")
+@Tag("benchmark")
  • Issue: This changes the semantic meaning. TCK (Technology Compatibility Kit) tests are conformance tests, not benchmarks
  • Recommendation: Consider using both tags if applicable, or keep it as "slow" since TCK tests verify correctness, not measure performance
  • Impact: If someone runs tests with -Dgroups=slow, they will no longer run TCK tests

3. SelectClasspathResource → SelectPackages Change

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")
  • Issue: This fundamentally changes how Cucumber discovers feature files
    • SelectClasspathResource looks for files in the classpath (e.g., .feature files)
    • SelectPackages looks for Java packages
  • Risk: This could break TCK test discovery if feature files are not in a package named opencypher.tck.features
  • Recommendation: Verify this actually works. The commit message says "fix load of resources" but no explanation of why this was needed

4. Indentation-only Changes
Multiple files have whitespace/alignment changes (e.g., TriggerBenchmark.java:268-273, QueryLanguageBenchmark.java):

-  private static Database database;
-  private static final int WARMUP_ITERATIONS = 10000;
+  private static       Database database;
+  private static final int      WARMUP_ITERATIONS    = 10000;
  • Issue: These formatting changes make the diff harder to review and are not related to the PR's stated purpose
  • Recommendation: Per CLAUDE.md guidelines, avoid unrelated formatting changes. Consider reverting these or submitting them in a separate formatting-only PR

5. Property Organization (pom.xml)
The PR reorganizes property definitions in the root pom.xml. While this improves organization, ensure there are no duplicate definitions.


🔍 Performance Considerations

Positive:

  • Separating benchmark tests allows them to run with different resource constraints if needed
  • package phase is faster than verify for unit tests since it skips integration test setup

Consideration:

  • Benchmark tests now run in a separate job that depends on build-and-package, adding some overhead
  • Total CI time might increase if benchmark tests are slow, even though unit tests complete faster

🔒 Security Concerns

No security issues identified. Changes are primarily test organization and CI configuration.


🧪 Test Coverage

Positive:

  • New assertion in consoleLogStatement() test actually validates output
  • All benchmark tests properly tagged for categorization

Missing:

  • No evidence that the OpenCypherTCKTest changes were verified (does it still find and run TCK features?)
  • No mention of whether benchmark tests were actually run in the new CI job

📋 Recommendations

  1. Remove or explain commented code in engine/pom.xml
  2. Reconsider OpenCypherTCKTest tagging - should it be "slow" or "benchmark"?
  3. Verify SelectPackages change - ensure TCK tests still run correctly
  4. Revert formatting-only changes or move to separate PR
  5. Add PR description - currently blank, should explain:
    • Why the change from verify to package phase
    • Rationale for removing cucumber-junit-platform-engine exclusion
    • Expected impact on CI execution time
  6. Run full test suite to ensure nothing broke, especially TCK tests

Code Quality: 7/10

The core idea is solid (better test organization), but execution has some rough edges:

  • Mixing functional changes with formatting changes
  • Unclear rationale for some changes (commented code, SelectPackages)
  • Missing PR description to explain the "why"

Per CLAUDE.md project guidelines:

  • ✅ Tests are being improved
  • ✅ Build commands updated appropriately
  • ⚠️ Some changes seem incomplete (commented code)
  • ⚠️ Formatting changes should be avoided in functional PRs

Next Steps

Please address the issues above, particularly:

  1. The OpenCypherTCKTest resource loading change
  2. Commented-out exclusions in engine/pom.xml
  3. Add a PR description explaining the changes

Happy to re-review once these are addressed!

@codacy-production
Copy link

codacy-production bot commented Feb 10, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.41%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (5144fcc) 84354 58499 69.35%
Head commit (247afe9) 84354 (+0) 58153 (-346) 68.94% (-0.41%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3398) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@claude
Copy link

claude bot commented Feb 10, 2026

PR Review: Fix Tentative Unit Test Execution

I've reviewed PR #3398 and have the following feedback:

Positive Changes

  1. Improved Test Categorization: The introduction of the benchmark tag is excellent for separating performance tests from regular unit tests. This aligns with the project's goal of faster CI pipelines.

  2. Dedicated Benchmark Job: The new benchmark-unit-tests job properly isolates benchmark tests, preventing them from slowing down regular test runs while still collecting coverage data.

  3. Phase Change Rationale: Switching from verify to package phase prevents unintended integration test execution during unit test runs. This is a smart fix.

  4. Better Test Coverage Verification: The consoleLogStatement() test now properly validates output instead of just checking for no exceptions - good improvement.

  5. Consistent Formatting: The pom.xml formatting cleanup (indentation, whitespace) improves readability.

⚠️ Issues and Concerns

1. OpenCypherTCKTest Tag Change (Critical)

File: engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java:36

-@Tag("slow")
+@Tag("benchmark")

Issue: This changes the semantic meaning of the test. The OpenCypher TCK is a compliance test suite, not a benchmark. It's slow because it runs hundreds of feature scenarios, but it's fundamentally different from performance benchmarks.

Recommendation: Keep it as @Tag("slow"). If you want to exclude it from regular runs, add it to the -DexcludedGroups parameter instead:

-DexcludedGroups=slow,benchmark -Dtest=!OpenCypherTCKTest

2. Maven Surefire Configuration Mismatch

File: pom.xml:239

The failsafe plugin now excludes Cucumber engine:

<excludeJUnit5Engines>
    <excludeJUnit5Engine>cucumber</excludeJUnit5Engine>
</excludeJUnit5Engines>

Potential Issue: This might prevent the OpenCypherTCKTest from running in integration tests if that's where it was intended to run. The test uses @IncludeEngines("cucumber"), so excluding it globally could cause the test to be silently skipped.

Recommendation: Verify that OpenCypherTCKTest runs in the intended job (benchmark-unit-tests). If it should run separately, consider creating a dedicated TCK test job instead of mixing it with benchmarks.

3. System.out Capture Pattern

File: engine/src/test/java/com/arcadedb/query/sql/executor/SQLScriptAdditionalCoverageTest.java:287-301

The test captures System.out which is good, but:

Minor Issue: The pattern doesn't handle potential concurrent test execution well. If JUnit runs tests in parallel (which surefire can do), this could capture output from other tests.

Recommendation: Consider using a logging framework capture or ensuring this test class runs in single-threaded mode. Not critical for now, but worth noting.

4. Duplicate Property Definitions

File: pom.xml:76,78

<maven-install-plugin.version>3.1.4</maven-install-plugin.version>  <!-- line 72 -->
...
<maven-install-plugin.version>3.1.4</maven-install-plugin.version>  <!-- line 76 -->

Issue: The maven-install-plugin.version property is defined twice in the pom.xml properties section.

Recommendation: Remove one of the duplicate definitions.

📊 Test Coverage Considerations

The PR correctly collects coverage from all test categories (unit, slow, benchmark, integration) and merges them in the coverage-report job. This ensures no coverage data is lost.

However, verify that:

  • Benchmark tests actually contribute meaningful coverage (they might exercise different code paths)
  • The --fail-never flag doesn't hide real test failures that should block CI

🔍 Security & Performance

No security concerns identified. The changes are test infrastructure improvements.

Performance impact should be positive:

  • Regular CI runs will be faster (excludes slow and benchmark tests)
  • Parallel job execution maintains overall coverage quality
  • Benchmark tests run separately without blocking other checks

📋 Suggested Actions

  1. Revert OpenCypherTCKTest tag change - Keep it as slow rather than benchmark
  2. Remove duplicate maven-install-plugin.version property
  3. Verify OpenCypherTCKTest execution - Ensure it runs in the intended job and isn't silently excluded
  4. Consider test isolation for System.out capture test (low priority)

Overall Assessment

This is a solid improvement to the test execution strategy. The separation of benchmark tests is well-designed and follows best practices for CI optimization. The main concerns are semantic (TCK vs benchmark classification) and a potential duplicate property definition.

Recommendation: Address the critical issues (TCK tag, duplicate property) before merging. The other items are minor suggestions for future consideration.


Review conducted by Claude Code following CLAUDE.md guidelines

@claude
Copy link

claude bot commented Feb 10, 2026

Code Review - PR #3398: Fix Tentative Unit Test Execution

Summary

This PR refactors the CI test execution strategy by:

  1. Changing Maven lifecycle from verify to package for unit tests
  2. Introducing a new @Tag("benchmark") to separate benchmark tests
  3. Adding a dedicated benchmark-unit-tests job in the CI pipeline
  4. Improving test coverage for CONSOLE.OUTPUT statement
  5. Code formatting and dependency organization improvements

✅ Strengths

1. Improved Test Organization

  • Benchmark Separation: Properly tags benchmark tests with @Tag("benchmark") and runs them separately, preventing them from slowing down standard unit test runs
  • Clear Intent: The new CI job structure makes it explicit when benchmarks run vs regular tests

2. Correct Maven Lifecycle Usage

  • Phase Change: Moving from verifypackage prevents unintended integration test execution
  • Accurate Comment: The updated comment correctly explains that package phase runs surefire (test) and JaCoCo report (prepare-package) without reaching integration-test phase

3. Enhanced Test Coverage

  • SQLScriptAdditionalCoverageTest.java (line 263-283): The consoleLogStatement() test now properly verifies output instead of just ensuring no exceptions
    • Captures System.out and validates the actual output
    • Follows proper cleanup pattern with try-finally
    • Adheres to project's preferred assertion style: assertThat(output).isEqualTo("test log message")

4. Dependency Management

  • engine/pom.xml: Extracted JMH version to property ${jmh.version} (1.37) - good practice for version consistency
  • integration/pom.xml: Added proper exclusions for Cucumber dependencies in engine test-jar to prevent unwanted transitive dependencies

⚠️ Issues & Concerns

1. 🔴 Critical: OpenCypherTCKTest Tag Change

File: engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java:216

-@Tag("slow")
+@Tag("benchmark")

Issue: This changes the semantic categorization of the OpenCypher TCK test suite from "slow" to "benchmark". TCK (Technology Compatibility Kit) tests are conformance tests, not benchmarks.

Impact:

  • TCK tests verify specification compliance and should run in the "slow" test category
  • Benchmark tests measure performance characteristics
  • This misclassifies the test's purpose

Recommendation: Revert this change. If TCK tests need to be excluded from regular runs, use multiple tags:

@Tag("slow")
@Tag("tck")
public class OpenCypherTCKTest {

2. 🟡 Medium: Inconsistent Annotation Import

File: engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java:204

-import org.junit.platform.suite.api.SelectClasspathResource;
+import org.junit.platform.suite.api.SelectPackages;

Issue: The annotation change from @SelectClasspathResource("opencypher/tck/features") to @SelectPackages("opencypher.tck.features") may cause the test to fail finding feature files.

Concerns:

  • SelectClasspathResource looks for files/directories on the classpath (e.g., .feature files)
  • SelectPackages looks for Java packages with test classes
  • TCK feature files are resources, not Java packages

Recommendation: Verify this change doesn't break TCK test discovery. If feature files are in src/test/resources/opencypher/tck/features/, the original SelectClasspathResource is likely correct.

3. 🟡 Medium: Whitespace-Only Changes

Files:

  • engine/src/test/java/com/arcadedb/query/sql/TriggerBenchmark.java:237-243
  • engine/src/test/java/performance/QueryLanguageBenchmark.java:332-350,419-429

Issue: Multiple lines have alignment-only whitespace changes (e.g., variable declarations aligned to columns)

-  private static Database database;
+  private static       Database database;

Concerns:

  • Per CLAUDE.md: "adhere to the existing code" style
  • These changes add noise to the diff without functional benefit
  • Not consistently applied across the codebase

Recommendation: Remove cosmetic whitespace changes unless they're part of a dedicated formatting effort. Focus on functional changes only.

4. 🟡 Medium: Wildcard Import Replaced

File: engine/src/test/java/performance/QueryLanguageBenchmark.java:339-350

-import java.util.*;
+import java.util.ArrayList;
+import java.util.Arrays;
+// ... 8 more explicit imports

Issue: While explicit imports are generally preferred, this change is cosmetic and not mentioned in the PR description.

Observation: CLAUDE.md states "don't use fully qualified names if possible, always import the class" but doesn't mandate explicit vs wildcard imports.

Recommendation: This is fine if it's project standard, but consider applying consistently or deferring to a separate formatting PR.

5. 🟢 Minor: Duplicate Property Definition

File: pom.xml:522-523

<license-maven-plugin.version>5.0.0</license-maven-plugin.version>
<maven-install-plugin.version>3.1.4</maven-install-plugin.version>

Observation: These properties already exist earlier in the file (lines 519, 522 in the diff). The diff shows they're being reorganized/deduplicated, which is good. Verify no duplicates remain.

6. 🟢 Minor: CI YAML Formatting

File: .github/workflows/mvn-test.yml:20-22, 40-41

-          list-suites: "failed"
           list-tests: "failed"
+          list-suites: "failed"

Issue: Parameter order changed (cosmetic)

Impact: None - YAML doesn't care about parameter order


🔍 Testing Verification

Required Testing:

  1. Verify CI pipeline runs successfully with the new job structure
  2. ⚠️ Verify OpenCypherTCKTest still discovers and runs feature files after annotation change
  3. Verify benchmark tests are properly excluded from standard unit test runs with -DexcludedGroups=benchmark
  4. Verify consoleLogStatement() test passes with the new assertion

Commands to Verify Locally:

# Test the package phase runs unit tests but not ITs
mvn package -Pcoverage -DexcludedGroups=slow,benchmark -Dtest=!OpenCypherTCKTest

# Test slow tests run separately
mvn package -Pcoverage -pl engine -Dgroups=slow -Dtest=!OpenCypherTCKTest

# Test benchmark tests run separately
mvn package -Pcoverage -Dgroups=benchmark -Dtest=!OpenCypherTCKTest

# Verify OpenCypherTCKTest can still run
cd engine && mvn test -Dtest=OpenCypherTCKTest

🛡️ Security & Performance

Security: ✅ No security concerns identified

Performance:

  • Positive: Separating benchmarks will speed up regular CI runs
  • Positive: Using package instead of verify prevents accidental IT execution

📋 Recommendations

Must Fix:

  1. Revert OpenCypherTCKTest tag from benchmark back to slow (or use both tags appropriately)
  2. Verify SelectPackages change doesn't break TCK test discovery - test locally before merging

Should Consider:

  1. Remove cosmetic whitespace changes (variable alignment) to reduce diff noise
  2. Verify no duplicate properties remain in pom.xml

Nice to Have:

  1. Document the test categorization strategy (slow, benchmark) in CLAUDE.md for future contributors

📊 Adherence to CLAUDE.md

Followed:

  • Test coverage improvements (consoleLogStatement test)
  • Used assertThat(...).isTrue() style assertions
  • Proper dependency management
  • No System.out left in production code

⚠️ Violated:

  • "adhere to the existing code" - cosmetic whitespace changes deviate from surrounding code
  • Possibly semantic change to TCK test categorization without clear justification

Final Verdict

Approve with Changes Required 🟡

This PR makes valuable improvements to test organization and CI efficiency. However, the OpenCypherTCKTest categorization change requires verification/correction before merging. The TCK suite is a conformance test, not a benchmark, and changing its tag may cause it to be skipped inappropriately.

Once the TCK test issue is addressed, this PR will be ready to merge.


Reviewed per CLAUDE.md guidelines

@robfrank robfrank force-pushed the test-and-coverage-review branch from aa51fae to 247afe9 Compare February 11, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant