Skip to content

Fix benchmark CI false success and Transaction isolation#6

Merged
konard merged 8 commits intomainfrom
issue-5-bdec6bf473d2
Dec 23, 2025
Merged

Fix benchmark CI false success and Transaction isolation#6
konard merged 8 commits intomainfrom
issue-5-bdec6bf473d2

Conversation

@konard
Copy link
Member

@konard konard commented Dec 23, 2025

Summary

This PR fixes the benchmark being broken (issue #5) by addressing three root causes:

  1. CI Pipeline Exit Code Masking - The benchmark command cargo bench ... | tee out.txt was masking failure exit codes because tee always succeeds. Added set -o pipefail to properly propagate benchmark failures.

  2. Transaction Benchmark Isolation - The Transaction's drop_table() method was missing the crucial step of resetting the client's next_id counter after deleting nodes. This caused NotExists(4000) panics when subsequent iterations tried to delete links with IDs that didn't exist (because the counter wasn't reset to 1).

  3. CI Workflow Improvements:

    • CI now runs on both PRs and pushes to main/master
    • README.md updates only happen when changes are pushed to main/master (not on PRs)
    • This allows testing PRs before merging while preventing duplicate README updates

Changes

Core Fix

  • rust/src/client.rs: Add reset_next_id() method to allow external callers to reset the ID counter
  • rust/src/transaction.rs: Call client.reset_next_id() in drop_table() to ensure proper benchmark isolation

Performance Improvement

  • rust/src/lib.rs: Reduce BACKGROUND_LINKS from 3000 to 10, add LINK_COUNT = 10 constant
  • Updated all benchmark files (create.rs, delete.rs, update.rs, each/*.rs) to use LINK_COUNT constant

Code Quality

  • Fixed 6 unused import warnings in benchmark files by moving BACKGROUND_LINKS imports from module-level to inside bench! macro bodies where they are actually used

CI Workflow

  • .github/workflows/rust.yml:
    • Added pull_request trigger for testing PR branches
    • Added set -o pipefail to properly propagate benchmark failures
    • Added condition to "Prepare benchmark results" step to only run on push to main/master
    • This ensures PRs are tested before merge, and README updates only happen on actual merges

Root Cause Analysis

The NotExists(4000) error occurred because:

  1. First benchmark iteration creates 3000 background links (IDs 1-3000)
  2. Delete benchmark creates 1000 more links (IDs 3001-4000), then deletes them
  3. drop_table() deletes all nodes but next_id stays at 4001
  4. Second iteration creates background links starting from ID 4001, NOT from ID 1
  5. Delete benchmark tries to delete IDs 3001-4000 which don't exist → panic!

The fix ensures next_id is reset to 1 whenever drop_table() is called, so each iteration starts fresh.

Test Plan

  • Code compiles (cargo check --all-features passes)
  • CI runs on PR commits (for testing) - verified with run #20465184821
  • Benchmark completes without NotExists errors - all tests pass including Delete/Neo4j_Transaction
  • README.md updates only happen on push to main/master (conditional step was skipped on PR)
  • Benchmark iterations complete in ~15 minutes (reduced from 70+ minutes)
  • No compiler warnings (fixed 6 unused import warnings in benchmarks)

Fixes #5


🤖 Generated with Claude Code

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: #5
@konard konard self-assigned this Dec 23, 2025
konard and others added 3 commits December 23, 2025 15:03
This commit adds a comprehensive case study documenting:
- Timeline of events during the failed CI run
- Root cause analysis identifying pipeline exit code masking and
  Transaction benchmark isolation issues
- Evidence from CI logs and code analysis
- Proposed solutions for both issues

The case study includes:
- docs/case-studies/issue-5/README.md: Full analysis document
- docs/case-studies/issue-5/ci-run-20442351221.log: Complete CI logs
- docs/case-studies/issue-5/key-log-entries.txt: Key error entries

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The benchmark command piped to tee was masking the exit code:
  cargo bench ... | tee out.txt

When cargo bench failed with exit code 101, tee succeeded with exit
code 0, making the whole pipeline appear successful. This caused:
- False success status on CI runs where benchmarks actually failed
- Subsequent steps running despite benchmark failures
- Broken benchmark results being committed to main

Solution: Add 'set -o pipefail' to ensure the pipeline returns the
exit code of the first failing command, making CI properly fail when
benchmarks fail.

Fixes part of #5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The Transaction benchmark implementation was missing proper cleanup
in fork() and unfork() methods, unlike the Client implementation
which properly cleaned state between iterations.

This caused state pollution between benchmarks:
- Delete/Neo4j_Transaction couldn't find links created by previous
  benchmark iterations
- Panic: 'NotExists(4000)' at delete.rs:23

Solution: Add drop_table() calls in Transaction's fork() and unfork()
methods to ensure each benchmark iteration starts with a clean state,
matching the Client implementation's behavior.

Fixes part of #5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@konard konard changed the title [WIP] Benchmark is broken Fix benchmark CI false success and Transaction isolation Dec 23, 2025
@konard konard marked this pull request as ready for review December 23, 2025 14:04
This file was used for AI task management and should not be committed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@konard
Copy link
Member Author

konard commented Dec 23, 2025

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $4.869104 USD
  • Calculated by Anthropic: $2.827731 USD
  • Difference: $-2.041373 (-41.93%)
    📎 Log file uploaded as GitHub Gist (530KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Member Author

konard commented Dec 23, 2025

Benchmark Rust version / benchmark (pull_request)
Benchmark Rust version / benchmark (pull_request)Failing after 70m
Benchmark Rust version / benchmark (push)
Benchmark Rust version / benchmark (push)Failing after 74m

We should not execute benchmark twice on pull request commits. Also only benchmark on push to default branch should update README.md with latest benchmark stats and graphs (visuals).

Also the benchmark is still broken, and executes quite a long time, may be we should try to use fewer iterations, to iterate on benchmark itself faster?

Let's start with 10 links only, it should be fast enough. And will allow us to test all benchmarks, and fix them all for all read and write operations.

@konard
Copy link
Member Author

konard commented Dec 23, 2025

🤖 AI Work Session Started

Starting automated work session at 2025-12-23T15:37:29.118Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback.

@konard konard marked this pull request as draft December 23, 2025 15:37
…testing

Root cause of NotExists(4000) panic:
- Transaction's drop_table() was deleting all nodes but NOT resetting the
  Client's next_id counter (unlike Client's own drop_table which does reset it)
- This caused benchmark iterations to reuse stale IDs after cleanup

Changes:
1. Added reset_next_id() method to Client for external callers
2. Transaction's drop_table() now calls client.reset_next_id() for proper isolation
3. Reduced BACKGROUND_LINKS from 3000 to 10 and added LINK_COUNT=10 constant
   for much faster benchmark iteration during development
4. Updated all benchmark files to use the new LINK_COUNT constant
5. CI workflow now only runs on push to main/master (not on PRs)
   - This fixes the double-execution issue on PR commits
   - README.md updates only happen on default branch merges

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@konard
Copy link
Member Author

konard commented Dec 23, 2025

Changes Made

I've addressed all the feedback from the previous comment:

1. Fix for NotExists(4000) Error

Root Cause Found: The Transaction's drop_table() method was deleting all Neo4j nodes but NOT resetting the client's next_id counter. This caused subsequent benchmark iterations to try deleting IDs that didn't exist because new links were being created with stale (higher) IDs.

Fix: Added reset_next_id() method to Client and now call client.reset_next_id() in Transaction's drop_table().

2. CI Workflow No Longer Runs Twice on PRs

Changed the workflow to only trigger on push to main or master branches. The benchmark workflow (which takes ~70+ minutes) will now only run when PRs are merged, not on every PR commit.

3. Reduced Benchmark Iterations to 10 Links

  • Changed BACKGROUND_LINKS from 3000 to 10
  • Added new LINK_COUNT constant (10) for the number of links to create/delete/update
  • Updated all benchmark files to use these constants

This should make iterations much faster for development/testing purposes.


Note: Since CI no longer runs on PRs, you may want to consider adding a separate lightweight workflow that runs on PRs to just verify the build compiles (without running the full benchmark). Let me know if you'd like me to add that.

The build has been verified locally:

$ cargo check --all-features
    Finished dev [unoptimized + debuginfo] target(s) in 10.44s

Ready for review!

@konard konard marked this pull request as ready for review December 23, 2025 15:44
@konard
Copy link
Member Author

konard commented Dec 23, 2025

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $4.817247 USD
  • Calculated by Anthropic: $3.140494 USD
  • Difference: $-1.676753 (-34.81%)
    📎 Log file uploaded as GitHub Gist (651KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Member Author

konard commented Dec 23, 2025

🔄 Auto-restart 1/3

Detected uncommitted changes from previous run. Starting new session to review and commit them.

Uncommitted files:

?? ci-logs/
?? ci-runs.json
?? issue-5.json
?? pr-view.json

Auto-restart will stop after changes are committed or after 2 more iterations. Please wait until working session will end and give your feedback.

- Add pull_request trigger for branches main/master to allow testing PR changes
- Add condition to "Prepare benchmark results" step to only run on push events
  to main/master branches (not on PRs)
- This ensures:
  - PRs can be tested before merging (benchmark runs and fails if broken)
  - README.md updates only happen when changes are merged to main/master

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@konard
Copy link
Member Author

konard commented Dec 23, 2025

✅ CI Now Passing

The benchmark fix has been verified. Here's a summary of what was done:

Changes Made

  1. Fixed Transaction benchmark isolation - Added reset_next_id() call in Transaction's drop_table() to reset the ID counter between benchmark iterations
  2. Fixed CI workflow - Changed workflow to run on both PRs (for testing) and pushes to main/master (for README updates)
  3. Reduced iterations - Changed BACKGROUND_LINKS from 3000 to 10 and added LINK_COUNT=10 for faster iteration

CI Results

  • Run #20465184821 completed successfully in ~15 minutes
  • ✅ All benchmarks pass including the previously failing Delete/Neo4j_Transaction
  • ✅ "Prepare benchmark results" step was correctly skipped (only runs on push to main/master)
  • ✅ No NotExists(4000) or other errors

Ready for Review

The PR is ready to be reviewed and merged. After merge, the benchmark will run on the main branch and automatically update the README with new benchmark results.

@konard
Copy link
Member Author

konard commented Dec 23, 2025

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $3.543635 USD
  • Calculated by Anthropic: $2.155418 USD
  • Difference: $-1.388217 (-39.17%)
    📎 Log file uploaded as GitHub Gist (1025KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Member Author

konard commented Dec 23, 2025

@konard konard marked this pull request as draft December 23, 2025 19:51
@konard
Copy link
Member Author

konard commented Dec 23, 2025

🤖 AI Work Session Started

Starting automated work session at 2025-12-23T19:51:32.489Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback.

Moved BACKGROUND_LINKS imports from module-level to inside bench! macro
bodies where they are actually used. This eliminates 6 compiler warnings
that appeared in CI run #20465184821.

The bench! macro already imports BACKGROUND_LINKS internally (lib.rs:18),
so module-level imports were redundant. Now each benchmark that uses
BACKGROUND_LINKS explicitly imports it within the macro body.

Fixes warnings in:
- benches/benchmarks/delete.rs
- benches/benchmarks/update.rs
- benches/benchmarks/each/concrete.rs
- benches/benchmarks/each/identity.rs
- benches/benchmarks/each/incoming.rs
- benches/benchmarks/each/outgoing.rs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@konard
Copy link
Member Author

konard commented Dec 23, 2025

Fixed Compiler Warnings

All 6 unused import warnings from CI run #20465184821 have been fixed.

Changes Made

  • Moved BACKGROUND_LINKS imports from module-level to inside bench! macro bodies in 6 benchmark files
  • The bench! macro already imports BACKGROUND_LINKS internally (lib.rs:18), so module-level imports were redundant
  • Verified locally with cargo check --benches - no warnings

Verification

  • ✅ Local build: cargo check --all-features passes with no warnings
  • ✅ Benchmarks: cargo check --benches passes with no warnings
  • 🔄 CI run #20470267171 in progress to verify fix in CI environment

The warnings were:

warning: unused import: `BACKGROUND_LINKS`
  --> benches/benchmarks/delete.rs:10:81
  --> benches/benchmarks/update.rs:10:81
  --> benches/benchmarks/each/concrete.rs:11:81
  --> benches/benchmarks/each/identity.rs:11:81
  --> benches/benchmarks/each/incoming.rs:11:81
  --> benches/benchmarks/each/outgoing.rs:11:81

@konard konard marked this pull request as ready for review December 23, 2025 19:58
@konard
Copy link
Member Author

konard commented Dec 23, 2025

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $1.474991 USD
  • Calculated by Anthropic: unknown
  • Difference: unknown
    📎 Log file uploaded as GitHub Gist (366KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard konard merged commit b066d59 into main Dec 23, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Benchmark is broken

1 participant