Fix benchmark CI false success and Transaction isolation#6
Conversation
Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: #5
This commit adds a comprehensive case study documenting: - Timeline of events during the failed CI run - Root cause analysis identifying pipeline exit code masking and Transaction benchmark isolation issues - Evidence from CI logs and code analysis - Proposed solutions for both issues The case study includes: - docs/case-studies/issue-5/README.md: Full analysis document - docs/case-studies/issue-5/ci-run-20442351221.log: Complete CI logs - docs/case-studies/issue-5/key-log-entries.txt: Key error entries 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The benchmark command piped to tee was masking the exit code: cargo bench ... | tee out.txt When cargo bench failed with exit code 101, tee succeeded with exit code 0, making the whole pipeline appear successful. This caused: - False success status on CI runs where benchmarks actually failed - Subsequent steps running despite benchmark failures - Broken benchmark results being committed to main Solution: Add 'set -o pipefail' to ensure the pipeline returns the exit code of the first failing command, making CI properly fail when benchmarks fail. Fixes part of #5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The Transaction benchmark implementation was missing proper cleanup in fork() and unfork() methods, unlike the Client implementation which properly cleaned state between iterations. This caused state pollution between benchmarks: - Delete/Neo4j_Transaction couldn't find links created by previous benchmark iterations - Panic: 'NotExists(4000)' at delete.rs:23 Solution: Add drop_table() calls in Transaction's fork() and unfork() methods to ensure each benchmark iteration starts with a clean state, matching the Client implementation's behavior. Fixes part of #5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This file was used for AI task management and should not be committed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
Benchmark Rust version / benchmark (pull_request) We should not execute benchmark twice on pull request commits. Also only benchmark on push to default branch should update README.md with latest benchmark stats and graphs (visuals). Also the benchmark is still broken, and executes quite a long time, may be we should try to use fewer iterations, to iterate on benchmark itself faster? Let's start with 10 links only, it should be fast enough. And will allow us to test all benchmarks, and fix them all for all read and write operations. |
|
🤖 AI Work Session Started Starting automated work session at 2025-12-23T15:37:29.118Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
…testing Root cause of NotExists(4000) panic: - Transaction's drop_table() was deleting all nodes but NOT resetting the Client's next_id counter (unlike Client's own drop_table which does reset it) - This caused benchmark iterations to reuse stale IDs after cleanup Changes: 1. Added reset_next_id() method to Client for external callers 2. Transaction's drop_table() now calls client.reset_next_id() for proper isolation 3. Reduced BACKGROUND_LINKS from 3000 to 10 and added LINK_COUNT=10 constant for much faster benchmark iteration during development 4. Updated all benchmark files to use the new LINK_COUNT constant 5. CI workflow now only runs on push to main/master (not on PRs) - This fixes the double-execution issue on PR commits - README.md updates only happen on default branch merges 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Changes MadeI've addressed all the feedback from the previous comment: 1. Fix for NotExists(4000) ErrorRoot Cause Found: The Transaction's Fix: Added 2. CI Workflow No Longer Runs Twice on PRsChanged the workflow to only trigger on push to 3. Reduced Benchmark Iterations to 10 Links
This should make iterations much faster for development/testing purposes. Note: Since CI no longer runs on PRs, you may want to consider adding a separate lightweight workflow that runs on PRs to just verify the build compiles (without running the full benchmark). Let me know if you'd like me to add that. The build has been verified locally: Ready for review! |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
🔄 Auto-restart 1/3Detected uncommitted changes from previous run. Starting new session to review and commit them. Uncommitted files: Auto-restart will stop after changes are committed or after 2 more iterations. Please wait until working session will end and give your feedback. |
- Add pull_request trigger for branches main/master to allow testing PR changes - Add condition to "Prepare benchmark results" step to only run on push events to main/master branches (not on PRs) - This ensures: - PRs can be tested before merging (benchmark runs and fails if broken) - README.md updates only happen when changes are merged to main/master 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
✅ CI Now PassingThe benchmark fix has been verified. Here's a summary of what was done: Changes Made
CI Results
Ready for ReviewThe PR is ready to be reviewed and merged. After merge, the benchmark will run on the main branch and automatically update the README with new benchmark results. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
We should also fix all warnings in https://github.com/linksplatform/Comparisons.Neo4jVSDoublets/actions/runs/20465184821/job/58807054020?pr=6 |
|
🤖 AI Work Session Started Starting automated work session at 2025-12-23T19:51:32.489Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
Moved BACKGROUND_LINKS imports from module-level to inside bench! macro bodies where they are actually used. This eliminates 6 compiler warnings that appeared in CI run #20465184821. The bench! macro already imports BACKGROUND_LINKS internally (lib.rs:18), so module-level imports were redundant. Now each benchmark that uses BACKGROUND_LINKS explicitly imports it within the macro body. Fixes warnings in: - benches/benchmarks/delete.rs - benches/benchmarks/update.rs - benches/benchmarks/each/concrete.rs - benches/benchmarks/each/identity.rs - benches/benchmarks/each/incoming.rs - benches/benchmarks/each/outgoing.rs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed Compiler WarningsAll 6 unused import warnings from CI run #20465184821 have been fixed. Changes Made
Verification
The warnings were: |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
Summary
This PR fixes the benchmark being broken (issue #5) by addressing three root causes:
CI Pipeline Exit Code Masking - The benchmark command
cargo bench ... | tee out.txtwas masking failure exit codes becauseteealways succeeds. Addedset -o pipefailto properly propagate benchmark failures.Transaction Benchmark Isolation - The Transaction's
drop_table()method was missing the crucial step of resetting the client'snext_idcounter after deleting nodes. This causedNotExists(4000)panics when subsequent iterations tried to delete links with IDs that didn't exist (because the counter wasn't reset to 1).CI Workflow Improvements:
Changes
Core Fix
rust/src/client.rs: Addreset_next_id()method to allow external callers to reset the ID counterrust/src/transaction.rs: Callclient.reset_next_id()indrop_table()to ensure proper benchmark isolationPerformance Improvement
rust/src/lib.rs: ReduceBACKGROUND_LINKSfrom 3000 to 10, addLINK_COUNT = 10constantcreate.rs,delete.rs,update.rs,each/*.rs) to useLINK_COUNTconstantCode Quality
BACKGROUND_LINKSimports from module-level to insidebench!macro bodies where they are actually usedCI Workflow
.github/workflows/rust.yml:pull_requesttrigger for testing PR branchesset -o pipefailto properly propagate benchmark failuresRoot Cause Analysis
The
NotExists(4000)error occurred because:drop_table()deletes all nodes butnext_idstays at 4001The fix ensures
next_idis reset to 1 wheneverdrop_table()is called, so each iteration starts fresh.Test Plan
cargo check --all-featurespasses)Fixes #5
🤖 Generated with Claude Code