Fix benchmark CI false success and Transaction isolation by konard · Pull Request #6 · linksplatform/Comparisons.Neo4jVSDoublets

konard · 2025-12-23T13:57:51Z

Summary

This PR fixes the benchmark being broken (issue #5) by addressing three root causes:

CI Pipeline Exit Code Masking - The benchmark command cargo bench ... | tee out.txt was masking failure exit codes because tee always succeeds. Added set -o pipefail to properly propagate benchmark failures.
Transaction Benchmark Isolation - The Transaction's drop_table() method was missing the crucial step of resetting the client's next_id counter after deleting nodes. This caused NotExists(4000) panics when subsequent iterations tried to delete links with IDs that didn't exist (because the counter wasn't reset to 1).
CI Workflow Improvements:
- CI now runs on both PRs and pushes to main/master
- README.md updates only happen when changes are pushed to main/master (not on PRs)
- This allows testing PRs before merging while preventing duplicate README updates

Changes

Core Fix

rust/src/client.rs: Add reset_next_id() method to allow external callers to reset the ID counter
rust/src/transaction.rs: Call client.reset_next_id() in drop_table() to ensure proper benchmark isolation

Performance Improvement

rust/src/lib.rs: Reduce BACKGROUND_LINKS from 3000 to 10, add LINK_COUNT = 10 constant
Updated all benchmark files (create.rs, delete.rs, update.rs, each/*.rs) to use LINK_COUNT constant

Code Quality

Fixed 6 unused import warnings in benchmark files by moving BACKGROUND_LINKS imports from module-level to inside bench! macro bodies where they are actually used

CI Workflow

.github/workflows/rust.yml:
- Added pull_request trigger for testing PR branches
- Added set -o pipefail to properly propagate benchmark failures
- Added condition to "Prepare benchmark results" step to only run on push to main/master
- This ensures PRs are tested before merge, and README updates only happen on actual merges

Root Cause Analysis

The NotExists(4000) error occurred because:

First benchmark iteration creates 3000 background links (IDs 1-3000)
Delete benchmark creates 1000 more links (IDs 3001-4000), then deletes them
drop_table() deletes all nodes but next_id stays at 4001
Second iteration creates background links starting from ID 4001, NOT from ID 1
Delete benchmark tries to delete IDs 3001-4000 which don't exist → panic!

The fix ensures next_id is reset to 1 whenever drop_table() is called, so each iteration starts fresh.

Test Plan

Code compiles (cargo check --all-features passes)
CI runs on PR commits (for testing) - verified with run #20465184821
Benchmark completes without NotExists errors - all tests pass including Delete/Neo4j_Transaction
README.md updates only happen on push to main/master (conditional step was skipped on PR)
Benchmark iterations complete in ~15 minutes (reduced from 70+ minutes)
No compiler warnings (fixed 6 unused import warnings in benchmarks)

Fixes #5

🤖 Generated with Claude Code

Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: #5

This commit adds a comprehensive case study documenting: - Timeline of events during the failed CI run - Root cause analysis identifying pipeline exit code masking and Transaction benchmark isolation issues - Evidence from CI logs and code analysis - Proposed solutions for both issues The case study includes: - docs/case-studies/issue-5/README.md: Full analysis document - docs/case-studies/issue-5/ci-run-20442351221.log: Complete CI logs - docs/case-studies/issue-5/key-log-entries.txt: Key error entries 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The benchmark command piped to tee was masking the exit code: cargo bench ... | tee out.txt When cargo bench failed with exit code 101, tee succeeded with exit code 0, making the whole pipeline appear successful. This caused: - False success status on CI runs where benchmarks actually failed - Subsequent steps running despite benchmark failures - Broken benchmark results being committed to main Solution: Add 'set -o pipefail' to ensure the pipeline returns the exit code of the first failing command, making CI properly fail when benchmarks fail. Fixes part of #5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The Transaction benchmark implementation was missing proper cleanup in fork() and unfork() methods, unlike the Client implementation which properly cleaned state between iterations. This caused state pollution between benchmarks: - Delete/Neo4j_Transaction couldn't find links created by previous benchmark iterations - Panic: 'NotExists(4000)' at delete.rs:23 Solution: Add drop_table() calls in Transaction's fork() and unfork() methods to ensure each benchmark iteration starts with a clean state, matching the Client implementation's behavior. Fixes part of #5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

This file was used for AI task management and should not be committed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

konard · 2025-12-23T14:05:24Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

Public pricing estimate: $4.869104 USD
Calculated by Anthropic: $2.827731 USD
Difference: $-2.041373 (-41.93%)
📎 Log file uploaded as GitHub Gist (530KB)
🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2025-12-23T15:36:47Z

Benchmark Rust version / benchmark (pull_request)
Benchmark Rust version / benchmark (pull_request)Failing after 70m
Benchmark Rust version / benchmark (push)
Benchmark Rust version / benchmark (push)Failing after 74m

We should not execute benchmark twice on pull request commits. Also only benchmark on push to default branch should update README.md with latest benchmark stats and graphs (visuals).

Also the benchmark is still broken, and executes quite a long time, may be we should try to use fewer iterations, to iterate on benchmark itself faster?

Let's start with 10 links only, it should be fast enough. And will allow us to test all benchmarks, and fix them all for all read and write operations.

konard · 2025-12-23T15:37:30Z

🤖 AI Work Session Started

Starting automated work session at 2025-12-23T15:37:29.118Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback.

…testing Root cause of NotExists(4000) panic: - Transaction's drop_table() was deleting all nodes but NOT resetting the Client's next_id counter (unlike Client's own drop_table which does reset it) - This caused benchmark iterations to reuse stale IDs after cleanup Changes: 1. Added reset_next_id() method to Client for external callers 2. Transaction's drop_table() now calls client.reset_next_id() for proper isolation 3. Reduced BACKGROUND_LINKS from 3000 to 10 and added LINK_COUNT=10 constant for much faster benchmark iteration during development 4. Updated all benchmark files to use the new LINK_COUNT constant 5. CI workflow now only runs on push to main/master (not on PRs) - This fixes the double-execution issue on PR commits - README.md updates only happen on default branch merges 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

konard · 2025-12-23T15:43:56Z

Changes Made

I've addressed all the feedback from the previous comment:

1. Fix for NotExists(4000) Error

Root Cause Found: The Transaction's drop_table() method was deleting all Neo4j nodes but NOT resetting the client's next_id counter. This caused subsequent benchmark iterations to try deleting IDs that didn't exist because new links were being created with stale (higher) IDs.

Fix: Added reset_next_id() method to Client and now call client.reset_next_id() in Transaction's drop_table().

2. CI Workflow No Longer Runs Twice on PRs

Changed the workflow to only trigger on push to main or master branches. The benchmark workflow (which takes ~70+ minutes) will now only run when PRs are merged, not on every PR commit.

3. Reduced Benchmark Iterations to 10 Links

Changed BACKGROUND_LINKS from 3000 to 10
Added new LINK_COUNT constant (10) for the number of links to create/delete/update
Updated all benchmark files to use these constants

This should make iterations much faster for development/testing purposes.

Note: Since CI no longer runs on PRs, you may want to consider adding a separate lightweight workflow that runs on PRs to just verify the build compiles (without running the full benchmark). Let me know if you'd like me to add that.

The build has been verified locally:

$ cargo check --all-features
    Finished dev [unoptimized + debuginfo] target(s) in 10.44s

Ready for review!

konard · 2025-12-23T15:44:19Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

Public pricing estimate: $4.817247 USD
Calculated by Anthropic: $3.140494 USD
Difference: $-1.676753 (-34.81%)
📎 Log file uploaded as GitHub Gist (651KB)
🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2025-12-23T15:44:25Z

🔄 Auto-restart 1/3

Detected uncommitted changes from previous run. Starting new session to review and commit them.

Uncommitted files:

?? ci-logs/
?? ci-runs.json
?? issue-5.json
?? pr-view.json

Auto-restart will stop after changes are committed or after 2 more iterations. Please wait until working session will end and give your feedback.

- Add pull_request trigger for branches main/master to allow testing PR changes - Add condition to "Prepare benchmark results" step to only run on push events to main/master branches (not on PRs) - This ensures: - PRs can be tested before merging (benchmark runs and fails if broken) - README.md updates only happen when changes are merged to main/master 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

konard · 2025-12-23T16:05:04Z

✅ CI Now Passing

The benchmark fix has been verified. Here's a summary of what was done:

Changes Made

Fixed Transaction benchmark isolation - Added reset_next_id() call in Transaction's drop_table() to reset the ID counter between benchmark iterations
Fixed CI workflow - Changed workflow to run on both PRs (for testing) and pushes to main/master (for README updates)
Reduced iterations - Changed BACKGROUND_LINKS from 3000 to 10 and added LINK_COUNT=10 for faster iteration

CI Results

✅ Run #20465184821 completed successfully in ~15 minutes
✅ All benchmarks pass including the previously failing Delete/Neo4j_Transaction
✅ "Prepare benchmark results" step was correctly skipped (only runs on push to main/master)
✅ No NotExists(4000) or other errors

Ready for Review

The PR is ready to be reviewed and merged. After merge, the benchmark will run on the main branch and automatically update the README with new benchmark results.

konard · 2025-12-23T16:05:21Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

Public pricing estimate: $3.543635 USD
Calculated by Anthropic: $2.155418 USD
Difference: $-1.388217 (-39.17%)
📎 Log file uploaded as GitHub Gist (1025KB)
🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2025-12-23T19:51:01Z

We should also fix all warnings in https://github.com/linksplatform/Comparisons.Neo4jVSDoublets/actions/runs/20465184821/job/58807054020?pr=6

konard · 2025-12-23T19:51:34Z

🤖 AI Work Session Started

Starting automated work session at 2025-12-23T19:51:32.489Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback.

Moved BACKGROUND_LINKS imports from module-level to inside bench! macro bodies where they are actually used. This eliminates 6 compiler warnings that appeared in CI run #20465184821. The bench! macro already imports BACKGROUND_LINKS internally (lib.rs:18), so module-level imports were redundant. Now each benchmark that uses BACKGROUND_LINKS explicitly imports it within the macro body. Fixes warnings in: - benches/benchmarks/delete.rs - benches/benchmarks/update.rs - benches/benchmarks/each/concrete.rs - benches/benchmarks/each/identity.rs - benches/benchmarks/each/incoming.rs - benches/benchmarks/each/outgoing.rs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

konard · 2025-12-23T19:57:55Z

Fixed Compiler Warnings

All 6 unused import warnings from CI run #20465184821 have been fixed.

Changes Made

Moved BACKGROUND_LINKS imports from module-level to inside bench! macro bodies in 6 benchmark files
The bench! macro already imports BACKGROUND_LINKS internally (lib.rs:18), so module-level imports were redundant
Verified locally with cargo check --benches - no warnings

Verification

✅ Local build: cargo check --all-features passes with no warnings
✅ Benchmarks: cargo check --benches passes with no warnings
🔄 CI run #20470267171 in progress to verify fix in CI environment

The warnings were:

warning: unused import: `BACKGROUND_LINKS`
  --> benches/benchmarks/delete.rs:10:81
  --> benches/benchmarks/update.rs:10:81
  --> benches/benchmarks/each/concrete.rs:11:81
  --> benches/benchmarks/each/identity.rs:11:81
  --> benches/benchmarks/each/incoming.rs:11:81
  --> benches/benchmarks/each/outgoing.rs:11:81

konard · 2025-12-23T19:58:27Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

Public pricing estimate: $1.474991 USD
Calculated by Anthropic: unknown
Difference: unknown
📎 Log file uploaded as GitHub Gist (366KB)
🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

Initial commit with task details

7bc6ac6

Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: #5

konard self-assigned this Dec 23, 2025

konard and others added 3 commits December 23, 2025 15:03

konard changed the title ~~[WIP] Benchmark is broken~~ Fix benchmark CI false success and Transaction isolation Dec 23, 2025

konard marked this pull request as ready for review December 23, 2025 14:04

Remove CLAUDE.md task file

5f3255f

This file was used for AI task management and should not be committed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

konard marked this pull request as draft December 23, 2025 15:37

konard marked this pull request as ready for review December 23, 2025 15:44

konard marked this pull request as draft December 23, 2025 19:51

konard marked this pull request as ready for review December 23, 2025 19:58

konard merged commit b066d59 into main Dec 23, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix benchmark CI false success and Transaction isolation#6

Fix benchmark CI false success and Transaction isolation#6
konard merged 8 commits intomainfrom
issue-5-bdec6bf473d2

konard commented Dec 23, 2025 •

edited

Loading

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025 •

edited

Loading

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

konard commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core Fix

Performance Improvement

Code Quality

CI Workflow

Root Cause Analysis

Test Plan

Uh oh!

konard commented Dec 23, 2025

🤖 Solution Draft Log

Uh oh!

konard commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Changes Made

1. Fix for NotExists(4000) Error

2. CI Workflow No Longer Runs Twice on PRs

3. Reduced Benchmark Iterations to 10 Links

Uh oh!

konard commented Dec 23, 2025

🤖 Solution Draft Log

Uh oh!

konard commented Dec 23, 2025

🔄 Auto-restart 1/3

Uh oh!

konard commented Dec 23, 2025

✅ CI Now Passing

Changes Made

CI Results

Ready for Review

Uh oh!

konard commented Dec 23, 2025

🤖 Solution Draft Log

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Uh oh!

konard commented Dec 23, 2025

Fixed Compiler Warnings

Changes Made

Verification

Uh oh!

konard commented Dec 23, 2025

🤖 Solution Draft Log

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

konard commented Dec 23, 2025 •

edited

Loading

konard commented Dec 23, 2025 •

edited

Loading