Skip to content

fix(keyval): make FilesystemStore._set_raw durable and leak-free#188

Merged
tomasvotava merged 2 commits into
masterfrom
fix/keyval-filesystem-durable-flush
May 19, 2026
Merged

fix(keyval): make FilesystemStore._set_raw durable and leak-free#188
tomasvotava merged 2 commits into
masterfrom
fix/keyval-filesystem-durable-flush

Conversation

@tomasvotava
Copy link
Copy Markdown
Owner

Follow-up to #185.

FilesystemStore._set_raw is the atomic-write pattern that #158/#185 was explicitly told to mirror. The gemini-code-assist review on #185 identified durability/leak gaps in the FileQueue copy; the keyval sibling had the same two:

  • Durability: no flush()/fsync() before os.replace — a crash right after the rename could leave an empty/truncated value file. → now flush() + os.fsync().
  • Temp-file leak: delete=False with no cleanup — a write failure stranded the temp file in root_path. → now removed via Path(...).unlink(missing_ok=True) on error, then re-raise.

The third FileQueue gap (text encoding) does not apply — this writer is binary ("wb").

Out of scope: LocalFileCache's write path has separate, larger safety issues already tracked in #156 — deliberately not folded in here.

Tests: new test_set_cleans_up_temp_file_on_error + existing suite (19 passed). ruff / ruff-format / mypy-strict / import-linter all green.

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.64%. Comparing base (3226d80) to head (96ac1a8).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #188      +/-   ##
==========================================
+ Coverage   87.61%   87.64%   +0.02%     
==========================================
  Files          57       57              
  Lines        2932     2938       +6     
==========================================
+ Hits         2569     2575       +6     
  Misses        363      363              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the durability of the filesystem key-value store by implementing flush and fsync during writes and ensuring temporary files are deleted upon failure. A new test case validates the cleanup logic. Feedback suggests using a finally block instead of except Exception to more robustly handle BaseException types, such as KeyboardInterrupt, preventing potential file leaks.

Comment thread hyperion/adapters/keyval/filesystem.py Outdated
tomasvotava added a commit that referenced this pull request May 19, 2026
…ally

Addresses gemini-code-assist review on PR #188: except Exception missed BaseException (e.g. KeyboardInterrupt during write), which would still strand the temp file. try/finally always cleans up; on success os.replace already moved the file so unlink(missing_ok=True) is a no-op. Keeps the explicit tmp_path: str | None annotation (mypy-strict rejects the unannotated form the bot suggested).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
tomasvotava and others added 2 commits May 19, 2026 20:22
Follow-up to PR #185 (gemini-code-assist review on the sibling FileQueue): the explicitly-mirrored FilesystemStore._set_raw had the same two gaps — no flush()/fsync() before the atomic rename (a crash just after os.replace could leave an empty/truncated value file) and, with delete=False, a write failure stranded the temp file in root_path. Add flush()+fsync() for crash durability and remove the temp file if anything before os.replace fails. (The third FileQueue gap, text encoding, does not apply here — this writer is binary.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ally

Addresses gemini-code-assist review on PR #188: except Exception missed BaseException (e.g. KeyboardInterrupt during write), which would still strand the temp file. try/finally always cleans up; on success os.replace already moved the file so unlink(missing_ok=True) is a no-op. Keeps the explicit tmp_path: str | None annotation (mypy-strict rejects the unannotated form the bot suggested).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@tomasvotava tomasvotava force-pushed the fix/keyval-filesystem-durable-flush branch from 6decca5 to 96ac1a8 Compare May 19, 2026 18:22
@tomasvotava tomasvotava merged commit f54e758 into master May 19, 2026
20 checks passed
@tomasvotava tomasvotava deleted the fix/keyval-filesystem-durable-flush branch May 19, 2026 18:25
tomasvotava added a commit that referenced this pull request May 19, 2026
…try/finally

Follow-up to the gemini-code-assist review on PR #188 (same atomic-write pattern, applied to #188's keyval sibling). PR #185 had already merged with the except Exception form, which misses BaseException (e.g. KeyboardInterrupt mid-flush) and leaks the temp file beside the queue file. try/finally always cleans up; the success path is a missing_ok no-op since os.replace already renamed it. Keeps FileQueue and FilesystemStore identical.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
tomasvotava added a commit that referenced this pull request May 19, 2026
…try/finally

Follow-up to the gemini-code-assist review on PR #188 (same atomic-write pattern, applied to #188's keyval sibling). PR #185 had already merged with the except Exception form, which misses BaseException (e.g. KeyboardInterrupt mid-flush) and leaks the temp file beside the queue file. try/finally always cleans up; the success path is a missing_ok no-op since os.replace already renamed it. Keeps FileQueue and FilesystemStore identical.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
tomasvotava added a commit that referenced this pull request May 19, 2026
…ss path

Follow-up to merged #188 / sibling of open #189. The gemini-code-assist review on #189 noted the try/finally cleanup does an unnecessary unlink syscall on every successful write; #188 had already merged with the plain try/finally, so master's FilesystemStore._set_raw needs the same tmp_path=None-after-os.replace optimization to stay identical to FileQueue (#189). Cleanup still runs when an exception occurs before/at the rename.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
tomasvotava added a commit that referenced this pull request May 19, 2026
…ss path

Follow-up to merged #188 / sibling of open #189. The gemini-code-assist review on #189 noted the try/finally cleanup does an unnecessary unlink syscall on every successful write; #188 had already merged with the plain try/finally, so master's FilesystemStore._set_raw needs the same tmp_path=None-after-os.replace optimization to stay identical to FileQueue (#189). Cleanup still runs when an exception occurs before/at the rename.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant