Skip to content

fix(core): use atomic writes in storage to prevent data corruption on crash#13745

Open
vaab wants to merge 1 commit intoanomalyco:devfrom
vaab:fix/atomic-storage-writes
Open

fix(core): use atomic writes in storage to prevent data corruption on crash#13745
vaab wants to merge 1 commit intoanomalyco:devfrom
vaab:fix/atomic-storage-writes

Conversation

@vaab
Copy link

@vaab vaab commented Feb 15, 2026

What does this PR do?

Bun.write(path, data) truncates the target file before writing. If the process crashes between truncate and write, the file ends up filled with null bytes — valid size, zero content. This corrupts session/message JSON files and can make entire sessions unloadable.

Fixes #7607, #9673, #10904.

Related PRs: #7734 and #11646 address the same root cause with a broader scope (CLI repair/restore, retry/backoff, fsync, quarantine). This PR is a minimal fix focused only on preventing the corruption.

See also #13032 which adds graceful degradation on corrupted reads — complementary to this fix.

The fix is the standard write-to-temp-then-rename pattern:

  • Write data to a dotfile sibling (.basename.random.tmp) in the same directory
  • fs.rename() atomically to the target (atomic on same filesystem, POSIX guarantee)
  • Clean up the temp file if rename fails
  • On startup, remove any orphaned .tmp files left by a previous crash

Dotfile prefix ensures Bun.Glob("**/*") in Storage.list() never picks up temp files (it skips dotfiles by default).

All writes in storage.ts are covered: Storage.write(), Storage.update(), migrations, and the migration index.

How did you verify your code works?

  • Confirmed the root cause by hex-dumping real corrupted files from a crashed session — all null bytes, matching the O_TRUNC + crash pattern
  • 108 existing session tests pass, 0 fail
  • Verified Bun.Glob ignores dotfiles with a manual test

… crash

Bun.write() truncates the target file before writing. If the process
crashes between truncate and write, the file ends up filled with null
bytes. This corrupts session/message JSON and can make entire sessions
unloadable.

Replace all Bun.write() calls in storage.ts with an atomic helper that
writes to a dotfile sibling then renames into place. Clean up orphaned
temp files on startup.
@github-actions
Copy link
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search, I found one potentially related PR:

Related PR:

Why it's related:
The PR description explicitly mentions #7734 as addressing "the same root cause with a broader scope (CLI repair/restore, retry/backoff, fsync, quarantine)". PR #13745 is a minimal fix focused specifically on the atomic writes pattern, while #7734 takes a wider approach. These are complementary approaches to the same underlying issue of preventing data corruption from truncate-and-crash scenarios.

@rekram1-node
Copy link
Collaborator

Are you sure you are still facing issues? We switched to sqlite so a lot of this code doesnt run anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

If disk space runs out your active session becomes corrupted

2 participants