Skip to content

Conversation

@adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Feb 3, 2026

Summary

Adds blockdev --flushbufs operation on the guest side after unmounting the sticky disk to ensure data durability before Ceph RBD snapshots are taken. This is the client-side portion of BLA-3202 for the setup-docker-builder action.

Changes:

  • Add getDeviceFromMount() to extract device path from mount point (uses findmnt with fallback to parsing mount output)
  • Add flushBlockDevice() that runs sudo blockdev --flushbufs with 30-second timeout
  • Log I/O stats from /sys/block/{device}/stat before and after flush for observability
  • Add ENABLE_DURABILITY_FLUSH env var feature flag (defaults to enabled, set to "false" to disable)
  • Errors are logged as warnings but don't fail the action to avoid breaking existing workflows

Review & Testing Checklist for Human

  • Verify device path extraction: The getDeviceFromMount function uses findmnt -n -o SOURCE with fallback to regex parsing of mount output. Confirm this works correctly in the actual Blacksmith VM environment (device format may vary, e.g., /dev/rbd0 vs /dev/vdb).
  • Verify flush timing: The flush is called after unmount but while the device is still mapped. Confirm this is the correct ordering for Ceph RBD durability guarantees.
  • Test flush execution: Deploy to staging and verify guest flush duration: log entries appear in workflow output with before/after stats.
  • Verify feature flag: Test with ENABLE_DURABILITY_FLUSH=false to confirm flush is skipped.
  • Check no regression: Ensure existing unmount and commit flow still works correctly.

Recommended test plan:

  1. Run a workflow using setup-docker-builder on a Blacksmith runner
  2. Check workflow logs for guest flush duration: entries
  3. Verify the sticky disk commit still succeeds
  4. Test with ENABLE_DURABILITY_FLUSH=false env var to confirm feature flag works

Notes

This is part of BLA-3202 which spans multiple repositories. Companion PRs exist for:

  • Host-side flush in fa/agent
  • stickydisk client-side flush
  • checkout client-side flush

Link to Devin run: https://app.devin.ai/sessions/611301f918674712b016558ddc2fba0e
Requested by: @adityamaru


Note

Medium Risk
Touches post-action cleanup around unmounting and adds privileged blockdev execution; while best-effort and time-bounded, mis-detected devices or unexpected environments could cause skipped flushes or noisy warnings.

Overview
Improves sticky-disk durability by determining the underlying device for /var/lib/buildkit during post-cleanup and issuing a best-effort sudo blockdev --flushbufs immediately after umount (while the device remains mapped).

Adds getDeviceFromMount() (prefers findmnt, falls back to parsing mount) and flushBlockDevice() with a short timeout plus before/after /sys/block/.../stat logging for observability; flush failures/timeouts are logged as warnings and do not fail the action.

Written by Cursor Bugbot for commit 576eaa6. This will update automatically on new commits. Configure here.

Cursor Bugbot reviewed your changes and found no issues for commit 576eaa6

Add blockdev --flushbufs operation on guest side after unmounting the sticky
disk to ensure data durability before Ceph RBD snapshots are taken.

Changes:
- Add getDeviceFromMount() to extract device path from mount point
- Add flushBlockDevice() that runs blockdev --flushbufs with stats logging
- Log I/O stats from /sys/block/{device}/stat before and after flush
- Add ENABLE_DURABILITY_FLUSH env var for feature flag (defaults to enabled)
- Handle errors gracefully - log warnings but don't fail the cleanup flow

Co-Authored-By: maru@blacksmith.sh <adityamaru@gmail.com>
@linear
Copy link

linear bot commented Feb 3, 2026

@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration bot and others added 2 commits February 3, 2026 20:13
Co-Authored-By: maru@blacksmith.sh <adityamaru@gmail.com>
Co-Authored-By: maru@blacksmith.sh <adityamaru@gmail.com>
@adityamaru
Copy link
Contributor Author

@devin fix the verify build output CI check

@devin-ai-integration
Copy link
Contributor

The verify-build CI check is failing due to non-deterministic source map generation by ncc. The dist/index.js.map file differs between local builds and CI builds, even with identical source code. This is a known issue with ncc's source map generation.

Options to fix this:

  1. Remove source maps from the build (--no-source-map flag in ncc)
  2. Exclude dist/index.js.map from the verify-build diff check
  3. Remove source maps from version control (add to .gitignore)

Would you like me to implement one of these solutions?

adityamaru and others added 2 commits February 4, 2026 16:24
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Changed all flush failure cases from core.info to core.warning
- Added warning for stderr output even when exit code is 0
- This makes it easier to identify when the flush command fails

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Logs SOURCE, FSTYPE, and OPTIONS from findmnt (or full mount line)
to help debug mount-related issues.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@adityamaru adityamaru requested a review from eltonkl February 4, 2026 22:02
@adityamaru adityamaru merged commit ef12d5b into main Feb 4, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants