-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtio block flush (VIRTIO_BLK_T_FLUSH) not syncing data to host #2172
Comments
Thanks for logging this @pclesr . We'll take a deeper look and let you know our findings. |
metrics showing flush received:
|
It seems calling |
Thanks @bzld. I am not sure we need to call fsync() at all in the current block device emulation, since the file is opened with O_SYNC this implies that each write returns after data has been sent to the underlying hardware just like a write()+ fsync(). |
@sandreim Yes, firecracker/src/devices/src/virtio/block/device.rs Lines 42 to 45 in d9d00ec
(Also, using |
Using the 'big hammer' approach does force the data out -- using
Using
|
Thanks everyone for following up on this.
@pclesr, would you be interested in submitting a PR to mainline with the solution? |
Hello @alindima , yes I can submit a PR. I am working on a slightly cleaner implementation that would keep the existing flush in place so that any buffered data would be sent to the page cache before the sync. I don't know if that is really necessary, though. My knowledge of how the Rust library buffers data is non-existent. I will have something ready in a day or two. Thank you very much for looking at this ! |
@pclesr I'm confident that your proposed patch is the complete and correct solution (in terms of
The data is already being sent to the page cache as part of previous The correct function to call is as you mentioned
We would prefer adding the fix mentioned above sooner rather than later, could you prioritize this or should we open a PR with it? |
@acatangiu -- let me run through the steps for contributing in tests/README and I will put together the pull request. The reason it takes time is that I have to compile/test for my arm target on a Qemu cross environment (arm64VM running on an x86) because my current employer doesn't provide me access to a spiffy AWS Annapurna VM. :-( |
I don't see how/why this change would be in any way arch-specific. IMHO it would be fine to just post the PR and take advantage of our automated CI that also runs on aarch64. You can add an integration test (see our |
Also, thanks for the find and the fix! Good catch! |
Created pull request #2185 |
The virtio device advertises Removing
To be strictly correct, the WCE bit should be enabled (since the data is cached on the underlying host), but then the guest has the option of changing the cache type (which is not supported). In addition, supporting the WCE bit requires supporting more than the first 8 bytes of the virtio_blk_config (see section 5.2.4 in https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html) As @sandreim suggested in the PR, I would propose removing the I would then propose offering a separate mode (maybe via a config option) that would set the |
Removing the
Also, offering
So the simple, correct fix would be to just not offer The more flexible fix would be to have a configurable option to enable |
I dug a little more to understand why the Now, because the The References: |
I have created a pull request #2223 that keeps the existing behaviour (no flush) and adds a new drive option 'want_flush' for drives that need the coherency and are willing to take the performance hit. |
Hi @pclesr, We decied to implement two caching mechanisms: unsafe (no flushing mechanism) and writeback (on-demand flushing), which is very similar to your PR. The unsafe option will still advertise the The writeback option will add the actual flush through This should be configurable at the API level when installing a block device, through a |
Describe the bug
It appears that a sync on an ext4 file in the guest does not result in the data on the host also being sync'd out to disk.
To Reproduce
Using the file tester
fio
with the following config file:When running this on the host to establish a baseline and running the linux
perf
utility (with call graphs enabled), one sees on an ancient (android) 4.14 kernel many calls to syncperf is run with
perf record -a --call-graph dwarf,16384 -F 200 -o /tmp/perf.data /tmp/fio /tmp/fio.job
Then the results with
perf report
andperf script
.When running the exact same
fio
and config in the VM, the bandwidth reported byfio
is 4x that of the host baseline, indicating that the data is never actually sync'd to the medium. The page cache on the host is functioning as a huge disk cache. Running perf on the host during the execution of the fio test in the VM results shows no calls to sync.The test is run against a virtio disk (/dev/vdb) that is formatted as an ext4 disk on the host, and then specified in the list of devices in the VMs config.json and mounted in the guest.
If perf is run in the VM (with a 5.7 kernel), then perf shows calls both to sync and the ext4 write call in the VM kernel:
so I do not believe it is a problem with fio.
It does not seem to matter what the VM config is. Firecracker is master as of 31 August 2020.
Expected behaviour
I would have expected that synchronous IO in the VM would be worse than on the host.
Environment
Firecracker v0.21.0-473-gf2f6d8f8
Host: Android 4.14
Guest: 5.7 kernel.org
rootfs: Alpine
Arch: arm64
perf version 4.14.133.g2e3484a
fio-3.16-32-g8c302
Running on an Android system with a 2 core A57 with a 14gig eMMC.
Additional context
This is an embedded application, and because of power loss issues, data that is synchronously written in the guest needs to be resident on media.
I am running baseline perf numbers in both disk and network so that my masters can see how good firecracker is, opposed to all the other VM alternatives. 👍
I am a C programmer, and hardly know Rust. However, I wonder if there is any issue in
src/devices/src/virtio/block/request.rs
because the write is against a memory object, yet the flush is against a disk object?In the guest:
Checks
The text was updated successfully, but these errors were encountered: