fix `IovDeque` for non 4K pages #5222

ShadowCurse · 2025-05-22T14:33:22Z

Changes

The L const generic was determining the maximum number of iov
elements in the IovDeque. This cases the issue when the host kernel
uses pages which can contain more entries than L. For example usual
4K pages can contain 256 iovs while 16K pages can contain 1024 iovs.
Current implementation on 16K (and any other bigger than 4K page size)
will continue wrap IovDeque when it reaches 256'th element. This
breaks the implementation since elements written past 256'th index will
not be 'duplicated' at the beginning of the queue.

Curren implementation expects this behavior:

 page 1 page 2
|ABCD|#|ABCD|
      ^ will wrap here

With big page sizes current impl will:

 page 1              page2
|ABCD|EFGD________|#|ABCDEFGD________|
     ^ sill wrap here
                   ^ but should wrap here

The solution is to calculate the maximum capacity the IovDeque can
hold, and use it for wrapping purposes. This capacity is allowed to be
bigger than L. The actual used number of entries in the queue will
still be guarded by the L parameter used in the is_full method.

Reason

Fixes #5217

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

I have read and understand CONTRIBUTING.md.
I have run tools/devtool checkstyle to verify that the PR passes the
automated style checks.
I have described what is done in these changes, why they are needed, and
how they are solving the problem in a clear and encompassing way.
I have updated any relevant documentation (both in code and in the docs)
in the PR.
I have mentioned all user-facing changes in CHANGELOG.md.
If a specific issue led to this PR, this PR closes the issue.
When making API changes, I have followed the
Runbook for Firecracker API changes.
I have tested all new and changed functionalities in unit tests and/or
integration tests.
I have linked an issue to every new TODO.

This functionality cannot be added in rust-vmm.

codecov · 2025-05-22T14:38:17Z

Codecov Report

Attention: Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 82.93%. Comparing base (331ffec) to head (9b52af9).

Files with missing lines	Patch %	Lines
src/vmm/src/devices/virtio/iov_deque.rs	87.50%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5222      +/-   ##
==========================================
+ Coverage   82.88%   82.93%   +0.05%     
==========================================
  Files         250      250              
  Lines       26936    26942       +6     
==========================================
+ Hits        22325    22344      +19     
+ Misses       4611     4598      -13

Flag	Coverage Δ
5.10-c5n.metal	`83.37% <87.50%> (+<0.01%)`	⬆️
5.10-m5n.metal	`83.37% <87.50%> (+<0.01%)`	⬆️
5.10-m6a.metal	`82.58% <87.50%> (-0.01%)`	⬇️
5.10-m6g.metal	`79.20% <87.50%> (+<0.01%)`	⬆️
5.10-m6i.metal	`83.36% <87.50%> (-0.01%)`	⬇️
5.10-m7a.metal-48xl	`82.57% <87.50%> (?)`
5.10-m7g.metal	`79.20% <87.50%> (+<0.01%)`	⬆️
5.10-m7i.metal-24xl	`83.32% <87.50%> (?)`
5.10-m7i.metal-48xl	`83.33% <87.50%> (?)`
5.10-m8g.metal-24xl	`79.19% <87.50%> (?)`
5.10-m8g.metal-48xl	`79.19% <87.50%> (?)`
6.1-c5n.metal	`83.42% <87.50%> (+<0.01%)`	⬆️
6.1-m5n.metal	`83.41% <87.50%> (-0.01%)`	⬇️
6.1-m6a.metal	`82.64% <87.50%> (+<0.01%)`	⬆️
6.1-m6g.metal	`79.20% <87.50%> (+<0.01%)`	⬆️
6.1-m6i.metal	`83.40% <87.50%> (-0.01%)`	⬇️
6.1-m7a.metal-48xl	`82.62% <87.50%> (?)`
6.1-m7g.metal	`79.20% <87.50%> (+<0.01%)`	⬆️
6.1-m7i.metal-24xl	`83.42% <87.50%> (?)`
6.1-m7i.metal-48xl	`83.43% <87.50%> (?)`
6.1-m8g.metal-24xl	`79.19% <87.50%> (?)`
6.1-m8g.metal-48xl	`79.19% <87.50%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The `L` const generic was determining the maximum number of `iov` elements in the `IovDeque`. This cases the issue when the host kernel uses pages which can contain more entries than `L`. For example usual 4K pages can contain 256 `iov`s while 16K pages can contain 1024 `iov`s. Current implementation on 16K (and any other bigger than 4K page size) will continue wrap `IovDeque` when it reaches 256'th element. This breaks the implementation since elements written past 256'th index will not be 'duplicated' at the beginning of the queue. Curren implementation expects this behavior: page 1 page 2 |ABCD|#|ABCD| ^ will wrap here With big page sizes current impl will: page 1 page2 |ABCD|EFGD________|#|ABCDEFGD________| ^ sill wrap here ^ but should wrap here The solution is to calculate the maximum capacity the `IovDeque` can hold, and use it for wrapping purposes. This capacity is allowed to be bigger than `L`. The actual used number of entries in the queue will still be guarded by the `L` parameter used in the `is_full` method. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>

Add note about `IovDeque` fix for non 4K pages. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>

Manciukic · 2025-05-22T15:32:34Z

src/vmm/src/devices/virtio/iov_deque.rs

could you to add a unit test to verify the behaviour when L < capacity? For example, using L=64 and testing a scenario where it was failing before this change

louwers · 2025-05-22T17:08:21Z

Confirmed that this fix seems to solve the problem I reported.

The issue is no longer reproducible.

ShadowCurse force-pushed the net_16k_fix branch 3 times, most recently from 037f7dc to 62393e7 Compare May 22, 2025 14:56

ShadowCurse added 2 commits May 22, 2025 16:01

chore: update CHANGELOG with a fix

9b52af9

Add note about `IovDeque` fix for non 4K pages. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>

ShadowCurse force-pushed the net_16k_fix branch from 62393e7 to 9b52af9 Compare May 22, 2025 15:01

ShadowCurse mentioned this pull request May 22, 2025

[Bug] Regression v1.10.0 tap device unreliable and unresponsive #5217

Open

3 tasks

Manciukic reviewed May 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix `IovDeque` for non 4K pages #5222

fix `IovDeque` for non 4K pages #5222

Uh oh!

ShadowCurse commented May 22, 2025 •

edited

Loading

Uh oh!

codecov bot commented May 22, 2025 •

edited

Loading

Uh oh!

Manciukic May 22, 2025

Uh oh!

louwers commented May 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

fix IovDeque for non 4K pages #5222

Are you sure you want to change the base?

fix IovDeque for non 4K pages #5222

Uh oh!

Conversation

ShadowCurse commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason

License Acceptance

PR Checklist

Uh oh!

codecov bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Manciukic May 22, 2025

Choose a reason for hiding this comment

Uh oh!

louwers commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fix `IovDeque` for non 4K pages #5222

fix `IovDeque` for non 4K pages #5222

ShadowCurse commented May 22, 2025 •

edited

Loading

codecov bot commented May 22, 2025 •

edited

Loading

louwers commented May 22, 2025 •

edited

Loading