Skip to content

virtio-queue: Error out on invalid available ring index #3156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 4, 2022

Conversation

dianpopa
Copy link
Contributor

@dianpopa dianpopa commented Oct 3, 2022

The number of descriptor chain heads to process should always be smaller or equal to the queue size, as the driver should never ask the VMM to process a available ring entry more than once. Checking and reporting such incorrect driver behavior can prevent potential hanging and Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa dpopa@amazon.com
Co-authored-by: Bo Chen chen.bo@intel.com

Fixes #3149.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

  • All commits in this PR are signed (git commit -s).
  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • New unsafe code is documented.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.

  • This functionality can be added in rust-vmm.

@dianpopa dianpopa self-assigned this Oct 3, 2022
@dianpopa dianpopa requested a review from a team October 3, 2022 10:27
@dianpopa dianpopa added Type: Bug Indicates an unexpected problem or unintended behavior NextRelease labels Oct 3, 2022
@dianpopa dianpopa force-pushed the fix_virtio_dos branch 2 times, most recently from 62cbafc to 9c71485 Compare October 3, 2022 13:15
Comment on lines 310 to 315
if len > self.actual_size() {
// We are choosing to interrupt execution since this could be a potential malicious driver
// scenario. This way we also eliminate the risk of repeatedly logging and potentially
// clogging the microVM through the log system.
panic!("The number of available virtio descriptors is greater than queue size!");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len > self.actual_size() {
// We are choosing to interrupt execution since this could be a potential malicious driver
// scenario. This way we also eliminate the risk of repeatedly logging and potentially
// clogging the microVM through the log system.
panic!("The number of available virtio descriptors is greater than queue size!");
}
// We are choosing to interrupt execution since this could be a potential malicious driver
// scenario. This way we also eliminate the risk of repeatedly logging and potentially
// clogging the microVM through the log system.
assert!(len > self.actual_size(), "The number of available virtio descriptors is greater than queue size!");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use panic over assert in production code.
At least, this is what we have targeted to do until now so that it is easier to track the panic branches we have in production code.
Also, we have a panic handler installed when starting Firecracker (for flushing metrics and whatnot) which I am not sure would get called if we use asssert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, this is what we have targeted to do until now so that it is easier to track the panic branches we have in production code.

Could you expand, how do we currently track panic branches?

Also, we have a panic handler installed when starting Firecracker (for flushing metrics and whatnot) which I am not sure would get called if we use asssert.

The docs for std::assert specify:

This will invoke the panic! macro if the provided expression cannot be evaluated to true at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, this is what we have targeted to do until now so that it is easier to track the panic branches we have in production code.

Could you expand, how do we currently track panic branches?

We do not have an automated process or something of the kind. But if we were to use asserts in production code (which are extensively used throughout unit tests) is going to be quite challenging to uncover those if the need arises. That would be one of the practical aspects of it.

Also, we have a panic handler installed when starting Firecracker (for flushing metrics and whatnot) which I am not sure would get called if we use asssert.

The docs for std::assert specify:

This will invoke the panic! macro if the provided expression cannot be evaluated to true at runtime.

Nice, thanks for looking this up!
Can you tell me is there is a strong reason for using assert over panic here?

I prefer to use panic over assert in production code. At least, this is what we have targeted to do until now so that it is easier to track the panic branches we have in production code. Also, we have a panic handler installed when starting Firecracker (for flushing metrics and whatnot) which I am not sure would get called if we use asssert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is simpler, but beyond that, no strong reasons.

Comment on lines 482 to 488
if len > self.actual_size() {
// We are choosing to interrupt execution since this could be a potential malicious driver
// scenario. This way we also eliminate the risk of repeatedly logging and potentially
// clogging the microVM through the log system.
panic!("The number of available virtio descriptors is greater than queue size!");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len > self.actual_size() {
// We are choosing to interrupt execution since this could be a potential malicious driver
// scenario. This way we also eliminate the risk of repeatedly logging and potentially
// clogging the microVM through the log system.
panic!("The number of available virtio descriptors is greater than queue size!");
}
// We are choosing to interrupt execution since this could be a potential malicious driver
// scenario. This way we also eliminate the risk of repeatedly logging and potentially
// clogging the microVM through the log system.
assert!(len > self.actual_size(), "The number of available virtio descriptors is greater than queue size!");

Copy link
Contributor Author

@dianpopa dianpopa Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above (#3156 (comment)).

Comment on lines +784 to +790
for j in 0..4 {
vq.dtable[j].set(
0x1000 * (j + 1) as u64,
0x1000,
VIRTQ_DESC_F_NEXT,
(j + 1) as u16,
);
}
Copy link
Contributor

@JonathanWoollett-Light JonathanWoollett-Light Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It is always safe to cast a u64 into a usize but it is not always safe to do the reverse (with stricter linting this would currently raise a warning).

Suggested change
for j in 0..4 {
vq.dtable[j].set(
0x1000 * (j + 1) as u64,
0x1000,
VIRTQ_DESC_F_NEXT,
(j + 1) as u16,
);
}
for j in 0u64..4u64 {
vq.dtable[usize::from(j)].set(
0x1000 * (j + 1),
0x1000,
VIRTQ_DESC_F_NEXT,
(u16::from(j) + 1),
);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point @JonathanWoollett-Light! However, this applies to all queue tests in this file. Can you please open an issue for this? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a wider context, maybe an issue should relate to adding #![warn(clippy::as_conversions)] (https://rust-lang.github.io/rust-clippy/master/)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe modify add this restriction in the cargo clippy command directly if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created basic issue #3161

Comment on lines +836 to +843
for j in 0..4 {
vq.dtable[j].set(
0x1000 * (j + 1) as u64,
0x1000,
VIRTQ_DESC_F_NEXT,
(j + 1) as u16,
);
}
Copy link
Contributor

@JonathanWoollett-Light JonathanWoollett-Light Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It is always safe to cast a u64 into a usize but it is not always safe to do the reverse (with stricter linting this would currently raise a warning).

Suggested change
for j in 0..4 {
vq.dtable[j].set(
0x1000 * (j + 1) as u64,
0x1000,
VIRTQ_DESC_F_NEXT,
(j + 1) as u16,
);
}
for j in 0u64..4u64 {
vq.dtable[usize::from(j)].set(
0x1000 * (j + 1),
0x1000,
VIRTQ_DESC_F_NEXT,
(u16::from(j) + 1),
);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applies to all virtio queue unit tests. We need to open an issue for this.

dianpopa and others added 2 commits October 4, 2022 13:39
We would also take into account newlines
when checking that lines following co-author
are only additional co-authors.

Signed-off-by: Diana Popa <dpopa@amazon.com>
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
@dianpopa dianpopa added Status: Awaiting review Indicates that a pull request is ready to be reviewed and removed Status: Author labels Oct 4, 2022
@dianpopa dianpopa merged commit 8a7152a into firecracker-microvm:main Oct 4, 2022
@dianpopa dianpopa deleted the fix_virtio_dos branch October 17, 2022 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Backport fix for a self-DOS scenario from rust-vmm's vm-virtio
4 participants