Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add net device support #49

Merged
merged 2 commits into from
Mar 5, 2021
Merged

Add net device support #49

merged 2 commits into from
Mar 5, 2021

Conversation

alexandruag
Copy link
Collaborator

This PR refactors some of the common virtio device code to reduce the amount of required boilerplate, and then implements virtio net device support. The tap functionality and associated logic is taken from Firecracker for now, until we'll have a similar component available in rust-vmm or someplace else. More tests have to be added, and we're going to circle back to that after the initial version of the vsock support PR is ready as well.

Helpful steps for testing net device support:

  • A tap device is needed on the host. One can be created and configured with something like (please note the example might conflict with the existing host configuration if there's overlap in terms of IP addresses/subnets):
sudo ip tuntap add dev vmtap100 mode tap
sudo ip addr add 192.168.241.1/24 dev vmtap100
sudo ip link set vmtap100 up
  • Start the reference vmm with the --net parameter, i.e. cargo run ... -- ... --net tap=vmtap100

  • A network interface should be present inside the guest (usually eth0, but the exact naming may vary). We need to configure this interface as well, for example by running (inside the guest):

ip a add 192.168.241.2/24 dev eth0
ip link set eth0 up

Please note the interface in the guest and the tap device from the host must be configured as part of the same subnet ( 192.168.241.0/24 in the previous examples) to communicate directly.

pub fn cmdline_string(&self) -> String {
let mut s = String::new();
if self.root_device {
s.push_str("root=/dev/vda");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't vda be configurable? We should keep track somewhere of the number of block devices inserted, and compose the vd* part, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also nit - for key=val params you can use Cmdline::insert)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats good to know; for now this is used as a String here, and I think there is still an open discussion/todo regarding the most seamless way to leverage Cmdline everywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like currently we can only have one block device in the vmm config. Created #97 to not forget about this.

src/devices/src/virtio/block/mod.rs Outdated Show resolved Hide resolved
src/devices/src/virtio/mod.rs Outdated Show resolved Hide resolved
src/devices/src/virtio/net/bindings.rs Show resolved Hide resolved
src/devices/src/virtio/net/bindings.rs Show resolved Hide resolved
src/devices/src/virtio/net/device.rs Show resolved Hide resolved

let mut chain = match self.rxq.iter()?.next() {
Some(c) => c,
_ => return Ok(false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would better be expressed as an error variant, since we've exhausted the descriptors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily an error, because it simply means there's no more room to write frames, which can appear frequently during normal operation. Most errors are treated as unrecoverable in this call chain for now. This could definitely use more polish/clarity; was just waiting for some of the ongoing vm-virtio things to settle down before changing things here as well.

let rxq = self.cfg.virtio.queues[0].clone();
let txq = self.cfg.virtio.queues[1].clone();

let tap = Tap::open_named(self.tap_name.as_str()).map_err(Error::Tap)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this also creates the device if it isn't already present? The PR description mentions how to create tap devices in order for the tests to run, but shouldn't they be created anyway assuming that the user has CAP_NET_ADMIN?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the API yes, but the vmm should be run as deprivileged as possible in general.

ops.remove(Events::empty(&self.rx_ioevent))
.expect("Failed to remove rx ioevent");
ops.remove(Events::empty(&self.tx_ioevent))
.expect("Failed to remove rx ioevent");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.expect("Failed to remove rx ioevent");
.expect("Failed to remove tx ioevent");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to ops.remove them in a row, keep track of the results, and panic after all the removes have been attempted so that events don't incorrectly remain in the event manager in case of a quick death.

That being said, I'm not sure about panicking in library code. There's not much we can do about it yet but if you go ahead with the expects, please add an issue to track removing them once we have metrics.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is something to think about. After this PR is merged we want to consolidate common/related device logic even more, so at least we can focus on one place (there's something similar for block right now as well).

src/devices/src/virtio/net/tap.rs Show resolved Hide resolved
Copy link
Collaborator

@lauralt lauralt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, left a few comments, most of them related to rx/tx processing.

bindings::TUN_F_CSUM
| bindings::TUN_F_UFO
| bindings::TUN_F_TSO4
| bindings::TUN_F_TSO6,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary/required to add TUN_F_TSO6 if VIRTIO_NET_F_GUEST_TSO6 wasn't negotiated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it was missing. Added!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you recently pointed out, I forgot to add VIRTIO_NET_F_GUEST_TSO6 to the feature list as well. I've done so, and will update the next time I push changes to the PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve once this comment is addressed. Thanks!

Tap(io::Error),
}

impl From<vm_virtio::Error> for Error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we implement From for the other variants as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't because it was easier/cleaner to use map_err where the other variants are used. We can add more implementations of From if we get to a point where map_err is unwieldy. Otherwise, I like using it because it's a lot more explicit in terms of what actual error gets returned from a method.

pub fn process_tap(&mut self) -> result::Result<(), Error> {
loop {
if self.rxbuf_current == 0 {
match self.tap.read(&mut self.rxbuf) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what happens if we read some data from tap here, but don't have a descriptor chain available for writing to its buffer. Is that data lost? Should we do this read only if we're sure there is at least one descriptor chain?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tap event is registered as edged triggered, which means that it'll fire when something causes more data to be available on the tap, but only once for each such occurrence (so it will not fire continuously while any data is available). Moreover, we read the frame first in self.rxbuf (if it's empty), otherwise we attempt to write the frame that was already present in self.rxbuf to the guest instead of reading from the tap.

If we cannot write to the guest, nothing is lost, because the current frame remains in self.rxbuf. In time, the tap itself can start losing frames that arrive if we cannot read the rest quick enough. Also, it's worth noting that losing frames is acceptable for network devices (whether or not it's preferable to buffering is a bigger discussion).

break;
}

let len = cmp::min(left, desc.len() as usize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if len field from the descriptor is always 0 and we don't write any data to memory (count = 0 at the end of this loop)? Does it still make sense to add an entry in the used ring when no data was actually transferred?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, because even if we didn't write anything, we still consumed a descriptor chain from the available ring, and now we have to return it to the used ring. The driver is expected to handle stranger cases like the one you mentioned (and also the standard does not prohibit this kind of situation).

while let Some(mut chain) = self.txq.iter()?.next() {
let len = self.send_frame_from_chain(&mut chain)?;

self.txq.add_used(chain.head_index(), len)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len should be replaced with 0 I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right!

count += len;
}

self.tap.write(&self.txbuf[..count]).map_err(Error::Tap)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we treat the case when less than count bytes could've been written to the tap device? I guess we can add at least a (TODO) comment about this :-?.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, the implementation of a tap interface will never cause partial reads/writes, so (in this context) every write is equivalent to a write_all. This is something to keep in mind when we start defining a more generic abstraction.

// Adding the virtio devices. We'll come up with a cleaner abstraction for `Env`.

let mem = Arc::new(vmm.guest_memory.clone());

// We transiently define a mut `Cmdline` object here to pass for device creation
// (devices expect a `&mut Cmdline` to leverage the newly added virtio_mmio
// functionality), until we figure out how this fits with the rest of the vmm, which
// apparently does not explicitly use `Cmdline` structs. Will discuss and fix this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could use Cmdline everywhere (hopefully, we will avoid this way adding spaces in strings like here, but didn't look very much into how is Cmdline implemented).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's def a thread we should continue. Created #98 to track it.

Copy link
Collaborator Author

@alexandruag alexandruag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all, I've updated the PR, answered the previous comments, and opened some new issues. Would be useful to merge the current iteration, so we can add some more consolidation logic and device simplification on top of these changes. Here are some of the current concerns:

  • The unit tests for the Tap struct taken from Firecracker/crosvm require elevated privileges to run, as you have already noticed. Given that we'd like Tap to be a separate abstraction (Tap abstractions #94), should we just consider the current implementation essentially immutable after we've brought it over and not include those tests as well, or should we temporarily run the tests with more privileges (if they are not so already)?

  • Not sure how creating and tearing down TAP interfaces should exist within the current integration testing framework (as it's required for integration testing). Does anyone have a good starting idea here? Should we create an issue and solve that separately going forward?

src/devices/src/virtio/block/mod.rs Outdated Show resolved Hide resolved
pub fn cmdline_string(&self) -> String {
let mut s = String::new();
if self.root_device {
s.push_str("root=/dev/vda");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats good to know; for now this is used as a String here, and I think there is still an open discussion/todo regarding the most seamless way to leverage Cmdline everywhere.

pub fn cmdline_string(&self) -> String {
let mut s = String::new();
if self.root_device {
s.push_str("root=/dev/vda");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like currently we can only have one block device in the vmm config. Created #97 to not forget about this.

src/devices/src/virtio/mod.rs Outdated Show resolved Hide resolved
src/devices/src/virtio/net/bindings.rs Show resolved Hide resolved
pub fn process_tap(&mut self) -> result::Result<(), Error> {
loop {
if self.rxbuf_current == 0 {
match self.tap.read(&mut self.rxbuf) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tap event is registered as edged triggered, which means that it'll fire when something causes more data to be available on the tap, but only once for each such occurrence (so it will not fire continuously while any data is available). Moreover, we read the frame first in self.rxbuf (if it's empty), otherwise we attempt to write the frame that was already present in self.rxbuf to the guest instead of reading from the tap.

If we cannot write to the guest, nothing is lost, because the current frame remains in self.rxbuf. In time, the tap itself can start losing frames that arrive if we cannot read the rest quick enough. Also, it's worth noting that losing frames is acceptable for network devices (whether or not it's preferable to buffering is a bigger discussion).

count += len;
}

self.tap.write(&self.txbuf[..count]).map_err(Error::Tap)?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, the implementation of a tap interface will never cause partial reads/writes, so (in this context) every write is equivalent to a write_all. This is something to keep in mind when we start defining a more generic abstraction.

while let Some(mut chain) = self.txq.iter()?.next() {
let len = self.send_frame_from_chain(&mut chain)?;

self.txq.add_used(chain.head_index(), len)?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right!

src/devices/src/virtio/net/tap.rs Show resolved Hide resolved
// Adding the virtio devices. We'll come up with a cleaner abstraction for `Env`.

let mem = Arc::new(vmm.guest_memory.clone());

// We transiently define a mut `Cmdline` object here to pass for device creation
// (devices expect a `&mut Cmdline` to leverage the newly added virtio_mmio
// functionality), until we figure out how this fits with the rest of the vmm, which
// apparently does not explicitly use `Cmdline` structs. Will discuss and fix this
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's def a thread we should continue. Created #98 to track it.

@alexandruag alexandruag force-pushed the net_pr branch 4 times, most recently from 457a08f to ee66a26 Compare March 3, 2021 17:12
Signed-off-by: Alexandru Agache <aagch@amazon.com>
@alexandruag alexandruag force-pushed the net_pr branch 3 times, most recently from 85659d8 to d0258ce Compare March 3, 2021 17:35
@alexandruag
Copy link
Collaborator Author

Looks like the tap tests run ok on the CI system; in that case the current disadvantage is that ppl running locally have to explicitly provide the required privileges so they can pass. I lowered the coverage due to not having sufficient unit tests as part of the net functionality (the queue handlers in particular). Are you ok with tracking that as a separate issue while things start falling into place more within vm-virtio (and some of the device logic gets consolidated)?

@andreeaflorescu
Copy link
Member

Hi all, I've updated the PR, answered the previous comments, and opened some new issues. Would be useful to merge the current iteration, so we can add some more consolidation logic and device simplification on top of these changes. Here are some of the current concerns:

  • The unit tests for the Tap struct taken from Firecracker/crosvm require elevated privileges to run, as you have already noticed. Given that we'd like Tap to be a separate abstraction (Tap abstractions #94), should we just consider the current implementation essentially immutable after we've brought it over and not include those tests as well, or should we temporarily run the tests with more privileges (if they are not so already)?

I think we are more likely to write tests for uncover lines than to come back and re-write the tests such that they do not require sudo (at least this seems to be the case for past experiences). I would prefer us to be in a place where tests can run without polluting the environment.

  • Not sure how creating and tearing down TAP interfaces should exist within the current integration testing framework (as it's required for integration testing). Does anyone have a good starting idea here? Should we create an issue and solve that separately going forward?

We should add an issue for this. Ideally tests will not leave tap interfaces even when the run fails.

alxiord
alxiord previously approved these changes Mar 5, 2021
Copy link
Member

@alxiord alxiord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The tests pass on CI because they already run privileged there; I think we shouldn't invest more in fixing the privilege problem until #94 is done, and I prefer missing tests to overly complex ones - at least until some of the complexity goes away.

Signed-off-by: Alexandru Agache <aagch@amazon.com>
@alexandruag
Copy link
Collaborator Author

I removed the Tap tests that required special permissions, and I'll add any missing issues for outstanding work items later today. Please have another look, and thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants