-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add net device support #49
Conversation
pub fn cmdline_string(&self) -> String { | ||
let mut s = String::new(); | ||
if self.root_device { | ||
s.push_str("root=/dev/vda"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't vda
be configurable? We should keep track somewhere of the number of block devices inserted, and compose the vd*
part, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also nit - for key=val
params you can use Cmdline::insert
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats good to know; for now this is used as a String
here, and I think there is still an open discussion/todo regarding the most seamless way to leverage Cmdline
everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like currently we can only have one block device in the vmm config. Created #97 to not forget about this.
|
||
let mut chain = match self.rxq.iter()?.next() { | ||
Some(c) => c, | ||
_ => return Ok(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would better be expressed as an error variant, since we've exhausted the descriptors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessarily an error, because it simply means there's no more room to write frames, which can appear frequently during normal operation. Most errors are treated as unrecoverable in this call chain for now. This could definitely use more polish/clarity; was just waiting for some of the ongoing vm-virtio things to settle down before changing things here as well.
let rxq = self.cfg.virtio.queues[0].clone(); | ||
let txq = self.cfg.virtio.queues[1].clone(); | ||
|
||
let tap = Tap::open_named(self.tap_name.as_str()).map_err(Error::Tap)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this also creates the device if it isn't already present? The PR description mentions how to create tap devices in order for the tests to run, but shouldn't they be created anyway assuming that the user has CAP_NET_ADMIN
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the API yes, but the vmm should be run as deprivileged as possible in general.
ops.remove(Events::empty(&self.rx_ioevent)) | ||
.expect("Failed to remove rx ioevent"); | ||
ops.remove(Events::empty(&self.tx_ioevent)) | ||
.expect("Failed to remove rx ioevent"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.expect("Failed to remove rx ioevent"); | |
.expect("Failed to remove tx ioevent"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to ops.remove
them in a row, keep track of the results, and panic after all the remove
s have been attempted so that events don't incorrectly remain in the event manager in case of a quick death.
That being said, I'm not sure about panicking in library code. There's not much we can do about it yet but if you go ahead with the expect
s, please add an issue to track removing them once we have metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is something to think about. After this PR is merged we want to consolidate common/related device logic even more, so at least we can focus on one place (there's something similar for block right now as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice, left a few comments, most of them related to rx/tx processing.
bindings::TUN_F_CSUM | ||
| bindings::TUN_F_UFO | ||
| bindings::TUN_F_TSO4 | ||
| bindings::TUN_F_TSO6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary/required to add TUN_F_TSO6
if VIRTIO_NET_F_GUEST_TSO6
wasn't negotiated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it was missing. Added!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you recently pointed out, I forgot to add VIRTIO_NET_F_GUEST_TSO6
to the feature list as well. I've done so, and will update the next time I push changes to the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll approve once this comment is addressed. Thanks!
Tap(io::Error), | ||
} | ||
|
||
impl From<vm_virtio::Error> for Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we implement From
for the other variants as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't because it was easier/cleaner to use map_err
where the other variants are used. We can add more implementations of From
if we get to a point where map_err
is unwieldy. Otherwise, I like using it because it's a lot more explicit in terms of what actual error gets returned from a method.
pub fn process_tap(&mut self) -> result::Result<(), Error> { | ||
loop { | ||
if self.rxbuf_current == 0 { | ||
match self.tap.read(&mut self.rxbuf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what happens if we read some data from tap here, but don't have a descriptor chain available for writing to its buffer. Is that data lost? Should we do this read
only if we're sure there is at least one descriptor chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tap
event is registered as edged triggered, which means that it'll fire when something causes more data to be available on the tap, but only once for each such occurrence (so it will not fire continuously while any data is available). Moreover, we read the frame first in self.rxbuf
(if it's empty), otherwise we attempt to write the frame that was already present in self.rxbuf
to the guest instead of reading from the tap.
If we cannot write to the guest, nothing is lost, because the current frame remains in self.rxbuf
. In time, the tap itself can start losing frames that arrive if we cannot read the rest quick enough. Also, it's worth noting that losing frames is acceptable for network devices (whether or not it's preferable to buffering is a bigger discussion).
break; | ||
} | ||
|
||
let len = cmp::min(left, desc.len() as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if len
field from the descriptor is always 0 and we don't write any data to memory (count = 0 at the end of this loop)? Does it still make sense to add an entry in the used ring when no data was actually transferred?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, because even if we didn't write anything, we still consumed a descriptor chain from the available ring, and now we have to return it to the used ring. The driver is expected to handle stranger cases like the one you mentioned (and also the standard does not prohibit this kind of situation).
while let Some(mut chain) = self.txq.iter()?.next() { | ||
let len = self.send_frame_from_chain(&mut chain)?; | ||
|
||
self.txq.add_used(chain.head_index(), len)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len
should be replaced with 0 I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right!
count += len; | ||
} | ||
|
||
self.tap.write(&self.txbuf[..count]).map_err(Error::Tap)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we treat the case when less than count bytes could've been written to the tap device? I guess we can add at least a (TODO) comment about this :-?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, the implementation of a tap interface will never cause partial reads/writes, so (in this context) every write
is equivalent to a write_all
. This is something to keep in mind when we start defining a more generic abstraction.
// Adding the virtio devices. We'll come up with a cleaner abstraction for `Env`. | ||
|
||
let mem = Arc::new(vmm.guest_memory.clone()); | ||
|
||
// We transiently define a mut `Cmdline` object here to pass for device creation | ||
// (devices expect a `&mut Cmdline` to leverage the newly added virtio_mmio | ||
// functionality), until we figure out how this fits with the rest of the vmm, which | ||
// apparently does not explicitly use `Cmdline` structs. Will discuss and fix this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could use Cmdline
everywhere (hopefully, we will avoid this way adding spaces in strings like here, but didn't look very much into how is Cmdline
implemented).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's def a thread we should continue. Created #98 to track it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi all, I've updated the PR, answered the previous comments, and opened some new issues. Would be useful to merge the current iteration, so we can add some more consolidation logic and device simplification on top of these changes. Here are some of the current concerns:
-
The unit tests for the
Tap
struct taken from Firecracker/crosvm require elevated privileges to run, as you have already noticed. Given that we'd likeTap
to be a separate abstraction (Tap abstractions #94), should we just consider the current implementation essentially immutable after we've brought it over and not include those tests as well, or should we temporarily run the tests with more privileges (if they are not so already)? -
Not sure how creating and tearing down TAP interfaces should exist within the current integration testing framework (as it's required for integration testing). Does anyone have a good starting idea here? Should we create an issue and solve that separately going forward?
pub fn cmdline_string(&self) -> String { | ||
let mut s = String::new(); | ||
if self.root_device { | ||
s.push_str("root=/dev/vda"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats good to know; for now this is used as a String
here, and I think there is still an open discussion/todo regarding the most seamless way to leverage Cmdline
everywhere.
pub fn cmdline_string(&self) -> String { | ||
let mut s = String::new(); | ||
if self.root_device { | ||
s.push_str("root=/dev/vda"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like currently we can only have one block device in the vmm config. Created #97 to not forget about this.
pub fn process_tap(&mut self) -> result::Result<(), Error> { | ||
loop { | ||
if self.rxbuf_current == 0 { | ||
match self.tap.read(&mut self.rxbuf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tap
event is registered as edged triggered, which means that it'll fire when something causes more data to be available on the tap, but only once for each such occurrence (so it will not fire continuously while any data is available). Moreover, we read the frame first in self.rxbuf
(if it's empty), otherwise we attempt to write the frame that was already present in self.rxbuf
to the guest instead of reading from the tap.
If we cannot write to the guest, nothing is lost, because the current frame remains in self.rxbuf
. In time, the tap itself can start losing frames that arrive if we cannot read the rest quick enough. Also, it's worth noting that losing frames is acceptable for network devices (whether or not it's preferable to buffering is a bigger discussion).
count += len; | ||
} | ||
|
||
self.tap.write(&self.txbuf[..count]).map_err(Error::Tap)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, the implementation of a tap interface will never cause partial reads/writes, so (in this context) every write
is equivalent to a write_all
. This is something to keep in mind when we start defining a more generic abstraction.
while let Some(mut chain) = self.txq.iter()?.next() { | ||
let len = self.send_frame_from_chain(&mut chain)?; | ||
|
||
self.txq.add_used(chain.head_index(), len)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right!
// Adding the virtio devices. We'll come up with a cleaner abstraction for `Env`. | ||
|
||
let mem = Arc::new(vmm.guest_memory.clone()); | ||
|
||
// We transiently define a mut `Cmdline` object here to pass for device creation | ||
// (devices expect a `&mut Cmdline` to leverage the newly added virtio_mmio | ||
// functionality), until we figure out how this fits with the rest of the vmm, which | ||
// apparently does not explicitly use `Cmdline` structs. Will discuss and fix this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's def a thread we should continue. Created #98 to track it.
457a08f
to
ee66a26
Compare
Signed-off-by: Alexandru Agache <aagch@amazon.com>
85659d8
to
d0258ce
Compare
Looks like the tap tests run ok on the CI system; in that case the current disadvantage is that ppl running locally have to explicitly provide the required privileges so they can pass. I lowered the coverage due to not having sufficient unit tests as part of the net functionality (the queue handlers in particular). Are you ok with tracking that as a separate issue while things start falling into place more within |
I think we are more likely to write tests for uncover lines than to come back and re-write the tests such that they do not require sudo (at least this seems to be the case for past experiences). I would prefer us to be in a place where tests can run without polluting the environment.
We should add an issue for this. Ideally tests will not leave tap interfaces even when the run fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The tests pass on CI because they already run privileged there; I think we shouldn't invest more in fixing the privilege problem until #94 is done, and I prefer missing tests to overly complex ones - at least until some of the complexity goes away.
aae0019
to
cae0937
Compare
Signed-off-by: Alexandru Agache <aagch@amazon.com>
I removed the |
This PR refactors some of the common virtio device code to reduce the amount of required boilerplate, and then implements virtio net device support. The
tap
functionality and associated logic is taken from Firecracker for now, until we'll have a similar component available in rust-vmm or someplace else. More tests have to be added, and we're going to circle back to that after the initial version of the vsock support PR is ready as well.Helpful steps for testing net device support:
tap
device is needed on the host. One can be created and configured with something like (please note the example might conflict with the existing host configuration if there's overlap in terms of IP addresses/subnets):Start the reference vmm with the
--net
parameter, i.e.cargo run ... -- ... --net tap=vmtap100
A network interface should be present inside the guest (usually
eth0
, but the exact naming may vary). We need to configure this interface as well, for example by running (inside the guest):Please note the interface in the guest and the
tap
device from the host must be configured as part of the same subnet (192.168.241.0/24
in the previous examples) to communicate directly.