Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

machine/linux: Switch to virtiofs by default #22920

Merged
merged 3 commits into from
Jun 24, 2024

Conversation

cgwalters
Copy link
Contributor

@cgwalters cgwalters commented Jun 5, 2024

machine/linux: Use memory-backend-memfd by default

This is prep for using virtiofsd; it has no real
impact otherwise.

Signed-off-by: Colin Walters walters@verbum.org


machine: Also initialize virtiofs mounts

I'm hitting a bug with 9p when trying to transfer large files.
In RHEL at least 9p isn't supported because it's known to have a
lot of design flaws; virtiofsd is the supported and recommended
way to share files between a host and guest.

This patch somewhat hackily sets up parallel virtiofsd mounts
alongside the 9p mounts (just on the host side, but one
can then easily mount in the guest).

This is allowing me to do some side-by-side comparisons
with the exact same machine.

I think what we can do in the short term is make
this a hidden option to enable (both "alongside" and also
fully replacing the 9p code).

Signed-off-by: Colin Walters walters@verbum.org


On Linux, podman machine has switched to using virtiofs (instead of 9p) for mounting host filesystems.  This change will be transparently applied when a machine is stopped and restarted (e.g. "podman machine stop; podman machine start") or reinitializing the machine.  This change is intended to improve performance and reliability.

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Jun 5, 2024
@cgwalters
Copy link
Contributor Author

https://gist.github.com/cgwalters/3d22459bae05a89d86b9f0d501d82feb for my notes debugging a 9p issue that spawned this

@cgwalters
Copy link
Contributor Author

One architectural issue we have now is that the qemu process from podman machine start just happens to run in the context of the user's shell. Yet, when using virtiofsd suddenly we have new processes that need to lifecycle bind with qemu (i.e. they should die if qemu dies). That kind of happens implicitly with this...but...IMO it'd be better to consider having podman machine start actually wrap everything underneath a systemd-run --user -u podman-machine.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically speaking is there any reason to support both 9p and virtiofs at the same time? Supporting both just increases maintenance burden. I guess the only downside is that we depend on virtiofsd.

pkg/machine/qemu/stubber.go Outdated Show resolved Hide resolved
pkg/machine/qemu/stubber.go Outdated Show resolved Hide resolved
pkg/machine/qemu/stubber.go Outdated Show resolved Hide resolved
pkg/machine/qemu/stubber.go Outdated Show resolved Hide resolved
pkg/machine/qemu/stubber.go Outdated Show resolved Hide resolved
pkg/machine/qemu/stubber.go Outdated Show resolved Hide resolved
pkg/machine/qemu/stubber.go Outdated Show resolved Hide resolved
@rhatdan
Copy link
Member

rhatdan commented Jun 6, 2024

We should just move to virtiofsd and drop 9p

@cgwalters
Copy link
Contributor Author

Practically speaking is there any reason to support both 9p and virtiofs at the same time? Supporting both just increases maintenance burden.

It'd definitely simplify this patch to just hard switch. Basically I was just trying to be conservative; on the flip side it wasn't really hard either to do both.

I am seeing bits about FreeBSD mentioned, which is not supported by virtiofsd (it uses lots of Linux specific namespacing bits, openat2, etc). Digging around slightly I see https://wiki.freebsd.org/VirtFS which looks like a not-yet-upstream implementation, although that may be mostly relevant when acting as a guest, not a host. It looks like... @dfr may be a contact here?

@dfr
Copy link
Contributor

dfr commented Jun 6, 2024

Practically speaking is there any reason to support both 9p and virtiofs at the same time? Supporting both just increases maintenance burden.

It'd definitely simplify this patch to just hard switch. Basically I was just trying to be conservative; on the flip side it wasn't really hard either to do both.

I am seeing bits about FreeBSD mentioned, which is not supported by virtiofsd (it uses lots of Linux specific namespacing bits, openat2, etc). Digging around slightly I see https://wiki.freebsd.org/VirtFS which looks like a not-yet-upstream implementation, although that may be mostly relevant when acting as a guest, not a host. It looks like... @dfr may be a contact here?

There will be support for 9p in FreeBSD 15.0 - the implementation is derived from the VirtFS code you reference. It is likely that I will also merge that to FreeBSD 14.2 when that is released later this year.

Last week at BSDCan, I heard about a FreeBSD implementation of virtiofs which I believe is compatible with the Linux virtiofs but I don't know the status of that and can't predict when it will make it to a FreeBSD release.

@Luap99
Copy link
Member

Luap99 commented Jun 6, 2024

Practically speaking is there any reason to support both 9p and virtiofs at the same time? Supporting both just increases maintenance burden.

It'd definitely simplify this patch to just hard switch. Basically I was just trying to be conservative; on the flip side it wasn't really hard either to do both.
I am seeing bits about FreeBSD mentioned, which is not supported by virtiofsd (it uses lots of Linux specific namespacing bits, openat2, etc). Digging around slightly I see https://wiki.freebsd.org/VirtFS which looks like a not-yet-upstream implementation, although that may be mostly relevant when acting as a guest, not a host. It looks like... @dfr may be a contact here?

There will be support for 9p in FreeBSD 15.0 - the implementation is derived from the VirtFS code you reference. It is likely that I will also merge that to FreeBSD 14.2 when that is released later this year.

Last week at BSDCan, I heard about a FreeBSD implementation of virtiofs which I believe is compatible with the Linux virtiofs but I don't know the status of that and can't predict when it will make it to a FreeBSD release.

We don't support feebsd guest so the question is can virtiofsd run on a freebsd host? If not hard forcing virtiofs with qemu will break the freebsd (host) support for podman machine but tbh I have no idea if there is anyone using this in the first place???

@dfr
Copy link
Contributor

dfr commented Jun 6, 2024

Practically speaking is there any reason to support both 9p and virtiofs at the same time? Supporting both just increases maintenance burden.

It'd definitely simplify this patch to just hard switch. Basically I was just trying to be conservative; on the flip side it wasn't really hard either to do both.
I am seeing bits about FreeBSD mentioned, which is not supported by virtiofsd (it uses lots of Linux specific namespacing bits, openat2, etc). Digging around slightly I see https://wiki.freebsd.org/VirtFS which looks like a not-yet-upstream implementation, although that may be mostly relevant when acting as a guest, not a host. It looks like... @dfr may be a contact here?

There will be support for 9p in FreeBSD 15.0 - the implementation is derived from the VirtFS code you reference. It is likely that I will also merge that to FreeBSD 14.2 when that is released later this year.
Last week at BSDCan, I heard about a FreeBSD implementation of virtiofs which I believe is compatible with the Linux virtiofs but I don't know the status of that and can't predict when it will make it to a FreeBSD release.

We don't support feebsd guest so the question is can virtiofsd run on a freebsd host? If not hard forcing virtiofs with qemu will break the freebsd (host) support for podman machine but tbh I have no idea if there is anyone using this in the first place???

I have not tried to run virtiofsd on a FreeBSD host and currently 'podman machine' would need some work to support the native hypervisor and that is far down my priority list. The only FreeBSD related thing which is close is #19939 but it doesn't seem likely that will land any time soon and that is related to guest, not host.

It is clear that virtiofs is the way forward and that 9p will be phased out at some point. Personally I would prefer that happen later rather than sooner given that 9p works well on FreeBSD and will be part of an upcoming release whereas virtiofs seems further off. I don't know how to estimate the negative impact of maintaining both in the meantime though.

@cgwalters
Copy link
Contributor Author

I have not tried to run virtiofsd on a FreeBSD host and currently 'podman machine' would need some work to support the native hypervisor and that is far down my priority list.

Does that mean podman machine doesn't work at all today on FreeBSD, and the support that exists (sorry I am just discovering this stuff now) is about running containers directly on the FreeBSD host? Does that actually work? Ahhh....I see https://www.freshports.org/sysutils/ocijail/

It is clear that virtiofs is the way forward and that 9p will be phased out at some point. Personally I would prefer that happen later rather than sooner given that 9p works well on FreeBSD and will be part of an upcoming release whereas virtiofs seems further off. I

To be clear this is only about "podman machine" which is currently obscure even on Linux (that said, we're forcing its usage as part of some bootc workflows). And I suspect podman on FreeBSD is already obscure, so we're talking about an obscure corner of an obscure corner...

Here's my proposal: I'll structure things as two patches:

  • Add virtiofsd support (as it is now)
  • A followup patch which deletes 9p

Then anyone who cares could try to argue for a revert of the second patch later, or at least use it as a starting point.

@Luap99
Copy link
Member

Luap99 commented Jun 6, 2024

Then anyone who cares could try to argue for a revert of the second patch later, or at least use it as a starting point.

SGTM

@dfr
Copy link
Contributor

dfr commented Jun 6, 2024

Then anyone who cares could try to argue for a revert of the second patch later, or at least use it as a starting point.

SGTM

That works for me. My only use of 'podman machine' is plain vanilla Linux guest on macos host. For anything else, I have FreeBSD hosts and VMs as well as Linux VMs.

@cgwalters
Copy link
Contributor Author

OK I cleaned this up a bit more, and split it into the promised two separate patches, one to add virtiofs in parallel, and one to switch to it by default and drop 9p.

In digging through this, a few other notes:

  • We were serializing the Type of the mount into the machine JSON; I changed the code to just ignore that. IMO we should serialize less stuff into this JSON and make it dynamic (xref machine/applehv: Sync vfkit cpu/memory on start #21192 e.g.).
  • There's this "client9p" stuff that I think is only used on Hyper-V/Windows, is that right? I didn't touch it and I think it won't be affected by this, it had parallel 9p logic.

@cgwalters cgwalters changed the title machine: Also initialize virtiofs mounts machine/linux: Switch to virtiofs by default Jun 6, 2024
@cgwalters cgwalters marked this pull request as ready for review June 6, 2024 15:58
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 6, 2024
@baude
Copy link
Member

baude commented Jun 6, 2024

if I am using podman machine on Linux now and you switch to virtiofs, what happens to my machine? the update process would allow you to update a client and NOT have to recreate the machine. I don't see how you can remove 9p without that consideration and I also don't think migration is worth it. As for switching to virtiofs as a default, we usually do not add the function and then immediately make it the default. we usually have at least one release off bake time. I could be convinced otherwise.

@cgwalters
Copy link
Contributor Author

if I am using podman machine on Linux now and you switch to virtiofs, what happens to my machine? the update process would allow you to update a client and NOT have to recreate the machine.

The semantics I believe are easy to describe: When podman is updated to a binary that includes this PR, existing machines stay unchanged until you stop/restart them.

I don't see how you can remove 9p without that consideration and I also don't think migration is worth it.

There's not much a "migration" per above, it's just a stop/restart because the mounts are done dynamically over SSH.

As for switching to virtiofs as a default, we usually do not add the function and then immediately make it the default. we usually have at least one release off bake time. I could be convinced otherwise.

Well, that's a debate between you and Dan and Paul right? I originally did the PR making it opt in, and it's easy to just drop the top commit here. I could also do a small variant where we make virtiofs the default with an env fallback to 9p.

But the argument here for a default switch IMO is:

  • virtiofs is the production supported path for this in RHEL, and while 9p is enabled in Fedora IMO people only use it out of inertia mostly.
  • virtiofs is used on MacOS
  • The set of podman machine users on Linux is relatively small anyways (although notably again bootc use cases are pushing that up) but I think we can just do some extra "real world", manual testing on this before the next release, right? I can say it works well for me whereas 9p was buggy.

@baude
Copy link
Member

baude commented Jun 6, 2024

Well, I did say I could be convinced, but it is a departure from the norm; and that is notable. How this applies to RHEL should probably be discussed in further detail, because there are subtleties on Podman machine support there.

@cgwalters
Copy link
Contributor Author

cgwalters commented Jun 6, 2024

Updates:

  • I realized we were still providing 9p mounts to the guest, even when using virtiofs: fixed, and the "switch to virtiofs" patch drops that code
  • I discovered and learned how to run make localmachine, and that found an important bug that :Z didn't work
  • Then I noticed that the MacOS+virtiofs code had hardcoded logic to set the security context (and has a totally different flow to set up the mounts via ignition?) - anyways fixed both to use a shared context

But then suddenly make localmachine started failing for me with: EDIT: Figured it out, I had a stray override environment variable

@rhatdan
Copy link
Member

rhatdan commented Jun 6, 2024

Since almost no one ever changes the default, it is better to default to virtiofs. Falling back to plan9.
virtiofsd performs better, is better supported (the engineers who wrote it work for Red Hat). And we use it on Mac.

@@ -87,7 +87,7 @@ func GenerateSystemDFilesForVirtiofsMounts(mounts []machine.VirtIoFs) ([]ignitio
mountUnit.Add("Mount", "What", "%s")
mountUnit.Add("Mount", "Where", "%s")
mountUnit.Add("Mount", "Type", "virtiofs")
mountUnit.Add("Mount", "Options", "context=\"system_u:object_r:nfs_t:s0\"")
mountUnit.Add("Mount", "Options", fmt.Sprintf("context=\"%s\"", machine.NFSSELinuxContext))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit use %q

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do; just waiting to see if anyone else has other comments before doing another update and round of CI.

@rhatdan
Copy link
Member

rhatdan commented Jun 10, 2024

LGTM
@Luap99 PTAL

@@ -410,7 +410,12 @@ case "$TEST_FLAVOR" in
install_test_configs
;;
machine-linux)
showrun dnf install -y podman-gvproxy*
showrun dnf install -y podman-gvproxy* virtiofsd
# Bootstrap this link if it isn't yet in the package; xref
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this a bit more, and one thing that will probably show up is static package analyzers will warn about the broken symlink (presumably we won't take a hard dep on virtiofsd).

I don't know how annoying it will be though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I do wonder if we should add a new podman-machine package that just contains the necessary deps as requires and could add the symlink there. That way users don't get broken every time we add a new dep and they can just have the one mostly empty package just to get further packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense.

@Luap99
Copy link
Member

Luap99 commented Jun 19, 2024

Also note that I still consider @baude to have the final say on this.

@cgwalters cgwalters requested a review from baude June 19, 2024 10:05
@germag
Copy link

germag commented Jun 21, 2024

Moving to virtiofsd, while running it inside a user namespace (inside a podman unshare or using its own sandbox --uid-map/--gid-map) will allow sharing the user image storage and mount it inside the VM in /usr/lib/containers/storage. So, allowing the root to run containers using the user's images without copy them or requiring to store the image in the (VM)root storage.

This is important for podman-bootc since we need to run as root to create disk images, and currently we force linux user to store the images in the (VM)root user.

@germag
Copy link

germag commented Jun 21, 2024

Btw, regarding FreeBSD, we plan to support virtiofsd on it in the near future, @stefano-garzarella is doing all the required modification on rust-vmm and qemu

@baude
Copy link
Member

baude commented Jun 24, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 24, 2024
@baude
Copy link
Member

baude commented Jun 24, 2024

/approve

Copy link
Contributor

openshift-ci bot commented Jun 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: baude, cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2024
@cgwalters
Copy link
Contributor Author

Thanks for the lgtm, Brent !

One thing I'd re-emphasize a bit here is two things:

  • Currently "podman machine" is critical path for my day to day work on Linux too
  • You know I'm going to be around to help with any fallout from this PR as well as ongoing maintenance

@openshift-merge-bot openshift-merge-bot bot merged commit 0563fb4 into containers:main Jun 24, 2024
90 checks passed
baude added a commit to baude/podman that referenced this pull request Jun 24, 2024
With the merge of containers#22920, we now require virtiofsd on Linux for machine
mounts.

Signed-off-by: Brent Baude <bbaude@redhat.com>
@cgwalters
Copy link
Contributor Author

Can we queue this for a 5.1 release? Getting more reports, e.g. osbuild/bootc-image-builder#540

@Luap99
Copy link
Member

Luap99 commented Jul 11, 2024

This isn't the kind of stuff that should go into a patch release given it requires packaging changes IMO.
Regardless 5.2 will be released in the next few weeks anyway

@cgwalters
Copy link
Contributor Author

Regardless 5.2 will be released in the next few weeks anyway

Is that planned to be released to fedora 40 for example?

@Luap99
Copy link
Member

Luap99 commented Jul 12, 2024

Yes

@rhatdan
Copy link
Member

rhatdan commented Jul 12, 2024

All minor releases are released to supported Fedora's.

@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 11, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Oct 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants