Description
This is a verbatim conversation from the mailing list:
On Wed, Oct 13, 2021 at 12:16 AM Gregory Burd gr...@burd.me wrote:
Hello OSv-ers,
I'm a huge fan of ZFS, it's an amazing bit of work and I'm thrilled it's a core component in OSv. That said, it's not a great choice in all cases, the overhead of ZFS can outweigh the benefits. I've heard many references to "adding another filesystem" into the mix in different contexts, most recently in the (amazing) talk given at p99conf by Waldek.
So, how about ext2 pulled straight from the BSD tree?
https://github.com/freebsd/freebsd-src/tree/main/sys/fs/ext2fs
Why ext2 and not ? Well, it's not my favorite filesystem either but it is popular and well known. It's easy for Linux users to get comfortable with and the tools are generally installed by default on most distros. I would imagine that the BSD code is fairly complete and supported and I believe it supports ext2, 3, and 4 (https://wiki.freebsd.org/Ext2fs).
anyone have thoughts?
On Wednesday, October 13, 2021 at 9:32:18 AM UTC-4 Nadav Har'El wrote:
I think it makes sense, but only if it's something that you personally care about for some reason - e.g., that it's important for you for OSv to be smaller and you believe that replacing ZFS will make it smaller. Or that some other advantage of ext2 over zfs is interesting for you.
Something worth keeping in mind is that one of the claimed advantages of OSv over, say, Linux, is that OSv does not need to support a gazillion different drivers and filesystems. It's not like anyone will ever plug a ext2-formatted disk or ntfs-formatted or whatever into OSv - so we don't need to support any of these filesystems. If we do want to support them, it should be out of some expected benefit - not out of necessity. So let's just spell out in advance what this benefit might be over the filesystems we already have (zfs, ramfs and rofs).
Waldek responded:
If we go ahead with implementing ext2 support, we should define a minimal subset of it we want to implement (do we need extents, large files, etc?). We should probably also not make the same mistake as with ZFS and NOT implement tools equivalent to zpool.so, mkfs.so, etc. Let us make it so for all admin functionality we would delegate to the toolset on host OS.
Also, I don't remember if Waldek did this or only partially (?), but if you're adding ext2 to reduce the kernel size, we first need to compile a kernel without zfs. We could add a build-time feature of removing zfs (see #1110) or build it into a shared library that doesn't need to be loaded (#1009). This would be similar to the Linux build system - which allows keeping some parts of the kernel out of the build, but also keeping some parts of the kernel in the build but as separate modules (sort of shared libraries).
anyone have suggestions on where to start?
I would start with making (at least to myself) the case of what the benefit of adding ext2 would be.
If you think it is the code size, I would start by trying to estimate how much smaller the kernel would be without ZFS. For that I would start adding to our build system an option to compile without ZFS - or compile ZFS into a shared library.
I think in my presentation (slides 9/10) I claimed that I was able to trim the size of the kernel by ~0.7MB (after enabling GC). Also adding an option to build OSv without ZFS is something that I was planning to prepare proper patches for. However, this would come in the order described in the presentation - the 3rd step, but does not have to: we can work on the ability to compile ZFS out independently. I think I might have something ready for steps 1 and 2 (hide C++ std, enable GC) in the next 2-4 weeks. Meanwhile, my WIP branch - https://github.com/wkozaczuk/osv/commits/minimize_kernel_size - has all code changes I made for the presentation including adding commenting out ZFS in the right places so you can start experimenting with it. This particular commit deals with ZFS - wkozaczuk@df98287.
Then of course you can start implementing ext2. I agree you should try to find existing code in freebsd. You can see the other examples in the fs/ subdirectory (ramfs, rofs, devfs, nfs) on how to plug that code into OSv.
Yeah ideally, as Nadav points out, we would want to make ext2 driver a pluggable shared library - module. A good example of how to do it is how nfs was changed to become a shared library with this patch - 4ffb0fa. Hopefully, once you read the comments and the code changes it will all make sense. For example, these fragments - 4ffb0fa#diff-4dcb4336d0285de24fc7f3ebdb6805eb0c10ea645d9848dcfd38daa7742b363c and 4ffb0fa#diff-df0d94aa12dd9f4772529c9060f40a3492dc1b034f8bc2ae79bd2f978231f8ed, are key to see how OSv would automatically try to find an ext2 shared library under /usr/lib/fs and call its INIT functions to let it register its vfsops structure.
Now with nfs it is easier to achieve because we have a thin adapter layer under modules/nfs which delegates to https://github.com/sahlberg/libnfs which is built separately as libnfs.so.4.0.0. It would be ideal to do a similar thing with ext2 but can you build https://github.com/freebsd/freebsd-src/tree/main/sys/fs/ext2fs as a library or at least not have to copy it under OSv tree and then be stuck with that version of code forever? My concern is that if we fork the code or copy it, it will be difficult or a burden to maintain it in terms of bringing in bug fixes from upstream for example. Is it possible to achieve?.
One of the problems you'll encounter will be the cache. I have to admit I don't remember everything we did there (Waldek might have a fresher memory, as he did rofs more recently), but because ZFS has such an elaborate caching mechanism, and OSv used ZFS, we avoided having yet another page-cache layer. That means that if ext2 doesn't come with its own page cache (because freebsd assumes a different layer handles the caching) your ext2 will not do any caching, which isn't great. Waldek's rofs dealt a bit with caching, so maybe you can copy it or be inspired from it, or also copy some caching code from freebsd.
ROFS comes with its own simple cache layer (https://github.com/cloudius-systems/osv/blob/master/fs/rofs/rofs_cache.cc and the commit to integrate it with pagecache - 54b3071) and here is the commit that finally integrated it into page cache - 4c0bdbc. Now, the changes I made are enough for read-only functionality. If we want full read-write ext2 support we would also need to figure out the "write" part.
Now having made all these comments about ext2, I think we should consider virtiofs as well. We have already the read-only implementation of virtiofs in OSv thanks to Fotis Xenakis (see this wiki page - https://github.com/cloudius-systems/osv/wiki/virtio-fs - it has many good references as well). Given that we can consider adding the write-support part to it and then delegate it to the host to provide whatever sophisticated filesystem it comes with. It is the beauty of virtiofs. But the downside is that virtiofs is only supported by some VMMs like QEMU and Intel's cloud hypervisor. So I still think it would be nice to have a simple and pluggable reasonably fast read-write filesystem support of ext2.
-greg