-
Notifications
You must be signed in to change notification settings - Fork 3.4k
WasmFS: Add NODERAWFS support #19400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
||
// Special backends that want to install themselves as the root use this hook. | ||
// Otherwise, we use the default backends. | ||
__attribute__((weak)) extern backend_t wasmfs_create_root_dir(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's declare this in the wasmfs.h hooks section to make it available to users who want to e.g. use OPFS as their root directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this to be public? I'm not sure myself. I feel like if a user tries to use it at the same time as a backend there could be a weird failure, whereas in-tree, we control all the uses (so we can error on a flag that conflicts with NODERAWFS for example). But maybe I'm overly worried?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to make available to users. Backends in general should not be injecting themselves as the root directories, since that's not composable at all (since only one of them can win). I would view this instead as the application choosing what the root directory should be. The NODERAWFS linker flag is one mechanism the application can use to control its root directory, but I don't see any harm in giving it more general control via this hook.
Maybe we should rename noderawfs_backend.cpp to something else to not pretend it's a proper backend. Maybe noderawfs_root.cpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Yeah, it's not really a backend. I think it only was a backend in the old FS for technical reasons. Yes, renaming to _root
sgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also moved the hook to the public API.
#ifdef WASMFS_CASE_INSENSITIVE | ||
auto rootBackend = | ||
createIgnoreCaseBackend([]() { return createMemoryBackend(); }); | ||
return createIgnoreCaseBackend([]() { return createMemoryBackend(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to handle the CASE_INSENSITIVE
option the same way, perhaps as a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I was thinking about that. Do we want it to compose, that is, whatever the root backend is, in case-insensitive mode we'd wrap it? But I'm not sure if wrapping node makes sense (the underlying FS has its own case sensitivity).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think it needs to compose with NODERAWFS.
The |
Is NODERAWFS really the same as NODEFS at the root? With just NODEFS at the root I can mount other filesystems at non-root location.. is that true of NODERAWFS? I had previously thought that NODERAWFS did not compose with other filesystems.. am I wrong about that? |
@sbc100 NODERAWFS might not compose in the JS FS due to technical limitations, I'm not sure. But in principle it should be just "NODEFS automatically starting at the root." (That might have implications, though. If we disallow mounting anything other than NodeFS under a NodeFS directory, then it would mean the entire tree must be NodeFS. I think we've considered that for WasmFS but don't remember if it's a current limitation.) |
I think NODERAWFS is maybe a bit more different. IIUC it maps syscall more directly down to node and bypasses the VFS and path lookup system completely. |
(I'm not sure if maintain that difference is important or not though). |
@sbc100 hmm, maybe I'm forgetting something then. I hope at least that in WasmFS it can be simpler. But I could be wrong. This PR at least gets some of the tests passing though. |
@sbc100 are the new |
For auto-generated sigs you would see errors if you added them like this.. so I guess we are not yet auto-generating sigs for these symbols. In other words, adding them manually is still needed/useful here. |
namespace wasmfs { | ||
|
||
backend_t wasmfs_create_root_dir(void) { | ||
return wasmfs_create_node_backend("."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that NODERAWFS exposed /
as the root.. i.e. you can open /etc/passwd
if you want.
This looks like its exposing the current working directory..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible you can't travel up the tree to get to the true outside root, I'm not sure (also not sure if the old FS supported that).
But this does make us behave the same as the old FS in the common case of just accessing files in the current folder, or beneath it. That is useful for configure
tests like freetype needs, for example. And I think it's likely the common case? But we may need more eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC NODERAWFS is explicitly about granting the generated program complete raw access to the entire underlying filesystem (basically making of VFS layer invisible). I think a lot of folks depend on that so we probably want to keep that behaviour around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, perhaps NODERAWFS
can/should be completely independent of wasmfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the code can be independent though? We have a NODERAWFS impl for the old FS and now the beginnings of one for the new FS.
I think this PR might be enough, if we also have support for looking up the directory tree. I checked that now and indeed it works in the JS FS but not in WasmFS. So we would need to set the root to be the actual system root I think. Just changing this PR to do /
instead of .
for the root dir doesn't work though, so that needs more investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested with this:
#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
struct dirent **namelist;
int n;
// Scan the parent dir
n = scandir("..", &namelist, NULL, alphasort);
printf("n: %d\n", n);
if (n < 0)
return 1;
while (n--) {
printf("name: %s\n", namelist[n]->d_name);
free(namelist[n]);
}
free(namelist);
return 0;
}
A test with that might be useful to add as I don't think we have any more NODERAWFS tests atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way NODERAWFS works IIUC is to completely replace/override all the old FS code.. basically taking over completely. I think the equivalent for the new version of NODERAWFS
might be to simply to provide an alternative implementation of the musl syscalls and avoid include wasmfs at all.
See
emscripten/src/library_noderawfs.js
Lines 23 to 26 in 6b179e3
var VFS = Object.assign({}, FS); | |
for (var _key in NODERAWFS) { | |
FS[_key] = _wrapNodeError(NODERAWFS[_key]); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test with that might be useful to add as I don't think we have any more NODERAWFS tests atm.
Looking at test_core.py and test_other.py I see a fair few NODERAWFS tests.. have you tried all them with WASMFS? I would expect/hope that at least one of them would fail with your currently "mount CWD at root" approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'm not sure I can think of a good way to do that in WasmFS actually. Yeah, maybe a reimplementation at the syscall layer is worth exploring, as you said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at test_core.py and test_other.py I see a fair few NODERAWFS tests.. have you tried all them with WASMFS?
I haven't gotten to them all yet but all those I've tried have passed so far.
def get_files(self): | ||
return files_in_path( | ||
path='system/lib/wasmfs/backends', | ||
filenames=['noderawfs_root.cpp']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to add another native system library just for this purpose.
Can we find another way (other than weak linking) to express the mount point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have asked you for review probably before landing.
I'm not sure there is a better mechanism. We don't want to bloat code size by having some kind of global dispatch "if X then root backend = .. else if Y then root backend = ..". The only slim (and extensible) way to do it that I know is to have a weak symbol that is overridden like this. I can't think of another way at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative might be to use EM_JS
to check whether the NODERAWFS
setting is set, for example:
https://github.com/kleisauke/wasm-vips/blob/cff9f510dcfba6cc5bb8cb219e1ce140959ff4f8/src/vips-emscripten.cpp#L45-L53
However, in terms of code size, this mechanism is probably better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, perhaps the get_default_variation()
mechanism would be better?:
Details
--- a/tools/system_libs.py
+++ b/tools/system_libs.py
@@ -1821,6 +1821,7 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
def __init__(self, **kwargs):
self.ignore_case = kwargs.pop('ignore_case')
+ self.node_rawfs = kwargs.pop('node_rawfs')
super().__init__(**kwargs)
def get_cflags(self):
@@ -1833,6 +1834,8 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
name = super().get_base_name()
if self.ignore_case:
name += '-icase'
+ if self.node_rawfs:
+ name += '-noderawfs'
return name
@classmethod
@@ -1841,7 +1844,7 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
@classmethod
def get_default_variation(cls, **kwargs):
- return super().get_default_variation(ignore_case=settings.CASE_INSENSITIVE_FS, **kwargs)
+ return super().get_default_variation(ignore_case=settings.CASE_INSENSITIVE_FS, node_rawfs=settings.NODERAWFS, **kwargs)
def get_files(self):
backends = files_in_path(
@@ -1852,6 +1855,10 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
'memory_backend.cpp',
'node_backend.cpp',
'opfs_backend.cpp'])
+ if self.node_rawfs:
+ backends += files_in_path(
+ path='system/lib/wasmfs/backends',
+ filenames=['noderawfs_root.cpp'])
return backends + files_in_path(
path='system/lib/wasmfs',
filenames=['file.cpp',
@@ -1867,22 +1874,6 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
return settings.WASMFS
-class libwasmfs_noderawfs(Library):
- name = 'libwasmfs_noderawfs'
-
- cflags = ['-fno-exceptions', '-std=c++17']
-
- includes = ['system/lib/wasmfs']
-
- def get_files(self):
- return files_in_path(
- path='system/lib/wasmfs/backends',
- filenames=['noderawfs_root.cpp'])
-
- def can_use(self):
- return settings.WASMFS and settings.NODERAWFS
-
-
class libhtml5(Library):
name = 'libhtml5'
(untested)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative might be to use EM_JS [..] However, in terms of code size, this mechanism is probably better.
Yeah, exactly - EM_JS can work, and linking is simpler, but it would be more code that can't be removed by the linker, since it won't be able to tell what the EM_JS call will return. Using weak linking, that information is utilized at link time basically.
I don't have a strong opinion on using get_default_variation()
, but the current code seems easier to understand to me at least...
NODERAWFS means NODEFS + the root is NODEFS, so in practice the
program just uses Node to access files. All this PR does is create the
root using the Node backend, which accomplishes that.
To do that, add a hook that backends can use to override creation of
the root directory. Then NODERAWFS support basically is just to link
in a tiny library that overrides that hook.
This adds a new test,
test_noderawfs_wasmfs
which verifies we writefiles out to the normal filesystem. Also, enable
test_freetype
, whichnow passes; previously it failed during a
configure
step (specifically,configure
tried to write the size of an integer to a file... before thisPR, the file was empty and it defined integer size as the empty string).
So
test_freetype
both verifies NODERAWFS writes, and also it usesfiles internally (to read font files etc.) which seems useful to enable.
Also, this is tested in
test_fs_writeFile_rawfs
. That test doesFS.writeFile
withNODERAWFS
enabled. It passed before, becauseenabling
NODERAWFS
did nothing... for WasmFS the test really justworked on the MemoryFile backend, since it doesn't create a Node
backend. With this PR, the test properly uses Node files, and passes
(thanks to #19397 and #19396).
Note:
other.test_unistd_fstatfs_wasmfs
was removed from CI becauseit was unnecessary - all
other
tests run anyhow. We only need to addselect
wasmfs
tests because that mode does not run in full already.