Skip to content

WasmFS: Add NODERAWFS support #19400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 22, 2023
Merged

WasmFS: Add NODERAWFS support #19400

merged 14 commits into from
May 22, 2023

Conversation

kripken
Copy link
Member

@kripken kripken commented May 19, 2023

NODERAWFS means NODEFS + the root is NODEFS, so in practice the
program just uses Node to access files. All this PR does is create the
root using the Node backend, which accomplishes that.

To do that, add a hook that backends can use to override creation of
the root directory. Then NODERAWFS support basically is just to link
in a tiny library that overrides that hook.

This adds a new test, test_noderawfs_wasmfs which verifies we write
files out to the normal filesystem. Also, enable test_freetype, which
now passes; previously it failed during a configure step (specifically,
configure tried to write the size of an integer to a file... before this
PR, the file was empty and it defined integer size as the empty string).
So test_freetype both verifies NODERAWFS writes, and also it uses
files internally (to read font files etc.) which seems useful to enable.

Also, this is tested in test_fs_writeFile_rawfs. That test does
FS.writeFile with NODERAWFS enabled. It passed before, because
enabling NODERAWFS did nothing... for WasmFS the test really just
worked on the MemoryFile backend, since it doesn't create a Node
backend. With this PR, the test properly uses Node files, and passes
(thanks to #19397 and #19396).

Note: other.test_unistd_fstatfs_wasmfs was removed from CI because
it was unnecessary - all other tests run anyhow. We only need to add
select wasmfs tests because that mode does not run in full already.

@kripken kripken requested a review from tlively May 19, 2023 18:30
Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


// Special backends that want to install themselves as the root use this hook.
// Otherwise, we use the default backends.
__attribute__((weak)) extern backend_t wasmfs_create_root_dir(void) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's declare this in the wasmfs.h hooks section to make it available to users who want to e.g. use OPFS as their root directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to be public? I'm not sure myself. I feel like if a user tries to use it at the same time as a backend there could be a weird failure, whereas in-tree, we control all the uses (so we can error on a flag that conflicts with NODERAWFS for example). But maybe I'm overly worried?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to make available to users. Backends in general should not be injecting themselves as the root directories, since that's not composable at all (since only one of them can win). I would view this instead as the application choosing what the root directory should be. The NODERAWFS linker flag is one mechanism the application can use to control its root directory, but I don't see any harm in giving it more general control via this hook.

Maybe we should rename noderawfs_backend.cpp to something else to not pretend it's a proper backend. Maybe noderawfs_root.cpp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Yeah, it's not really a backend. I think it only was a backend in the old FS for technical reasons. Yes, renaming to _root sgtm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also moved the hook to the public API.

#ifdef WASMFS_CASE_INSENSITIVE
auto rootBackend =
createIgnoreCaseBackend([]() { return createMemoryBackend(); });
return createIgnoreCaseBackend([]() { return createMemoryBackend(); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to handle the CASE_INSENSITIVE option the same way, perhaps as a follow-up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was thinking about that. Do we want it to compose, that is, whatever the root backend is, in case-insensitive mode we'd wrap it? But I'm not sure if wrapping node makes sense (the underlying FS has its own case sensitivity).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think it needs to compose with NODERAWFS.

@kripken
Copy link
Member Author

kripken commented May 19, 2023

The wasm64l version of that writeFile test failed, so I added the p-containing signatures to fix it. You don't need to read those too carefully, as there is a better automatic way to do it, which I'll ask @sbc100 about later.

@sbc100
Copy link
Collaborator

sbc100 commented May 22, 2023

Is NODERAWFS really the same as NODEFS at the root?

With just NODEFS at the root I can mount other filesystems at non-root location.. is that true of NODERAWFS? I had previously thought that NODERAWFS did not compose with other filesystems.. am I wrong about that?

@kripken
Copy link
Member Author

kripken commented May 22, 2023

@sbc100 NODERAWFS might not compose in the JS FS due to technical limitations, I'm not sure. But in principle it should be just "NODEFS automatically starting at the root."

(That might have implications, though. If we disallow mounting anything other than NodeFS under a NodeFS directory, then it would mean the entire tree must be NodeFS. I think we've considered that for WasmFS but don't remember if it's a current limitation.)

@sbc100
Copy link
Collaborator

sbc100 commented May 22, 2023

@sbc100 NODERAWFS might not compose in the JS FS due to technical limitations, I'm not sure. But in principle it should be just "NODEFS automatically starting at the root."

(That might have implications, though. If we disallow mounting anything other than NodeFS under a NodeFS directory, then it would mean the entire tree must be NodeFS. I think we've considered that for WasmFS but don't remember if it's a current limitation.)

I think NODERAWFS is maybe a bit more different. IIUC it maps syscall more directly down to node and bypasses the VFS and path lookup system completely.

@sbc100
Copy link
Collaborator

sbc100 commented May 22, 2023

@sbc100 NODERAWFS might not compose in the JS FS due to technical limitations, I'm not sure. But in principle it should be just "NODEFS automatically starting at the root."
(That might have implications, though. If we disallow mounting anything other than NodeFS under a NodeFS directory, then it would mean the entire tree must be NodeFS. I think we've considered that for WasmFS but don't remember if it's a current limitation.)

I think NODERAWFS is maybe a bit more different. IIUC it maps syscall more directly down to node and bypasses the VFS and path lookup system completely.

(I'm not sure if maintain that difference is important or not though).

@kripken
Copy link
Member Author

kripken commented May 22, 2023

@sbc100 hmm, maybe I'm forgetting something then. I hope at least that in WasmFS it can be simpler. But I could be wrong. This PR at least gets some of the tests passing though.

@kripken
Copy link
Member Author

kripken commented May 22, 2023

@sbc100 are the new __sigs here ok, or should I use the new mechanism for that? (if so, how?)

@sbc100
Copy link
Collaborator

sbc100 commented May 22, 2023

@sbc100 are the new __sigs here ok, or should I use the new mechanism for that? (if so, how?)

For auto-generated sigs you would see errors if you added them like this.. so I guess we are not yet auto-generating sigs for these symbols.

In other words, adding them manually is still needed/useful here.

@kripken kripken merged commit 1d5b070 into main May 22, 2023
@kripken kripken deleted the wasmfs.noderawfs branch May 22, 2023 18:14
namespace wasmfs {

backend_t wasmfs_create_root_dir(void) {
return wasmfs_create_node_backend(".");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that NODERAWFS exposed / as the root.. i.e. you can open /etc/passwd if you want.

This looks like its exposing the current working directory..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible you can't travel up the tree to get to the true outside root, I'm not sure (also not sure if the old FS supported that).

But this does make us behave the same as the old FS in the common case of just accessing files in the current folder, or beneath it. That is useful for configure tests like freetype needs, for example. And I think it's likely the common case? But we may need more eventually.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC NODERAWFS is explicitly about granting the generated program complete raw access to the entire underlying filesystem (basically making of VFS layer invisible). I think a lot of folks depend on that so we probably want to keep that behaviour around.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, perhaps NODERAWFS can/should be completely independent of wasmfs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the code can be independent though? We have a NODERAWFS impl for the old FS and now the beginnings of one for the new FS.

I think this PR might be enough, if we also have support for looking up the directory tree. I checked that now and indeed it works in the JS FS but not in WasmFS. So we would need to set the root to be the actual system root I think. Just changing this PR to do / instead of . for the root dir doesn't work though, so that needs more investigation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with this:

#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
  struct dirent **namelist;
  int n;

  // Scan the parent dir
  n = scandir("..", &namelist, NULL, alphasort);
  printf("n: %d\n", n);
  if (n < 0)
    return 1;

  while (n--) {
    printf("name: %s\n", namelist[n]->d_name);
    free(namelist[n]);
  }
  free(namelist);

  return 0;
}

A test with that might be useful to add as I don't think we have any more NODERAWFS tests atm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way NODERAWFS works IIUC is to completely replace/override all the old FS code.. basically taking over completely. I think the equivalent for the new version of NODERAWFS might be to simply to provide an alternative implementation of the musl syscalls and avoid include wasmfs at all.

See

var VFS = Object.assign({}, FS);
for (var _key in NODERAWFS) {
FS[_key] = _wrapNodeError(NODERAWFS[_key]);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test with that might be useful to add as I don't think we have any more NODERAWFS tests atm.

Looking at test_core.py and test_other.py I see a fair few NODERAWFS tests.. have you tried all them with WASMFS? I would expect/hope that at least one of them would fail with your currently "mount CWD at root" approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'm not sure I can think of a good way to do that in WasmFS actually. Yeah, maybe a reimplementation at the syscall layer is worth exploring, as you said.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at test_core.py and test_other.py I see a fair few NODERAWFS tests.. have you tried all them with WASMFS?

I haven't gotten to them all yet but all those I've tried have passed so far.

def get_files(self):
return files_in_path(
path='system/lib/wasmfs/backends',
filenames=['noderawfs_root.cpp'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to add another native system library just for this purpose.

Can we find another way (other than weak linking) to express the mount point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have asked you for review probably before landing.

I'm not sure there is a better mechanism. We don't want to bloat code size by having some kind of global dispatch "if X then root backend = .. else if Y then root backend = ..". The only slim (and extensible) way to do it that I know is to have a weak symbol that is overridden like this. I can't think of another way at least.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative might be to use EM_JS to check whether the NODERAWFS setting is set, for example:
https://github.com/kleisauke/wasm-vips/blob/cff9f510dcfba6cc5bb8cb219e1ce140959ff4f8/src/vips-emscripten.cpp#L45-L53

However, in terms of code size, this mechanism is probably better.

Copy link
Collaborator

@kleisauke kleisauke May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, perhaps the get_default_variation() mechanism would be better?:

Details
--- a/tools/system_libs.py
+++ b/tools/system_libs.py
@@ -1821,6 +1821,7 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
 
   def __init__(self, **kwargs):
     self.ignore_case = kwargs.pop('ignore_case')
+    self.node_rawfs = kwargs.pop('node_rawfs')
     super().__init__(**kwargs)
 
   def get_cflags(self):
@@ -1833,6 +1834,8 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
     name = super().get_base_name()
     if self.ignore_case:
       name += '-icase'
+    if self.node_rawfs:
+      name += '-noderawfs'
     return name
 
   @classmethod
@@ -1841,7 +1844,7 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
 
   @classmethod
   def get_default_variation(cls, **kwargs):
-    return super().get_default_variation(ignore_case=settings.CASE_INSENSITIVE_FS, **kwargs)
+    return super().get_default_variation(ignore_case=settings.CASE_INSENSITIVE_FS, node_rawfs=settings.NODERAWFS, **kwargs)
 
   def get_files(self):
     backends = files_in_path(
@@ -1852,6 +1855,10 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
                    'memory_backend.cpp',
                    'node_backend.cpp',
                    'opfs_backend.cpp'])
+    if self.node_rawfs:
+        backends += files_in_path(
+            path='system/lib/wasmfs/backends',
+            filenames=['noderawfs_root.cpp'])
     return backends + files_in_path(
         path='system/lib/wasmfs',
         filenames=['file.cpp',
@@ -1867,22 +1874,6 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
     return settings.WASMFS
 
 
-class libwasmfs_noderawfs(Library):
-  name = 'libwasmfs_noderawfs'
-
-  cflags = ['-fno-exceptions', '-std=c++17']
-
-  includes = ['system/lib/wasmfs']
-
-  def get_files(self):
-    return files_in_path(
-        path='system/lib/wasmfs/backends',
-        filenames=['noderawfs_root.cpp'])
-
-  def can_use(self):
-    return settings.WASMFS and settings.NODERAWFS
-
-
 class libhtml5(Library):
   name = 'libhtml5'
 

(untested)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative might be to use EM_JS [..] However, in terms of code size, this mechanism is probably better.

Yeah, exactly - EM_JS can work, and linking is simpler, but it would be more code that can't be removed by the linker, since it won't be able to tell what the EM_JS call will return. Using weak linking, that information is utilized at link time basically.

I don't have a strong opinion on using get_default_variation(), but the current code seems easier to understand to me at least...

@kripken kripken mentioned this pull request May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants