Skip to content

WasmFS: Add NODERAWFS support #19400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 22, 2023
Merged
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -500,7 +500,6 @@ jobs:
core2ss.test_pthread_thread_local_storage
core2s.test_dylink_syslibs_missing_assertions
wasm2js3.test_memorygrowth_2
other.test_unistd_fstatfs_wasmfs
wasmfs.test_hello_world
wasmfs.test_hello_world_standalone
wasmfs.test_unistd_links*
Expand All @@ -524,7 +523,8 @@ jobs:
wasmfs.test_stat
wasmfs.test_fstatat
wasmfs.test_unistd_links_memfs
wasmfs.test_fcntl_open"
wasmfs.test_fcntl_open
wasmfs.test_freetype"
test-wasm2js1:
executor: bionic
steps:
Expand Down
2 changes: 2 additions & 0 deletions emcc.py
Original file line number Diff line number Diff line change
Expand Up @@ -2293,6 +2293,8 @@ def phase_linker_setup(options, state, newargs):

if settings.WASMFS:
state.forced_stdlibs.append('libwasmfs')
if settings.NODERAWFS:
state.forced_stdlibs.append('libwasmfs_noderawfs')
settings.FILESYSTEM = 1
settings.SYSCALLS_REQUIRE_FILESYSTEM = 0
settings.JS_LIBRARIES.append((0, 'library_wasmfs.js'))
Expand Down
11 changes: 11 additions & 0 deletions src/library_wasmfs_node.js
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ mergeInto(LibraryManager.library, {
return wasmfsNodeFixStat(stat);
},

_wasmfs_node_readdir__sig: 'ipp',
_wasmfs_node_readdir__deps: ['$wasmfsNodeConvertNodeCode'],
_wasmfs_node_readdir: function(path_p, vec) {
let path = UTF8ToString(path_p);
Expand Down Expand Up @@ -80,6 +81,7 @@ mergeInto(LibraryManager.library, {
// implicitly return 0
},

_wasmfs_node_get_mode__sig: 'ipp',
_wasmfs_node_get_mode__deps: ['$wasmfsNodeLstat'],
_wasmfs_node_get_mode: function(path_p, mode_p) {
let stat = wasmfsNodeLstat(UTF8ToString(path_p));
Expand All @@ -90,6 +92,7 @@ mergeInto(LibraryManager.library, {
// implicitly return 0
},

_wasmfs_node_stat_size__sig: 'ipp',
_wasmfs_node_stat_size__deps: ['$wasmfsNodeLstat'],
_wasmfs_node_stat_size: function(path_p, size_p) {
let stat = wasmfsNodeLstat(UTF8ToString(path_p));
Expand All @@ -100,6 +103,7 @@ mergeInto(LibraryManager.library, {
// implicitly return 0
},

_wasmfs_node_fstat_size__sig: 'iip',
_wasmfs_node_fstat_size__deps: ['$wasmfsNodeFstat'],
_wasmfs_node_fstat_size: function(fd, size_p) {
let stat = wasmfsNodeFstat(fd);
Expand All @@ -110,6 +114,7 @@ mergeInto(LibraryManager.library, {
// implicitly return 0
},

_wasmfs_node_insert_file__sig: 'ipi',
_wasmfs_node_insert_file__deps: ['$wasmfsNodeConvertNodeCode'],
_wasmfs_node_insert_file: function(path_p, mode) {
try {
Expand All @@ -121,6 +126,7 @@ mergeInto(LibraryManager.library, {
// implicitly return 0
},

_wasmfs_node_insert_directory__sig: 'ipi',
_wasmfs_node_insert_directory__deps: ['$wasmfsNodeConvertNodeCode'],
_wasmfs_node_insert_directory: function(path_p, mode) {
try {
Expand All @@ -132,6 +138,7 @@ mergeInto(LibraryManager.library, {
// implicitly return 0
},

_wasmfs_node_unlink__sig: 'ip',
_wasmfs_node_unlink__deps: ['$wasmfsNodeConvertNodeCode'],
_wasmfs_node_unlink: function(path_p) {
try {
Expand All @@ -143,6 +150,7 @@ mergeInto(LibraryManager.library, {
// implicitly return 0
},

_wasmfs_node_rmdir__sig: 'ip',
_wasmfs_node_rmdir__deps: ['$wasmfsNodeConvertNodeCode'],
_wasmfs_node_rmdir: function(path_p) {
try {
Expand All @@ -154,6 +162,7 @@ mergeInto(LibraryManager.library, {
// implicitly return 0
},

_wasmfs_node_open__sig: 'ipp',
_wasmfs_node_open__deps: ['$wasmfsNodeConvertNodeCode'],
_wasmfs_node_open: function(path_p, mode_p) {
try {
Expand All @@ -174,6 +183,7 @@ mergeInto(LibraryManager.library, {
}
},

_wasmfs_node_read__sig: 'iipiip',
_wasmfs_node_read__deps: ['$wasmfsNodeConvertNodeCode'],
_wasmfs_node_read: function(fd, buf_p, len, pos, nread_p) {
try {
Expand All @@ -188,6 +198,7 @@ mergeInto(LibraryManager.library, {
// implicitly return 0
},

_wasmfs_node_write__sig: 'iipiip',
_wasmfs_node_write__deps : ['$wasmfsNodeConvertNodeCode'],
_wasmfs_node_write : function(fd, buf_p, len, pos, nwritten_p) {
try {
Expand Down
5 changes: 5 additions & 0 deletions system/include/emscripten/wasmfs.h
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,11 @@ backend_t wasmfs_create_icase_backend(backend_constructor_t create_backend,

// Hooks

// A hook users can do to create the root directory. Overriding this allows the
// user to set a particular backend as the root. If this is not set then the
// default backend is used.
backend_t wasmfs_create_root_dir(void);

// A hook users can do to run code during WasmFS startup. This hook happens
// before file preloading, so user code could create backends and mount them,
// which would then affect in which backend the preloaded files are loaded (the
Expand Down
14 changes: 14 additions & 0 deletions system/lib/wasmfs/backends/noderawfs_root.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
// Copyright 2023 The Emscripten Authors. All rights reserved.
// Emscripten is available under two separate licenses, the MIT license and the
// University of Illinois/NCSA Open Source License. Both these licenses can be
// found in the LICENSE file.

#include "emscripten/wasmfs.h"

namespace wasmfs {

backend_t wasmfs_create_root_dir(void) {
return wasmfs_create_node_backend(".");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that NODERAWFS exposed / as the root.. i.e. you can open /etc/passwd if you want.

This looks like its exposing the current working directory..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible you can't travel up the tree to get to the true outside root, I'm not sure (also not sure if the old FS supported that).

But this does make us behave the same as the old FS in the common case of just accessing files in the current folder, or beneath it. That is useful for configure tests like freetype needs, for example. And I think it's likely the common case? But we may need more eventually.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC NODERAWFS is explicitly about granting the generated program complete raw access to the entire underlying filesystem (basically making of VFS layer invisible). I think a lot of folks depend on that so we probably want to keep that behaviour around.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, perhaps NODERAWFS can/should be completely independent of wasmfs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the code can be independent though? We have a NODERAWFS impl for the old FS and now the beginnings of one for the new FS.

I think this PR might be enough, if we also have support for looking up the directory tree. I checked that now and indeed it works in the JS FS but not in WasmFS. So we would need to set the root to be the actual system root I think. Just changing this PR to do / instead of . for the root dir doesn't work though, so that needs more investigation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with this:

#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
  struct dirent **namelist;
  int n;

  // Scan the parent dir
  n = scandir("..", &namelist, NULL, alphasort);
  printf("n: %d\n", n);
  if (n < 0)
    return 1;

  while (n--) {
    printf("name: %s\n", namelist[n]->d_name);
    free(namelist[n]);
  }
  free(namelist);

  return 0;
}

A test with that might be useful to add as I don't think we have any more NODERAWFS tests atm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way NODERAWFS works IIUC is to completely replace/override all the old FS code.. basically taking over completely. I think the equivalent for the new version of NODERAWFS might be to simply to provide an alternative implementation of the musl syscalls and avoid include wasmfs at all.

See

var VFS = Object.assign({}, FS);
for (var _key in NODERAWFS) {
FS[_key] = _wrapNodeError(NODERAWFS[_key]);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test with that might be useful to add as I don't think we have any more NODERAWFS tests atm.

Looking at test_core.py and test_other.py I see a fair few NODERAWFS tests.. have you tried all them with WASMFS? I would expect/hope that at least one of them would fail with your currently "mount CWD at root" approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'm not sure I can think of a good way to do that in WasmFS actually. Yeah, maybe a reimplementation at the syscall layer is worth exploring, as you said.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at test_core.py and test_other.py I see a fair few NODERAWFS tests.. have you tried all them with WASMFS?

I haven't gotten to them all yet but all those I've tried have passed so far.

}

} // namespace wasmfs
14 changes: 9 additions & 5 deletions system/lib/wasmfs/wasmfs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -70,14 +70,18 @@ WasmFS::~WasmFS() {
rootDirectory->locked().setParent(nullptr);
}

std::shared_ptr<Directory> WasmFS::initRootDirectory() {

// Special backends that want to install themselves as the root use this hook.
// Otherwise, we use the default backends.
__attribute__((weak)) extern backend_t wasmfs_create_root_dir(void) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's declare this in the wasmfs.h hooks section to make it available to users who want to e.g. use OPFS as their root directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to be public? I'm not sure myself. I feel like if a user tries to use it at the same time as a backend there could be a weird failure, whereas in-tree, we control all the uses (so we can error on a flag that conflicts with NODERAWFS for example). But maybe I'm overly worried?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to make available to users. Backends in general should not be injecting themselves as the root directories, since that's not composable at all (since only one of them can win). I would view this instead as the application choosing what the root directory should be. The NODERAWFS linker flag is one mechanism the application can use to control its root directory, but I don't see any harm in giving it more general control via this hook.

Maybe we should rename noderawfs_backend.cpp to something else to not pretend it's a proper backend. Maybe noderawfs_root.cpp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Yeah, it's not really a backend. I think it only was a backend in the old FS for technical reasons. Yes, renaming to _root sgtm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also moved the hook to the public API.

#ifdef WASMFS_CASE_INSENSITIVE
auto rootBackend =
createIgnoreCaseBackend([]() { return createMemoryBackend(); });
return createIgnoreCaseBackend([]() { return createMemoryBackend(); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to handle the CASE_INSENSITIVE option the same way, perhaps as a follow-up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was thinking about that. Do we want it to compose, that is, whatever the root backend is, in case-insensitive mode we'd wrap it? But I'm not sure if wrapping node makes sense (the underlying FS has its own case sensitivity).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think it needs to compose with NODERAWFS.

#else
auto rootBackend = createMemoryBackend();
return createMemoryBackend();
#endif
}

std::shared_ptr<Directory> WasmFS::initRootDirectory() {
auto rootBackend = wasmfs_create_root_dir();
auto rootDirectory =
rootBackend->createDirectory(S_IRUGO | S_IXUGO | S_IWUGO);
auto lockedRoot = rootDirectory->locked();
Expand Down
3 changes: 2 additions & 1 deletion test/test_other.py
Original file line number Diff line number Diff line change
Expand Up @@ -8881,8 +8881,9 @@ def test_toolchain_profiler_stderr(self):
self.assertContained('start block "main"', stderr)
self.assertContained('block "main" took', stderr)

@also_with_wasmfs
def test_noderawfs(self):
self.run_process([EMXX, test_file('fs/test_fopen_write.cpp'), '-sNODERAWFS'])
self.run_process([EMXX, test_file('fs/test_fopen_write.cpp'), '-sNODERAWFS'] + self.get_emcc_args())
self.assertContained("read 11 bytes. Result: Hello data!", self.run_js('a.out.js'))

# NODERAWFS should directly write on OS file system
Expand Down
16 changes: 16 additions & 0 deletions tools/system_libs.py
Original file line number Diff line number Diff line change
Expand Up @@ -1867,6 +1867,22 @@ def can_use(self):
return settings.WASMFS


class libwasmfs_noderawfs(Library):
name = 'libwasmfs_noderawfs'

cflags = ['-fno-exceptions', '-std=c++17']

includes = ['system/lib/wasmfs']

def get_files(self):
return files_in_path(
path='system/lib/wasmfs/backends',
filenames=['noderawfs_root.cpp'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to add another native system library just for this purpose.

Can we find another way (other than weak linking) to express the mount point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have asked you for review probably before landing.

I'm not sure there is a better mechanism. We don't want to bloat code size by having some kind of global dispatch "if X then root backend = .. else if Y then root backend = ..". The only slim (and extensible) way to do it that I know is to have a weak symbol that is overridden like this. I can't think of another way at least.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative might be to use EM_JS to check whether the NODERAWFS setting is set, for example:
https://github.com/kleisauke/wasm-vips/blob/cff9f510dcfba6cc5bb8cb219e1ce140959ff4f8/src/vips-emscripten.cpp#L45-L53

However, in terms of code size, this mechanism is probably better.

Copy link
Collaborator

@kleisauke kleisauke May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, perhaps the get_default_variation() mechanism would be better?:

Details
--- a/tools/system_libs.py
+++ b/tools/system_libs.py
@@ -1821,6 +1821,7 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
 
   def __init__(self, **kwargs):
     self.ignore_case = kwargs.pop('ignore_case')
+    self.node_rawfs = kwargs.pop('node_rawfs')
     super().__init__(**kwargs)
 
   def get_cflags(self):
@@ -1833,6 +1834,8 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
     name = super().get_base_name()
     if self.ignore_case:
       name += '-icase'
+    if self.node_rawfs:
+      name += '-noderawfs'
     return name
 
   @classmethod
@@ -1841,7 +1844,7 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
 
   @classmethod
   def get_default_variation(cls, **kwargs):
-    return super().get_default_variation(ignore_case=settings.CASE_INSENSITIVE_FS, **kwargs)
+    return super().get_default_variation(ignore_case=settings.CASE_INSENSITIVE_FS, node_rawfs=settings.NODERAWFS, **kwargs)
 
   def get_files(self):
     backends = files_in_path(
@@ -1852,6 +1855,10 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
                    'memory_backend.cpp',
                    'node_backend.cpp',
                    'opfs_backend.cpp'])
+    if self.node_rawfs:
+        backends += files_in_path(
+            path='system/lib/wasmfs/backends',
+            filenames=['noderawfs_root.cpp'])
     return backends + files_in_path(
         path='system/lib/wasmfs',
         filenames=['file.cpp',
@@ -1867,22 +1874,6 @@ class libwasmfs(DebugLibrary, AsanInstrumentedLibrary, MTLibrary):
     return settings.WASMFS
 
 
-class libwasmfs_noderawfs(Library):
-  name = 'libwasmfs_noderawfs'
-
-  cflags = ['-fno-exceptions', '-std=c++17']
-
-  includes = ['system/lib/wasmfs']
-
-  def get_files(self):
-    return files_in_path(
-        path='system/lib/wasmfs/backends',
-        filenames=['noderawfs_root.cpp'])
-
-  def can_use(self):
-    return settings.WASMFS and settings.NODERAWFS
-
-
 class libhtml5(Library):
   name = 'libhtml5'
 

(untested)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative might be to use EM_JS [..] However, in terms of code size, this mechanism is probably better.

Yeah, exactly - EM_JS can work, and linking is simpler, but it would be more code that can't be removed by the linker, since it won't be able to tell what the EM_JS call will return. Using weak linking, that information is utilized at link time basically.

I don't have a strong opinion on using get_default_variation(), but the current code seems easier to understand to me at least...


def can_use(self):
return settings.WASMFS and settings.NODERAWFS


class libhtml5(Library):
name = 'libhtml5'

Expand Down