Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc request: How does emscripten MEMFS persist link information in IDBFS? #15789

Open
hcldan opened this issue Dec 15, 2021 · 12 comments
Open

Doc request: How does emscripten MEMFS persist link information in IDBFS? #15789

hcldan opened this issue Dec 15, 2021 · 12 comments

Comments

@hcldan
Copy link

hcldan commented Dec 15, 2021

Please include the following in your bug report:

I asked this question on SO and was suggested to ask for an update to the docs.
https://stackoverflow.com/questions/70368027/how-does-emscripten-memfs-persist-link-information-in-idbfs

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2021

I believe you need to use IDBFS (not MEMFS) if you want to persist the data. According the docs here it looks like the way to do as of today is to call FS.syncfs(): https://emscripten.org/docs/api_reference/Filesystem-API.html#filesystem-api-idbfs

@hcldan
Copy link
Author

hcldan commented Sep 7, 2022

When you use IDBFS, all data is stored in memory... maybe not memfs exactly, though that would seem wasteful to not re-use it.
And you are correct, all the data is synced with FS.syncfs() when you want to persist it.

My question was how it persists the link information when it does that operation. Like... looking through the IDBFS database, I wasn't easily able to find what was a link and what wasn't.

@DavidGOrtega
Copy link

This issue in SO is related.
I have the same issue. Despite that I use syncfs my files are not present when I reload the browser. I cant make persistent data
@hcldan @sbc100

@sbc100
Copy link
Collaborator

sbc100 commented Apr 7, 2023

@DavidGOrtega are you sure you are using IDBFS? Can you share the code you using the mount the filesystem and the persist it with syncfs?

@DavidGOrtega
Copy link

DavidGOrtega commented Apr 8, 2023

Hey @sbc100 , yes Im using IDBFS.
Finally I got it working (sometimes), however I have some hicups.

Module.onRuntimeInitialized = () => {
      const IDBFS = FS.filesystems.IDBFS;
      const path = '/IDBFS'
      FS.mkdir(path);
      FS.mount(IDBFS, {}, path);
      FS.chdir(path);
      FS.syncfs(true, (err) => {
        console.log(err);
      });
  • I need to wait FS.syncfs to happen. In other words I must use the FS.syncfs callback.
  • This operation may take a bit for long files
  • Not sure why but sometimes the contents of the IndexedDB are not shown despite they seems to be present as they `re loaded!!
  • For some reason, not sure if its related to the size it simply wont work.

image

image

This is an example of an Filesystem that has content already but displays nothing in the inspector 🤷‍♂️

I would like to init the filesystem within my C++ code and wait there for the filesystem to be ready but Im not sure how to accomplish it

EM_ASM({
      const IDBFS = FS.filesystems.IDBFS;
      const path = $0;
      FS.mkdir(path);
      FS.mount(IDBFS, {}, path);
      FS.chdir(path);
      FS.syncfs(true, (err) => {
        // now we can continue with the C++ class constructor or factory
      });
}, path);

@hcldan
Copy link
Author

hcldan commented Apr 10, 2023

  • Not sure why but sometimes the contents of the IndexedDB are not shown despite they seems to be present as they `re loaded!!

This is what I was puzzled about.

@jhirschberg70
Copy link

@DavidGOrtega and @hcldan, do you see this same behavior of the IndexedDB files not being listed, despite being present, in Chrome or other browsers?

I've seen the behavior you describe with my own code if I use Firefox with URL GET parameters. Inspector doesn't show the IndexedDB entries when I specify a URL with GET parameters. If I leave off the GET parameters, the entries are shown. Chrome doesn't behave this way.

So far, I've seen no issues related to file size, though the files I've used haven't been particularly large. The single largest file I've worked with was about 35 megs.

As for using IDBFS, I'll share the methodology that's worked for me. When I start my program, I want to make sure that files are immediately available for use, so I've created a main.js where I call FS.syncfs() from JavaScript and wait until the sync is complete before calling main() in C++. As @DavidGOrtega mentions, FS.syncfs() is asynchronous, so if you want to make sure the sync is complete, you have to wait some unknown amount of time. One option for this is to use the callback parameter for FS.syncfs(), as @DavidGOrtega suggests. What I chose to do instead was to wrap my call to FS.syncfs() in a Promise. Then, I can await the resolution of the Promise before calling main(). I do a similar thing when my code exits to initiate a final FS.syncfs() to store my data back to IndexedDB. I've made a FILE_IO module with these wrapper functions. Here's that code:

const FILE_IO = (() => {
  "use strict";

  function closeFS() {
    return new Promise((resolve, reject) => {
      FS.syncfs(false, (err) => {
        if (err) {
          reject(err);
        }
        else {
          resolve(true);
        }
      });
    });
  }

  function initFS(path) {
    FS.mkdir(path);
    FS.mount(IDBFS, {}, path);

    return new Promise((resolve, reject) => {
      FS.syncfs(true, (err) => {
        if (err) {
          reject(err);
        }
        else {
          resolve(true);
        }
      });
    });
  }

  return {
    closeFS: closeFS,
    initFS: initFS
  }
})();

Here's my main.js:

Module["onExit"] = async () => {
  await FILE_IO.closeFS().catch(err => {
      alert("Error closing files: " + err);
    });
  console.log("Exiting ...");
}

Module["onRuntimeInitialized"] = async () => {
  const root = "game/";
  let args = window.location.search.substring(1).trim().split("%20") || [];

  await FILE_IO.initFS(root).catch(err => {
      alert("Error initializing files: " + err);
    });
  IO.initEventHandlers();
  callMain(args);
}

@DavidGOrtega mentions wanting to initiate the FS.syncfs() from the C/C++ code. You should be able do that by using EM_ASYNC_JS. While I haven't used EM_ASYNC_JS with the file system, I have used it with another piece of code that is a callback wrapped in a Promise similar to the wrapper functions in FILE_IO, and it worked perfectly.

Lastly, if you have any interest in allowing a user to upload local files into IndexedDB, I recently posted something about that in #18306.

If you have any questions, I'll try to help as best I can.

Jeff

@DavidGOrtega
Copy link

@jhirschberg70 thanks a lot for the reply 🙏
I need to control the sync in the C++ constructor because it depends on the Filesystem. Having to do it in JS prior to use it is bad for the UX to my mind. I dont want my users to suffer 😋

I've worked with was about 35 megs.

Mine can be many of many GB

You should be able do that by using EM_ASYNC_JS.

Awesome suggestion! I did not know it. Could you please point me where to find the docs? Im struggling a bit with the emscripten docs. I.E. I don't yet know how to properly create a Worker to run in another thread like a would do with Napi.
Confused with Asyncify, pthreads, and macros.

@hcldan
Copy link
Author

hcldan commented Apr 12, 2023

My main issue was that I could see the files listed but I had no idea how to inspect the file contents (or if indeed there were any file contents) from the chrome tool inspector.

Which is why I was asking how it was stored... maybe it's just a dev tool issue? I haven't looked in a while since support for OPFS high performance handles landed.

@jhirschberg70
Copy link

@jhirschberg70 thanks a lot for the reply pray I need to control the sync in the C++ constructor because it depends on the Filesystem. Having to do it in JS prior to use it is bad for the UX to my mind. I dont want my users to suffer yum

You're welcome, @DavidGOrtega. Are you saying that your constructor uses the filesystem? If so, then why does it matter whether you sync in JavaScript or C++? Either way you're going to have to wait for the sync to complete. Am I misunderstanding?

Mine can be many of many GB

If you have to sync files that are multiple GBs, then I can understand how that could be bad for the user, but again, I don't see how where you initiate the sync (JS or C++) affects things. The problem is that you have to sync and that sync is going to have to copy many GBs to memory.

Awesome suggestion! I did not know it. Could you please point me where to find the docs? Im struggling a bit with the emscripten docs. I.E. I don't yet know how to properly create a Worker to run in another thread like a would do with Napi. Confused with Asyncify, pthreads, and macros.

Here's a link to the section of the docs that talks about EM_ASYNC_JS: https://emscripten.org/docs/porting/asyncify.html?highlight=em_async_js#making-async-web-apis-behave-as-if-they-were-synchronous. There's not a lot there.

If you don't want to use a Promise wrapped around FS.syncfs(), you could have your callback set some kind of flag to indicate that the sync is finished and just poll that from the C++ side using Asyncify and a busy loop using emscripten_sleep(). Not elegant, but it would work.

Apparently, there's a more elegant solution for waiting on JS from C++ without using a Promise. It's Asyncify.handleSleep(). https://emscripten.org/docs/porting/asyncify.html?highlight=emscripten_sleep#ways-to-use-async-apis-in-older-engines

There are also the Asynchronous File System API and Asynchronous IndexedDB API, which might be useful for you. I have no experience with these. https://emscripten.org/docs/api_reference/emscripten.h.html#asynchronous-file-system-api https://emscripten.org/docs/api_reference/emscripten.h.html?highlight=idb_store#asynchronous-indexeddb-api

I haven't done anything with pthreads, so I can't help with that. My work with workers has been limited to some very basic experiments, so I don't think I can give much insight there, either.

Jeff

@jhirschberg70
Copy link

My main issue was that I could see the files listed but I had no idea how to inspect the file contents (or if indeed there were any file contents) from the chrome tool inspector.

Which is why I was asking how it was stored... maybe it's just a dev tool issue? I haven't looked in a while since support for OPFS high performance handles landed.

@hcldan, do you mean something other than the Uint8Array shown here:

image

Jeff

@hcldan
Copy link
Author

hcldan commented Apr 13, 2023

@hcldan, do you mean something other than the Uint8Array shown here:

Yes. I don't recall ever finding that back then, but this was exactly what I was looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants