Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3.exec throws "RuntimeError: memory access out of bounds" #156

Closed
getvictor opened this issue Feb 5, 2024 · 11 comments
Closed

sqlite3.exec throws "RuntimeError: memory access out of bounds" #156

getvictor opened this issue Feb 5, 2024 · 11 comments

Comments

@getvictor
Copy link

getvictor commented Feb 5, 2024

We have a Chrome extension using wa-sqlite (latest versions), and some Chromebooks are seeing:

image

The error typically takes ~2 days to show up, where the database is used for ~10 queries per minute. Once this error happens, it continues happening for each subsequent exec call.

I'm not familiar with WASM. Some thoughts/questions:

  1. Could there be a memory leak? If so, how to check?
  2. Could enabling Chrome extensions dev mode be causing this?
  3. Can I work around this by catching this error and re-initializing the DB? Or will the issue persist since it still uses the same WASM memory?
  4. Can this be fixed by compiling with more memory, like TOTAL_STACK=2mb
@rhashimoto
Copy link
Owner

Answers to your questions:

  1. That would be one of my guesses. I'm not sure of the best way to monitor available memory in C.
  2. I don't know but I wouldn't think so.
  3. I don't think that will work because of the reason you state. You might be able to destroy the current WebAssembly module and instantiate a new one. I believe that should work but I've never done that so I don't know the details on the destroying part.
  4. I'm not sure yet. It's possible that messing with some build parameters would make it go away, but I'd be nervous that it just puts off the problem.

Some questions for you:

  1. What build are you using (default or Asyncify)? Are you using a custom build, and if so how is it customized?
  2. What VFS (if not the default)?
  3. You're using a module (virtual table), is that correct? If yes, are you making any API calls from module methods?
  4. Can you list the wa-sqlite wrapped APIs you call, i.e. from this list regularly (after all the initialization)? Is it only exec()?
  5. Does your SQL change over time or are you just issuing the same query over and over?
  6. Does your SQL modify the database or only read?
  7. Aren't Chrome extensions basically service workers now? How do you keep an extension context alive for two days?
  8. If you're using any SQL PRAGMA statements, can you list them?

@rhashimoto
Copy link
Owner

rhashimoto commented Feb 6, 2024

The error typically takes ~2 days to show up, where the database is used for ~10 queries per minute. Once this error happens, it continues happening for each subsequent exec call.

That's roughly 30K queries, yes? Is it possible to log all the SQL you submit, including any setup, so we can just replay it and reproduce the error that way? Or even better, can you write a succinct generator that will initialize the schema and database and produce an arbitrary number of simulated queries?

@getvictor
Copy link
Author

I will put together a detailed answer to your questions tomorrow. A dedicated Chromebook is always on and only running my extension.

I rebuilt wa-sqlite with TOTAL_STACK=2mb and so far it hasn't hit this issue.

@getvictor
Copy link
Author

getvictor commented Feb 8, 2024

  1. What build are you using (default or Asyncify)? Are you using a custom build, and if so how is it customized?

Asyncify

  1. What VFS (if not the default)?

Default

  1. You're using a module (virtual table), is that correct? If yes, are you making any API calls from module methods?

Yes, using 11 modules.

  1. Can you list the wa-sqlite wrapped APIs you call, i.e. from this list regularly (after all the initialization)? Is it only exec()?

exec() and virtual tables call result() inside xColumn() function

  1. Does your SQL change over time or are you just issuing the same query over and over?

It can change, but it is the same set of queries 95% of the time.

  1. Does your SQL modify the database or only read?

Only reads -- only SELECT statements.

  1. Aren't Chrome extensions basically service workers now? How do you keep an extension context alive for two days?

Dedicated Chromebook.

  1. If you're using any SQL PRAGMA statements, can you list them?

No

For reference, the relevant extension code is here:
https://github.com/fleetdm/fleet/blob/main/ee/fleetd-chrome/src/db.ts

I'll try and see if I can reproduce this issue faster.

@getvictor
Copy link
Author

getvictor commented Feb 8, 2024

I created a repo reproducing this issue: https://github.com/getvictor/fleetd-chrome

And video instructions: https://www.loom.com/share/603cb991739f4e54b1603e7fed1ff273?sid=e31ab9ee-9660-43bb-b477-9146b8fedff3

My build with TOTAL_STACK=2mb: github:getvictor/wa-sqlite#7430694

@rhashimoto
Copy link
Owner

I can see an error with the repo and the instructions. With a debug wa-sqlite build, I get:

Query (sql: "SELECT * FROM osquery_info") failed: RuntimeError: Aborted(RuntimeError: unreachable). "unreachable" may be due to ASYNCIFY_STACK_SIZE not being large enough (try increasing it)

ASYNCIFY_STACK_SIZE was one of my top suspects when I learned this is an Asyncify build. But this is a very simple query so I don't see why that should take a lot of stack space, and when I increased ASYNCIFY_STACK_SIZE all the way up to 8 MB (from 16 KB) the error persists. So I think this is not the problem. Or perhaps it could be part of the problem but not the only problem.

My other top suspect was memory leak. This also seems less likely now that it doesn't take days to reproduce. I have looked over the exec() code several times and I don't see a leak.

My new top suspect is one I didn't consider earlier: actual memory corruption in the virtual table structs. It's possible there is an error in translating JavaScript objects to C structs. I'll follow up on that. @getvictor In the reproduction repo, is there anything in those virtual tables when the error happens or are they empty?

@getvictor
Copy link
Author

The virtual tables should not be persisting any data. Their return values are either hardcoded, or they call chrome APIs to get their values. I think their values have to be re-translated from JS to C and back to JS on every SELECT.

@rhashimoto

This comment was marked as outdated.

@rhashimoto
Copy link
Owner

rhashimoto commented Feb 13, 2024

@getvictor I suspect this occurs because your virtual table modules throw an error here. It looks like when this happens the WebAssembly stack is not properly unwound and subsequent calls into WebAssembly will start with that part of the stack unavailable. Eventually all these now inaccessible stack regions take up all the allotted stack space.

Can you please try returning a valid SQLite result code from all module methods, and see if that fixes the problem?

Investigative details follow:

I patched my SQLite source to log the stack address when sqlite3_prepare() is called like this:

SQLITE_API int sqlite3_prepare_v2(
  sqlite3 *db,              /* Database handle. */
  const char *zSql,         /* UTF-8 encoded SQL statement. */
  int nBytes,               /* Length of zSql in bytes. */
  sqlite3_stmt **ppStmt,    /* OUT: A pointer to the prepared statement */
  const char **pzTail       /* OUT: End of parsed string */
){
  int rc;
  printf("sqlite3_prepare_v2 %p %s\n", &rc, zSql);
  ...

A couple iterations of output looks like this:

sqlite3_prepare_v2 0x906b8 SELECT * from chrome_extensions
sqlite3_prepare_v2 0x8f938 SELECT*FROM"main".sqlite_master ORDER BY rowid
sqlite3_prepare_v2 0x906b8 
sqlite3_prepare_v2 0x906b8 SELECT * from disk_info
sqlite3_prepare_v2 0x906b8 
sqlite3_prepare_v2 0x906b8 SELECT * FROM network_interfaces
sqlite3_prepare_v2 0x8fdf8 SELECT * FROM os_version
sqlite3_prepare_v2 0x8fdf8 
sqlite3_prepare_v2 0x8fdf8 SELECT * FROM osquery_info
sqlite3_prepare_v2 0x8fdf8 
sqlite3_prepare_v2 0x8fdf8 SELECT * FROM privacy_preferences
sqlite3_prepare_v2 0x8fdf8 
sqlite3_prepare_v2 0x8fdf8 SELECT * FROM screenlock
sqlite3_prepare_v2 0x8f538 SELECT * FROM system_info
sqlite3_prepare_v2 0x8f538 
sqlite3_prepare_v2 0x8f538 SELECT * FROM system_state
sqlite3_prepare_v2 0x8ec78 SELECT * FROM users
sqlite3_prepare_v2 0x8ec78 
sqlite3_prepare_v2 0x8ec78 SELECT * from chrome_extensions
sqlite3_prepare_v2 0x8ec78 
sqlite3_prepare_v2 0x8ec78 SELECT * from disk_info
sqlite3_prepare_v2 0x8ec78 
sqlite3_prepare_v2 0x8ec78 SELECT * FROM network_interfaces
sqlite3_prepare_v2 0x8e3b8 SELECT * FROM os_version
sqlite3_prepare_v2 0x8e3b8 
sqlite3_prepare_v2 0x8e3b8 SELECT * FROM osquery_info
sqlite3_prepare_v2 0x8e3b8 
sqlite3_prepare_v2 0x8e3b8 SELECT * FROM privacy_preferences
sqlite3_prepare_v2 0x8e3b8 
sqlite3_prepare_v2 0x8e3b8 SELECT * FROM screenlock
sqlite3_prepare_v2 0x8daf8 SELECT * FROM system_info
sqlite3_prepare_v2 0x8daf8 
sqlite3_prepare_v2 0x8daf8 SELECT * FROM system_state
sqlite3_prepare_v2 0x8d238 SELECT * FROM users
sqlite3_prepare_v2 0x8d238 

We expect the address when entering from JavaScript always to be the same, but instead we see it is lower after accessing network_interfaces, screenlock, and system_state virtual tables. Those specific virtual tables are suspiciously labeled as "bad_tables". Error logging from these tables is suppressed, but it appears likely these unlogged exceptions are skipping stack unwinding. That might be an Emscripten bug, or it might just be that you aren't supposed to do that.

@getvictor
Copy link
Author

Removing throw worked. Closing issue. Thank you.

@rhashimoto
Copy link
Owner

rhashimoto commented Feb 15, 2024

Filed an Emscripten issue in case this is a bug.

Update: Not an Emscripten bug - don't throw on calls from SQlite WebAssembly to JavaScript, that produces undefined behavior.

getvictor added a commit to fleetdm/fleet that referenced this issue Feb 21, 2024
#16394
As suggested:
rhashimoto/wa-sqlite#156 (comment)

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

<!-- Note that API documentation changes are now addressed by the
product design team. -->

- [x] Changes file added for user-visible changes in `changes/` or
`orbit/changes/`.
See [Changes
files](https://fleetdm.com/docs/contributing/committing-changes#changes-files)
for more information.
- [x] Manual QA for all new/changed functionality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants