-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Wasm Coredump #5732
Comments
Reading the linear memory after a crash is already possible. As for getting the locals and stack values, this is much more complicated. Wasmtime uses the Cranelift optimizing compiler, which can eliminate locals and stack values entirely and leaves those that remain at whichever location it likes. It did be necessary to somehow prevent optimizing locals away, at least for points where a trap could happen. There is debugger support for getting the location of locals and stack values which aren't optimized away to generate debuginfo, but I'm not sure if it is 100% accurate. By the way #5537 is somewhat relevant to this. |
I don't think Wasm coredump should prevent optimizations, given that ideally it's enabled by default. It's not uncommon to see coredump in native environment with missing values because they were optimized away. They are usually not very helpful for debugging. |
The wasm coredump format doesn't seem to allow omitting values that are optimized away, but if it is allowed, then it should be possible to implement without too much changes to Cranelift. I think it would need some changes to the unwind table generation code to store the location of callee saved registers, but that will need to be done anyway for handling exceptions. After that I guess it would be a matter of telling Cranelift to generate debuginfo and then during a crash unwind the stack and record all preserved locals and stack values for every frame from Wasmtime. |
Correct, at the moment it doesn't. I'm going to add it, thanks for your input! |
This is an area I haven't dug into much, but doesn't Cranelift's support for GC already support tracking the information we need for this? I think we would need to mark potentially-trapping instructions as "safe points" and then request stack maps from Cranelift. And my impression was that calls are already considered safe points. But this is all conjecture based on a CVE that I was peripherally paying attention to last year, so I could have it all wrong. |
Stack maps only track reference values ( I don't think we would want to use stack maps for this stuff. |
On the flip-side, if you're proposing altering the generated code to assist debugging observability @jameysharp, there is a large design space that we haven't really explored. A relatively simple change would be to define a pseudoinstruction that takes all locals as inputs, with "any" constraints to regalloc (stack slot or register), and insert these wherever a crash could happen. This "state snapshot" instruction would then guarantee observability of all values, at the cost of hindering optimization. This goes somewhat against the "don't alter what you're observing" principle that is common in debug infrastructure but I'll note that we do already have some hacks to keep important values alive (in this case, the vmctx, which makes all other wasm state reachable) for the whole function body. There's also the "recovery instruction" approach, used in IonMonkey at least: whenever a value is optimized out, generate a side-sequence of instructions that can recompute it. That's a much larger compiler-infrastructure undertaking but in principle we could do it, if perfect debug observability were a goal. |
WebAssembly/tool-conventions#198 has been closed. The coredump format now allows to mark local/stack values as missing. |
This change adds a basic coredump generation after a WebAssembly trap was entered. The coredump includes rudimentary stack / process debugging information. A new CLI argument is added to enable coredump generation: ``` wasmtime --coredump-on-trap=/path/to/coredump/directory module.wasm ``` See ./docs/examples-coredump.md for a working example. Refs bytecodealliance#5732
This change adds a basic coredump generation after a WebAssembly trap was entered. The coredump includes rudimentary stack / process debugging information. A new CLI argument is added to enable coredump generation: ``` wasmtime --coredump-on-trap=/path/to/coredump/directory module.wasm ``` See ./docs/examples-coredump.md for a working example. Refs bytecodealliance#5732
This change adds a basic coredump generation after a WebAssembly trap was entered. The coredump includes rudimentary stack / process debugging information. A new CLI argument is added to enable coredump generation: ``` wasmtime --coredump-on-trap=/path/to/coredump/directory module.wasm ``` See ./docs/examples-coredump.md for a working example. Refs bytecodealliance#5732
I made a change to add initial/basic coredump generation: #5868. Could you please have a look and let me know if this is the right direction? |
This change adds a basic coredump generation after a WebAssembly trap was entered. The coredump includes rudimentary stack / process debugging information. A new CLI argument is added to enable coredump generation: ``` wasmtime --coredump-on-trap=/path/to/coredump/file module.wasm ``` See ./docs/examples-coredump.md for a working example. Refs bytecodealliance#5732
This change adds a basic coredump generation after a WebAssembly trap was entered. The coredump includes rudimentary stack / process debugging information. A new CLI argument is added to enable coredump generation: ``` wasmtime --coredump-on-trap=/path/to/coredump/file module.wasm ``` See ./docs/examples-coredump.md for a working example. Refs bytecodealliance#5732
This change adds a basic coredump generation after a WebAssembly trap was entered. The coredump includes rudimentary stack / process debugging information. A new CLI argument is added to enable coredump generation: ``` wasmtime --coredump-on-trap=/path/to/coredump/file module.wasm ``` See ./docs/examples-coredump.md for a working example. Refs bytecodealliance#5732
This change adds a basic coredump generation after a WebAssembly trap was entered. The coredump includes rudimentary stack / process debugging information. A new CLI argument is added to enable coredump generation: ``` wasmtime --coredump-on-trap=/path/to/coredump/file module.wasm ``` See ./docs/examples-coredump.md for a working example. Refs bytecodealliance#5732
This change adds a basic coredump generation after a WebAssembly trap was entered. The coredump includes rudimentary stack / process debugging information. A new CLI argument is added to enable coredump generation: ``` wasmtime --coredump-on-trap=/path/to/coredump/file module.wasm ``` See ./docs/examples-coredump.md for a working example. Refs bytecodealliance#5732
This change adds a basic coredump generation after a WebAssembly trap was entered. The coredump includes rudimentary stack / process debugging information. A new CLI argument is added to enable coredump generation: ``` wasmtime --coredump-on-trap=/path/to/coredump/file module.wasm ``` See ./docs/examples-coredump.md for a working example. Refs bytecodealliance#5732
This change adds a basic coredump generation after a WebAssembly trap was entered. The coredump includes rudimentary stack / process debugging information. A new CLI argument is added to enable coredump generation: ``` wasmtime --coredump-on-trap=/path/to/coredump/file module.wasm ``` See ./docs/examples-coredump.md for a working example. Refs #5732
Basic coredump generation has been merged (thanks!). Now, to have the complete debugger experience, we need to collect the following information:
|
Is there a chance we could revive this thread? I'm working on cloud There has been a plethora of academic papers published about using execution I am trying to extend this idea with a construction called Nondeterministic In addition, we can create conditional snapshots that let application developers I was looking into Wizer quite a bit and the design decisions it makes, and I
Is this just a lint against the produced module being potentially non-portable
I don't anticipate the application code running on my system to need any of
This makes sense. Application code in my system should not need to use these. More fundamentally, the major roadblock to my design working with WebAssembly My main question is (and I apologize for taking a page to get there), is what
I had the intuition that the application library could just run some WebAssembly Thanks everyone for reading. You all do great work, and I'd love to contribute |
Something that may work is if you reuse the exact same compiled machine code then you could take a snapshot of the part of the native stack that contains the wasm frames and restore it later. You did have to fixup pointers (which probably requires emitting extra metadata and maybe some changes to avoid keeping pointers alive across function calls) and making sure that no native frames are on the stack as those can't safely be snapshotted. By keeping the same compiled machine code you know that the stack layout is identical. Wasmtime already allows emitting compiled wasm modules (.cwasm extension) and loading them again. You did only need to implement the stack snapshotting and pointer fixups. This still not exactly trivial, but likely much easier than perfectly reconstructing the wasm vm state.
I would guess this is a combination of there being no way to hook up any imported functions from the host to wizer and this limitation ensuring that there is no native state that wizer can't snapshot. But I'm not a contributor to it, so it is nothing but a guess. |
@RyanTorok there are a lot of interesting ideas in your comment (I have to admit that I skimmed it in parts; I'd encourage a "tl;dr" of points for comments this long!). A few thoughts:
So I think some form of this is possible but it's a deep research project and requires a bunch of intimate knowledge of the compiler and runtime. We likely don't have the resources to help you design this in detail, but I'm personally curious to see what you come up with... |
The Wasm stack doesn't really exist anymore by the time Cranelift is done emitting machine code (it is erased very early in the pipeline, basically the first thing to go). Instead you would need to capture the actual native stack. This has issues that @bjorn3 mentioned around native frames in between Wasm frames, but even if it is just Wasm there will be pointers on the stack to things Backing up a bit: this topic would be better discussed in a dedicated issue or on zulip, since this issue is specifically about implementing the proposed standard Wasm coredump format, which won't help with this feature since it is strictly about the Wasm-level. I suggest filing a new issue or starting a thread on zulip if you have further questions. |
Thank you to everyone for the quick responses and insightful comments! TL;DR: Issues with ASLR and the level of introspection into the runtime that would be required make stack snapshots pretty much a non-starter, and in fact they alerted me to limitations in the existing work on cold-starts I wasn't aware of. Based on @fitzgen 's comments about ASLR, I took another look back at the existing literature on cold-starts, and it turns out that the traditional method of snapshotting the entire state of the VM or language runtime is not compatible with ASLR at all, and for the exact reason @fitzgen pointed out. A summary of the problem is that language runtimes (e.g. JVM, Python, Node.js, wasmtime, ...) inherently need to compile code using native addresses, thereby making the VM state not portable to different addresses. Traditionally, the way to deal with this portability issue would be to introduce another level of indirection (i.e. position-independent addresses), but @fitzgen, @cfallin, and @bjorn3 all pointed out that any such scheme would require very deep introspection into the language runtime to convert the indirect addresses to direct addresses, which would be an enormous endeavor to the point you'd be better of redesigning the entire runtime to support this indirection. Otherwise, you're really walking a tightrope on both performance and security (mess up the indirection once, and the tenant can read memory their program doesn't own). The existing literature on cold-starts essentially punts on this issue; it requires all memory owned by the VM or runtime to be loaded at the same address every time. While I don't see any major reasons wasmtime couldn't support this from an implementation standpoint, I don't recommend this as a direction for multiple reasons:
To summarize (in research paper speak), there are several open problems that have to be addressed with language runtimes in general, not just wasmtime, in order for generalized snapshots to be a practical solution for the cloud. I'm going to continue looking into how we might provide a subset of this feature set via library abstractions that work with the designs of existing language runtimes. Thanks for all your help everyone! |
As an aside, I think this question from my original comment:
was a simple misunderstanding by me about the mechanics of cranelift. Clearly everything has to be compiled in order to run, it's just a matter of when that happens (AOT or JIT). My last project was in browser security, and in JavaScript engines we actually have to worry about code running at multiple optimization levels, and my confusion stemmed from there. This doesn't change anything about the issues with ASLR or introspection, however. |
What tools can I use to inspect the coredumps? |
@whitequark unfortunately there isn't much off-the-shelf at the moment. There was https://github.com/xtuc/wasm-coredump/tree/main/bin/wasmgdb but as far as I know it only works with an old version of the format. There are plans to build support for inspecting them via the debug adapter protocol in Wasmtime itself, as a stepping stone towards fuller debugging capabilities. See bytecodealliance/rfcs#34 for more details. Unfortunately, that doesn't exist yet. In the meantime, Wasm's core dumps are just wasm modules themselves, so you can use any tool that you might inspect a wasm module with to get at the information inside a core dump, e.g. I know this isn't a great answer. I wish I had a better one. But we are planning on getting there! |
Thanks! I'll keep it in mind--I have to use |
Sorry about that. I'm planning to update wasmgdb to the latest spec but haven't had the time yet. |
Feature
When the Wasm instance traps, it's sometimes difficult to understand what happened. Post-mortem debugging using coredumps (which is extensively used in native environment) would be helpful for investigating and fixing crashes.
Wasm coredump is especially useful for serverless environment where production binaries are stripped and/or have access to limited logging.
Implementation
Implement Wasm coredumps as specified by https://github.com/WebAssembly/tool-conventions/blob/main/Coredump.md.
Note that the spec is early and subject to changes. Feedback very welcome!
cc @fitzgen
The text was updated successfully, but these errors were encountered: