Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coredump: switch format to Wasm module #197

Merged
merged 1 commit into from
Feb 6, 2023

Conversation

xtuc
Copy link
Contributor

@xtuc xtuc commented Jan 25, 2023

Reuse the Wasm module container to store a coredump. Debugging informations are stored in custom sections and the main memory in the data section.

Note that naming of the custom sections is still work in progress at the moment and that we can add more information in process/thread-info, if/when needed.

Copy link
Member

@dschuff dschuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of using the data section to encode the dumped memory makes sense in principle. It does raise some things to think about.

  1. Memory size. If there's a data section it probably makes sense to have a memory section as well, to declare the memory's size, and then this could also eventually extend to multiple memories.
  2. Partial dumps. Often memory dumps only include part of the process' memory space, but I don't see a way here to tell the difference between a full dump that just includes a lot of 0 values (which wouldn't be included in any segment) and a partial dump

I wonder if it makes sense to put the process info or some other custom section as the first section in the binary, to make it easier to identify coredumps. Especially given that we're using some known sections (i.e. memory) the fact that it's a coredump may change how a tool might process or interpret those known sections.

Coredump.md Outdated Show resolved Hide resolved
Coredump.md Outdated Show resolved Hide resolved
Coredump.md Outdated Show resolved Hide resolved
Coredump.md Outdated Show resolved Hide resolved
@xtuc
Copy link
Contributor Author

xtuc commented Jan 30, 2023

Thanks for the review @dschuff!

I agree about 1.

  1. Partial dumps. Often memory dumps only include part of the process' memory space, but I don't see a way here to tell the difference between a full dump that just includes a lot of 0 values (which wouldn't be included in any segment) and a partial dump

We can specify multiple data segments with their corresponding offset in memory:

(data (i32.const 1) "...10 bytes")
(data (i32.const 100) "...100 bytes")

This also plays well with mulitple memories. As opposed to ELF, Wasm modules don't include a memory mapping table that would help with partial dump or identifying the memory segments. Compilers could emit such a section that coredumps could rely on but this is outside of the scope of coredumps.

Maybe it's only me, but I don't see any reference to partial coredump in the ELF spec. systemd-coredump mentions that coredump can be truncated, which presumably removes some data segments.

I wonder if it makes sense to put the process info or some other custom section as the first section in the binary, to make it easier to identify coredumps

Yes, I agree. The only reason I haven't done this is because it would break my early tooling. I like that ELF coredumps can be identified by reading the first few bytes.

@xtuc xtuc requested a review from dschuff January 31, 2023 18:04
@dschuff
Copy link
Member

dschuff commented Jan 31, 2023

What I meant about data segments is:
In a regular wasm file the memory initialization is the combination of the specified memory size (which implicitly initializes everything to 0) plus the set of data segments that describe just the parts of memory that have nonzero contents. If a dump is a full dump of all memory, it can also be encoded this way. But there's no way to tell the difference between a full dump that is known to contain 0s in part of its memory (which would have no data segment covering that part) and a dump that only includes some parts (where other parts have unknown contents).
Maybe that doesn't really matter, and we don't really consider it a problem? Not sure.

@fitzgen
Copy link
Contributor

fitzgen commented Jan 31, 2023

FWIW, Wizer's memory snapshots are literally Wasm files with data segments for the nonzero ranges (although we have to be careful not to run into implementation limits for number of data segments, and merge near by data segments together when we get close to the limit).

(Aside: I'm interested in this proposal! But I haven't had time to dig in yet, unfortunately. Sorry about that!)

@xtuc
Copy link
Contributor Author

xtuc commented Feb 1, 2023

@dschuff got it now. Coredumps aren't instantiated like regular Wasm files (they use the Wasm binary encoding for generation/decoding convenience).
I'll clarifiy that in the Coredump spec.

@fitzgen

(Aside: I'm interested in this proposal! But I haven't had time to dig in yet, unfortunately. Sorry about that!)

Glad to hear, I'm happy to have a video chat if that helps.

@xtuc xtuc force-pushed the sven/coredump-wasm-format branch 2 times, most recently from 6b9e373 to d77eb85 Compare February 4, 2023 21:24
Reuse the Wasm module container to store a coredump. Debugging
informations are stored in custom sections and the main memory in the
data section.
@xtuc
Copy link
Contributor Author

xtuc commented Feb 6, 2023

@dschuff could you please merge the PR or you have more questions about memory segments? I also added a global section in the Coredump.

I'm planning to reach out to potential implementers to get more feedback / input.

@dschuff
Copy link
Member

dschuff commented Feb 6, 2023

Sorry, I didn't mean to hold this up.
I think this is fine, I'll merge it.

The way it's currently written suggests that any dump using multiple segments is partial (i.e. incomplete), but that situation is still indistinguishable from a dump that is known to be complete, but is encoded with multiple segments (so that zeros don't need to be written into the image). Practically speaking it may not matter, since a debugging tool might not do anything different. But if e.g. it finds a pointer that points to a missing portion, it might be good to know whether the pointed-to data is expected to be zero or is just missing from the dump.

@dschuff dschuff merged commit 31165d1 into WebAssembly:main Feb 6, 2023
@xtuc xtuc deleted the sven/coredump-wasm-format branch February 6, 2023 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants