Skip to content

DWARF debug line updating #2545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Dec 21, 2019
Merged

DWARF debug line updating #2545

merged 21 commits into from
Dec 21, 2019

Conversation

kripken
Copy link
Member

@kripken kripken commented Dec 20, 2019

With this, we can update DWARF debug line info properly as
we write a new binary.

To do that we track binary locations as we write. Each
instruction is mapped to the location it is written to. We
must also adjust them as we move code around because
of LEB optimization (we emit a function or a section
with a 5-byte LEB placeholder, the maximal size; later
we shrink it which is almost always possible).

writeDWARFSections() now takes a second param, the new
locations of instructions. It then maps debug line info from the
original offsets in the binary to the new offsets in the binary
being written.

The core logic for updating the debug line section is in
wasm-debug.cpp. It basically tracks state machine logic
both to read the existing debug lines and to emit the new
ones. I couldn't find a way to reuse LLVM code for this, but
reading LLVM's code was very useful here.

A final tricky thing we need to do is to update the DWARF
section's internal size annotation. The LLVM YAML writing
code doesn't do that for us. Luckily it's pretty easy, in
fixEmittedSection we just update the first 4 bytes in place
to have the section size, after we've emitted it and know
the size.

This ignores debug lines with a 0 in the line, col, or addr,
see WebAssembly/debugging#9 (comment)

This ignores debug line offsets into the middle of
instructions, which LLVM sometimes emits for some
reason, see WebAssembly/debugging#9 (comment)
Handling that would likely at least double our memory
usage, which is unfortunate - we are run in an LTO manner,
where the entire app's DWARF is present, and it may be
massive. I think we should see if such odd offsets are
a bug in LLVM, and if we can fix or prevent that.

This does not emit "special" opcodes for debug lines. Those
are purely an optimization, which I wanted to leave for
later. (Even without them we decrease the size quite a lot,
btw, as many lines have 0s in them...)

This adds some testing that shows we can load and save
fib2.c and fannkuch.cpp properly. The latter includes more
than one function and has nontrivial code.

To actually emit correct offsets a few minor fixes are
done here:

  • Fix the code section location tracking during reading -
    the correct offset we care about is the body of the code
    section, not including the section declaration and size.
  • Fix wasm-stack debug line emitting. We need to update
    in BinaryInstWriter::visit(), that is, right before writing
    bytes for the instruction. That differs from
    BinaryenIRWriter::visit which is a recursive function
    that also calls the children - so the offset there would be
    of the first child. For some reason that is correct with
    source maps, I don't understand why, but it's wrong for
    DWARF...
  • Print code section offsets in hex, to match other tools.
  • Remove DWARFUpdate pass, which was useful for testing
    temporarily, but doesn't make sense now (it just updates without
    writing a binary).

cc @yurydelendik

@dschuff
Copy link
Member

dschuff commented Dec 20, 2019

which "special" debug line opcodes are you referring to?

Copy link
Member

@dschuff dschuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't looked at LineState yet...

@@ -144,6 +152,8 @@ class BinaryInstWriter : public OverriddenVisitor<BinaryInstWriter> {
WasmBinaryWriter& parent;
BufferWithRandomAccess& o;
Function* func = nullptr;
bool sourceMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just use for source maps for for dwarf debug_line too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for source maps - when source maps are enabled, we don't update debug info in the way that DWARF needs, and vice versa. So we need to know if source maps are used (previously there was just one way to do debug locations).

return writeU32LEBPlaceholder(); // section size to be filled in later
}

void WasmBinaryWriter::finishSection(int32_t start) {
// section size does not include the reserved bytes of the size field itself
int32_t size = o.size() - start - MaxLEB32Bytes;
auto sizeFieldSize = o.writeAt(start, U32LEB(size));
if (sizeFieldSize != MaxLEB32Bytes) {
auto adjustment = MaxLEB32Bytes - sizeFieldSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that adjustment isn't quite as local, maybe it should be called something more descriptive such as sectionHeaderSizeAdjustment (or something better/shorter if you can think of something)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, renaming to adjustmentForLEBShrinking, how is that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just LEB shrinking of the header though, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just the header's LEB (here, of the section - the same happens in each function as they also have an LEB size).

// We added the binary locations, adjust them: they must be relative
// to the code section.
assert(binaryLocationsSizeAtSectionStart == 0);
// The actual section start is right before the LEB for the size; we want
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means that the "actual" section start is at the start of the size, but not at the section type? Is that common in other tools/elsewehre in Binaryen? It seems really counterintuitive to me. I would think of the section as starting after the "header" which is the type + size. It's confusing that the payload includes the name, but that's a separate issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment is confusing, yeah, I'll reword. Basically we want to have offsets relative to the "body", which is after the section type byte and the size LEB.


void writeDWARFSections(Module& wasm) {
// Represents the state when parsing a line table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where in LLVM is it that does this same logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the main place:

Error DWARFDebugLine::LineTable::parse(

@kripken
Copy link
Member Author

kripken commented Dec 20, 2019

which "special" debug line opcodes are you referring to?

Oh, the debug line opcodes that update both the line + the address + end the sequence, all in one byte. Here is the LLVM code for them:

bool prologueEnd = false;
bool epilogueBegin = false;

LineState() = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible, by avoiding newAddrInfo[newAddr] accesses (which I guess makes the hash map code consider adding an empty element). Using .at, .emplace allows this to work without that. That's safer I guess, although slightly less readable perhaps, but also useful due to other reasons in the discussion below.

void writeDWARFSections(Module& wasm) {
// Represents the state when parsing a line table.
struct LineState {
uint32_t addr = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be Address?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Address is a wasm address, so something in the wasm memory. This is an offset in the wasm code section. I think that has a 4GB limit in wasm, so uint32 seems reasonable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so... until wasm64 starts allowing >4G code as well as data...

}

// Given an old state, emit the diff from it to this state into a new line
// table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be "into a new LineState" or "into a new line" or "into a new line table entry"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter. I'll add that and clarify more.

if (iter != newLocations.end()) {
uint32_t newAddr = iter->second;
newAddrs.push_back(newAddr);
auto& updatedState = newAddrInfo[newAddr] = state;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a pretty ugly compound assignment since the intermediate assignment is a copy but the final one is actually taking a reference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do auto& updatedState = newAddrInfo.insert({newAddr, state}).first? Also a bit ugly but at least it's maybe more clear all the stuff that's going on, and it's still one line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring this to also help getting rid of that default constructor you didn't like.

bool epilogueBegin = false;

LineState() = default;
LineState(const LineState& other) = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we have an explicitly-defaulted copy constructor we should have an explicitly-defaulted assignment operator too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't be used yet - not sure what the benefit is to adding one at this point? Is it easier to read for you that way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was used, in that compound assignment I complained about. More generally it's just a rule of thumb that whatever you do with the copy constructor (default, delete, etc) you should also do with the assignment operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good, added.

void writeDWARFSections(Module& wasm) {
// Represents the state when parsing a line table.
struct LineState {
uint32_t addr = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so... until wasm64 starts allowing >4G code as well as data...

bool epilogueBegin = false;

LineState() = default;
LineState(const LineState& other) = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was used, in that compound assignment I complained about. More generally it's just a rule of thumb that whatever you do with the copy constructor (default, delete, etc) you should also do with the assignment operator.

@kripken kripken merged commit 3f6fd58 into master Dec 21, 2019
@kripken kripken deleted the binloc3 branch December 21, 2019 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants