-
Notifications
You must be signed in to change notification settings - Fork 779
DWARF debug line updating #2545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
which "special" debug line opcodes are you referring to? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't looked at LineState yet...
@@ -144,6 +152,8 @@ class BinaryInstWriter : public OverriddenVisitor<BinaryInstWriter> { | |||
WasmBinaryWriter& parent; | |||
BufferWithRandomAccess& o; | |||
Function* func = nullptr; | |||
bool sourceMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just use for source maps for for dwarf debug_line too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for source maps - when source maps are enabled, we don't update debug info in the way that DWARF needs, and vice versa. So we need to know if source maps are used (previously there was just one way to do debug locations).
src/wasm/wasm-binary.cpp
Outdated
return writeU32LEBPlaceholder(); // section size to be filled in later | ||
} | ||
|
||
void WasmBinaryWriter::finishSection(int32_t start) { | ||
// section size does not include the reserved bytes of the size field itself | ||
int32_t size = o.size() - start - MaxLEB32Bytes; | ||
auto sizeFieldSize = o.writeAt(start, U32LEB(size)); | ||
if (sizeFieldSize != MaxLEB32Bytes) { | ||
auto adjustment = MaxLEB32Bytes - sizeFieldSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that adjustment
isn't quite as local, maybe it should be called something more descriptive such as sectionHeaderSizeAdjustment
(or something better/shorter if you can think of something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, renaming to adjustmentForLEBShrinking
, how is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just LEB shrinking of the header though, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just the header's LEB (here, of the section - the same happens in each function as they also have an LEB size).
src/wasm/wasm-binary.cpp
Outdated
// We added the binary locations, adjust them: they must be relative | ||
// to the code section. | ||
assert(binaryLocationsSizeAtSectionStart == 0); | ||
// The actual section start is right before the LEB for the size; we want |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means that the "actual" section start is at the start of the size, but not at the section type? Is that common in other tools/elsewehre in Binaryen? It seems really counterintuitive to me. I would think of the section as starting after the "header" which is the type + size. It's confusing that the payload includes the name, but that's a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment is confusing, yeah, I'll reword. Basically we want to have offsets relative to the "body", which is after the section type byte and the size LEB.
|
||
void writeDWARFSections(Module& wasm) { | ||
// Represents the state when parsing a line table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where in LLVM is it that does this same logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the main place:
Error DWARFDebugLine::LineTable::parse( |
Oh, the debug line opcodes that update both the line + the address + end the sequence, all in one byte. Here is the LLVM code for them:
|
src/wasm/wasm-debug.cpp
Outdated
bool prologueEnd = false; | ||
bool epilogueBegin = false; | ||
|
||
LineState() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible, by avoiding newAddrInfo[newAddr]
accesses (which I guess makes the hash map code consider adding an empty element). Using .at
, .emplace
allows this to work without that. That's safer I guess, although slightly less readable perhaps, but also useful due to other reasons in the discussion below.
void writeDWARFSections(Module& wasm) { | ||
// Represents the state when parsing a line table. | ||
struct LineState { | ||
uint32_t addr = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be Address
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Address is a wasm address, so something in the wasm memory. This is an offset in the wasm code section. I think that has a 4GB limit in wasm, so uint32 seems reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so... until wasm64 starts allowing >4G code as well as data...
src/wasm/wasm-debug.cpp
Outdated
} | ||
|
||
// Given an old state, emit the diff from it to this state into a new line | ||
// table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be "into a new LineState" or "into a new line" or "into a new line table entry"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter. I'll add that and clarify more.
src/wasm/wasm-debug.cpp
Outdated
if (iter != newLocations.end()) { | ||
uint32_t newAddr = iter->second; | ||
newAddrs.push_back(newAddr); | ||
auto& updatedState = newAddrInfo[newAddr] = state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pretty ugly compound assignment since the intermediate assignment is a copy but the final one is actually taking a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you do auto& updatedState = newAddrInfo.insert({newAddr, state}).first
? Also a bit ugly but at least it's maybe more clear all the stuff that's going on, and it's still one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactoring this to also help getting rid of that default constructor you didn't like.
bool epilogueBegin = false; | ||
|
||
LineState() = default; | ||
LineState(const LineState& other) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if we have an explicitly-defaulted copy constructor we should have an explicitly-defaulted assignment operator too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't be used yet - not sure what the benefit is to adding one at this point? Is it easier to read for you that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was used, in that compound assignment I complained about. More generally it's just a rule of thumb that whatever you do with the copy constructor (default, delete, etc) you should also do with the assignment operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good, added.
void writeDWARFSections(Module& wasm) { | ||
// Represents the state when parsing a line table. | ||
struct LineState { | ||
uint32_t addr = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so... until wasm64 starts allowing >4G code as well as data...
bool epilogueBegin = false; | ||
|
||
LineState() = default; | ||
LineState(const LineState& other) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was used, in that compound assignment I complained about. More generally it's just a rule of thumb that whatever you do with the copy constructor (default, delete, etc) you should also do with the assignment operator.
With this, we can update DWARF debug line info properly as
we write a new binary.
To do that we track binary locations as we write. Each
instruction is mapped to the location it is written to. We
must also adjust them as we move code around because
of LEB optimization (we emit a function or a section
with a 5-byte LEB placeholder, the maximal size; later
we shrink it which is almost always possible).
writeDWARFSections()
now takes a second param, the newlocations of instructions. It then maps debug line info from the
original offsets in the binary to the new offsets in the binary
being written.
The core logic for updating the debug line section is in
wasm-debug.cpp
. It basically tracks state machine logicboth to read the existing debug lines and to emit the new
ones. I couldn't find a way to reuse LLVM code for this, but
reading LLVM's code was very useful here.
A final tricky thing we need to do is to update the DWARF
section's internal size annotation. The LLVM YAML writing
code doesn't do that for us. Luckily it's pretty easy, in
fixEmittedSection
we just update the first 4 bytes in placeto have the section size, after we've emitted it and know
the size.
This ignores debug lines with a 0 in the line, col, or addr,
see WebAssembly/debugging#9 (comment)
This ignores debug line offsets into the middle of
instructions, which LLVM sometimes emits for some
reason, see WebAssembly/debugging#9 (comment)
Handling that would likely at least double our memory
usage, which is unfortunate - we are run in an LTO manner,
where the entire app's DWARF is present, and it may be
massive. I think we should see if such odd offsets are
a bug in LLVM, and if we can fix or prevent that.
This does not emit "special" opcodes for debug lines. Those
are purely an optimization, which I wanted to leave for
later. (Even without them we decrease the size quite a lot,
btw, as many lines have 0s in them...)
This adds some testing that shows we can load and save
fib2.c and fannkuch.cpp properly. The latter includes more
than one function and has nontrivial code.
To actually emit correct offsets a few minor fixes are
done here:
the correct offset we care about is the body of the code
section, not including the section declaration and size.
in BinaryInstWriter::visit(), that is, right before writing
bytes for the instruction. That differs from
BinaryenIRWriter::visit which is a recursive function
that also calls the children - so the offset there would be
of the first child. For some reason that is correct with
source maps, I don't understand why, but it's wrong for
DWARF...
temporarily, but doesn't make sense now (it just updates without
writing a binary).
cc @yurydelendik