Skip to content

Fix parsing of dwarfdump output to be more resilient #20777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 14, 2023

Conversation

kon72
Copy link
Contributor

@kon72 kon72 commented Nov 25, 2023

To generate source maps, we run llvm-dwarfdump against wasm file and parse its plain-text output heavily relied on regex matches.
Previously, in order to retrieve comp dir from debug_info section, we search regex patterns assuming that the output has entries DW_AT_stmt_list and DW_AT_comp_dir consecutively in this order, but I noticed that assumption doesn't hold in some cases.

The output that the previous code can parse comp_dir successfully:

0x00007a49: DW_TAG_compile_unit
              DW_AT_producer    ("clang version 18.0.0 (https://github.com/llvm/llvm-project 269685545e439ad050b67740533c59f965cae955)")
              DW_AT_language    (DW_LANG_C99)
              DW_AT_name    ("../../../system/lib/libc/musl/src/stdio/fflush.c")
              DW_AT_stmt_list   (0x00005c46)
              DW_AT_comp_dir    ("/emsdk/emscripten/cache/build/libc-debug-tmp")
              DW_AT_low_pc  (0x00002902)
              DW_AT_high_pc (0x00002a40)

The output that the previous code fails to parse comp_dir:

0x00007ddd: DW_TAG_compile_unit
              DW_AT_stmt_list   (0x00005f56)
              DW_AT_ranges  (0x000006b0
                 [0x00002c46, 0x00002c4e)
                 [0x00002c4f, 0x00002c59)
                 [0x00002c5a, 0x00002c74)
                 [0x00002c75, 0x00002c7d))
              DW_AT_name    ("../../../system/lib/compiler-rt/stack_ops.S")
              DW_AT_comp_dir    ("/emsdk/emscripten/cache/build/libcompiler_rt-tmp")
              DW_AT_producer    ("clang version 18.0.0 (https://github.com/llvm/llvm-project 269685545e439ad050b67740533c59f965cae955)")
              DW_AT_language    (DW_LANG_Mips_Assembler)

With this change, it can parse comp_dir regardless of the order of DW_* entries

@sbc100 sbc100 requested a review from dschuff November 27, 2023 20:38
@@ -198,12 +211,10 @@ def read_dwarf_entries(wasm, options):

entries = []
debug_line_chunks = re.split(r"debug_line\[(0x[0-9a-f]*)\]", output.decode('utf-8'))
maybe_debug_info_content = debug_line_chunks[0]
map_stmt_list_to_comp_dir = parse_debug_info_contents(debug_line_chunks[0])
for i in range(1, len(debug_line_chunks), 2):
stmt_list = debug_line_chunks[i]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but it looks like these two lines can be written as:

  for stmt_list in debug_line_chunks[1::2]:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I included this change in the PR.

Copy link
Member

@dschuff dschuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall this looks right to me.
@sbc100 should the minimal_runtime test failures go away with a rebase?

@@ -178,6 +178,19 @@ def remove_dead_entries(entries):
block_start = cur_entry


def parse_debug_info_contents(text):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it doesn't parse all the debug info contents (and since "debug info" is a bit ambiguous in this context since it could mean debug info generally or the specific .debug_info section), maybe this function could be called something that indicates that it extracts comp_dir information (e.g. "extract_comp_dir_map" or something like that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Renamed to extract_comp_dir_map

@kon72 kon72 force-pushed the fix-dwarfdump-parse branch from b6ce2cb to 87bba5b Compare December 1, 2023 08:37
@dschuff
Copy link
Member

dschuff commented Dec 6, 2023

One other request: if this is your first contribution, please add yourself to the AUTHORS file.

@kripken
Copy link
Member

kripken commented Dec 6, 2023

@dschuff We recently found that adding to AUTHORS is optional,

emscripten/AUTHORS

Lines 1 to 7 in f9f6f13

Contributors to Emscripten may optionally add themselves to this list of
authors if they choose to do so. This is optional because the git history is
good enough for that purpose, and in fact most projects do not maintain such a
manual list.
While the file is not necessary, to preserve the history of the project we are
not removing it.

So in general it is fine to either add to it, or not.

Separately, would it be practical to add testing in this PR? I commented in one of the related PRs here: #20776 (comment) But if it would be simpler to test this after those PRs that's find of course.

@kon72
Copy link
Contributor Author

kon72 commented Dec 9, 2023

@kripken

So in general it is fine to either add to it, or not.

Anyway, I added myself to AUTHORS :)

Separately, would it be practical to add testing in this PR?

I added a test but I have one question:
Since wasm-sourcemap.py has a dash - in its file name, I couldn't import it with a normal import statement, which I resort to use importlib.import_module.
Should I rename wasm-sourcemap.py to wasm_sourcemap.py so we can simply import it? or should I test it by running wasm-sourcemap.py as an executable so we don't have to import it? or is it good to go with current test code?

@@ -45,6 +46,8 @@
from tools import webassembly
from tools.settings import settings

wasm_sourcemap = importlib.import_module('tools.wasm-sourcemap')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps move this into the test itself? But, if it is simpler to run it as an executable, that seems fine as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved import_module into the test.

@dschuff
Copy link
Member

dschuff commented Dec 13, 2023

IIUC running it as an executable is consistent with how most of these auxiliary scripts are executed. It might be more optimal to do a broader change there, but I think keeping consistent here is fine.

@kripken kripken merged commit 79ac328 into emscripten-core:main Dec 14, 2023
@kripken
Copy link
Member

kripken commented Dec 14, 2023

Thanks for these fixes @kon72 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants