Don't dump full debug info section when creating source maps #9580

dschuff · 2019-10-05T00:41:39Z

wasm-sourcemap.py dumps the entire .debug_info section when creating source maps (which really only contains the info from the .debug_line section). It does this because it wants to find the compilation directory (DW_AT_comp_dir) associated with each compile unit, and remove it from the path of each file in the debug line section. However dumping the entire debug info section is very slow for large wasm files, and is unnecessary. Instead, only dump the top-level entities (the DW_TAG_compile_unit and don't recurse into their children).

This goes a long way toward fixing #8948.

kripken

Very cool!

…ten-core#9580) wasm-sourcemap.py dumps the entire .debug_info section when creating source maps (which really only contains the info from the .debug_line section). It does this because it wants to find the compilation directory (DW_AT_comp_dir) associated with each compile unit, and remove it from the path of each file in the debug line section. However dumping the entire debug info section is very slow for large wasm files, and is unnecessary. Instead, only dump the top-level entities (the DW_TAG_compile_unit and don't recurse into their children). This goes a long way toward fixing emscripten-core#8948.

This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. To do this, you can use `emcc -gsource-map=names`. This also adds separate internal settings for this namd generation and the existing source embedding, making them more readable. Also because they are internal settings, they don't add the number of external options. When you run `wasm-sourcemap.py` standalone you can use `--names`. While we have the name sections and DWARF, I think it is generally good to support, given that the field exists for that purpose and JS source maps support it. It looks Dart toolchain also supports it: https://github.com/dart-lang/sdk/blob/187c3cb004b5f6a0a1f1b242b7d1b8a6b33b9a7a/pkg/wasm_builder/lib/source_map.dart#L105-L118 To measure source map size increase, I ran this on `wasm-opt.wasm` built by the `if (EMSCRIPTEN)` setup here (https://github.com/WebAssembly/binaryen/blob/969bf763a495b475e2a28163e7d70a5dd01f9dda/CMakeLists.txt#L299-L365) with `-gsource-map` vs. `-gsource-map=names`. The source map file size increased from 352743 to 443373, about 25%. While I think 25% increase of the source map file size is tolerable, this option is off by default, because with this we can't use emscripten-core#9580. So far we only needed `DW_TAG_compile_unit`s in `llvm-dwarfdump` results, and for that we could get away with printing only the top level tags using `--recurse-depth=0`. But to gather function information, we need to parse all `DW_TAG_subprogram`s, which can be at any depth (because functions can be within nested namespaces or classes). So the trick in emscripten-core#9580 does not work and dumping all `.debug_info` section will be slow. To avoid this problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. Fixes emscripten-core#20715.

This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen. With this change and #???, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--child-tags / -t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increases running time of Fixes emscripten-core#20715 and closes emscripten-core#25116.

This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen. With this change and #???, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increases running time of Fixes emscripten-core#20715 and closes emscripten-core#25116.

This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup here: https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372 with `-g -gsource-map`. With this PR and #???, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increased running time of `wasm-sourcemap.py`, in case of the `wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking time this was not very noticeable. Fixes emscripten-core#20715 and closes emscripten-core#25116.

This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup here: https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372 with `-g -gsource-map`. With this PR and WebAssembly/binaryen#8068, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increased running time of `wasm-sourcemap.py`, in case of the `wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking time this was not very noticeable. Fixes emscripten-core#20715 and closes emscripten-core#25116.

This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup here: https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372 with `-g -gsource-map`. With this PR and WebAssembly/binaryen#8068, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increased running time of `wasm-sourcemap.py`, in case of the `wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking time this was not very noticeable. Fixes #20715 and closes #25116.

dschuff added 2 commits October 4, 2019 16:44

Don't dump debug info section when creating source maps

bc65e71

just print top-level

4493ecb

dschuff requested review from kripken and yurydelendik October 5, 2019 00:41

kripken approved these changes Oct 5, 2019

View reviewed changes

dschuff merged commit 87980c0 into incoming Oct 7, 2019

dschuff deleted the nodebuginfo branch October 7, 2019 19:29

sbc100 mentioned this pull request Oct 7, 2019

wasm-sourcemaps.py is slow and memory inefficient #8948

Closed

This was referenced Aug 25, 2025

Support names field in source maps #25044

Closed

Do source maps represent original location or current location? #25116

Closed

aheejin mentioned this pull request Nov 26, 2025

Support names field in source maps #25870

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't dump full debug info section when creating source maps #9580

Don't dump full debug info section when creating source maps #9580

Uh oh!

dschuff commented Oct 5, 2019

Uh oh!

kripken left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Don't dump full debug info section when creating source maps #9580

Don't dump full debug info section when creating source maps #9580

Uh oh!

Conversation

dschuff commented Oct 5, 2019

Uh oh!

kripken left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants