-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Don't dump full debug info section when creating source maps #9580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kripken
approved these changes
Oct 5, 2019
Member
kripken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!
belraquib
pushed a commit
to belraquib/emscripten
that referenced
this pull request
Dec 23, 2020
…ten-core#9580) wasm-sourcemap.py dumps the entire .debug_info section when creating source maps (which really only contains the info from the .debug_line section). It does this because it wants to find the compilation directory (DW_AT_comp_dir) associated with each compile unit, and remove it from the path of each file in the debug line section. However dumping the entire debug info section is very slow for large wasm files, and is unnecessary. Instead, only dump the top-level entities (the DW_TAG_compile_unit and don't recurse into their children). This goes a long way toward fixing emscripten-core#8948.
aheejin
added a commit
to aheejin/emscripten
that referenced
this pull request
Aug 23, 2025
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. To do this, you can use `emcc -gsource-map=names`. This also adds separate internal settings for this namd generation and the existing source embedding, making them more readable. Also because they are internal settings, they don't add the number of external options. When you run `wasm-sourcemap.py` standalone you can use `--names`. While we have the name sections and DWARF, I think it is generally good to support, given that the field exists for that purpose and JS source maps support it. It looks Dart toolchain also supports it: https://github.com/dart-lang/sdk/blob/187c3cb004b5f6a0a1f1b242b7d1b8a6b33b9a7a/pkg/wasm_builder/lib/source_map.dart#L105-L118 To measure source map size increase, I ran this on `wasm-opt.wasm` built by the `if (EMSCRIPTEN)` setup here (https://github.com/WebAssembly/binaryen/blob/969bf763a495b475e2a28163e7d70a5dd01f9dda/CMakeLists.txt#L299-L365) with `-gsource-map` vs. `-gsource-map=names`. The source map file size increased from 352743 to 443373, about 25%. While I think 25% increase of the source map file size is tolerable, this option is off by default, because with this we can't use emscripten-core#9580. So far we only needed `DW_TAG_compile_unit`s in `llvm-dwarfdump` results, and for that we could get away with printing only the top level tags using `--recurse-depth=0`. But to gather function information, we need to parse all `DW_TAG_subprogram`s, which can be at any depth (because functions can be within nested namespaces or classes). So the trick in emscripten-core#9580 does not work and dumping all `.debug_info` section will be slow. To avoid this problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. Fixes emscripten-core#20715.
aheejin
added a commit
to aheejin/emscripten
that referenced
this pull request
Aug 23, 2025
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. To do this, you can use `emcc -gsource-map=names`. This also adds separate internal settings for this namd generation and the existing source embedding, making them more readable. Also because they are internal settings, they don't add the number of external options. When you run `wasm-sourcemap.py` standalone you can use `--names`. While we have the name sections and DWARF, I think it is generally good to support, given that the field exists for that purpose and JS source maps support it. It looks Dart toolchain also supports it: https://github.com/dart-lang/sdk/blob/187c3cb004b5f6a0a1f1b242b7d1b8a6b33b9a7a/pkg/wasm_builder/lib/source_map.dart#L105-L118 To measure source map size increase, I ran this on `wasm-opt.wasm` built by the `if (EMSCRIPTEN)` setup here (https://github.com/WebAssembly/binaryen/blob/969bf763a495b475e2a28163e7d70a5dd01f9dda/CMakeLists.txt#L299-L365) with `-gsource-map` vs. `-gsource-map=names`. The source map file size increased from 352743 to 443373, about 25%. While I think 25% increase of the source map file size is tolerable, this option is off by default, because with this we can't use emscripten-core#9580. So far we only needed `DW_TAG_compile_unit`s in `llvm-dwarfdump` results, and for that we could get away with printing only the top level tags using `--recurse-depth=0`. But to gather function information, we need to parse all `DW_TAG_subprogram`s, which can be at any depth (because functions can be within nested namespaces or classes). So the trick in emscripten-core#9580 does not work and dumping all `.debug_info` section will be slow. To avoid this problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. Fixes emscripten-core#20715.
This was referenced Aug 25, 2025
aheejin
added a commit
to aheejin/emscripten
that referenced
this pull request
Nov 20, 2025
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. To do this, you can use `emcc -gsource-map=names`. This also adds separate internal settings for this namd generation and the existing source embedding, making them more readable. Also because they are internal settings, they don't add the number of external options. When you run `wasm-sourcemap.py` standalone you can use `--names`. While we have the name sections and DWARF, I think it is generally good to support, given that the field exists for that purpose and JS source maps support it. It looks Dart toolchain also supports it: https://github.com/dart-lang/sdk/blob/187c3cb004b5f6a0a1f1b242b7d1b8a6b33b9a7a/pkg/wasm_builder/lib/source_map.dart#L105-L118 To measure source map size increase, I ran this on `wasm-opt.wasm` built by the `if (EMSCRIPTEN)` setup here (https://github.com/WebAssembly/binaryen/blob/969bf763a495b475e2a28163e7d70a5dd01f9dda/CMakeLists.txt#L299-L365) with `-gsource-map` vs. `-gsource-map=names`. The source map file size increased from 352743 to 443373, about 25%. While I think 25% increase of the source map file size is tolerable, this option is off by default, because with this we can't use emscripten-core#9580. So far we only needed `DW_TAG_compile_unit`s in `llvm-dwarfdump` results, and for that we could get away with printing only the top level tags using `--recurse-depth=0`. But to gather function information, we need to parse all `DW_TAG_subprogram`s, which can be at any depth (because functions can be within nested namespaces or classes). So the trick in emscripten-core#9580 does not work and dumping all `.debug_info` section will be slow. To avoid this problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. Fixes emscripten-core#20715.
aheejin
added a commit
to aheejin/emscripten
that referenced
this pull request
Nov 25, 2025
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen. With this change and #???, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--child-tags / -t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increases running time of Fixes emscripten-core#20715 and closes emscripten-core#25116.
aheejin
added a commit
to aheejin/emscripten
that referenced
this pull request
Nov 25, 2025
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen. With this change and #???, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increases running time of Fixes emscripten-core#20715 and closes emscripten-core#25116.
aheejin
added a commit
to aheejin/emscripten
that referenced
this pull request
Nov 25, 2025
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen. With this change and #???, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increases running time of Fixes emscripten-core#20715 and closes emscripten-core#25116.
aheejin
added a commit
to aheejin/emscripten
that referenced
this pull request
Nov 26, 2025
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup here: https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372 with `-g -gsource-map`. With this PR and #???, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increased running time of `wasm-sourcemap.py`, in case of the `wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking time this was not very noticeable. Fixes emscripten-core#20715 and closes emscripten-core#25116.
aheejin
added a commit
to aheejin/emscripten
that referenced
this pull request
Nov 26, 2025
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup here: https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372 with `-g -gsource-map`. With this PR and WebAssembly/binaryen#8068, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increased running time of `wasm-sourcemap.py`, in case of the `wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking time this was not very noticeable. Fixes emscripten-core#20715 and closes emscripten-core#25116.
aheejin
added a commit
that referenced
this pull request
Dec 9, 2025
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup here: https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372 with `-g -gsource-map`. With this PR and WebAssembly/binaryen#8068, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increased running time of `wasm-sourcemap.py`, in case of the `wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking time this was not very noticeable. Fixes #20715 and closes #25116.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
wasm-sourcemap.py dumps the entire
.debug_infosection when creating source maps (which really only contains the info from the.debug_linesection). It does this because it wants to find the compilation directory (DW_AT_comp_dir) associated with each compile unit, and remove it from the path of each file in the debug line section. However dumping the entire debug info section is very slow for large wasm files, and is unnecessary. Instead, only dump the top-level entities (theDW_TAG_compile_unitand don't recurse into their children).This goes a long way toward fixing #8948.