Skip to content

Conversation

@dschuff
Copy link
Member

@dschuff dschuff commented Oct 5, 2019

wasm-sourcemap.py dumps the entire .debug_info section when creating source maps (which really only contains the info from the .debug_line section). It does this because it wants to find the compilation directory (DW_AT_comp_dir) associated with each compile unit, and remove it from the path of each file in the debug line section. However dumping the entire debug info section is very slow for large wasm files, and is unnecessary. Instead, only dump the top-level entities (the DW_TAG_compile_unit and don't recurse into their children).

This goes a long way toward fixing #8948.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

@dschuff dschuff merged commit 87980c0 into incoming Oct 7, 2019
@dschuff dschuff deleted the nodebuginfo branch October 7, 2019 19:29
belraquib pushed a commit to belraquib/emscripten that referenced this pull request Dec 23, 2020
…ten-core#9580)

wasm-sourcemap.py dumps the entire .debug_info section when creating source maps (which really only contains the info from the .debug_line section). It does this because it wants to find the compilation directory (DW_AT_comp_dir) associated with each compile unit, and remove it from the path of each file in the debug line section. However dumping the entire debug info section is very slow for large wasm files, and is unnecessary. Instead, only dump the top-level entities (the DW_TAG_compile_unit and don't recurse into their children).

This goes a long way toward fixing emscripten-core#8948.
aheejin added a commit to aheejin/emscripten that referenced this pull request Aug 23, 2025
This adds support for `names` field in source maps, which contains
function names. Source map mappings are correspondingly updated and
emsymbolizer now can provide function name information only with source
maps.

To do this, you can use `emcc -gsource-map=names`. This also adds
separate internal settings for this namd generation and the existing
source embedding, making them more readable. Also because they are
internal settings, they don't add the number of external options. When
you run `wasm-sourcemap.py` standalone you can use `--names`.

While we have the name sections and DWARF, I think it is generally
good to support, given that the field exists for that purpose and JS
source maps support it. It looks Dart toolchain also supports it:
https://github.com/dart-lang/sdk/blob/187c3cb004b5f6a0a1f1b242b7d1b8a6b33b9a7a/pkg/wasm_builder/lib/source_map.dart#L105-L118

To measure source map size increase, I ran this on `wasm-opt.wasm` built
by the `if (EMSCRIPTEN)` setup here
(https://github.com/WebAssembly/binaryen/blob/969bf763a495b475e2a28163e7d70a5dd01f9dda/CMakeLists.txt#L299-L365)
with `-gsource-map` vs. `-gsource-map=names`. The source map file size
increased from 352743 to 443373, about 25%.

While I think 25% increase of the source map file size is tolerable,
this option is off by default, because with this we can't use emscripten-core#9580.
So far we only needed `DW_TAG_compile_unit`s in `llvm-dwarfdump`
results, and for that we could get away with printing only the top level
tags using `--recurse-depth=0`. But to gather function information, we
need to parse all `DW_TAG_subprogram`s, which can be at any depth
(because functions can be within nested namespaces or classes). So the
trick in emscripten-core#9580 does not work and dumping all `.debug_info` section will
be slow. To avoid this problem, we can consider using DWARF-parsing
Python libraries like https://github.com/eliben/pyelftools, but this
will make another third party dependency, so I'm not sure if it's worth
it at this point.

Fixes emscripten-core#20715.
aheejin added a commit to aheejin/emscripten that referenced this pull request Aug 23, 2025
This adds support for `names` field in source maps, which contains
function names. Source map mappings are correspondingly updated and
emsymbolizer now can provide function name information only with source
maps.

To do this, you can use `emcc -gsource-map=names`. This also adds
separate internal settings for this namd generation and the existing
source embedding, making them more readable. Also because they are
internal settings, they don't add the number of external options. When
you run `wasm-sourcemap.py` standalone you can use `--names`.

While we have the name sections and DWARF, I think it is generally
good to support, given that the field exists for that purpose and JS
source maps support it. It looks Dart toolchain also supports it:
https://github.com/dart-lang/sdk/blob/187c3cb004b5f6a0a1f1b242b7d1b8a6b33b9a7a/pkg/wasm_builder/lib/source_map.dart#L105-L118

To measure source map size increase, I ran this on `wasm-opt.wasm` built
by the `if (EMSCRIPTEN)` setup here
(https://github.com/WebAssembly/binaryen/blob/969bf763a495b475e2a28163e7d70a5dd01f9dda/CMakeLists.txt#L299-L365)
with `-gsource-map` vs. `-gsource-map=names`. The source map file size
increased from 352743 to 443373, about 25%.

While I think 25% increase of the source map file size is tolerable,
this option is off by default, because with this we can't use emscripten-core#9580.
So far we only needed `DW_TAG_compile_unit`s in `llvm-dwarfdump`
results, and for that we could get away with printing only the top level
tags using `--recurse-depth=0`. But to gather function information, we
need to parse all `DW_TAG_subprogram`s, which can be at any depth
(because functions can be within nested namespaces or classes). So the
trick in emscripten-core#9580 does not work and dumping all `.debug_info` section will
be slow. To avoid this problem, we can consider using DWARF-parsing
Python libraries like https://github.com/eliben/pyelftools, but this
will make another third party dependency, so I'm not sure if it's worth
it at this point.

Fixes emscripten-core#20715.
aheejin added a commit to aheejin/emscripten that referenced this pull request Nov 20, 2025
This adds support for `names` field in source maps, which contains
function names. Source map mappings are correspondingly updated and
emsymbolizer now can provide function name information only with source
maps.

To do this, you can use `emcc -gsource-map=names`. This also adds
separate internal settings for this namd generation and the existing
source embedding, making them more readable. Also because they are
internal settings, they don't add the number of external options. When
you run `wasm-sourcemap.py` standalone you can use `--names`.

While we have the name sections and DWARF, I think it is generally
good to support, given that the field exists for that purpose and JS
source maps support it. It looks Dart toolchain also supports it:
https://github.com/dart-lang/sdk/blob/187c3cb004b5f6a0a1f1b242b7d1b8a6b33b9a7a/pkg/wasm_builder/lib/source_map.dart#L105-L118

To measure source map size increase, I ran this on `wasm-opt.wasm` built
by the `if (EMSCRIPTEN)` setup here
(https://github.com/WebAssembly/binaryen/blob/969bf763a495b475e2a28163e7d70a5dd01f9dda/CMakeLists.txt#L299-L365)
with `-gsource-map` vs. `-gsource-map=names`. The source map file size
increased from 352743 to 443373, about 25%.

While I think 25% increase of the source map file size is tolerable,
this option is off by default, because with this we can't use emscripten-core#9580.
So far we only needed `DW_TAG_compile_unit`s in `llvm-dwarfdump`
results, and for that we could get away with printing only the top level
tags using `--recurse-depth=0`. But to gather function information, we
need to parse all `DW_TAG_subprogram`s, which can be at any depth
(because functions can be within nested namespaces or classes). So the
trick in emscripten-core#9580 does not work and dumping all `.debug_info` section will
be slow. To avoid this problem, we can consider using DWARF-parsing
Python libraries like https://github.com/eliben/pyelftools, but this
will make another third party dependency, so I'm not sure if it's worth
it at this point.

Fixes emscripten-core#20715.
aheejin added a commit to aheejin/emscripten that referenced this pull request Nov 25, 2025
This adds support for `names` field in source maps, which contains
function names. Source map mappings are correspondingly updated and
emsymbolizer now can provide function name information only with source
maps.

While source maps don't provide the full inlined hierarchies, this
provides the name of the original (= pre-inlining) function, which may
not exist in the final binary because they were inlined. This is because
source maps are primarily intended for user debugging.

This also demangles C++ function names using `llvm-cxxfilt`, so the
printed names can be human-readable.

I tested with `wasm-opt.wasm` from Binaryen. With this change and #???,
the source map file size increases by 3.5x (8632423 -> 30070042)
primarily due to the function name strings.

From `llvm-dwarfdump` output, this also requires additional parsing of
`DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at
any depths (because functions can be within nested namespaces or
classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case
of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in
the command line, the size of its text output increased by 27.5x, but
with the `--child-tags / -t` option
(llvm/llvm-project#165720), the text output
increased only (?) by 3.2x, which I think is tolerable. This disables
`names` field generation when `-t` option is not available in
`llvm-dwarfdump` because it was added recently. To avoid this text size
problem, we can consider using DWARF-parsing Python libraries like
https://github.com/eliben/pyelftools, but this will make another third
party dependency, so I'm not sure if it's worth it at this point.

This also increases running time of

Fixes emscripten-core#20715 and closes emscripten-core#25116.
aheejin added a commit to aheejin/emscripten that referenced this pull request Nov 25, 2025
This adds support for `names` field in source maps, which contains
function names. Source map mappings are correspondingly updated and
emsymbolizer now can provide function name information only with source
maps.

While source maps don't provide the full inlined hierarchies, this
provides the name of the original (= pre-inlining) function, which may
not exist in the final binary because they were inlined. This is because
source maps are primarily intended for user debugging.

This also demangles C++ function names using `llvm-cxxfilt`, so the
printed names can be human-readable.

I tested with `wasm-opt.wasm` from Binaryen. With this change and #???,
the source map file size increases by 3.5x (8632423 -> 30070042)
primarily due to the function name strings.

From `llvm-dwarfdump` output, this also requires additional parsing of
`DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at
any depths (because functions can be within nested namespaces or
classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case
of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in
the command line, the size of its text output increased by 27.5x, but
with the `--filter-child-tag` / `-t` option
(llvm/llvm-project#165720), the text output
increased only (?) by 3.2x, which I think is tolerable. This disables
`names` field generation when `-t` option is not available in
`llvm-dwarfdump` because it was added recently. To avoid this text size
problem, we can consider using DWARF-parsing Python libraries like
https://github.com/eliben/pyelftools, but this will make another third
party dependency, so I'm not sure if it's worth it at this point.

This also increases running time of

Fixes emscripten-core#20715 and closes emscripten-core#25116.
aheejin added a commit to aheejin/emscripten that referenced this pull request Nov 25, 2025
This adds support for `names` field in source maps, which contains
function names. Source map mappings are correspondingly updated and
emsymbolizer now can provide function name information only with source
maps.

While source maps don't provide the full inlined hierarchies, this
provides the name of the original (= pre-inlining) function, which may
not exist in the final binary because they were inlined. This is because
source maps are primarily intended for user debugging.

This also demangles C++ function names using `llvm-cxxfilt`, so the
printed names can be human-readable.

I tested with `wasm-opt.wasm` from Binaryen. With this change and #???,
the source map file size increases by 3.5x (8632423 -> 30070042)
primarily due to the function name strings.

From `llvm-dwarfdump` output, this also requires additional parsing of
`DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at
any depths (because functions can be within nested namespaces or
classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case
of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in
the command line, the size of its text output increased by 27.5x, but
with the `--filter-child-tag` / `-t` option
(llvm/llvm-project#165720), the text output
increased only (?) by 3.2x, which I think is tolerable. This disables
`names` field generation when `-t` option is not available in
`llvm-dwarfdump` because it was added recently. To avoid this text size
problem, we can consider using DWARF-parsing Python libraries like
https://github.com/eliben/pyelftools, but this will make another third
party dependency, so I'm not sure if it's worth it at this point.

This also increases running time of

Fixes emscripten-core#20715 and closes emscripten-core#25116.
aheejin added a commit to aheejin/emscripten that referenced this pull request Nov 26, 2025
This adds support for `names` field in source maps, which contains
function names. Source map mappings are correspondingly updated and
emsymbolizer now can provide function name information only with source
maps.

While source maps don't provide the full inlined hierarchies, this
provides the name of the original (= pre-inlining) function, which may
not exist in the final binary because they were inlined. This is because
source maps are primarily intended for user debugging.

This also demangles C++ function names using `llvm-cxxfilt`, so the
printed names can be human-readable.

I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup
here:
https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372
with `-g -gsource-map`. With this PR and #???, the source map file size
increases by 3.5x (8632423 -> 30070042) primarily due to the function
name strings.

From `llvm-dwarfdump` output, this also requires additional parsing of
`DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at
any depths (because functions can be within nested namespaces or
classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case
of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in
the command line, the size of its text output increased by 27.5x, but
with the `--filter-child-tag` / `-t` option
(llvm/llvm-project#165720), the text output
increased only (?) by 3.2x, which I think is tolerable. This disables
`names` field generation when `-t` option is not available in
`llvm-dwarfdump` because it was added recently. To avoid this text size
problem, we can consider using DWARF-parsing Python libraries like
https://github.com/eliben/pyelftools, but this will make another third
party dependency, so I'm not sure if it's worth it at this point.

This also increased running time of `wasm-sourcemap.py`, in case of the
`wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking
time this was not very noticeable.

Fixes emscripten-core#20715 and closes emscripten-core#25116.
aheejin added a commit to aheejin/emscripten that referenced this pull request Nov 26, 2025
This adds support for `names` field in source maps, which contains
function names. Source map mappings are correspondingly updated and
emsymbolizer now can provide function name information only with source
maps.

While source maps don't provide the full inlined hierarchies, this
provides the name of the original (= pre-inlining) function, which may
not exist in the final binary because they were inlined. This is because
source maps are primarily intended for user debugging.

This also demangles C++ function names using `llvm-cxxfilt`, so the
printed names can be human-readable.

I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup
here:
https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372
with `-g -gsource-map`. With this PR and
WebAssembly/binaryen#8068, the source map file
size increases by 3.5x (8632423 -> 30070042) primarily due to the
function name strings.

From `llvm-dwarfdump` output, this also requires additional parsing of
`DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at
any depths (because functions can be within nested namespaces or
classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case
of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in
the command line, the size of its text output increased by 27.5x, but
with the `--filter-child-tag` / `-t` option
(llvm/llvm-project#165720), the text output
increased only (?) by 3.2x, which I think is tolerable. This disables
`names` field generation when `-t` option is not available in
`llvm-dwarfdump` because it was added recently. To avoid this text size
problem, we can consider using DWARF-parsing Python libraries like
https://github.com/eliben/pyelftools, but this will make another third
party dependency, so I'm not sure if it's worth it at this point.

This also increased running time of `wasm-sourcemap.py`, in case of the
`wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking
time this was not very noticeable.

Fixes emscripten-core#20715 and closes emscripten-core#25116.
aheejin added a commit that referenced this pull request Dec 9, 2025
This adds support for `names` field in source maps, which contains
function names. Source map mappings are correspondingly updated and
emsymbolizer now can provide function name information only with source
maps.

While source maps don't provide the full inlined hierarchies, this
provides the name of the original (= pre-inlining) function, which may
not exist in the final binary because they were inlined. This is because
source maps are primarily intended for user debugging.

This also demangles C++ function names using `llvm-cxxfilt`, so the
printed names can be human-readable.

I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup
here:

https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372
with `-g -gsource-map`. With this PR and
WebAssembly/binaryen#8068, the source map file
size increases by 3.5x (8632423 -> 30070042) primarily due to the
function name strings.

From `llvm-dwarfdump` output, this also requires additional parsing of
`DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at
any depths (because functions can be within nested namespaces or
classes), so we cannot use `--recurse-depth=0` (#9580) anymore. In case
of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in
the command line, the size of its text output increased by 27.5x, but
with the `--filter-child-tag` / `-t` option
(llvm/llvm-project#165720), the text output
increased only (?) by 3.2x, which I think is tolerable. This disables
`names` field generation when `-t` option is not available in
`llvm-dwarfdump` because it was added recently. To avoid this text size
problem, we can consider using DWARF-parsing Python libraries like
https://github.com/eliben/pyelftools, but this will make another third
party dependency, so I'm not sure if it's worth it at this point.

This also increased running time of `wasm-sourcemap.py`, in case of the
`wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking
time this was not very noticeable.

Fixes #20715 and closes #25116.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants