Skip to content

[Bazel] Support for feature debug fission in emsdk-bazel-toolchain #1479 #1531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions bazel/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,18 @@ alias(
}),
)

alias(
name = "dwp_files",
actual = select({
":linux": "@emscripten_bin_linux//:dwp_files",
":linux_arm64": "@emscripten_bin_linux_arm64//:dwp_files",
":macos": "@emscripten_bin_mac//:dwp_files",
":macos_arm64": "@emscripten_bin_mac_arm64//:dwp_files",
":windows": "@emscripten_bin_win//:dwp_files",
"//conditions:default": ":empty",
}),
)

platform(
name = "platform_wasm",
constraint_values = [
Expand Down
7 changes: 7 additions & 0 deletions bazel/emscripten_deps.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,13 @@ filegroup(
],
),
)

filegroup(
name = "dwp_files",
srcs = [
"bin/llvm-dwp{bin_extension}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need emdwp.py like the above uses emar.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also related to the problem of passing environment variables to the dwp tool:
emdwp.py script base on some envs to ensure where are llvm bins are.
#1531 (comment)

],
)
"""

def emscripten_deps(emscripten_version = "latest"):
Expand Down
15 changes: 14 additions & 1 deletion bazel/emscripten_toolchain/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,25 @@ filegroup(
],
)

filegroup(
name = "dwp_files",
srcs = [
"emdwp-emscripten_bin_linux_arm64.sh",
"emdwp-emscripten_bin_linux.sh",
"emdwp-emscripten_bin_mac_arm64.sh",
"emdwp-emscripten_bin_mac.sh",
"emdwp-emscripten_bin_win.bat",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this tool require 5 files where the above tools all just use two (one .sh and one .bat)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why the dwp/emdwp_script tool doesn't have access to the environment variables, OR
I don't know how to make it pass these variables to this tool.

I've seen how to do it for emcc and emcc_link in bazel/emscripten_toolchain/toolchain.bzl,
that this is done by means of env_set for specific actions, I tried to complete the action list but it didn't help.
So I got around this by using five dedicated files.

I also ask about this on bazel-slack (Fri, Feb 14th), no answer yet: https://bazelbuild.slack.com/archives/CGA9QFQ8H/p1739488236571719

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printenv from dwp tool:

...
SUBCOMMAND: # //:main [action 'CcGenerateDwp main.dwp', configuration: 77bdbcfa4ce6db4cf8539adb5465d4c7ee5d7734a59bd2c6e99a60f4e1c191e4, execution platform: @@platforms//host:host, mnemonic: CcGenerateDwp]
(cd /private/var/tmp/_bazel_damian/1c7f831e8ac7ba8a4146146efff137a8/execroot/_main && \
  exec env - \
  external/emsdk/emscripten_toolchain/emdwp-emscripten_bin_mac_arm64.sh bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/sdk/bar.dwo bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/sdk/foo.dwo bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/main/main.dwo -o bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/main.dwp)
# Configuration: 77bdbcfa4ce6db4cf8539adb5465d4c7ee5d7734a59bd2c6e99a60f4e1c191e4
# Execution platform: @@platforms//host:host
INFO: From CcGenerateDwp main.dwp:
+ printenv
TMPDIR=/tmp
__CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0
PWD=/private/var/tmp/_bazel_damian/1c7f831e8ac7ba8a4146146efff137a8/sandbox/darwin-sandbox/30/execroot/_main
SHLVL=1
_=/usr/bin/printenv
+ exec external/emscripten_bin_mac_arm64/bin/llvm-dwp bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/sdk/bar.dwo bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/sdk/foo.dwo bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/main/main.dwo -o bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/main.dwp
…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason why environment variables are unavailable is in bazel itself.
The action named CcGenerateDwp is declared here: LINK

ctx.actions.run(
   mnemonic = "CcGenerateDwp",
   tools = packager["tools"],
   executable = packager["executable"],
   toolchain = cc_helper.CPP_TOOLCHAIN_TYPE,
   arguments = [packager["arguments"]],
   inputs = packager["inputs"],
   outputs = packager["outputs"],
)

As you can see, the env variable is not declared for this action.

For comparison, I found the CppArchive action where the env variable is set LINK:

   env = cc_common.get_environment_variables(
       feature_configuration = feature_configuration,
       action_name = CPP_LINK_STATIC_LIBRARY_ACTION_NAME,
       variables = archiver_variables,
   )

   # TODO(bazel-team): PWD=/proc/self/cwd env var is missing, but it is present when an analogous archiving
   # action is created by cc_library
   ctx.actions.run(
       executable = archiver_path,
       toolchain = cc_helper.CPP_TOOLCHAIN_TYPE,
       arguments = [args],
       env = env,
       inputs = depset(
           direct = object_files,
           transitive = [
               cc_toolchain.all_files,
           ],
       ),
       use_default_shell_env = True,
       outputs = [output_file],
       mnemonic = "CppArchive",
   )

So as you can see, on the master version (>8.1.0), these variables will still not be there.
So I don't know if it's worth waiting for it at all, or using what I suggested, or improving it somehow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It we do end up landed this alternative method of have 5 different scripts, I think we should at least document why its needed, and perhaps link to a bazel bug, so we can potentially revert to the more common pattern in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can report this to bazel and add a link to the report here.
The question is, where should I add a comment/TODO with a link in the sourcecode to make it clear enough?

In a comment in each of the five scripts? Or somewhere in a collective place? Maybe link the comment to this PR there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a comment in each of the 5 scripts sounds good.

Something like "This scripts is needed because we cannot use the emscripten emdwp.py entry point because of a limiation/bug with bazel: "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll prepare something for tomorrow.

Copy link
Contributor Author

@trzeciak trzeciak Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think about it, it's probably possible to do it like this:

  • add an emdwp-template.py file with the following content:
#!/bin/bash
#
#  This script…
#

export EMCC_WASM_BACKEND=__EMCC_WASM_BACKEND__
export EM_BIN_PATH=__EM_BIN_PATH__
export EM_CONFIG_PATH=__EM_CONFIG_PATH__
export NODE_JS_PATH=__NODE_JS_PATH__

source $(dirname $0)/env.sh

exec python3 $EMSCRIPTEN/tools/emdwp.py "$@"
  • add some bazel-stuff to extract those variable using select functionality to something like TemplateVariableInfo
  • add genrule to update emdwp-template.py based on TemplateVariableInfo
  • and connect the output of this genrule as a input of cc_toolchain.dwp_files

I think this should work. Unfortunately it's a bit beyond the time I wanted to spend on it.
On the other hand, I think it will be less readable than it is now (although I may be biased/lazy).

Copy link
Contributor Author

@trzeciak trzeciak Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reported it to bazel: bazelbuild/bazel#25336

I also updated all scripts to add this comment:

This script differs in form from emcc.{py,bat}/…, because bazel are limited/bugged in the way of executing dwp tool.
Bazel dwp action configuration does not pass environment variables, so we cannot use them in this script.
For more info, see PR discussion and bazel issue:
- https://github.com/emscripten-core/emsdk/pull/1531#discussion_r1962090650
- https://github.com/bazelbuild/bazel/issues/25336

"@emsdk//:dwp_files",
],
)

filegroup(
name = "all_files",
srcs = [
":ar_files",
":compiler_files",
":linker_files",
":dwp_files",
],
)

Expand All @@ -75,7 +88,7 @@ cc_toolchain(
ar_files = ":ar_files",
as_files = ":empty",
compiler_files = ":compiler_files",
dwp_files = ":empty",
dwp_files = ":dwp_files",
linker_files = ":linker_files",
objcopy_files = ":empty",
strip_files = ":empty",
Expand Down
10 changes: 10 additions & 0 deletions bazel/emscripten_toolchain/emdwp-emscripten_bin_linux.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash
#
# This script differs in form from emcc.{py,bat}/…, because bazel are limited/bugged in the way of executing dwp tool.
# Bazel dwp action configuration does not pass environment variables, so we cannot use them in this script.
# For more info, see PR discussion and bazel issue:
# - https://github.com/emscripten-core/emsdk/pull/1531#discussion_r1962090650
# - https://github.com/bazelbuild/bazel/issues/25336
#

exec external/emscripten_bin_linux/bin/llvm-dwp "$@"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The emdwp.py wrapper should be able to find these files based on LLVM_ROOT, so I don't think you need 5 different wrappers here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but it doesn't work / or I don't know how to do it, #1531 (comment)

10 changes: 10 additions & 0 deletions bazel/emscripten_toolchain/emdwp-emscripten_bin_linux_arm64.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash
#
# This script differs in form from emcc.{py,bat}/…, because bazel are limited/bugged in the way of executing dwp tool.
# Bazel dwp action configuration does not pass environment variables, so we cannot use them in this script.
# For more info, see PR discussion and bazel issue:
# - https://github.com/emscripten-core/emsdk/pull/1531#discussion_r1962090650
# - https://github.com/bazelbuild/bazel/issues/25336
#

exec external/emscripten_bin_linux_arm64/bin/llvm-dwp "$@"
10 changes: 10 additions & 0 deletions bazel/emscripten_toolchain/emdwp-emscripten_bin_mac.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash
#
# This script differs in form from emcc.{py,bat}/…, because bazel are limited/bugged in the way of executing dwp tool.
# Bazel dwp action configuration does not pass environment variables, so we cannot use them in this script.
# For more info, see PR discussion and bazel issue:
# - https://github.com/emscripten-core/emsdk/pull/1531#discussion_r1962090650
# - https://github.com/bazelbuild/bazel/issues/25336
#

exec external/emscripten_bin_mac/bin/llvm-dwp "$@"
10 changes: 10 additions & 0 deletions bazel/emscripten_toolchain/emdwp-emscripten_bin_mac_arm64.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash
#
# This script differs in form from emcc.{py,bat}/…, because bazel are limited/bugged in the way of executing dwp tool.
# Bazel dwp action configuration does not pass environment variables, so we cannot use them in this script.
# For more info, see PR discussion and bazel issue:
# - https://github.com/emscripten-core/emsdk/pull/1531#discussion_r1962090650
# - https://github.com/bazelbuild/bazel/issues/25336
#

exec external/emscripten_bin_mac_arm64/bin/llvm-dwp "$@"
10 changes: 10 additions & 0 deletions bazel/emscripten_toolchain/emdwp-emscripten_bin_win.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
::
:: This script differs in form from emcc.{py,bat}/…, because bazel are limited/bugged in the way of executing dwp tool.
:: Bazel dwp action configuration does not pass environment variables, so we cannot use them in this script.
:: For more info, see PR discussion and bazel issue:
:: - https://github.com/emscripten-core/emsdk/pull/1531#discussion_r1962090650
:: - https://github.com/bazelbuild/bazel/issues/25336
::
@ECHO OFF

call external\emscripten_bin_win\bin\llvm-dwp %*
46 changes: 46 additions & 0 deletions bazel/emscripten_toolchain/toolchain.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,14 @@ def _impl(ctx):

emscripten_dir = ctx.attr.emscripten_binaries.label.workspace_root
nodejs_path = ctx.file.nodejs_bin.path
emscripten_name = ctx.attr.emscripten_binaries.label.workspace_name

builtin_sysroot = emscripten_dir + "/emscripten/cache/sysroot"

emcc_script = "emcc.%s" % ctx.attr.script_extension
emcc_link_script = "emcc_link.%s" % ctx.attr.script_extension
emar_script = "emar.%s" % ctx.attr.script_extension
emdwp_script = "emdwp-%s.%s" % (emscripten_name, ctx.attr.script_extension)

################################################################
# Tools
Expand All @@ -99,6 +101,7 @@ def _impl(ctx):
tool_path(name = "nm", path = "NOT_USED"),
tool_path(name = "objdump", path = "/bin/false"),
tool_path(name = "strip", path = "NOT_USED"),
tool_path(name = "dwp", path = emdwp_script),
]

################################################################
Expand Down Expand Up @@ -460,6 +463,49 @@ def _impl(ctx):
feature(
name = "wasm_standalone",
),
# Support for debug fission. In short, debugging fission should:
# * reduce linking time, RAM usage and disk usage
# * speed up incremental builds
# * speed up debugger work (reduce startup and breakpoint time)
# (to use this, follow the --fission=yes flag)
# https://developer.chrome.com/blog/faster-wasm-debugging
# https://bazel.build/docs/user-manual#fission
feature(
name = "per_object_debug_info",
flag_sets = [
flag_set(
actions = [
ACTION_NAMES.c_compile,
ACTION_NAMES.cpp_compile,
ACTION_NAMES.cpp_module_codegen,
ACTION_NAMES.assemble,
ACTION_NAMES.preprocess_assemble,
],
flag_groups = [
flag_group(
flags = ["-g", "-gsplit-dwarf", "-gdwarf-5", "-gpubnames"],
expand_if_available = "per_object_debug_info_file",
),
],
),
],
enabled = True,
),
feature(
name = "fission_support",
flag_sets = [
flag_set(
actions = all_link_actions,
flag_groups = [
flag_group(
flags = ["-sWASM_BIGINT"], # WASM_BIGINT required to support dwarf-5
expand_if_available = "is_using_fission",
),
],
),
],
enabled = True,
)
]

crosstool_default_flag_sets = [
Expand Down
10 changes: 10 additions & 0 deletions bazel/emscripten_toolchain/wasm_binary.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import argparse
import os
import tarfile
import shutil


def ensure(f):
Expand All @@ -26,11 +27,20 @@ def main():
parser = argparse.ArgumentParser()
parser.add_argument('--archive', help='The archive to extract from.')
parser.add_argument('--outputs', help='Comma separated list of files that should be extracted from the archive. Only the extname has to match a file in the archive.')
parser.add_argument('--dwp_file', help='Optional dwp input file, generated when fission flags set.')
parser.add_argument('--allow_empty_outputs', help='If an output listed in --outputs does not exist, create it anyways.', action='store_true')
args = parser.parse_args()

args.archive = os.path.normpath(args.archive)
args.outputs = args.outputs.split(",")
args.dwp_file = os.path.normpath(args.dwp_file) if args.dwp_file else None

if args.dwp_file:
for idx, output in enumerate(args.outputs):
if output.endswith(".dwp"): # also update extension 'binary.dwp' to 'binary.wasm.dwp'
shutil.copy2(args.dwp_file, output)
args.outputs.pop(idx)
break

tar = tarfile.open(args.archive)

Expand Down
32 changes: 24 additions & 8 deletions bazel/emscripten_toolchain/wasm_cc_binary.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ _ALLOW_OUTPUT_EXTNAMES = [
".fetch.js",
".js.symbols",
".wasm.debug.wasm",
".wasm.dwp",
".html",
".aw.js",
]
Expand Down Expand Up @@ -107,10 +108,11 @@ _WASM_BINARY_COMMON_ATTRS = {
}

def _wasm_cc_binary_impl(ctx):
args = ctx.actions.args()
cc_target = ctx.attr.cc_target[0]
dwp_file = cc_target[DebugPackageInfo].dwp_file if DebugPackageInfo in cc_target else None
outputs = ctx.outputs.outputs

for output in ctx.outputs.outputs:
for output in outputs:
valid_extname = False
for allowed_extname in _ALLOW_OUTPUT_EXTNAMES:
if output.path.endswith(allowed_extname):
Expand All @@ -119,28 +121,35 @@ def _wasm_cc_binary_impl(ctx):
if not valid_extname:
fail("Invalid output '{}'. Allowed extnames: {}".format(output.basename, ", ".join(_ALLOW_OUTPUT_EXTNAMES)))

inputs = ctx.files.cc_target
args = ctx.actions.args()
args.add_all("--archive", ctx.files.cc_target)
args.add_joined("--outputs", ctx.outputs.outputs, join_with = ",")
args.add_joined("--outputs", outputs, join_with = ",")

if dwp_file:
args.add("--dwp_file", dwp_file)
inputs = inputs + [dwp_file]

ctx.actions.run(
inputs = ctx.files.cc_target,
outputs = ctx.outputs.outputs,
inputs = inputs,
outputs = outputs,
arguments = [args],
executable = ctx.executable._wasm_binary_extractor,
)

return [
DefaultInfo(
files = depset(ctx.outputs.outputs),
files = depset(outputs),
# This is needed since rules like web_test usually have a data
# dependency on this target.
data_runfiles = ctx.runfiles(transitive_files = depset(ctx.outputs.outputs)),
data_runfiles = ctx.runfiles(transitive_files = depset(outputs)),
),
OutputGroupInfo(_wasm_tar = cc_target.files),
]

def _wasm_cc_binary_legacy_impl(ctx):
cc_target = ctx.attr.cc_target[0]
dwp_file = cc_target[DebugPackageInfo].dwp_file if DebugPackageInfo in cc_target else None
outputs = [
ctx.outputs.loader,
ctx.outputs.wasm,
Expand All @@ -151,17 +160,23 @@ def _wasm_cc_binary_legacy_impl(ctx):
ctx.outputs.data,
ctx.outputs.symbols,
ctx.outputs.dwarf,
ctx.outputs.dwp,
ctx.outputs.html,
ctx.outputs.audio_worklet,
]

inputs = ctx.files.cc_target
args = ctx.actions.args()
args.add("--allow_empty_outputs")
args.add_all("--archive", ctx.files.cc_target)
args.add_joined("--outputs", outputs, join_with = ",")

if dwp_file:
args.add("--dwp_file", dwp_file)
inputs = inputs + [dwp_file]

ctx.actions.run(
inputs = ctx.files.cc_target,
inputs = inputs,
outputs = outputs,
arguments = [args],
executable = ctx.executable._wasm_binary_extractor,
Expand Down Expand Up @@ -202,6 +217,7 @@ def _wasm_binary_legacy_outputs(name, cc_target):
"data": "{}/{}.data".format(name, basename),
"symbols": "{}/{}.js.symbols".format(name, basename),
"dwarf": "{}/{}.wasm.debug.wasm".format(name, basename),
"dwp": "{}/{}.wasm.dwp".format(name, basename),
"html": "{}/{}.html".format(name, basename),
"audio_worklet": "{}/{}.aw.js".format(name, basename)
}
Expand Down