Skip to content

[Bazel] Support for feature debug fission in emsdk-bazel-toolchain #1479 #1531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

trzeciak
Copy link
Contributor

Support for debug fission. In short, debugging fission should:

  • reduce linking time, RAM usage and disk usage
  • speed up incremental builds
  • speed up debugger work (reduce startup and breakpoint time)

To use this, follow the --fission=yes flag.

References:

@trzeciak
Copy link
Contributor Author

This fixes issue 1479

@trzeciak
Copy link
Contributor Author

trzeciak commented Feb 19, 2025

Can I sync bazel for windows (from old ones 5.4.0), to be sync with linux/mac?
https://github.com/emscripten-core/emsdk/pull/1531/checks?check_run_id=37473055285

I'll try this, but so far no luck: #1532

(edited)
Okay, I managed to make Bazel 5.4.1 supported too, so it doesn't block this PR.

"emdwp-emscripten_bin_linux.sh",
"emdwp-emscripten_bin_mac_arm64.sh",
"emdwp-emscripten_bin_mac.sh",
"emdwp-emscripten_bin_win.bat",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this tool require 5 files where the above tools all just use two (one .sh and one .bat)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why the dwp/emdwp_script tool doesn't have access to the environment variables, OR
I don't know how to make it pass these variables to this tool.

I've seen how to do it for emcc and emcc_link in bazel/emscripten_toolchain/toolchain.bzl,
that this is done by means of env_set for specific actions, I tried to complete the action list but it didn't help.
So I got around this by using five dedicated files.

I also ask about this on bazel-slack (Fri, Feb 14th), no answer yet: https://bazelbuild.slack.com/archives/CGA9QFQ8H/p1739488236571719

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printenv from dwp tool:

...
SUBCOMMAND: # //:main [action 'CcGenerateDwp main.dwp', configuration: 77bdbcfa4ce6db4cf8539adb5465d4c7ee5d7734a59bd2c6e99a60f4e1c191e4, execution platform: @@platforms//host:host, mnemonic: CcGenerateDwp]
(cd /private/var/tmp/_bazel_damian/1c7f831e8ac7ba8a4146146efff137a8/execroot/_main && \
  exec env - \
  external/emsdk/emscripten_toolchain/emdwp-emscripten_bin_mac_arm64.sh bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/sdk/bar.dwo bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/sdk/foo.dwo bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/main/main.dwo -o bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/main.dwp)
# Configuration: 77bdbcfa4ce6db4cf8539adb5465d4c7ee5d7734a59bd2c6e99a60f4e1c191e4
# Execution platform: @@platforms//host:host
INFO: From CcGenerateDwp main.dwp:
+ printenv
TMPDIR=/tmp
__CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0
PWD=/private/var/tmp/_bazel_damian/1c7f831e8ac7ba8a4146146efff137a8/sandbox/darwin-sandbox/30/execroot/_main
SHLVL=1
_=/usr/bin/printenv
+ exec external/emscripten_bin_mac_arm64/bin/llvm-dwp bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/sdk/bar.dwo bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/sdk/foo.dwo bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/_objs/main/main.dwo -o bazel-out/wasm-dbg-ST-6bab6f2a10f7/bin/main.dwp
…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason why environment variables are unavailable is in bazel itself.
The action named CcGenerateDwp is declared here: LINK

ctx.actions.run(
   mnemonic = "CcGenerateDwp",
   tools = packager["tools"],
   executable = packager["executable"],
   toolchain = cc_helper.CPP_TOOLCHAIN_TYPE,
   arguments = [packager["arguments"]],
   inputs = packager["inputs"],
   outputs = packager["outputs"],
)

As you can see, the env variable is not declared for this action.

For comparison, I found the CppArchive action where the env variable is set LINK:

   env = cc_common.get_environment_variables(
       feature_configuration = feature_configuration,
       action_name = CPP_LINK_STATIC_LIBRARY_ACTION_NAME,
       variables = archiver_variables,
   )

   # TODO(bazel-team): PWD=/proc/self/cwd env var is missing, but it is present when an analogous archiving
   # action is created by cc_library
   ctx.actions.run(
       executable = archiver_path,
       toolchain = cc_helper.CPP_TOOLCHAIN_TYPE,
       arguments = [args],
       env = env,
       inputs = depset(
           direct = object_files,
           transitive = [
               cc_toolchain.all_files,
           ],
       ),
       use_default_shell_env = True,
       outputs = [output_file],
       mnemonic = "CppArchive",
   )

So as you can see, on the master version (>8.1.0), these variables will still not be there.
So I don't know if it's worth waiting for it at all, or using what I suggested, or improving it somehow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It we do end up landed this alternative method of have 5 different scripts, I think we should at least document why its needed, and perhaps link to a bazel bug, so we can potentially revert to the more common pattern in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can report this to bazel and add a link to the report here.
The question is, where should I add a comment/TODO with a link in the sourcecode to make it clear enough?

In a comment in each of the five scripts? Or somewhere in a collective place? Maybe link the comment to this PR there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a comment in each of the 5 scripts sounds good.

Something like "This scripts is needed because we cannot use the emscripten emdwp.py entry point because of a limiation/bug with bazel: "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll prepare something for tomorrow.

Copy link
Contributor Author

@trzeciak trzeciak Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think about it, it's probably possible to do it like this:

  • add an emdwp-template.py file with the following content:
#!/bin/bash
#
#  This script…
#

export EMCC_WASM_BACKEND=__EMCC_WASM_BACKEND__
export EM_BIN_PATH=__EM_BIN_PATH__
export EM_CONFIG_PATH=__EM_CONFIG_PATH__
export NODE_JS_PATH=__NODE_JS_PATH__

source $(dirname $0)/env.sh

exec python3 $EMSCRIPTEN/tools/emdwp.py "$@"
  • add some bazel-stuff to extract those variable using select functionality to something like TemplateVariableInfo
  • add genrule to update emdwp-template.py based on TemplateVariableInfo
  • and connect the output of this genrule as a input of cc_toolchain.dwp_files

I think this should work. Unfortunately it's a bit beyond the time I wanted to spend on it.
On the other hand, I think it will be less readable than it is now (although I may be biased/lazy).

Copy link
Contributor Author

@trzeciak trzeciak Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reported it to bazel: bazelbuild/bazel#25336

I also updated all scripts to add this comment:

This script differs in form from emcc.{py,bat}/…, because bazel are limited/bugged in the way of executing dwp tool.
Bazel dwp action configuration does not pass environment variables, so we cannot use them in this script.
For more info, see PR discussion and bazel issue:
- https://github.com/emscripten-core/emsdk/pull/1531#discussion_r1962090650
- https://github.com/bazelbuild/bazel/issues/25336

filegroup(
name = "dwp_files",
srcs = [
"bin/llvm-dwp{bin_extension}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need emdwp.py like the above uses emar.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also related to the problem of passing environment variables to the dwp tool:
emdwp.py script base on some envs to ensure where are llvm bins are.
#1531 (comment)

@@ -0,0 +1,3 @@
#!/bin/bash

exec external/emscripten_bin_linux/bin/llvm-dwp "$@"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The emdwp.py wrapper should be able to find these files based on LLVM_ROOT, so I don't think you need 5 different wrappers here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but it doesn't work / or I don't know how to do it, #1531 (comment)

phlax added a commit to envoyproxy/envoy that referenced this pull request Jun 2, 2025
Also, install before `proxy-wasm-cpp-sdk` to ensure it is used/patched

Currently its installed after, making our setup redundant

includes patch of emscripten-core/emsdk#1531

the context for this is that its required to use a hermetic llvm
toolchain (other than being generally useful)

without it, redundant hackery/patching would be required as the newer
platforms way of setting up toolchains expects _something_ to be setup
for various bins. Rather than adding a stub i figured add the fix. This
has historically been the main blocker to adding llvm toolchains, or
using the bazel platforms setup

Signed-off-by: Ryan Northey <ryan@synca.io>
@phlax
Copy link

phlax commented Jun 3, 2025

would be great for this to land it unblocks us (envoy)

@sbc100
Copy link
Collaborator

sbc100 commented Jun 3, 2025

would be great for this to land it unblocks us (envoy)

@phlax do you know much about Bazel itself? Would you be interested in sending them very small (maybe just one line) PR so that we don't need maintain all these extra wrapper scripts: bazelbuild/bazel#25336

parser.add_argument('--allow_empty_outputs', help='If an output listed in --outputs does not exist, create it anyways.', action='store_true')
args = parser.parse_args()

args.archive = os.path.normpath(args.archive)
args.outputs = args.outputs.split(",")
args.dwp_file = os.path.normpath(args.dwp_file) if args.dwp_file else None

if args.dwp_file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put args.dwp_file = os.path.normpath(args.dwp_file) here instead of line 36 (one less conditional).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. using dwp files is optional, won't that be a problem for the normpath method?


if args.dwp_file:
for idx, output in enumerate(args.outputs):
if output.endswith(".dwp"): # also update extension 'binary.dwp' to 'binary.wasm.dwp'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use single quotes here

@@ -26,11 +27,20 @@ def main():
parser = argparse.ArgumentParser()
parser.add_argument('--archive', help='The archive to extract from.')
parser.add_argument('--outputs', help='Comma separated list of files that should be extracted from the archive. Only the extname has to match a file in the archive.')
parser.add_argument('--dwp_file', help='Optional dwp input file, generated when fission flags set.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't the dwp file live in the archive like the rest on the files?

Is is because bazel actually supports the dwp file natively (unlike the other extra files we generate). If there is support for dwp natively in bazel why do we need to involve the wasm_binary.py script at all here? i.e. if its not part of the archive why we even involve the wasm-binary script at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is more complex, although I am no expert, I will try to shed more light on it:

  • during compilation we can generate .o files containing debugging information, which can be extracted from the binary after the linking phase (but this is not the case),
  • debug fission, is the extraction of part of the debug information during compilation, then during each compilation, two files are created: .o and .dwo,
  • a .dwp file is a collection of .dwo files
  • bazel aggregates .dwo files into (multiple, if necessary) .dwp files, and multiple .dwp files into a single .dwp file (~100-leaf tree)
  • dwp file is created as a separate target (file) next to the output file from the cc_binary target (the first time it wasn't obvious to me that if I have target cc_binary(name = "foo", …), then to get a .dwp file I have to add a requirement to the target (file): foo.dwp)
  • considering this, if we want to generate both files at once, we can add a filegroup that will have a dependency on both files (foo and foo.dwp)
  • forwarding dependencies to the dwp file via the wasm_binary.py file (~via wasm_cc_binary), simply makes the same configuration apply to both files,
    doing it differently would involve duplicating the wasm_cc_binary method.

A bit of text, but I hope you know what I mean.

@phlax
Copy link

phlax commented Jun 3, 2025

@sbc100 i can take a look if i get some spare cycles - but can i suggest we land this with the multiple scripts ... iiuc the fix would depend on the end users bazel version - and i know at least for us if it lands on bazel main now it may be a year+ before we are going to be up to that version

@trzeciak
Copy link
Contributor Author

trzeciak commented Jun 3, 2025

I would also suggest not to wait for changes bazelbuild/bazel#25336 as it will block merging this MR only with the latest Bazel version (which is not necessary now) and in worst case this version will be released in the next major version.
At the same time, although I tried to describe exactly what is needed, it has not been taken any further.
The solution itself may be "one-line", but someone may expect tests (which I generally agree with), which may require more understanding / more time.
At the same time, adding the requirement to upgrade bazel to the latest version (even though I think it's good and progressive), is no longer attractive to me (it's about other dependencies).

If we agreed not to wait for those changes, just add TODO in the code, then I can try to come back to this MR, merge the current branch, apply the reported fixes, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants