Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize sandbox performance #1045

Merged
merged 6 commits into from
May 9, 2022

Conversation

jfirebaugh
Copy link
Contributor

Link just the files needed to compile, link, or archive, rather than the entire directory tree. This drastically improves build times on macOS, where the sandbox is known to be slow (bazelbuild/bazel#8230).

Fixes #1026

Link just the files needed to compile, link, or archive, rather than the entire directory tree. This drastically improves build times on macOS, where the sandbox is known to be slow (bazelbuild/bazel#8230).
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this works I think it would be great improvement!

"bin/llvm-objcopy",
"bin/wasm-emscripten-finalize",
"bin/wasm-ld",
"bin/wasm-opt",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the libraries (from the sysroot) that are used during linking? Are they handled separately?

@@ -68,12 +76,12 @@ emscripten_cc_toolchain_config_rule(

cc_toolchain(
name = "cc-compiler-wasm",
all_files = ":every-file",
ar_files = ":common-script-includes",
all_files = ":all_files",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all_files even needed at all anymore?

filegroup(
name = "compiler_files",
srcs = [
"emscripten/emcc.py",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do tools/*.py and stuff like that get included?

@jfirebaugh
Copy link
Contributor Author

I should say -- I don't have any deep knowledge of emscripten. My methodology here was just to start each of compiler_files, linker_files, ar_files with an empty list and add things until both local testing (on macOS) and my own project built successfully. So it is likely that dependencies are missing if they aren't covered by those cases.

What about the libraries (from the sysroot) that are used during linking? Are they handled separately?

I don't know. What libraries would those be and when would they be needed?

Where do tools/*.py and stuff like that get included?

What are those used for?

@sbc100
Copy link
Collaborator

sbc100 commented May 2, 2022

I should say -- I don't have any deep knowledge of emscripten. My methodology here was just to start each of compiler_files, linker_files, ar_files with an empty list and add things until both local testing (on macOS) and my own project built successfully. So it is likely that dependencies are missing if they aren't covered by those cases.

What about the libraries (from the sysroot) that are used during linking? Are they handled separately?

I don't know. What libraries would those be and when would they be needed?

At link time you would access to all the system library in sysroot/libc/wasm32-emscripten. For example libc.a. The fact that your link is working without this makes me thing maybe the whole sysroot is still available somehow.

Where do tools/*.py and stuff like that get included?

What are those used for?

They are part of the emcc compiler. emcc.py depends a whole lot of other .py file such as the ones in tools/ and won't load without those. If emcc.py is running then somehow those files must be available.. but I can't see how you are making them available.

@jfirebaugh
Copy link
Contributor Author

At link time you would access to all the system library in sysroot/libc/wasm32-emscripten. For example libc.a. The fact that your link is working without this makes me thing maybe the whole sysroot is still available somehow.

It looks like the answer is simply that nothing I've tested actually requires linking any of these libraries. For example, I can in fact remove the --sysroot argument entirely (by removing ACTION_NAMES.cpp_link* from the appropriate flag_set) and test_external still builds successfully.

They are part of the emcc compiler. emcc.py depends a whole lot of other .py file such as the ones in tools/ and won't load without those. If emcc.py is running then somehow those files must be available.. but I can't see how you are making them available.

Ah, this appears to be because python adds the containing directory of the script as the first entry in sys.path and if the script is a symlink, does so after resolving the symlink (I can't find this detail documented however). This is a sandbox bypass so I imagine bazel has found a workaround for it somewhere, but I'm not sure where to look.

@jfirebaugh
Copy link
Contributor Author

It's an unresolved issue in Bazel: bazelbuild/bazel#7091

@walkingeyerobot
Copy link
Collaborator

Overall I'm good with this, but it looks like the windows CI for bazel failed here. It's possible something additional is needed for windows?

@jfirebaugh
Copy link
Contributor Author

I don't know. As far as I can tell, bin/clang and bin/clang++ are correctly registered as inputs for compile actions, and I don't have access to a Windows environment to debug further. In case it was something related to unsandboxed builds, I tested bazel build //hello-world:hello-world-wasm --spawn_strategy=local locally, but that works.

@walkingeyerobot
Copy link
Collaborator

I don't have a windows set up for bazel at the moment. I'll try to get one soon to see if we can debug this break a bit easier.

Comment on lines 17 to 18
"bin/clang",
"bin/clang++",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On windows all these executables need to have the .exe extension. A separate BUILD file specifically for windows may be necessary or maybe using build_file_content and template it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this hint! I've taken the templated build_file_content approach.

@walkingeyerobot walkingeyerobot merged commit 93f21c9 into emscripten-core:main May 9, 2022
@sbc100
Copy link
Collaborator

sbc100 commented May 9, 2022

I don't really understand how this change can function. It doesn't seem to include any of the files under lib/.. or tools/... many of which are needed for even simple emscripten linking.

For example, src/library.js .. this file is always needed during linking. As are all the other .js files in that directory.

radekdoulik pushed a commit to radekdoulik/emsdk that referenced this pull request May 31, 2022
* Optimize sandbox performance

Link just the files needed to compile, link, or archive, rather than the entire directory tree. This drastically improves build times on macOS, where the sandbox is known to be slow (bazelbuild/bazel#8230).

* Linux wants clang to link

* all_files not needed?

* Linux wants llc

* And llvm-ar

* Templated build_file_content
@kjlubick
Copy link
Contributor

kjlubick commented Jun 6, 2022

@trybka, you had asked about a before and after benchmark. I noticed a substantial improvement when building CanvasKit on my 24 core Linux machine.

# emsdk-3.1.10
$ bazel clean && bazel build //modules/canvaskit:canvaskit_webgl --compilation_mode=dbg
...
INFO: Elapsed time: 6031.283s, Critical Path: 701.38s
INFO: 1662 processes: 101 internal, 1561 linux-sandbox.
INFO: Build completed successfully, 1662 total actions

# emsdk-3.1.10 with RAM disk
$ bazel clean && bazel build //modules/canvaskit:canvaskit_webgl --compilation_mode=dbg --sandbox_base=/dev/shm
...
INFO: Elapsed time: 179.974s, Critical Path: 75.22s
INFO: 1662 processes: 101 internal, 1561 linux-sandbox.
INFO: Build completed successfully, 1662 total actions

# emsdk-3.1.13
$ bazel clean && bazel build //modules/canvaskit:canvaskit_webgl --compilation_mode=dbg
...
INFO: Elapsed time: 215.952s, Critical Path: 71.56s
INFO: 1662 processes: 101 internal, 1561 linux-sandbox.
INFO: Build completed successfully, 1662 total actions

# emsdk-3.1.13 with RAM disk
$ bazel clean && bazel build //modules/canvaskit:canvaskit_webgl --compilation_mode=dbg --sandbox_base=/dev/shm
...
INFO: Elapsed time: 126.265s, Critical Path: 78.31s
INFO: 1662 processes: 101 internal, 1561 linux-sandbox.
INFO: Build completed successfully, 1662 total actions

With my sandbox on a RAM disk - a 30% improvement. Using the default sandboxing, a ~30x improvement.

However, I do notice some issues when trying to compile the release mode and with a remote executor. Not all the files are specified, e.g. the closure compiler. I can get a PR up that addresses those issues for me.

lewing pushed a commit to dotnet/emsdk that referenced this pull request Feb 7, 2023
* [bazel] Set CLOSURE_COMPILER to workaround RBE+symlinks issue (emscripten-core#1037)

* [bazel] Set CLOSURE_COMPILER to workaround RBE+symlinks issue

* space

* specify node_js

* 3.1.10 (emscripten-core#1046)

* Optimize sandbox performance (emscripten-core#1045)

* Optimize sandbox performance

Link just the files needed to compile, link, or archive, rather than the entire directory tree. This drastically improves build times on macOS, where the sandbox is known to be slow (bazelbuild/bazel#8230).

* Linux wants clang to link

* all_files not needed?

* Linux wants llc

* And llvm-ar

* Templated build_file_content

* include node modules glob with linker files. also some minor formatting fixes. (emscripten-core#1052)

* Using bazelisk on macOS CI (emscripten-core#1049)

* Explicit outputs for wasm_cc_binary (emscripten-core#1047)

* Explicit outputs for wasm_cc_binary

* Backwards compatibility

* data_runfiles restore

* restore test_bazel.sh

* Using wrong path on accident

* two separate rules for legacy support

* Added name attribute to wasm_cc_binary rule

* 3.1.11 (emscripten-core#1053)

* 3.1.12 (emscripten-core#1054)

* 3.1.13 (emscripten-core#1055)

* [bazel] Add additional files necessary for building with closure and on RBE (emscripten-core#1057)

* 3.1.14 (emscripten-core#1061)

* 3.1.15 (emscripten-core#1066)

* Pin `latest` to a specific version for arm64-linux (emscripten-core#1065)

Fixes: emscripten-core#1040

* 3.1.16 (emscripten-core#1071)

* 3.1.17 (emscripten-core#1076)

* Exclude msys from path fix function. (emscripten-core#1078)

Fixes: #911

* 3.1.18 (emscripten-core#1081)

* 3.1.18

* Update LLVM include path in Bazel files

* Version 3.1.18-2 (emscripten-core#1083)

3.1.18 had a bad release binary on ARM64 Mac so push an updated version of the release.

* 3.1.19 (emscripten-core#1090)

* Add EMSDK_QUIET to make emsdk_env less chatting (emscripten-core#1091)

Without this the recommended way to silence emsdk_env was to pipe its
stderr to /dev/null.. but then you also loose potentially useful error
message.

Fixes: #946

* 3.1.20 (emscripten-core#1095)

* Add double-quotes to allow spaces in path (emscripten-core#1097)

* 3.1.21 (emscripten-core#1101)

* Update latest-arm64-linux to 3.1.21 (emscripten-core#1102)

* Update XCode version on CircleCI (emscripten-core#1103)

12.2 is being deprecated

* 3.1.22 (emscripten-core#1107)

* 3.1.23 (emscripten-core#1111)

* Avoid exporting EM_CONFIG for modern SDK versions (emscripten-core#1110)

Newer versions of emscipten, starting all the way back in 1.39.13, can
automatically locate the `.emscripten` config file that emsdk creates so
there is no need for the explicit EM_CONFIG environment variable.  Its
redundant and adds unnessary noisce/complexity.

Really, adding emcc to the PATH is all the is needed these days.

One nice thing about this change is that it allows folks to run
whichever emcc they want to and have it just work, even if they have
configured emsdk.   Without this change, if I activate emsdk and I run
`some/other/emcc` then emsdk's `EM_CONFIG` will still be present and
override the configuration embedded in `some/other/emcc`.

e.g. in the same shell, with emsdk activated, I can run both these
commands and have them both just work as expected.

```
$ emcc --version
$ /path/to/my/emcc --version
```

* Use x64 version for Windows on Arm (emscripten-core#1115)

* 3.1.24 (emscripten-core#1122)

* 3.1.25 (emscripten-core#1130)

* [bazel] Switch to platforms-based toolchain resolution (emscripten-core#1036)

* remove "name" attribute from bazel rules (emscripten-core#1131)

* 3.1.26 (emscripten-core#1134)

* Update remote_docker version in CircleCI config (emscripten-core#1117)

20.10.17 is the current default.

* docker image: Change base to Ubuntu 22.04 LTS (jammy) (emscripten-core#1135)

Done to upgrade from CMake 3.16.3 to 3.22.1. CMake 3.21 or newer is needed to build the Qt 6.4.1 sources with emscripten.

Also update to libidn12 to resolve an "Unable to locate package libidn11" error.

* 3.1.27 (emscripten-core#1139)

* Use constants for fixed paths. NFC (emscripten-core#1140)

* Add standalone_wasm feature to bazel emscripten_toolchain (emscripten-core#1145)

* 3.1.28 (emscripten-core#1149)

* Upgrade to rules_nodejs 5.8.0 (emscripten-core#1150)

Fixes emscripten-core#1020

* 3.1.29 (emscripten-core#1160)

* Pin Windows CI to Bazel 5.4.0 (emscripten-core#1163)

* Remove reference to fastcomp-latest. NFC (emscripten-core#1164)

fastcomp can only be install using explicit versions names so this name
doesn't work.

* Remove fastcomp SDK and fastcomp build rules. NFC (emscripten-core#1165)

Folks that want to work with fastcomp will now need to use an older
checkout of emsdk.

* 3.1.30 (emscripten-core#1167)

* Bump emscripten to 3.1.30

* Bump clang version

* Do not include test directory

* Update eng/emsdk.proj

Co-authored-by: Alexander Köplinger <alex.koeplinger@outlook.com>

---------

Co-authored-by: Kevin Lubick <kjlubick@users.noreply.github.com>
Co-authored-by: Sam Clegg <sbc@chromium.org>
Co-authored-by: John Firebaugh <john.firebaugh@gmail.com>
Co-authored-by: walkingeyerobot <mitch@thefoley.net>
Co-authored-by: Ezekiel Warren <zaucy@users.noreply.github.com>
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
Co-authored-by: Tim Ebbeke <Tim06TR@gmail.com>
Co-authored-by: Derek Schuff <dschuff@chromium.org>
Co-authored-by: Joel Van Eenwyk <joel.vaneenwyk@gmail.com>
Co-authored-by: Pierrick Bouvier <101587250+pbo-linaro@users.noreply.github.com>
Co-authored-by: Trevor Hickey <TrevorJamesHickey@gmail.com>
Co-authored-by: Fredrik Orderud <forderud@users.noreply.github.com>
Co-authored-by: Robbert van Ginkel <570934+robbertvanginkel@users.noreply.github.com>
Co-authored-by: Alexander Köplinger <alex.koeplinger@outlook.com>
walkingeyerobot added a commit to walkingeyerobot/emsdk that referenced this pull request Aug 28, 2023
Fixes emscripten-core#1273. This was broken by emscripten-core#1045 with the comment "* all_files not needed?" They were needed.
walkingeyerobot added a commit that referenced this pull request Aug 29, 2023
Fixes #1273. This was broken by #1045 with the comment "* all_files not needed?" They were needed.
shlomif pushed a commit to shlomif/emsdk that referenced this pull request Sep 29, 2023
Fixes emscripten-core#1273. This was broken by emscripten-core#1045 with the comment "* all_files not needed?" They were needed.
krinkinmu added a commit to krinkinmu/proxy-wasm-cpp-sdk that referenced this pull request Oct 4, 2024
This CL mostly follows the instructions in
https://github.com/emscripten-core/emsdk/blob/main/bazel/README.md.
Even so, there are a few things that are worth mentioning:

1. I tested the changes locally and in my tests
   incompatible_enable_cc_toolchain_resolution bazel flag made no
   difference (e.g. everything builds with or without the flag);
   I still keep it though because instructions say that it should
   be there and it does not cause any harm.
2. I don't think that we still need to patch emsdk to exclude npm
   modules from the toolchain, because it appears that upstream
   already removed those as a toolchain dependency in
   emscripten-core/emsdk#1045.

It's worth noting, that even though I don't think that emsdk patch
is still needed, I actually wasn't able to reproduce the problem
reported in proxy-wasm#149
locally without the emsdk.patch even with the current version of
emsdk used by C++ SDK.

Signed-off-by: Mikhail Krinkin <krinkin.m.u@gmail.com>
krinkinmu added a commit to krinkinmu/proxy-wasm-cpp-sdk that referenced this pull request Oct 7, 2024
This CL mostly follows the instructions in
https://github.com/emscripten-core/emsdk/blob/main/bazel/README.md.
Even so, there are a few things that are worth mentioning:

1. I tested the changes locally and in my tests
   incompatible_enable_cc_toolchain_resolution bazel flag made no
   difference (e.g. everything builds with or without the flag);
   I still keep it though because instructions say that it should
   be there and it does not cause any harm.
2. I don't think that we still need to patch emsdk to exclude npm
   modules from the toolchain, because it appears that upstream
   already removed those as a toolchain dependency in
   emscripten-core/emsdk#1045.

It's worth noting, that even though I don't think that emsdk patch
is still needed, I actually wasn't able to reproduce the problem
reported in proxy-wasm#149
locally without the emsdk.patch even with the current version of
emsdk used by C++ SDK.

Signed-off-by: Mikhail Krinkin <krinkin.m.u@gmail.com>
martijneken pushed a commit to proxy-wasm/proxy-wasm-cpp-sdk that referenced this pull request Oct 10, 2024
* Refresh and fix SDK Docker image build scripts

The way the script is currently it does not work (anymore). I'm aware
of two issues with the current setup:

1. emsdk underneath installs and uses Node JS and the version of Node JS
   is not pinned by the script and isn't tied to the version of the SDK
   used.

   As a result, emsdk moved to a newer version of Node JS than the one
   that was probably used when the scripts were created. This new
   version (it seems, the version currently being used is 18.20.3)
   requires a relatively fresh version of glibc which just isn't
   available in Ubuntu Bionic used as a base image. That results in
   obscure errors from CMake (see
   #170)
2. for whatever reason, the version of the protobuf static libraries
   currently in the repo, don't seem to work. I don't really know
   how the issue happened, but there were at least 2 users that faced
   the problem (see
   #161).

To resolve the first issue, we have a bunch of options:

1. Fix the version of emsdk
2. Fix the version of Node JS
3. Update the version of glibc

I figured I can combine options 1 and 3 together for the following
reasons:

1. Fixing the version of emsdk fixes the problem
2. The problem arised from the fact that the versions of software
   used in the build script are a bit old, so an update might be
   in order, even though we have other solutions to the problem.

> NOTE: It's my understanding that fixing emsdk version should also
> pin Node JS version, so if we deploy option 1, option 2 seem
> redundant.

For the second problem, I think there is only one ultimate solution
- not store binary artifacts in the repository and instead build
them from the sources in the repo.

It seems that in the past there was a concern that building protobuf
libraries takes a long time - it's still certainly the case. However,
I think we can compensate for that in two ways:

1. Drop WAVM - it does not seem like WAVM is still needed (the project
   itself appears to be dead and hasn't had any updates for at least 2
   years), but also one of the comments in
   #158, that
   removed the WAVM from the docs, also suggests that three doesn't
   seem to be a good reason to keep WAVM in the SDK build script.
2. Take advantage of potential hardware threads in the system when
   calling make - most laptops or servers these days have multiple
   cores, so if we have to build protobuf libraries twice, we can
   speed it up by using more cores.

With all those changes made, the build time adds up to something like
32 minutes on my laptop, compared to the 48m without building
protobuf libraries from sources (and without adding -j option to make).
So I think the additional time spent on building protobuf library is
compensated by other changes.

Signed-off-by: Mikhail Krinkin <krinkin.m.u@gmail.com>

* Update emsdk version to 3.1.67 in Bazel configuration

This CL mostly follows the instructions in
https://github.com/emscripten-core/emsdk/blob/main/bazel/README.md.
Even so, there are a few things that are worth mentioning:

1. I tested the changes locally and in my tests
   incompatible_enable_cc_toolchain_resolution bazel flag made no
   difference (e.g. everything builds with or without the flag);
   I still keep it though because instructions say that it should
   be there and it does not cause any harm.
2. I don't think that we still need to patch emsdk to exclude npm
   modules from the toolchain, because it appears that upstream
   already removed those as a toolchain dependency in
   emscripten-core/emsdk#1045.

It's worth noting, that even though I don't think that emsdk patch
is still needed, I actually wasn't able to reproduce the problem
reported in #149
locally without the emsdk.patch even with the current version of
emsdk used by C++ SDK.

Signed-off-by: Mikhail Krinkin <krinkin.m.u@gmail.com>

* Update building.md to match updated sdk_container.sh

This is a followup to the previous commits that refreshed the build
scripts and emsdk. This change reconciles the docs with the current
state of the build scripts.

Signed-off-by: Mikhail Krinkin <krinkin.m.u@gmail.com>

---------

Signed-off-by: Mikhail Krinkin <krinkin.m.u@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sandboxed bazel builds with emsdk are slow
5 participants