-
-
Couldn't load subscription status.
- Fork 635
perf: improve analysis performance by 95% for py_binary and py_test rules
#3380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf: improve analysis performance by 95% for py_binary and py_test rules
#3380
Conversation
Summary of ChangesHello @tobyh-canva, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the analysis performance of Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
c10ec90 to
7ef352e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a fantastic performance improvement! Moving the expensive depset manipulation and path normalization from the analysis phase to the execution phase by introducing a C++ wrapper is a great strategy. The flame graphs clearly show the massive impact of this change.
I have a few comments on the C++ implementation regarding correctness and security. Specifically, I've pointed out a potential issue with locating the zipper tool and the use of std::system, which can be a security risk.
Additionally, while you've done manual testing across platforms, this change would greatly benefit from automated tests. Adding unit tests for the new C++ wrapper (py_executable_zip_gen.cc) would ensure its logic is correct and robust. Also, adding integration tests to verify the PythonZipper action with the new wrapper would prevent future regressions. Since this is a critical path for building Python binaries, ensuring its correctness with a solid test suite is very important.
Overall, this is a very valuable contribution that will significantly improve analysis performance for many users of rules_python.
| std::exit(1); | ||
| } | ||
|
|
||
| std::string zipper_path = runfiles->Rlocation("bazel_tools/tools/zip/zipper/zipper"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path bazel_tools/tools/zip/zipper/zipper seems incorrect. The zipper target from @bazel_tools//tools/zip:zipper should be available in runfiles at bazel_tools/tools/zip/zipper. The extra /zipper at the end is likely a typo and will cause this tool to fail to find the zipper executable.
| std::string zipper_path = runfiles->Rlocation("bazel_tools/tools/zip/zipper/zipper"); | |
| std::string zipper_path = runfiles->Rlocation("bazel_tools/tools/zip/zipper"); |
| std::string cmd = zipper_path + " cC " + output + " @" + manifest_path; | ||
| int result = std::system(cmd.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std::system is generally discouraged as it can introduce security vulnerabilities (command injection) and is not robust against paths with spaces or other special characters. Since the output path is passed from an external source (command-line arguments), it could potentially be manipulated to execute arbitrary commands.
A safer approach would be to use fork and execv on POSIX systems and CreateProcess on Windows to execute the zipper tool directly, without involving a shell.
If a full rewrite is not feasible, at a minimum, all path arguments in the command string should be properly quoted to handle spaces and prevent trivial command injection. For example:
std::string cmd = "\"" + zipper_path + "\" cC \"" + output + "\" @\"" + manifest_path + "\"";However, replacing std::system is the recommended solution for robustness and security.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintainers, please let me know if this is a genuine concern in this circumstance 😅
12c96bc to
6db7a1a
Compare
1b2dbc1 to
fe6038f
Compare
Yes, it's a historical thing that was difficult to remove when much of the rules_python implementation was in upstream bazelbuild/bazel. I think now that the starlarkification efforts have landed, it might be possible to flip the default. The tracking issue is here: #2945 WDYT @rickeylev? Are we now in a position to flip this? |
2d51dc1 to
4464bb7
Compare
4464bb7 to
cef10a3
Compare
|
Thanks for this! Yeah, the What does perf look like if L853 Basically, I'm wondering what part of the perf improvement is coming from avoiding the analysis time depset flattening vs c++ doing string operations instead of Starlark/Java. I'm not super keen to require on-demand building a C++ binary (and C++ is a bit of a multi-platform PITA), but if the perf gains are this significant, it's pretty compelling. That get_zip_runfiles_path is showing up so prominently is weird/surprising. Those should be fairly simple string manipulations. But, looking at the paths.relativize implementation, it's actually doing quite a bit of work in there (normalizing both paths (which incurs splits/appends), then some zip+looping, etc). L876 can be replaced with a simple stripprefix() call. L882 can be replaced with e.g. Also, how big are the depset's you're working with? 40k? 100k? I ask because the experimental venv construction logic has to do depset flattening |
|
Hi @rickeylev :) I was also hoping I could avoid the additional C++ binary 😅 I did not think to replace the calls to
I'm actually not sure, though I'd be happy to profile this experimental logic on our repo. Is there a config setting I should toggle to try it out? |
|
It is |
|
This is the profile after applying @rickeylev's suggestions - instead of using a C++ script, replacing
I thought |

PR Status: Addressing windows compatibility issues
Context
I've recently been looking into Bazel analysis performance in my company's monorepo (predominantly Java and Typescript), and was surprised to find that
rules_pythonaccounts for almost 60% of the CPU time during analysis across our repo (433s of 741s).From a profile, it was clear that
depset.to_list()being called out of_create_zip_file()was accounting for the vast majority of the analysis time. So I focused on eliminating these depset expansions, which are expensive for analysis (anyone who's unaware, see Bazel's Optimizing Performance page). Path normalization in_get_zip_runfiles_pathwas also accounting for a large chunk of the analysis time.What was slightly less clear, but eventually apparent, was that
runfiles.empty_filenamesbeing provided as an arg to thezipperaction was also causing a hit to analysis performance. I traced this back to the fact that therunfiles.empty_filenamesdepset in this context is actually coming from theGetInitPyFilesJava class in the Python utils built into Bazel, and this class was also causing an implicit expansion of therunfiles.filesdepset. You can verify this by running a profile using my PR's changes and with--incompatible_default_to_explicit_init_py=false(the default).Note that despite the Bazel-native
--build_python_zipflag being disabled by default on non-Windows systems, this only affects whether the zipapp is considered a default output, i.e. the analysis cost is still incurred if this flag is disabled.Intent
My strategy for optimizing this was basically to get rid of any
depset.to_list()calls, and shift reproducible but expensive business logic (e.g. path normalization) from the analysis phase to the build phase.Changes
@bazel_tools//tools/zip:zipper, written in C++, which prepares the input manifest to thezippertool, including all path rewriting_create_zip_fileto provide only what it needs (runfiles_without_exe) and avoid a depset expansion + filter.I chose to write the script in C++ because it would keep the actual build action performant whilst avoiding adding new dependencies to
rules_python(rules_ccis already a dependency).Testing
I haven't added new tests specific to this wrapper or the affected functionality, instead relying on the existing
//tests. Please let me know if you'd prefer specific shell tests or C++ tests.Snapshot tests
To verify builds were the same, I wrote some snapshot tests, which are not integrated in this branch. Let me know if you'd like these to be brought in (they're platform-specific, and the snapshots may need to be updated somewhat frequently for innocuous changes).
Manual tests
I manually ran snapshot tests above on macOS, Linux, and Windows, comparing a snapshot from the main branch with this branch.
Results
Here's the
--starlark_cpu_profileflame graph before/after these changes (this is profiling analysis ofbazel build --nobuild //...in our monorepo.Overall the CPU time appears to have been reduced by at least 95% for
py_binaryandpy_testtargets.The memory profile shows a similar 95% reduction in memory usage for
py_binaryandpy_test.PythonZipperaction is effectively unchanged - the additional work done by the C++ wrapper is barely visible compared to the work which was already being done by the actualzipper.Notes
I wasn't aware that
--incompatible_default_to_explicit_init_py=truemade such a significant improvement to analysis performance. Is it worth mentioning this somewhere in the docs ofrules_python? It smells to me like it just disables a legacy Google-specific quirk, though I could be wrong.I initially moved the
GetInitPyFileslogic from Java into the C++ wrapper, so at least it would remove the analysis performance hit. But since it's only optimising legacy behaviour, and users can just toggle a flag to disable it, I reverted that change and left the empty__init__.pygeneration alone.