perf: improve analysis performance by 95% for `py_binary` and `py_test` rules #3380

tobyh-canva · 2025-10-26T22:09:24Z

PR Status: Addressing windows compatibility issues

Context

I've recently been looking into Bazel analysis performance in my company's monorepo (predominantly Java and Typescript), and was surprised to find that rules_python accounts for almost 60% of the CPU time during analysis across our repo (433s of 741s).

From a profile, it was clear that depset.to_list() being called out of _create_zip_file() was accounting for the vast majority of the analysis time. So I focused on eliminating these depset expansions, which are expensive for analysis (anyone who's unaware, see Bazel's Optimizing Performance page). Path normalization in _get_zip_runfiles_path was also accounting for a large chunk of the analysis time.

What was slightly less clear, but eventually apparent, was that runfiles.empty_filenames being provided as an arg to the zipper action was also causing a hit to analysis performance. I traced this back to the fact that the runfiles.empty_filenames depset in this context is actually coming from the GetInitPyFiles Java class in the Python utils built into Bazel, and this class was also causing an implicit expansion of the runfiles.files depset. You can verify this by running a profile using my PR's changes and with --incompatible_default_to_explicit_init_py=false (the default).

Note that despite the Bazel-native --build_python_zip flag being disabled by default on non-Windows systems, this only affects whether the zipapp is considered a default output, i.e. the analysis cost is still incurred if this flag is disabled.

Intent

My strategy for optimizing this was basically to get rid of any depset.to_list() calls, and shift reproducible but expensive business logic (e.g. path normalization) from the analysis phase to the build phase.

Changes

Introduced a wrapper for @bazel_tools//tools/zip:zipper, written in C++, which prepares the input manifest to the zipper tool, including all path rewriting
Change the inputs to _create_zip_file to provide only what it needs (runfiles_without_exe) and avoid a depset expansion + filter.

I chose to write the script in C++ because it would keep the actual build action performant whilst avoiding adding new dependencies to rules_python (rules_cc is already a dependency).

Testing

I haven't added new tests specific to this wrapper or the affected functionality, instead relying on the existing //tests. Please let me know if you'd prefer specific shell tests or C++ tests.

Snapshot tests

To verify builds were the same, I wrote some snapshot tests, which are not integrated in this branch. Let me know if you'd like these to be brought in (they're platform-specific, and the snapshots may need to be updated somewhat frequently for innocuous changes).

Manual tests

I manually ran snapshot tests above on macOS, Linux, and Windows, comparing a snapshot from the main branch with this branch.

Results

Here's the --starlark_cpu_profile flame graph before/after these changes (this is profiling analysis of bazel build --nobuild //... in our monorepo.

Overall the CPU time appears to have been reduced by at least 95% for py_binary and py_test targets.

The memory profile shows a similar 95% reduction in memory usage for py_binary and py_test.

All build outputs are identical after these changes.
The overall build performance is impacted by the compilation of the new C++ wrapper script.
The build performance of the actual PythonZipper action is effectively unchanged - the additional work done by the C++ wrapper is barely visible compared to the work which was already being done by the actual zipper.

Notes

I wasn't aware that --incompatible_default_to_explicit_init_py=true made such a significant improvement to analysis performance. Is it worth mentioning this somewhere in the docs of rules_python? It smells to me like it just disables a legacy Google-specific quirk, though I could be wrong.

I initially moved the GetInitPyFiles logic from Java into the C++ wrapper, so at least it would remove the analysis performance hit. But since it's only optimising legacy behaviour, and users can just toggle a flag to disable it, I reverted that change and left the empty __init__.py generation alone.

gemini-code-assist · 2025-10-26T22:09:39Z

Summary of Changes

Hello @tobyh-canva, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the analysis performance of py_binary and py_test rules in Bazel by addressing inefficiencies related to depset.to_list() expansions and path normalization. It introduces a new C++ wrapper to handle manifest preparation for the zipper tool, moving expensive logic from the analysis phase to the build phase, resulting in substantial reductions in both CPU time and memory consumption during Bazel analysis.

Highlights

Performance Improvement: Achieved a 95% reduction in CPU time and memory usage for py_binary and py_test targets during Bazel analysis by optimizing how Python zip manifests are generated.
Elimination of depset.to_list(): Replaced expensive depset.to_list() calls in Starlark with a C++ wrapper, shifting complex processing from the analysis phase to the build phase.
New C++ Wrapper for Zipper: Introduced py_executable_zip_gen, a C++ binary that prepares the input manifest for the zipper tool, handling path rewriting and __init__.py generation more efficiently.
Streamlined Starlark Logic: Simplified the _create_zip_file function in Starlark by reducing its inputs and offloading manifest generation details to the new C++ wrapper.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is a fantastic performance improvement! Moving the expensive depset manipulation and path normalization from the analysis phase to the execution phase by introducing a C++ wrapper is a great strategy. The flame graphs clearly show the massive impact of this change.

I have a few comments on the C++ implementation regarding correctness and security. Specifically, I've pointed out a potential issue with locating the zipper tool and the use of std::system, which can be a security risk.

Additionally, while you've done manual testing across platforms, this change would greatly benefit from automated tests. Adding unit tests for the new C++ wrapper (py_executable_zip_gen.cc) would ensure its logic is correct and robust. Also, adding integration tests to verify the PythonZipper action with the new wrapper would prevent future regressions. Since this is a critical path for building Python binaries, ensuring its correctness with a solid test suite is very important.

Overall, this is a very valuable contribution that will significantly improve analysis performance for many users of rules_python.

gemini-code-assist · 2025-10-26T22:17:31Z

python/private/py_executable_zip_gen.cc

+    std::exit(1);
+  }
+
+  std::string zipper_path = runfiles->Rlocation("bazel_tools/tools/zip/zipper/zipper");


The path bazel_tools/tools/zip/zipper/zipper seems incorrect. The zipper target from @bazel_tools//tools/zip:zipper should be available in runfiles at bazel_tools/tools/zip/zipper. The extra /zipper at the end is likely a typo and will cause this tool to fail to find the zipper executable.

Suggested change

std::string zipper_path = runfiles->Rlocation("bazel_tools/tools/zip/zipper/zipper");

std::string zipper_path = runfiles->Rlocation("bazel_tools/tools/zip/zipper");

gemini-code-assist · 2025-10-26T22:17:31Z

python/private/py_executable_zip_gen.cc

+  std::string cmd = zipper_path + " cC " + output + " @" + manifest_path;
+  int result = std::system(cmd.c_str());


Using std::system is generally discouraged as it can introduce security vulnerabilities (command injection) and is not robust against paths with spaces or other special characters. Since the output path is passed from an external source (command-line arguments), it could potentially be manipulated to execute arbitrary commands.

A safer approach would be to use fork and execv on POSIX systems and CreateProcess on Windows to execute the zipper tool directly, without involving a shell.

If a full rewrite is not feasible, at a minimum, all path arguments in the command string should be properly quoted to handle spaces and prevent trivial command injection. For example:

std::string cmd = "\"" + zipper_path + "\" cC \"" + output + "\" @\"" + manifest_path + "\"";

However, replacing std::system is the recommended solution for robustness and security.

Maintainers, please let me know if this is a genuine concern in this circumstance 😅

python/private/BUILD.bazel

python/private/py_executable_zip_gen.cc

python/private/py_executable.bzl

groodt · 2025-10-26T23:20:28Z

I wasn't aware that --incompatible_default_to_explicit_init_py=true made such a significant improvement to analysis performance. Is it worth mentioning this somewhere in the docs of rules_python? It smells to me like it just disables a legacy Google-specific quirk, though I could be wrong.

Yes, it's a historical thing that was difficult to remove when much of the rules_python implementation was in upstream bazelbuild/bazel. I think now that the starlarkification efforts have landed, it might be possible to flip the default. The tracking issue is here: #2945

WDYT @rickeylev? Are we now in a position to flip this?

…t` rules

rickeylev · 2025-10-27T01:39:25Z

Thanks for this! Yeah, the runfiles.files.to_list() flattening was necessary to filter out the main executable file; it predates runfiles_without_exe.

What does perf look like if L853 for artifact in runfiles.files.to_list(): ... is moved to execution phase using map_each (and the C++ tool isn't used)? By using runfiles_without_exe, that custom filtering logic doesn't need to occur, so the depset can be passed directly.

Basically, I'm wondering what part of the perf improvement is coming from avoiding the analysis time depset flattening vs c++ doing string operations instead of Starlark/Java. I'm not super keen to require on-demand building a C++ binary (and C++ is a bit of a multi-platform PITA), but if the perf gains are this significant, it's pretty compelling.

That get_zip_runfiles_path is showing up so prominently is weird/surprising. Those should be fairly simple string manipulations. But, looking at the paths.relativize implementation, it's actually doing quite a bit of work in there (normalizing both paths (which incurs splits/appends), then some zip+looping, etc). L876 can be replaced with a simple stripprefix() call. L882 can be replaced with e.g. "{workspace}/{path}" if not path.startswith("../") else path[3:]

Also, how big are the depset's you're working with? 40k? 100k? I ask because the experimental venv construction logic has to do depset flattening

tobyh-canva · 2025-10-27T02:31:40Z

Hi @rickeylev :) I was also hoping I could avoid the additional C++ binary 😅

I did not think to replace the calls to path.relativize - I'll attempt your suggestions and regenerate the profile to check. Thanks! 🫡

Also, how big are the depset's you're working with? 40k? 100k? I ask because the experimental venv construction logic has to do depset flattening

I'm actually not sure, though I'd be happy to profile this experimental logic on our repo. Is there a config setting I should toggle to try it out?

aignas · 2025-10-27T03:19:12Z

It is --@rules_python//python/config_settings:venv_site_packages=yes, it is going to use the codepaths introduced in #2156 to build a virtualenv just like what a venv or uv tool would.

tobyh-canva · 2025-10-27T03:21:41Z

This is the profile after applying @rickeylev's suggestions - instead of using a C++ script, replacing paths.relativize and paths.normalize with simpler logic. Oddly map_zip_runfiles is still fairly significant (15% overall) on the profile. See #3381 for the actual implementation I tested this with.

I thought map_each functions were deferred to build time, so I'm a bit confused why map_zip_runfiles is showing up in the analysis profile during a --nobuild - is this expected when we're using a closure for the map function?

rickeylev · 2025-11-09T21:55:42Z

map_zip_runfiles takes 46 seconds. half is format(), half is startswith()

well, that's a big improvement, but yeah, that's still a lot of wasted time spent. At the least, let's merge #3381 -- it's a significant improvement without any questions about adding C++ tools. I'll go review that.

--nobuild seems to trigger map_each in analysis, but it's supposed to be deferred to execution

Yeah, that's weird. Double check your invocation? But otherwise sounds like a bazel bug. I don't see anything in your PR that would cause that.

…nalysis phase performance (#3381) When py_binary/py_test were being built, they were flattening the runfiles depsets at analysis time in order to create the zip file mapping manifest for their implicit zipapp outputs. This flattening was necessary because they had to filter out the original main executable from the runfiles that didn't belong in the zipapp. This flattening is expensive for large builds, in some cases adding over 400 seconds of time and significant memory overhead. To fix, have the zip file manifest use the `runfiles_with_exe` object, which is the runfiles, but pre-filtered for the files zip building doesn't want. This then allows passing the depsets directly to `Args.add_all` and using map_each to transform them. Additionally, pass `runfiles.empty_filenames` using a lambda. Accessing that attribute implicitly flattens the runfiles. Finally, because the original profiles indicated `str.format()` was a non-trivial amount of time (46 seconds / 15% of build time), switch to using `+` instead. This is a more incremental alternative to #3380 which achieves _most_ of the same optimization with only Starlark changes, as opposed to introducing an external script written in C++. [Profile of a large build](https://github.com/user-attachments/assets/e90ae699-a04d-44df-b53c-1156aa890af5), which shows a Starlark CPU profile. It shows an overall build time of 305 seconds. 46 seconds (15%) are spent in `map_zip_runfiles`, half of which is in `str.startswith()` and the other half in `str.format()`. --------- Co-authored-by: Richard Levasseur <rlevasseur@google.com>

rickeylev · 2025-11-22T03:14:42Z

We talked about this in our maintainer meeting. We don't want to take on a built-from-source C++ dependency because of Windows -- it's already difficult to support windows without having to worry about C++ compiling. (We recently went through this with the Gazelle plugin, and it took about a week to figure out and we gave up on the gazelle release because of it).

So, options:

Use this for non-windows only.
Setup prebuilt binaries.

aignas · 2025-11-22T05:50:56Z

Also we talked about the fact that the current algorithm is using startswith, which seems to be inefficient to do for each file. Ideas:

You can do this for the first encountered file from a particular repo only.
You can instead check the repo name and decide what to do.
Is the external prefix that we are sniffing for disabled by some combination of bazel flags or not?

tobyh-canva requested review from aignas and rickeylev as code owners October 26, 2025 22:09

tobyh-canva force-pushed the create-zip-file-analysis-optimisations-2 branch from c10ec90 to 7ef352e Compare October 26, 2025 22:16

gemini-code-assist bot reviewed Oct 26, 2025

View reviewed changes

tobyh-canva force-pushed the create-zip-file-analysis-optimisations-2 branch 3 times, most recently from 12c96bc to 6db7a1a Compare October 26, 2025 22:57

mattem reviewed Oct 26, 2025

View reviewed changes

python/private/py_executable.bzl Outdated Show resolved Hide resolved

tobyh-canva force-pushed the create-zip-file-analysis-optimisations-2 branch 2 times, most recently from 1b2dbc1 to fe6038f Compare October 26, 2025 23:08

tobyh-canva force-pushed the create-zip-file-analysis-optimisations-2 branch from 2d51dc1 to 4464bb7 Compare October 27, 2025 00:19

perf: improve analysis performance by 95% for py_binary and `py_tes…

cef10a3

…t` rules

tobyh-canva force-pushed the create-zip-file-analysis-optimisations-2 branch from 4464bb7 to cef10a3 Compare October 27, 2025 00:43

tobyh-canva mentioned this pull request Oct 27, 2025

refactor: defer zip manifest building to execution phase to improve analysis phase performance #3381

Merged

	std::string zipper_path = runfiles->Rlocation("bazel_tools/tools/zip/zipper/zipper");
	std::string zipper_path = runfiles->Rlocation("bazel_tools/tools/zip/zipper");

		std::string cmd = zipper_path + " cC " + output + " @" + manifest_path;
		int result = std::system(cmd.c_str());

Uh oh!

perf: improve analysis performance by 95% for py_binary and py_test rules #3380

Are you sure you want to change the base?

perf: improve analysis performance by 95% for py_binary and py_test rules #3380

Uh oh!

Conversation

tobyh-canva commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Intent

Changes

Testing

Snapshot tests

Manual tests

Results

Notes

Uh oh!

gemini-code-assist bot commented Oct 26, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

tobyh-canva Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

groodt commented Oct 26, 2025

Uh oh!

rickeylev commented Oct 27, 2025

Uh oh!

tobyh-canva commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aignas commented Oct 27, 2025

Uh oh!

tobyh-canva commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rickeylev commented Nov 9, 2025

Uh oh!

rickeylev commented Nov 22, 2025

Uh oh!

aignas commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

perf: improve analysis performance by 95% for `py_binary` and `py_test` rules #3380

perf: improve analysis performance by 95% for `py_binary` and `py_test` rules #3380

tobyh-canva commented Oct 26, 2025 •

edited

Loading

tobyh-canva commented Oct 27, 2025 •

edited

Loading

tobyh-canva commented Oct 27, 2025 •

edited

Loading