Skip to content

Conversation

@MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Oct 14, 2025

Summary

This PR switches from thin to fat LTO optimization.

The main motivation is that using lto=fat fixes the binary size increase that we've seen after switching to inventory for ingredient registration. The regression is mainly due to symbols not being removed when using lto=thin, but they're successfully removed when using lto=fat (from 19MB to 17MB).

Using lto=fat also results in a significant performance improvement, which I think is worth it on its own.

Unfortunately, using lto=flat does have the downside that release builds take significantly longer. One of the main motivations for using lto=thin when we switched from fat to thin in #9031 was to improve performance. To mitigate this, I changed the --profiling profile to only use lto=thin, similar to what we do in uv (where we disable lto entirely). This PR is also likely to make the mypy primer and benchmark jobs slower (or when running mypy primer locally), because of a significant increase in compile time.

This setup now mirrors uv's (with the exception that profiling use lto=fat).

I do feel a bit bad about making this change while @BurntSushi is out, because I know he feels the most vocal about this.

Clean build timing:

  • fat: 2m 02s
  • thin: 1m 00s

Fixes #20845

Relevant discussions:

@codspeed-hq
Copy link

codspeed-hq bot commented Oct 14, 2025

CodSpeed Performance Report

Merging #20863 will degrade performances by 5.28%

Comparing micha/fat-lto (6e2b809) with main (abf685b)

Summary

⚡ 25 improvements
❌ 1 (👁 1) regression
✅ 25 untouched

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
Instrumentation linter/all-rules[large/dataset.py] 17.7 ms 16.2 ms +9.37%
Instrumentation linter/all-rules[numpy/ctypeslib.py] 4.2 ms 3.9 ms +7.89%
Instrumentation linter/all-rules[numpy/globals.py] 712.7 µs 648.1 µs +9.97%
Instrumentation linter/all-rules[pydantic/types.py] 8.1 ms 7.5 ms +8.33%
Instrumentation linter/all-rules[unicode/pypinyin.py] 1.8 ms 1.7 ms +10.14%
👁 Instrumentation linter/default-rules[numpy/globals.py] 194.4 µs 205.3 µs -5.28%
Instrumentation linter/all-with-preview-rules[large/dataset.py] 21.3 ms 19.5 ms +9.07%
Instrumentation linter/all-with-preview-rules[numpy/ctypeslib.py] 4.9 ms 4.6 ms +7.88%
Instrumentation linter/all-with-preview-rules[numpy/globals.py] 805 µs 731.3 µs +10.09%
Instrumentation linter/all-with-preview-rules[pydantic/types.py] 9.6 ms 8.9 ms +7.92%
Instrumentation linter/all-with-preview-rules[unicode/pypinyin.py] 2.1 ms 1.9 ms +10.15%
Instrumentation ty_check_file[cold] 122.8 ms 115.2 ms +6.6%
Instrumentation ty_check_file[incremental] 5.2 ms 4.4 ms +17.85%
Instrumentation ty_micro[many_enum_members] 92.6 ms 88.1 ms +5.03%
Instrumentation anyio 898.9 ms 828.5 ms +8.5%
Instrumentation attrs 412.6 ms 382.7 ms +7.8%
Instrumentation DateType 194.9 ms 182.3 ms +6.91%
Instrumentation hydra-zen 935.8 ms 853 ms +9.7%
WallTime large[sympy] 42.9 s 38.9 s +10.33%
WallTime medium[colour-science] 12.3 s 11.1 s +11.07%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 14, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser MichaReiser force-pushed the micha/fat-lto branch 2 times, most recently from b264e60 to c2f89bb Compare October 14, 2025 14:12
@MichaReiser MichaReiser added the release Related to the release process label Oct 14, 2025
@MichaReiser MichaReiser reopened this Oct 14, 2025
@MichaReiser MichaReiser marked this pull request as ready for review October 14, 2025 14:49
@MichaReiser
Copy link
Member Author

I'm not sure why that one instrumented benchmark regresses. But a 10% improvement on our walltime benchmarks is very convincing

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!

@MichaReiser MichaReiser changed the title Enable fat LTO Enable lto=fat Oct 14, 2025
Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
@MichaReiser MichaReiser merged commit a93618e into main Oct 15, 2025
38 checks passed
@MichaReiser MichaReiser deleted the micha/fat-lto branch October 15, 2025 07:00
@AlexWaygood AlexWaygood mentioned this pull request Oct 15, 2025
13 tasks
dcreager added a commit that referenced this pull request Oct 15, 2025
…rable

* origin/main:
  [ty] Add (unused) `inferable` parameter to type property methods (#20865)
  Run macos tests on macos (#20889)
  Remove `release` CI job (#20887)
  [ty] CI: Faster ecosystem analysis (#20886)
  Remove `strip` from release profile (#20885)
  [ty] Sync vendored typeshed stubs (#20876)
  [ty] Add some completion ranking improvements (#20807)
  Improved error recovery for unclosed strings (including f- and t-strings) (#20848)
  Enable lto=fat (#20863)
  [`pyupgrade`] Extend `UP019` to detect `typing_extensions.Text` (`UP019`) (#20825)
  [`flake8-bugbear`] Omit annotation in preview fix for `B006` (#20877)
  fix(docs): Fix typo in `RUF015` description (#20873)
  [ty] Improve and extend tests for instance attributes redeclared in subclasses (#20866)
  [ty] Ignore slow seeds as a temporary measure (#20870)
  Remove parentheses around multiple exception types on Python 3.14+ (#20768)
  Update Black tests (#20794)
dcreager added a commit that referenced this pull request Oct 15, 2025
…nt-sets

* dcreager/non-non-inferable: (174 commits)
  [ty] Add (unused) `inferable` parameter to type property methods (#20865)
  Run macos tests on macos (#20889)
  Remove `release` CI job (#20887)
  [ty] CI: Faster ecosystem analysis (#20886)
  Remove `strip` from release profile (#20885)
  [ty] Sync vendored typeshed stubs (#20876)
  [ty] Add some completion ranking improvements (#20807)
  Improved error recovery for unclosed strings (including f- and t-strings) (#20848)
  Enable lto=fat (#20863)
  [`pyupgrade`] Extend `UP019` to detect `typing_extensions.Text` (`UP019`) (#20825)
  [`flake8-bugbear`] Omit annotation in preview fix for `B006` (#20877)
  fix(docs): Fix typo in `RUF015` description (#20873)
  [ty] Improve and extend tests for instance attributes redeclared in subclasses (#20866)
  [ty] Ignore slow seeds as a temporary measure (#20870)
  use existing method
  Remove parentheses around multiple exception types on Python 3.14+ (#20768)
  Update Black tests (#20794)
  just the api parts
  [ty] Fix further issues in `super()` inference logic (#20843)
  [ty] Document when a rule was added (#20859)
  ...
@BurntSushi
Copy link
Member

Yeah I think I've slowly become okay with this sort of change. I even recently did the same for ripgrep. I still don't like that we have different compilation settings for profiling versus release, but I haven't been bitten too hard by it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release Related to the release process

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce size of WASM builds

4 participants