Add CMake tryrun cache for macOS ARM64 to speed up configure#124046
Add CMake tryrun cache for macOS ARM64 to speed up configure#124046steveisok wants to merge 2 commits intodotnet:mainfrom
Conversation
Add a pre-configured CMake cache file for macOS ARM64 (Apple Silicon) that eliminates redundant feature detection checks during the configure phase. Performance improvement: - CMake configure time: 105s → 12s (89% faster) - Full clean build (clr+libs): 9:51 → 7:36 (18% faster) The build currently runs CMake configuration 3 times (coreclr, native libs, host), with 597 total checks of which 395 are duplicated across configurations. The cache file pre-populates known results for macOS ARM64, similar to the existing tryrun.browser.cmake for WebAssembly builds. Valid for: - macOS 14.0+ (Sonoma and later) - Xcode 15.0+ / AppleClang 15.0+ - Architecture: arm64 (Apple Silicon) To disable if issues arise (e.g., after Xcode upgrade): export CLR_CMAKE_SKIP_PLATFORM_CACHE=1
|
@dotnet/runtime-infrastructure @jkotas @am11 I have this in draft as I think we need to make sure we strike the right balance in terms of caching and to make sure there isn't a great deal of friction using and updating this. I do believe this is the right path for most of our configurations. If you're all on board, easiest to start with one and we'll methodically go through the rest. |
There was a problem hiding this comment.
Pull request overview
This PR adds a CMake tryrun cache file for macOS ARM64 builds to significantly reduce CMake configure time. The change introduces a pre-configured cache file (tryrun.osx-arm64.cmake) containing 337 feature detection results that are known for macOS 14.0+ with Xcode 15.0+, eliminating redundant checks during the build configuration phase. The cache is automatically applied for macOS ARM64 builds and can be disabled via the CLR_CMAKE_SKIP_PLATFORM_CACHE=1 environment variable.
Changes:
- Added a new CMake cache file with pre-populated feature detection results for macOS ARM64 platform
- Modified the build system to automatically load this cache file when building for macOS ARM64
- Implemented an opt-out mechanism for users who encounter issues
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| eng/native/tryrun.osx-arm64.cmake | New CMake cache file containing 337 pre-configured feature detection results for macOS ARM64 (Sonoma 14.0+, Xcode 15.0+) to eliminate redundant checks during configuration |
| eng/native/gen-buildsys.sh | Added logic to automatically load the macOS ARM64 cache file when target_os is "osx" and host_arch is "arm64", with opt-out via CLR_CMAKE_SKIP_PLATFORM_CACHE=1 environment variable |
| # Use platform-specific tryrun cache to speed up CMake configure (opt-out via CLR_CMAKE_SKIP_PLATFORM_CACHE=1) | ||
| if [[ "$CLR_CMAKE_SKIP_PLATFORM_CACHE" != "1" ]]; then | ||
| if [[ "$target_os" == "osx" && "$host_arch" == "arm64" && -f "$scriptroot/tryrun.osx-arm64.cmake" ]]; then | ||
| cmake_extra_defines="-C $scriptroot/tryrun.osx-arm64.cmake $cmake_extra_defines" | ||
| fi | ||
| fi |
There was a problem hiding this comment.
When cross-compiling to macOS ARM64, both the new platform-specific cache (line 91) and the cross-compilation cache (line 75) will be loaded. Since tryrun.osx-arm64.cmake is prepended, CMake processes it first, but values can be overridden by the later tryrun.cmake.
There's a specific value discrepancy: HAVE_SHM_OPEN_THAT_WORKS_WELL_ENOUGH_WITH_MMAP_EXITCODE is set to 255 in tryrun.osx-arm64.cmake (line 233) but 1 in tryrun.cmake for Darwin ARM64 (line 75 context). In cross-compile scenarios, tryrun.cmake's value of 1 will override. However, for native macOS ARM64 builds, the value of 255 from tryrun.osx-arm64.cmake will be used.
Additionally, tryrun.osx-arm64.cmake is missing several _EXITCODE variables that are defined in tryrun.cmake for Darwin (HAVE_BROKEN_FIFO_SELECT_EXITCODE, HAVE_CLOCK_MONOTONIC_COARSE_EXITCODE, HAVE_CLOCK_GETTIME_NSEC_NP_EXITCODE, HAVE_MMAP_DEV_ZERO_EXITCODE, HAVE_SCHED_GETCPU_EXITCODE, MMAP_ANON_IGNORES_PROTECTION_EXITCODE, SEM_INIT_MODIFIES_ERRNO_EXITCODE).
Consider adding a check to skip loading tryrun.osx-arm64.cmake when CROSSCOMPILE is set to avoid any potential conflicts, or verify that these value differences are intentional and correct for the different build scenarios.
| # CMake pre-configured cache for macOS ARM64 (Apple Silicon) native builds | ||
| # | ||
| # This file caches the results of CMake feature detection checks to significantly | ||
| # speed up the CMake configure phase for macOS ARM64 builds. |
There was a problem hiding this comment.
The comment states this is for "macOS ARM64 (Apple Silicon) native builds" but the file can also be loaded during cross-compilation to macOS ARM64 (when CROSSCOMPILE=1 is set). Consider updating the comment to clarify that it applies to both native and cross-compilation scenarios, or alternatively, modify the loading logic in gen-buildsys.sh to skip this cache file when CROSSCOMPILE=1 to avoid the potential conflicts noted in the other comment.
| # CMake pre-configured cache for macOS ARM64 (Apple Silicon) native builds | |
| # | |
| # This file caches the results of CMake feature detection checks to significantly | |
| # speed up the CMake configure phase for macOS ARM64 builds. | |
| # CMake pre-configured cache for macOS ARM64 (Apple Silicon) builds | |
| # | |
| # This file caches the results of CMake feature detection checks to significantly | |
| # speed up the CMake configure phase when targeting macOS ARM64 (both native | |
| # builds on Apple Silicon and cross-compilation with CROSSCOMPILE=1). |
|
I think we need to have some sort of detection when the checked in configs get out of sync before adding more of them. We seem to have bugs due to messed up checked in configs #123950 |
| set(COMPILER_SUPPORTS_W_RESERVED_IDENTIFIER 1 CACHE INTERNAL "") | ||
| set(FNO_LTO_AVAILABLE 1 CACHE INTERNAL "") | ||
| set_cache_value(HAS_POSIX_SEMAPHORES_EXITCODE 1) | ||
| set(HAS_POSIX_SEMAPHORES "" CACHE INTERNAL "") |
There was a problem hiding this comment.
We may want to check how many of these are actually needed. I checked a few and immediately discovered a cluster of dead code: #124049
I think the kinds of checks we need to do are likely to be different per target. With wasm, it's when we bump emscripten. What do you think is appropriate here? XCode major/minor versions? OS version? |
I think we have a min version the cache is valid for and a check that we don't get too far ahead (2 major versions?). That way regeneration would likely to occur when we bump above the min version on our CI machines. Do we want to warn or error when we violate the cache checks? |
|
Caching the introspection result is fine for CI like environments which are deterministic. But for dev innerloop, we can't predict what user machine has installed. I honestly don't think saving a few seconds on non-critical use-cases is worth the hassle and it also begs the questions like; "why only the macOS". |
I do not know what the best practices around this are. Copilot highlighted some of the discrepancies with what's getting added here in https://github.com/dotnet/runtime/pull/124046/files#r2769414845 |
I think we have to try because the various build time savings (this plus ninja to start) is hard to ignore. I think around 18% for caching is about what you'd find in most cases (in addition to the ~8% ninja boost). That savings adds up.
I want to apply this approach everywhere that makes sense. Starting with one allows us to keep the discussion focused and then we can go down the line. |
Nice! I wondered if there were difference between when we cross build on CI and when we just run. Apple does a fairly decent job of making it appear seamless, so this is easy to miss.
I double checked with copilot and here's what it suggests (I think I pretty much agree). We also need to factor in when crosscompiling. |
|
One more thing... There are comments in wasm highlighting how to reconstruct the cache. We need to automate that part to make it easier on whoever is charged with updating the cache. |
An alternative way to address this is to ask why we have so many of these and why they are so slow to evaluate. |
I am thinking about the potential risks. The whole point of cmake introspection is to adapt to the machine or ensuring the "desired state" of the machine. Otherwise, we can have hand-rolled config.h file. :) In cmake, there is no single type of introspection. In places, we raise manual error in cmake script for unexpected state, so this will paper over those situation where something can go wrong. IOW, it has potential to break stuff in non-obvious ways. Perhaps, we can add a |
Agreed - I think part of what we're also trying to figure out is where the line is. It's likely different for every platform.
I think that's a good idea. |
Summary of analysis from the coreclr part of configure:
The fundamental issue is that cmake has to invoke the compiler many times (~300+ try_compile calls) to probe the toolchain/platform. There's no way to make those faster other than:
So yes - caching is the right lever to pull. The values are stable, and for CI where build times matter most, a checked-in cache makes sense. |
Add options to validate and regenerate platform-specific CMake cache files: - --validate-config: Runs cmake configure without the cache and compares detected values against the cached file, reporting any differences. - --regenerate-config: Same as validate, but updates the cache file if differences are found. The new validate-platform-cache.sh script supports multiple platforms: - osx-arm64, osx-x64, linux-x64, linux-arm64, browser, ios/tvos This addresses feedback about providing a way to verify cached preset values match the current system configuration.
I added a validation and regeneration switch along with a script to carry it out. Is the concept in line with what you were thinking? We may want to tweak the regen script, but what matters first is if this is what we want. |
| echo "Run with --regenerate to update the cache file:" | ||
| echo " $0 --regenerate $platform" |
There was a problem hiding this comment.
| echo "Run with --regenerate to update the cache file:" | |
| echo " $0 --regenerate $platform" | |
| echo "Run with --regenerate-config to update the cache file:" |
Maybe it's better to use the arg name in the top-level script.
|
Just tried and it has caught one difference: $ ./build.sh --regenerate-config
...
============================================================
Results
============================================================
Total variables checked: 312
Matched: 311
Differences: 1
Differences found:
- HAVE_SYS_ENDIAN_H: cached='' detected='1'
Run with --regenerate to update the cache file:
/Users/adeel/projects/runtime5/eng/native/validate-platform-cache.sh --regenerate osx-arm64 |
Curious, what is your mac setup? I surfaced it on my setup. I generated the cache from MacOS 15.4 / XCode 16.3. On my other Mac, that has 26.2 and is a newer SDK. The initial cache as part of this PR is closer to the floor that we want. You're too new. |
|
Mine is macos |
Add a pre-configured CMake cache file for macOS ARM64 (Apple Silicon) that eliminates redundant feature detection checks during the configure phase.
Performance improvement:
The build currently runs CMake configuration 3 times (coreclr, native libs, host), with 597 total checks of which 395 are duplicated across configurations. The cache file pre-populates known results for macOS ARM64, similar to the existing tryrun.browser.cmake for WebAssembly builds.
Valid for:
To disable if issues arise (e.g., after Xcode upgrade):
export CLR_CMAKE_SKIP_PLATFORM_CACHE=1