Skip to content

[CMake] Make precompiled headers opt-in for ccache builds #141927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mrkajetanp
Copy link
Contributor

@mrkajetanp mrkajetanp commented May 29, 2025

Using precompiled headers in ccache builds seems to be causing intermittent CI failures where CMake fails to rebuild the header when it's supposed to.

Add a new LLVM_CCACHE_PCH option, set to OFF by default, in order to make using pch in ccache builds opt-in and avoid these failures in the CI.

This is a workaround for an unknown underlying issue.
See: #142449

@mrkajetanp
Copy link
Contributor Author

mrkajetanp commented May 29, 2025

I've seen this happen on a pre-merge check for my own PR too. No idea which part of the environment could be causing this, but maybe now that we know this works in general we can make it opt-in either for local developer setups to make use of or for specific build bots to try out.

Copy link
Member

@mgorny mgorny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WFM

@Meinersbur
Copy link
Member

What happened with the -fno-pch-timestamp solution?

@mrkajetanp
Copy link
Contributor Author

What happened with the -fno-pch-timestamp solution?

That part is fine, but this is a different problem. From what I understand the compiler handling of pch is supposed to be based on either the timestamp or the file size. The timestamp causes obvious ccache problems because it gets out of sync. That is solved by -fno-pch-timestamp.

In this case, the errors are:

has been modified since the precompiled header '(..)/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/cmake_pch.hxx.pch' was built: size changed (was 25444, now 25753)

So this is an actual size mismatch. If the header in question has actually changed, then CMake should have rebuilt it prior to rebuilding its users and this should not happen. But somehow it seems to irregularly pop up on some build bots. I am yet to see it on a local build, though that doesn't mean nobody else has.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested how this interacts with CMAKE_C_COMPILER_LAUNCHER and CMAKE_CXX_COMPILER_LAUNCHER rather than LLVM_CCACHE_BUILD? That might be a bit of a footgun if we end up changing how the premerge system works (eg https://discourse.llvm.org/t/llvm-ccache-build-is-deprecated/68431?u=boomanaiden154-1).

@mrkajetanp
Copy link
Contributor Author

Have you tested how this interacts with CMAKE_C_COMPILER_LAUNCHER and CMAKE_CXX_COMPILER_LAUNCHER rather than LLVM_CCACHE_BUILD? That might be a bit of a footgun if we end up changing how the premerge system works (eg https://discourse.llvm.org/t/llvm-ccache-build-is-deprecated/68431?u=boomanaiden154-1).

If LLVM_CCACHE_BUILD gets removed then I think that whole part of this file will have to get reworked anyway. The last time I was making changes around ccache I was told that enabling ccache through the *LAUNCHER options is not supposed to be fully supported in the same way as LLVM_CCACHE_BUILD. As in, the former is supposed to work, but not proactively set things up for the user in the same way as the latter.

@Meinersbur
Copy link
Member

So this is an actual size mismatch. If the header in question has actually changed, then CMake should have rebuilt it prior to rebuilding its users and this should not happen. But somehow it seems to irregularly pop up on some build bots. I am yet to see it on a local build, though that doesn't mean nobody else has.

Sounds like a problem of dependencies not being honored. So this is just a workaround for another a different problem?

In any case, I find the summary and source comment insufficient. If this is indeed a workaround for an yet unknown cause, please also link to the issue # (#142449 ?) and add a FIXME comment. The fault may also lie in ccache not restoring the right .pch files before invoking clang.

@mrkajetanp
Copy link
Contributor Author

So this is just a workaround for another a different problem?

Exactly. It doesn't seem to happen very consistently, otherwise the CI would have been failing a lot more than it has lately. It seems like there's a bug either in CMake or ccache causing this interaction between the two, unless it's some other environment thing I'm not considering. If we make it opt-in this way, then it should be easier to pinpoint the issue over time because people configuring build bots can enable it and then see whether this ends up happening for them or not.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The summary, when used as commit message, could also mention this is a workaround and refer to the issue #.

Using precompiled headers in ccache builds seems to be causing
intermittent CI failures where CMake fails to rebuild the header
when it's supposed to, such as:

https://www.mail-archive.com/llvm-bugs@lists.llvm.org/msg88645.html

Add a new LLVM_CCACHE_PCH option, set to OFF by default, in order
to make using pch in ccache opt-in and avoid these failures in the CI.

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
@boomanaiden154
Copy link
Contributor

Also, has the issue been observed at all on the Github hosted runners? They use a different caching setup and I've only seen reports on the Buildkite side. None of the issues have links to Github workflows but that doesn't necessarily indicate anything.

@mrkajetanp
Copy link
Contributor Author

mrkajetanp commented Jun 4, 2025

Also, has the issue been observed at all on the Github hosted runners? They use a different caching setup and I've only seen reports on the Buildkite side. None of the issues have links to Github workflows but that doesn't necessarily indicate anything.

I've only seen the issue on Buildkite as well. It doesn't seem to be happening on the buildbots either because the flang builds have been passing pretty consistently lately, but maybe I just haven't noticed.

Do you know what's different about the buildkite caching setup?

@boomanaiden154
Copy link
Contributor

Do you know what's different about the buildkite caching setup?

It's different from Github in that it's persistent across builds inside the same container. It could even be a single runner that's messed up somewhere.

Can we hold off on landing this until after we sunset Buildkite (should happen in the next couple days)? If it's still occurring on the Github side we can land it, but otherwise we probably don't need it.

@mrkajetanp
Copy link
Contributor Author

It's different from Github in that it's persistent across builds inside the same container. It could even be a single runner that's messed up somewhere.

Oh I see, that sounds like it could indeed be the cause of this.

Can we hold off on landing this until after we sunset Buildkite (should happen in the next couple days)? If it's still occurring on the Github side we can land it, but otherwise we probably don't need it.

Sure, I'm happy to wait. Is there somewhere I can track the progress of moving the CI to GitHub?

@boomanaiden154
Copy link
Contributor

Sure, I'm happy to wait. Is there somewhere I can track the progress of moving the CI to GitHub?

We just fully switched over today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants