Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling for caches in skyfunctions #23101

Open
katre opened this issue Jul 24, 2024 · 7 comments
Open

Better handling for caches in skyfunctions #23101

katre opened this issue Jul 24, 2024 · 7 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: feature request

Comments

@katre
Copy link
Member

katre commented Jul 24, 2024

We have several caches used currently in skyfunction implementations, with many specifically, used directly or indirectly from ConfiguredTargetFunction. Some examples are:

Each of these are used to cache expansive computations that may be repeated multiple times in the same build (ie, in one bazel build call). Each of these would ideally also persist from build to build in case the same computation is needed across builds.

We've also seen several classes of error arising from the files used to determine cache values being changed between builds, but the cache not being invalidated:

There may well be more of these.

@katre
Copy link
Member Author

katre commented Jul 24, 2024

One potential solution: a way to get a cache from Skyframe, giving a root skykey, such that when the skykey is invalidated any entries added to the cache are also invalidated.

This would prevent the above problems because when the platform BUILD file or the transition bzl file are changed, those skykeys are dirtied, and that propagates to the ConfiguredTargetKey for the ConfiguredTargetFunction call that actually uses these caches.

This would require more state and more tracking from Skyframe, but would avoid the current fix, which is to entirely clear these caches between builds.

@katre
Copy link
Member Author

katre commented Jul 24, 2024

Pinging @haxorz and @gregestren for thoughts.

@katre katre added type: feature request P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc labels Jul 24, 2024
@haxorz
Copy link
Contributor

haxorz commented Jul 24, 2024

@aoeui Shahan can you please advise?

@haxorz
Copy link
Contributor

haxorz commented Jul 24, 2024

@katre Can you explain why the simple approach of using a Skyframe node for each computation doesn't work? By construction, seems like it'd address the correctness concern. Is the issue that it wouldn't give you the memory properties you want? Looks like the code you linked uses a mix of weak and soft caches.

@katre
Copy link
Member Author

katre commented Jul 25, 2024

@haxorz: The reason we've increased the use of state-machine producers for ConfiguredTargetFunction sub-steps is because adding more actual skyfunctions was causing the edge count to rise too high: CTF already has edges for each dependency, more for toolchains and configurations, but adding even more to break up the computation further was (I was told) too expensive.

I suspect that Shahan knows more about this than I do. I'd be very happy to use the existing caching skyframe brings rather than adding new systems on top of it.

@katre
Copy link
Member Author

katre commented Jul 25, 2024

Regarding weak and soft caches: I doubt we're being systematic here, and could probably do a lot to better tune these caches and how they handle references. I think this is actually another point in favor of having a generic system to get a cache from skyframe rather than adding them in an ad hoc manner throughout the system.

@gregestren
Copy link
Contributor

gregestren commented Aug 7, 2024

I think each weak vs. soft choice was consciously considered. Doesn't mean the current choices are right but I'd be curious about the original rationales.

Honestly, the surrounding code has changed so much since these caches were added that it's not remotely clear they still represent ideal implementation logic. I'm super-curious if a Skyframe-native caching approach could work for all the reasons discussed here.

Acknowledged about extensive edges. Maybe there are ways to mitigate that by combining some keys or only introducing them when needed (especially StarlarkTransitionCache, which I don't think would add edges to most nodes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: feature request
Projects
None yet
Development

No branches or pull requests

3 participants