Better handling for caches in skyfunctions #23101

katre · 2024-07-24T19:44:53Z

We have several caches used currently in skyfunction implementations, with many specifically, used directly or indirectly from ConfiguredTargetFunction. Some examples are:

Each of these are used to cache expansive computations that may be repeated multiple times in the same build (ie, in one bazel build call). Each of these would ideally also persist from build to build in case the same computation is needed across builds.

We've also seen several classes of error arising from the files used to determine cache values being changed between builds, but the cache not being invalidated:

Platform-based flags are cached incorrectly #22995 - Flags from platform weren't re-computed when the underlying BUILD file changed
Label flag transitions are cached incorrectly #23097 - Transitions weren't re-computed when the underlying bzl file changed.

There may well be more of these.

The text was updated successfully, but these errors were encountered:

katre · 2024-07-24T19:48:02Z

One potential solution: a way to get a cache from Skyframe, giving a root skykey, such that when the skykey is invalidated any entries added to the cache are also invalidated.

This would prevent the above problems because when the platform BUILD file or the transition bzl file are changed, those skykeys are dirtied, and that propagates to the ConfiguredTargetKey for the ConfiguredTargetFunction call that actually uses these caches.

This would require more state and more tracking from Skyframe, but would avoid the current fix, which is to entirely clear these caches between builds.

katre · 2024-07-24T19:48:42Z

Pinging @haxorz and @gregestren for thoughts.

haxorz · 2024-07-24T21:24:39Z

@aoeui Shahan can you please advise?

haxorz · 2024-07-24T21:28:05Z

@katre Can you explain why the simple approach of using a Skyframe node for each computation doesn't work? By construction, seems like it'd address the correctness concern. Is the issue that it wouldn't give you the memory properties you want? Looks like the code you linked uses a mix of weak and soft caches.

katre · 2024-07-25T17:37:00Z

@haxorz: The reason we've increased the use of state-machine producers for ConfiguredTargetFunction sub-steps is because adding more actual skyfunctions was causing the edge count to rise too high: CTF already has edges for each dependency, more for toolchains and configurations, but adding even more to break up the computation further was (I was told) too expensive.

I suspect that Shahan knows more about this than I do. I'd be very happy to use the existing caching skyframe brings rather than adding new systems on top of it.

katre · 2024-07-25T17:38:29Z

Regarding weak and soft caches: I doubt we're being systematic here, and could probably do a lot to better tune these caches and how they handle references. I think this is actually another point in favor of having a generic system to get a cache from skyframe rather than adding them in an ad hoc manner throughout the system.

gregestren · 2024-08-07T20:36:54Z

I think each weak vs. soft choice was consciously considered. Doesn't mean the current choices are right but I'd be curious about the original rationales.

Honestly, the surrounding code has changed so much since these caches were added that it's not remotely clear they still represent ideal implementation logic. I'm super-curious if a Skyframe-native caching approach could work for all the reasons discussed here.

Acknowledged about extensive edges. Maybe there are ways to mitigate that by combining some keys or only introducing them when needed (especially StarlarkTransitionCache, which I don't think would add edges to most nodes).

katre added type: feature request P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc labels Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling for caches in skyfunctions #23101

Better handling for caches in skyfunctions #23101

katre commented Jul 24, 2024

katre commented Jul 24, 2024

katre commented Jul 24, 2024

haxorz commented Jul 24, 2024

haxorz commented Jul 24, 2024

katre commented Jul 25, 2024

katre commented Jul 25, 2024

gregestren commented Aug 7, 2024 •

edited

Loading

Better handling for caches in skyfunctions #23101

Better handling for caches in skyfunctions #23101

Comments

katre commented Jul 24, 2024

katre commented Jul 24, 2024

katre commented Jul 24, 2024

haxorz commented Jul 24, 2024

haxorz commented Jul 24, 2024

katre commented Jul 25, 2024

katre commented Jul 25, 2024

gregestren commented Aug 7, 2024 • edited Loading

gregestren commented Aug 7, 2024 •

edited

Loading