Open
Description
The current implementation from #12634 has support for manual cleaning of cache data based on size. This is just a preliminary implementation, with the intent to gather better use cases to understand how size-based cleaning should work.
Some considerations for changes:
- What are the use cases for cleaning based on size?
- What should be included in size calculations (src, cache, git db, git co, indexes)?
- Does the current max-size behavior make sense? It combines src,cache and sorts on time. But should it maybe have higher precedence to delete src first (since it can be recreated locally)?
- Should it include indexes (It doesn't because deleting an index seems like it would not be something that you would want, and there are relatively few indexes).
- What should be the priorities for what it cleans first?
- What should be the CLI options be for specifying manual size cleaning?
- Should
--max-download-size
include git caches? - Should automatic gc support size-based cleaning?
- This may be tricky to implement, since you don't want to delete anything that is being used by the current build, and could potentially cause cache thrashing.
- It's not clear to me at which point during the build it should do a size-based cleaning.
- Tracking size data is slow.
- The current
du
implementation is primitive, and doesn't know about block sizes, and thus vastly undercounts the disk usage of small files. Should it be updated or replaced to use a better implementation? (It also doesn't handle hard-links, see gc: Verify du_git_checkout works on Windows #13064).