Description
Reducing Mypy's Cache Size
This is a meta-issue talking about different ways to reduce Mypy's cache size. I've been working on a branch in my free time, though it is probably too big for a PR, so I thought it would be best to get everyone's opinion on which optimizations (if any) would be worth-while.
In short, I've reduced Mypy's filesystem cache by about 27% using a few different techniques.
Here is a breakdown of each of the commits, what I did to reduce the cache size, and by how much. We probably don't need to include all of these techniques since a lot of them only marginally reduce the cache size. The cache I've been using as a comparison is Mypy's own cache when checking itself.
The Numbers
Commit | Total Size | Diff | Percent * | Optimization Technique |
---|---|---|---|---|
master |
26.9MB | |||
763a94d5e |
26.5MB | 0.4MB | 1.6% | Use ints instead of bools for certain truthy/falsey values |
623266f47 |
24.2MB | 2.2MB | 8.4% | Don't store fields that are nullable |
6cefbfb27 |
23.4MB | 0.8MB | 3.0% | Drop builtins. prefix for common types |
27e9e0d56 |
23.4MB | 47KB | 0.2% | Don't cache builtins.object in MRO because everything derives from it |
d2d0aa005 |
22.2MB | 1.2MB | 4.5% | Don't store empty list/tuples in TypeInfo classes |
cce01a60f |
22.1MB | 17KB | 0.1% | Don't store empty symbol tables |
88b5b6a3d |
21.3MB | 0.8MB | 3.0% | Don't store arg_names and arg_kinds for func defs with type info |
4275d51b5 |
21.1MB | 0.2MB | 0.8% | Store LDEF/GDEF/MDEF as ints |
0a4dfeab2 |
19.9MB | 1.2MB | 4.4% | Replace .class key with empty string |
abf66951c |
19.9MB | 0.1MB | 0.5% | Store NoneType node as "None" string literal |
22b91d94d |
19.6MB | 0.1MB | 0.6% | Optimize def_extras usage |
98c4ce817 |
19.6MB | 5KB | 0.0% | Don't store Instance args if they're empty |
Total | 7.0MB | 27.0% |
* Percent is calculated based on the original cache size of the
master
branch
The best techniques based on total savings are:
- Don't store fields that are nullable (8.4%)
- Don't store empty list/tuples in TypeInfo classes (4.5%)
- Replace
.class
key with empty string (4.4%) - Drop
builtins.
prefix for common types (3.0%) - Don't store
arg_names
andarg_kinds
for func defs with type info (3.0%)
After that point everything starts to drop off, though they still might be worth including.
Backwards Compatibility
I made sure to check that my changes would not break backwards compatibility. These new techniques allow for loading of both old and new caches, but of course the new cache format will be used when cached files needs to be rebuild.
Why?
I was poking around the cache folder to see why the cache was the size it was, and I noticed that there was a lot of optimizations that could be made. In theory, smaller caches are quicker to save and load, while taking up less space on the users computer. In CI systems where storage space is metered, having a smaller cache size will speed up CI workflows and use up less cloud storage.
Let me know if this is something you guys would be interested in! If so I'll start splitting this into separate PR's. Thanks!