Skip to content

Feature: Reduce Mypy's Cache Size #15731

Open
@dosisod

Description

@dosisod

Reducing Mypy's Cache Size

This is a meta-issue talking about different ways to reduce Mypy's cache size. I've been working on a branch in my free time, though it is probably too big for a PR, so I thought it would be best to get everyone's opinion on which optimizations (if any) would be worth-while.

In short, I've reduced Mypy's filesystem cache by about 27% using a few different techniques.

Here is a breakdown of each of the commits, what I did to reduce the cache size, and by how much. We probably don't need to include all of these techniques since a lot of them only marginally reduce the cache size. The cache I've been using as a comparison is Mypy's own cache when checking itself.

The Numbers

Commit Total Size Diff Percent * Optimization Technique
master 26.9MB
763a94d5e 26.5MB 0.4MB 1.6% Use ints instead of bools for certain truthy/falsey values
623266f47 24.2MB 2.2MB 8.4% Don't store fields that are nullable
6cefbfb27 23.4MB 0.8MB 3.0% Drop builtins. prefix for common types
27e9e0d56 23.4MB 47KB 0.2% Don't cache builtins.object in MRO because everything derives from it
d2d0aa005 22.2MB 1.2MB 4.5% Don't store empty list/tuples in TypeInfo classes
cce01a60f 22.1MB 17KB 0.1% Don't store empty symbol tables
88b5b6a3d 21.3MB 0.8MB 3.0% Don't store arg_names and arg_kinds for func defs with type info
4275d51b5 21.1MB 0.2MB 0.8% Store LDEF/GDEF/MDEF as ints
0a4dfeab2 19.9MB 1.2MB 4.4% Replace .class key with empty string
abf66951c 19.9MB 0.1MB 0.5% Store NoneType node as "None" string literal
22b91d94d 19.6MB 0.1MB 0.6% Optimize def_extras usage
98c4ce817 19.6MB 5KB 0.0% Don't store Instance args if they're empty
Total 7.0MB 27.0%

* Percent is calculated based on the original cache size of the master branch

The best techniques based on total savings are:

  • Don't store fields that are nullable (8.4%)
  • Don't store empty list/tuples in TypeInfo classes (4.5%)
  • Replace .class key with empty string (4.4%)
  • Drop builtins. prefix for common types (3.0%)
  • Don't store arg_names and arg_kinds for func defs with type info (3.0%)

After that point everything starts to drop off, though they still might be worth including.

Backwards Compatibility

I made sure to check that my changes would not break backwards compatibility. These new techniques allow for loading of both old and new caches, but of course the new cache format will be used when cached files needs to be rebuild.

Why?

I was poking around the cache folder to see why the cache was the size it was, and I noticed that there was a lot of optimizations that could be made. In theory, smaller caches are quicker to save and load, while taking up less space on the users computer. In CI systems where storage space is metered, having a smaller cache size will speed up CI workflows and use up less cloud storage.

Let me know if this is something you guys would be interested in! If so I'll start splitting this into separate PR's. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions