Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Reduce Mypy's Cache Size #15731

Open
dosisod opened this issue Jul 21, 2023 · 6 comments · May be fixed by #15981
Open

Feature: Reduce Mypy's Cache Size #15731

dosisod opened this issue Jul 21, 2023 · 6 comments · May be fixed by #15981
Labels

Comments

@dosisod
Copy link
Contributor

dosisod commented Jul 21, 2023

Reducing Mypy's Cache Size

This is a meta-issue talking about different ways to reduce Mypy's cache size. I've been working on a branch in my free time, though it is probably too big for a PR, so I thought it would be best to get everyone's opinion on which optimizations (if any) would be worth-while.

In short, I've reduced Mypy's filesystem cache by about 27% using a few different techniques.

Here is a breakdown of each of the commits, what I did to reduce the cache size, and by how much. We probably don't need to include all of these techniques since a lot of them only marginally reduce the cache size. The cache I've been using as a comparison is Mypy's own cache when checking itself.

The Numbers

Commit Total Size Diff Percent * Optimization Technique
master 26.9MB
763a94d5e 26.5MB 0.4MB 1.6% Use ints instead of bools for certain truthy/falsey values
623266f47 24.2MB 2.2MB 8.4% Don't store fields that are nullable
6cefbfb27 23.4MB 0.8MB 3.0% Drop builtins. prefix for common types
27e9e0d56 23.4MB 47KB 0.2% Don't cache builtins.object in MRO because everything derives from it
d2d0aa005 22.2MB 1.2MB 4.5% Don't store empty list/tuples in TypeInfo classes
cce01a60f 22.1MB 17KB 0.1% Don't store empty symbol tables
88b5b6a3d 21.3MB 0.8MB 3.0% Don't store arg_names and arg_kinds for func defs with type info
4275d51b5 21.1MB 0.2MB 0.8% Store LDEF/GDEF/MDEF as ints
0a4dfeab2 19.9MB 1.2MB 4.4% Replace .class key with empty string
abf66951c 19.9MB 0.1MB 0.5% Store NoneType node as "None" string literal
22b91d94d 19.6MB 0.1MB 0.6% Optimize def_extras usage
98c4ce817 19.6MB 5KB 0.0% Don't store Instance args if they're empty
Total 7.0MB 27.0%

* Percent is calculated based on the original cache size of the master branch

The best techniques based on total savings are:

  • Don't store fields that are nullable (8.4%)
  • Don't store empty list/tuples in TypeInfo classes (4.5%)
  • Replace .class key with empty string (4.4%)
  • Drop builtins. prefix for common types (3.0%)
  • Don't store arg_names and arg_kinds for func defs with type info (3.0%)

After that point everything starts to drop off, though they still might be worth including.

Backwards Compatibility

I made sure to check that my changes would not break backwards compatibility. These new techniques allow for loading of both old and new caches, but of course the new cache format will be used when cached files needs to be rebuild.

Why?

I was poking around the cache folder to see why the cache was the size it was, and I noticed that there was a lot of optimizations that could be made. In theory, smaller caches are quicker to save and load, while taking up less space on the users computer. In CI systems where storage space is metered, having a smaller cache size will speed up CI workflows and use up less cloud storage.

Let me know if this is something you guys would be interested in! If so I'll start splitting this into separate PR's. Thanks!

@ikonst
Copy link
Contributor

ikonst commented Jul 21, 2023

We could also pickle it while at it, or use some other serializer more compact than JSON. We could also compress it. All sound better than introducing file size optimization tricks into individual (de)serializers.

@dosisod
Copy link
Contributor Author

dosisod commented Jul 21, 2023

Good idea, I will look into this more. Some thoughts/comments:

  • Compressing the JSON might be more effective if all the cache files where in one big JSON file instead of a bunch of smaller ones (over 1,000). I'll look into both.
  • Pickling might be larger since we only cache certain fields, but AFAIK, pickling requires serializing the whole object. Might be wrong on this, will investigate.
  • I'm not familiar with how the SQLite cache system works in Mypy: is there a special (de)serializer for SQLite? How does caching differ between the filesystem cache and the SQLite cache?

@dosisod
Copy link
Contributor Author

dosisod commented Aug 27, 2023

I finally got around to running the numbers, and we can reduce the cache size by about ~5.5x by gzipping the JSON files:

Test Elapsed Time No Cache With Cache Total Cache Size
Baseline (3240da4) 65.89 (seconds) 7.41 (seconds) 29474816 (29.4 MB)
Gzip level 0 65.57 7.41 29487104 (29.4 MB)
Gzip level 1 65.80 7.36 5980160 (6.0 MB)
Gzip level 2 66.07 7.51 5824512 (5.8 MB)
Gzip level 3 66.03 7.42 5734400 (5.7 MB)
Gzip level 4 66.00 7.46 5578752 (5.6 MB)
Gzip level 5 65.98 7.47 5439488 (5.4 MB)
Gzip level 6 65.93 7.35 5357568 (5.3 MB)
Gzip level 7 66.20 7.36 5337088 (5.3 MB)
Gzip level 8 66.56 7.44 5328896 (5.3 MB)
Gzip level 9 66.83 7.39 5328896 (5.3 MB)

The times are an average of 3 runs. This was done without compiling with Mypyc.

These compression results looks promising, especially considering that they don't really affect the runtime performance, and significantly reduce the cache size. This was a simple 4 line change to compress/decompress the JSON files using gzip instead of writing directly. If we choose to go forwards with this we would need to support detection of older, non-compressed JSON files.

I have yet to experiment with storing all the cache data in one JSON file instead of lots of small ones. In theory we could read and decompress the entire cache once upon startup, which might be faster then reading each file individually.

Also, here's a note I found regarding pickling the data that I found during my investigation: #932 (comment) . Essentially pickling is far more volatile compared to JSON. I could look into other binary serializers, but I feel that the easiest and most impactful option is to compress the JSON.

@hauntsaninja
Copy link
Collaborator

Thanks for running these experiments, I would be supportive of using gzip level 1. I don't think we'd need to support detection of older non compressed files, we already don't read caches across mypy versions.

dosisod added a commit to dosisod/mypy that referenced this issue Aug 29, 2023
@dosisod dosisod linked a pull request Aug 29, 2023 that will close this issue
@JukkaL
Copy link
Collaborator

JukkaL commented Aug 29, 2023

Reducing cache size would be useful in general. However, there are few things that may cause issues for some users.

Switching to a compressed cache would break some use cases we have at work where we manipulate mypy cache files. Our mypy runner script supports downloading cache snapshots from a remote server, and these snapshots are compressed using LZMA, which compresses better than gzip -- but LZMA is much slower when compressing, so it only makes sense if cache files are downloaded over network. Compressing previously compressed files isn't effective, so we'd probably have to implement an extra step to decompress cache files and compress them again using LZMA, and similarly after downloading.

The second use case involves a tool that reads and parses mypy cache files. We'd need to add a decompression step to the tool, which should be simple.

This was done without compiling with Mypyc.

We primarily care about performance when compiled with mypyc, so it would be important to have performance measurements with a compiled mypy. Also, performance measurements are hard to do accurately, due to many potential sources of noise. Generally an average of at least 10 runs are needed (and reporting % standard deviation would be nice), and runs with different variants should be interleaved (e.g. run "variant 1, variant 2, variant 3, variant 1, variant 2, ..." instead of "variant 1, variant 1, variant 1, variant 2, variant 2, ..."). The number of other running processes should be reduced to a minimum (e.g. close browser windows and background services that use CPU). Finally, some laptops have aggressive CPU throttling, so using a desktop computer is usually better.

@dosisod
Copy link
Contributor Author

dosisod commented Aug 30, 2023

Thanks, this is really helpful info to keep in mind. I found a misc/perf_checker.py script that does most of what you describe, though it doesn't seem to do the interleaving that you mentioned, just runs N number of consecutive trials. I also had to do some modifications to get it to work. I could open a PR to spruce up that file if you'd like.

Regardless, here's the data that it's currently giving me for 10 trials, compiled with O3 and no debug info:

Baseline (010da0b)

Testing baseline
Baseline:
  Times: [19.297168731689453, 0.4640381336212158, 0.4504966735839844, 0.4662041664123535, 0.4392967224121094, 0.45980358123779297, 0.46353912353515625, 0.4629957675933838, 0.45834779739379883, 0.4324827194213867]
  Mean:  2.3394373416900636
  Stdev: 5.958350198754421

Testing cold cache
Cold cache:
  Times: [19.375569105148315, 18.87609052658081, 19.062970638275146, 19.143561124801636, 19.825632095336914, 18.859866857528687, 18.819206476211548, 18.90439224243164, 19.012978076934814, 19.017186641693115]
  Mean:  19.089745378494264
  Stdev: 0.30623794048405684

Testing warm cache
Warm cache:
  Times: [0.4719367027282715, 0.46656227111816406, 0.45354509353637695, 0.464038610458374, 0.43805575370788574, 0.46274590492248535, 0.4289665222167969, 0.4538865089416504, 0.46797966957092285, 0.4691941738128662]
  Mean:  0.4576911211013794
  Stdev: 0.014251567798474972

This PR (936ba7d)

Testing baseline
Baseline:
  Times: [2.7825253009796143, 0.4576442241668701, 0.4734163284301758, 0.44872307777404785, 0.47516632080078125, 0.43790364265441895, 0.4450533390045166, 0.4777059555053711, 0.44003915786743164, 0.4935641288757324]
  Mean:  0.693174147605896
  Stdev: 0.7343564692142952

Testing cold cache
Cold cache:
  Times: [19.112578630447388, 19.104485273361206, 19.529428482055664, 19.12293291091919, 19.45955729484558, 19.144737243652344, 20.329124689102173, 19.072441339492798, 19.56252908706665, 19.2300808429718]
  Mean:  19.36678957939148
  Stdev: 0.3868663729927083

Testing warm cache
Warm cache:
  Times: [0.481090784072876, 0.44382429122924805, 0.4377138614654541, 0.44713258743286133, 0.46593570709228516, 0.4480412006378174, 0.4819929599761963, 0.4391357898712158, 0.47620415687561035, 0.4763009548187256]
  Mean:  0.459737229347229
  Stdev: 0.018237852583062614

I assume that this script was made when -i wasn't the default, since the baseline numbers are off compared to the other tests. For reference, this is what I modified each of the execute() calls to (pulled from runtests.py):

execute([
    "python3", "-m", "mypy"
    "--config-file", "mypy_self_check.ini",
    "-p", "mypy",
    "-p", "mypyc"
])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants