-
Notifications
You must be signed in to change notification settings - Fork 56
Gzip profile files #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Read multiple words at once if we know the exact number of words to read. Speedup on a 60MiB profile: PyPy: 1.24s vs 0.84s (1.48x) CPython: 7.85s vs 3.05s (2.57x)
Force Python to re-use existing integer objects instead of allocating multiple objects for the same integer value. This can save a ton of memory depending on your profile. In my benchmarks the savings were somewhere in the range of 30-70%.
This can reduce profile file size by multiple orders of magnitude. In my benchmarks I found the compressed files to be up to 97% (!) smaller than uncompressed ones. The main reason for this is that traces contain the same instruction pointers again and again; thus they are very well suited for compression. Old (uncompressed) profile files are still supported. In fact, the profile file format did not change, it only got a gzip wrapper. In the reader we simply check if a given profile is gzip-compressed using the gzip magic bytes "0x1f 0x8b"; if it is compressed, we decompress it and then continue parsing the now-uncompressed profile just like we parse an uncompressed profile.
|
Two concerns: one is that travis failed, two, are we sure gzwrite is signal-safe (e.g. can't allocate memory). If it's not, then we need to introduce an async pipe that will do gzipping (or even a pre-analysis) |
|
I'll have a look at the signal-safety problem soon; my current guess is that it's not safe. |
|
While I haven't found any official documentation about whether I also print-debugged allocations (by modifying the allocation routines used by zlib in the zlib source code) which confirmed that no allocations are done. I will check with the zlib mailing list to be 100% sure. |
|
Thanks! Signal-safe is more advanced though than just allocations, so please ask about that |
|
gzip output may affects sigprof. |
|
Yes, I'm far more keen to pipe it out to some other program (then you can do accumulation and other tricks) |
|
Thing is that for large profiles the compression step can easily take 30s to a few minutes, which is terrible... We can make gzip optional, and you can always change the sample frequency if it it slows down your program too much or you get the impression that the profile isn't accurate enough (or accurate profiles are of high importance to you). If you want j can also try to collect some data on the sample recording duration variance etc. |
|
I think you misunderstood the idea - the idea is not to first finish vmprof and then run gzip, but instead run vmprof to a pipe that will be gzipping it on the fly. |
|
I see. In terms of performance this is probably a bit worse. In terms of code architecture it could be done very similar to what is done in this PR; we could simply spawn a new gzip process from within vmprof, get the pipe file descriptor and write to that (as opposed to writing to the output file directly). This doesn't require lots of changes in vmprof (only in the CLI module) and it's very convenient to use as we can integrate it with the CLI etc. Otherwise we could simply use the |
|
I agree (note that the performance can be better because you're suddenly using two cores). |
|
Another pros on piping: pipe buffer acts as write buffer. |
|
I made a prototype "gzip as a subprocess" implementation. Please have a look here and give me feedback: https://github.com/jonashaag/vmprof-python/tree/gzip-subproc |
|
Bump |
|
Yes, sorry I'll get back to you
|
|
Hi I'm a bit unhappy about packing the gzip stuff into C. I can be convinced it's a good idea, but a pure python solution would make it magically work with PyPy too for example. Does not seem too hard, if you have opinions I would like to know them |
|
It's just a matter of performance, although I haven't done any benchmarking with a pure-Python solution. It shouldn't be a problem if you have multiple cores and large enough pipe buffers, so I guess a pure-Python solution would be fine as well! |
|
Let's move discussion to #88 |
Depends on #70
Gzip profile files, reduces file size up to 97%. Backwards compatible. Increases parse duration by about 5-10%. (Note that my other PR reduces parse duration by 150-250%.)
I also tried other file size reduction techniques:
shortinstead oflong). Map 32-bit IPs to 16-bit IPs using an associative array when writing the profile. This works well and isn't too difficult to implement. It cuts profile file size up to 50%.(IP, count)tuple. Easy to implement, saves 10% in my benchmarks (depends on the program that is being profiled of course)An idea I did not look into is de-duplicating whole traces, i.e. for each trace look if an equivalent trace has already been written and if so only use a pointer to that other trace.
Gzipping everything is probably the easiest and most robust solution to reducing file sizes.