Skip to content

Conversation

@jonashaag
Copy link
Contributor

Depends on #70

Gzip profile files, reduces file size up to 97%. Backwards compatible. Increases parse duration by about 5-10%. (Note that my other PR reduces parse duration by 150-250%.)

I also tried other file size reduction techniques:

  • Use 16-bit integer for instruction pointers instead of 32-bit ones (short instead of long). Map 32-bit IPs to 16-bit IPs using an associative array when writing the profile. This works well and isn't too difficult to implement. It cuts profile file size up to 50%.
  • Count repeated IPs within a trace (recursion etc.) and replace by a (IP, count) tuple. Easy to implement, saves 10% in my benchmarks (depends on the program that is being profiled of course)

An idea I did not look into is de-duplicating whole traces, i.e. for each trace look if an equivalent trace has already been written and if so only use a pointer to that other trace.

Gzipping everything is probably the easiest and most robust solution to reducing file sizes.

Jonas Haag and others added 4 commits May 2, 2016 12:48
Read multiple words at once if we know the exact number of words to
read.

Speedup on a 60MiB profile:

PyPy:    1.24s vs 0.84s (1.48x)
CPython: 7.85s vs 3.05s (2.57x)
Force Python to re-use existing integer objects instead of allocating
multiple objects for the same integer value.

This can save a ton of memory depending on your profile. In my
benchmarks the savings were somewhere in the range of 30-70%.
This can reduce profile file size by multiple orders of magnitude. In my
benchmarks I found the compressed files to be up to 97% (!) smaller than
uncompressed ones.  The main reason for this is that traces contain the
same instruction pointers again and again; thus they are very well
suited for compression.

Old (uncompressed) profile files are still supported. In fact, the
profile file format did not change, it only got a gzip wrapper. In the
reader we simply check if a given profile is gzip-compressed using the
gzip magic bytes "0x1f 0x8b"; if it is compressed, we decompress it and
then continue parsing the now-uncompressed profile just like we parse an
uncompressed profile.
@fijal
Copy link
Member

fijal commented May 13, 2016

Two concerns: one is that travis failed, two, are we sure gzwrite is signal-safe (e.g. can't allocate memory). If it's not, then we need to introduce an async pipe that will do gzipping (or even a pre-analysis)

@jonashaag
Copy link
Contributor Author

I'll have a look at the signal-safety problem soon; my current guess is that it's not safe.

@jonashaag
Copy link
Contributor Author

While I haven't found any official documentation about whether gzwrite (or deflate, for that matter) does any allocations, I had a look at the zlib source code. It doesn't look like it does allocations; the only allocations are done during the first gzwrite call, which is always made before any traces are written.

I also print-debugged allocations (by modifying the allocation routines used by zlib in the zlib source code) which confirmed that no allocations are done.

I will check with the zlib mailing list to be 100% sure.

@fijal
Copy link
Member

fijal commented May 14, 2016

Thanks! Signal-safe is more advanced though than just allocations, so please ask about that

@methane
Copy link
Contributor

methane commented Jun 3, 2016

gzip output may affects sigprof.
I think piping to outer gzip (or lzop, lz4c, etc...) is better for accurate profiling.

@fijal
Copy link
Member

fijal commented Jun 4, 2016

Yes, I'm far more keen to pipe it out to some other program (then you can do accumulation and other tricks)

@jonashaag
Copy link
Contributor Author

Thing is that for large profiles the compression step can easily take 30s to a few minutes, which is terrible... We can make gzip optional, and you can always change the sample frequency if it it slows down your program too much or you get the impression that the profile isn't accurate enough (or accurate profiles are of high importance to you).

If you want j can also try to collect some data on the sample recording duration variance etc.

@fijal
Copy link
Member

fijal commented Jun 6, 2016

I think you misunderstood the idea - the idea is not to first finish vmprof and then run gzip, but instead run vmprof to a pipe that will be gzipping it on the fly.

@jonashaag
Copy link
Contributor Author

I see. In terms of performance this is probably a bit worse. In terms of code architecture it could be done very similar to what is done in this PR; we could simply spawn a new gzip process from within vmprof, get the pipe file descriptor and write to that (as opposed to writing to the output file directly). This doesn't require lots of changes in vmprof (only in the CLI module) and it's very convenient to use as we can integrate it with the CLI etc. Otherwise we could simply use the -o option to write to stdout and pipe that into gzip on the command line, but that wouldn't be as convenient to use from within Python because everyone would have to roll out their own gzip wiring. So I'd opt for the first option.

@fijal
Copy link
Member

fijal commented Jun 6, 2016

I agree (note that the performance can be better because you're suddenly using two cores).

@methane
Copy link
Contributor

methane commented Jun 6, 2016

Another pros on piping: pipe buffer acts as write buffer. write(2) to file is blocked easily.
When using piped gzip, python process isn't blocked until pipe buffer (64KB) is filled.

@jonashaag
Copy link
Contributor Author

I made a prototype "gzip as a subprocess" implementation. Please have a look here and give me feedback: https://github.com/jonashaag/vmprof-python/tree/gzip-subproc

@jonashaag
Copy link
Contributor Author

Bump

@fijal
Copy link
Member

fijal commented Jul 11, 2016

Yes, sorry I'll get back to you
On 11 Jul 2016 10:53 PM, "Jonas Haag" notifications@github.com wrote:

Bump


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#71 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAFPuJIOvwmb6kyeQjEmByqINIS5pEdcks5qUq05gaJpZM4Id2ES
.

@fijal
Copy link
Member

fijal commented Jul 13, 2016

Hi

I'm a bit unhappy about packing the gzip stuff into C. I can be convinced it's a good idea, but a pure python solution would make it magically work with PyPy too for example. Does not seem too hard, if you have opinions I would like to know them

@jonashaag
Copy link
Contributor Author

It's just a matter of performance, although I haven't done any benchmarking with a pure-Python solution. It shouldn't be a problem if you have multiple cores and large enough pipe buffers, so I guess a pure-Python solution would be fine as well!

@jonashaag
Copy link
Contributor Author

Let's move discussion to #88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants