-
Notifications
You must be signed in to change notification settings - Fork 56
Gzip subproc 2 #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gzip subproc 2 #90
Conversation
This can reduce profile file size by multiple orders of magnitude. In my benchmarks I found the compressed files to be up to 97% (!) smaller than uncompressed ones. The main reason for this is that traces contain the same instruction pointers again and again; thus they are very well suited for compression. Old (uncompressed) profile files are still supported. In fact, the profile file format did not change, it only got a gzip wrapper. In the reader we simply check if a given profile is gzip-compressed using the gzip magic bytes "0x1f 0x8b"; if it is compressed, we decompress it and then continue parsing the now-uncompressed profile just like we parse an uncompressed profile.
src/vmprof_common.h
Outdated
| close(out_fd); | ||
| /* Try system gzip */ | ||
| execlp("gzip", "gzip", NULL); | ||
| perror("gzip"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to improve this code here; how can I know if there's a executable gzip file somewhere in PATH without executing it?
|
OK, I reimplemented the gzip part entirely in the Python frontend. I decided to put it into the |
vmprof/__init__.py
Outdated
| else: | ||
| gzip_cmd = ["python", "-m", "gzip"] | ||
| proc = subprocess.Popen(gzip_cmd, stdin=subprocess.PIPE, | ||
| stdout=fileno, bufsize=-1, close_fds=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry. close_fds is idiomatic, but will not work on Windows.
See https://docs.python.org/2.7/library/subprocess.html#subprocess.Popen
Maybe, close_fds=(sys.platform != 'win32') is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. To be honest, I simply copied that from your code when I could not figure out how to properly set up the child process with Python (it never shut down, probably due to the fds being still open). I have given you push access to my fork. If you have the time you can fix it yourself as I won't be getting to it for a few days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed to your branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see any changes?
|
Hi, first: Thanks for your contribution. I left two comments how we could improve it. |
|
I'm thinking about to have some tests that starts python + vmprof and then waits on a signal. in the test you can then verify that the gzip process has been spawned. |
|
What would that be useful for? Isn't the fact that gzip is spawned as a subprocess an implementation detail and not part of the interface? We already check that the resulting profile is gzipped and that we can read a gzipped profile. |
|
Here is my concern: If we do not test that this impl. detail (spawning of gzip process) works correctly this will at some point break. That is a very common thing and if this happens, it makes users very unhappy. Multiprocessing seems to be easy, but it is really not. There are some questions like: did the process start? Was the process killed? Do we handle such scenarios? |
|
I think leaving it on is fine, but we need to make sure we test it properly (e.g. if gzip explodes because we run out of disk space what happens? do we have a hanging process or do we get a clear error message etc.) |
|
I don't necessarily disagree with having tests for this. But are these kinds of circumstances even tested with the current vmprof master? Out of disk space, out of memory? Maybe we can add a simply test that kills the gzip subprocess and make sure we handle that case gracefully. |
|
@jonashaag having such a test (e.g. killing gzip sub process) would really be a good start. |
|
Yes, that's what I meant (gzip exiting) On 19 Jul 2016 1:17 PM, "Jonas Haag" notifications@github.com wrote:
|
|
Ok will do! |
|
I committed the code improvements as suggested. |
|
We decided that we should give it a shot in a pre release version. Another good improvement would be to upload the gzipped profile to vmprof.com instead of decoding it to json. That should be easy, because it is already done for the jitlog (would require some changes to the service as well). |
|
Cool! I'll resolve the conflicts. Regarding the server changes: We have implemented our own clone of the vmprof server that also supports memory profiles (and more). I'll introduce that to you guys in a few days. I'll wait for feedback on that before doing changes to the official server. |
|
Here we go. I added two unrelated changes. The missing memory warning on PyPy looks like an oversight to me, also the fact that the error message for PyPy < 4.1.0 is printed and thrown? |
@fijal @methane Here we go!