Skip to content

Conversation

@jonashaag
Copy link
Contributor

@jonashaag jonashaag commented Jul 14, 2016

@fijal @methane Here we go!

This can reduce profile file size by multiple orders of magnitude. In my
benchmarks I found the compressed files to be up to 97% (!) smaller than
uncompressed ones.  The main reason for this is that traces contain the
same instruction pointers again and again; thus they are very well
suited for compression.

Old (uncompressed) profile files are still supported. In fact, the
profile file format did not change, it only got a gzip wrapper. In the
reader we simply check if a given profile is gzip-compressed using the
gzip magic bytes "0x1f 0x8b"; if it is compressed, we decompress it and
then continue parsing the now-uncompressed profile just like we parse an
uncompressed profile.
close(out_fd);
/* Try system gzip */
execlp("gzip", "gzip", NULL);
perror("gzip");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to improve this code here; how can I know if there's a executable gzip file somewhere in PATH without executing it?

@jonashaag
Copy link
Contributor Author

OK, I reimplemented the gzip part entirely in the Python frontend. I decided to put it into the enable/disable methods, so that it's separated from the Profiler component. One oddity is that we have to keep a list of gzip processes internally, so that we can wait for their termination when disable is called. Since vmprof can run only once per interpreter at the same time, it's safe to terminate all gzip child processes in that place.

@jonashaag jonashaag mentioned this pull request Jul 14, 2016
else:
gzip_cmd = ["python", "-m", "gzip"]
proc = subprocess.Popen(gzip_cmd, stdin=subprocess.PIPE,
stdout=fileno, bufsize=-1, close_fds=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry. close_fds is idiomatic, but will not work on Windows.
See https://docs.python.org/2.7/library/subprocess.html#subprocess.Popen

Maybe, close_fds=(sys.platform != 'win32') is ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. To be honest, I simply copied that from your code when I could not figure out how to properly set up the child process with Python (it never shut down, probably due to the fds being still open). I have given you push access to my fork. If you have the time you can fix it yourself as I won't be getting to it for a few days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed to your branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see any changes?

@planrich
Copy link
Contributor

Hi, first: Thanks for your contribution. I left two comments how we could improve it.
What about some tests? As time goes by, often there are some critical issues overlooked.

@planrich
Copy link
Contributor

I'm thinking about to have some tests that starts python + vmprof and then waits on a signal. in the test you can then verify that the gzip process has been spawned.

@jonashaag
Copy link
Contributor Author

What would that be useful for? Isn't the fact that gzip is spawned as a subprocess an implementation detail and not part of the interface? We already check that the resulting profile is gzipped and that we can read a gzipped profile.

@planrich
Copy link
Contributor

Here is my concern: If we do not test that this impl. detail (spawning of gzip process) works correctly this will at some point break. That is a very common thing and if this happens, it makes users very unhappy.

Multiprocessing seems to be easy, but it is really not. There are some questions like: did the process start? Was the process killed? Do we handle such scenarios?

@methane
Copy link
Contributor

methane commented Jul 19, 2016

I have afraid same to @planrich. That's why I prefer gzip as optional (see #89).
@fijal How do you think?

@fijal
Copy link
Member

fijal commented Jul 19, 2016

I think leaving it on is fine, but we need to make sure we test it properly (e.g. if gzip explodes because we run out of disk space what happens? do we have a hanging process or do we get a clear error message etc.)

@jonashaag
Copy link
Contributor Author

I don't necessarily disagree with having tests for this. But are these kinds of circumstances even tested with the current vmprof master? Out of disk space, out of memory?

Maybe we can add a simply test that kills the gzip subprocess and make sure we handle that case gracefully.

@planrich
Copy link
Contributor

@jonashaag having such a test (e.g. killing gzip sub process) would really be a good start.

@fijal
Copy link
Member

fijal commented Jul 19, 2016

Yes, that's what I meant (gzip exiting)

On 19 Jul 2016 1:17 PM, "Jonas Haag" notifications@github.com wrote:

I don't necessarily disagree with having tests for this. But are these
kinds of circumstances even tested with the current vmprof master? Out of
disk space, out of memory?

Maybe we can add a simply test that kills the gzip subprocess and make
sure we handle that case gracefully.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#90 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFPuEITryMNR9ZpHGdD5DcWrBpoVOzOks5qXLI7gaJpZM4JMScu
.

@jonashaag
Copy link
Contributor Author

Ok will do!

@jonashaag
Copy link
Contributor Author

I committed the code improvements as suggested.

@planrich
Copy link
Contributor

We decided that we should give it a shot in a pre release version.
Can't merge it right now (I commited some changes recently). Can you resolve the conflicts?

Another good improvement would be to upload the gzipped profile to vmprof.com instead of decoding it to json. That should be easy, because it is already done for the jitlog (would require some changes to the service as well).

@jonashaag
Copy link
Contributor Author

Cool! I'll resolve the conflicts.

Regarding the server changes: We have implemented our own clone of the vmprof server that also supports memory profiles (and more). I'll introduce that to you guys in a few days. I'll wait for feedback on that before doing changes to the official server.

@jonashaag
Copy link
Contributor Author

Here we go. I added two unrelated changes. The missing memory warning on PyPy looks like an oversight to me, also the fact that the error message for PyPy < 4.1.0 is printed and thrown?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants