Gzip subproc 2 #90

jonashaag · 2016-07-14T10:22:50Z

This can reduce profile file size by multiple orders of magnitude. In my benchmarks I found the compressed files to be up to 97% (!) smaller than uncompressed ones. The main reason for this is that traces contain the same instruction pointers again and again; thus they are very well suited for compression. Old (uncompressed) profile files are still supported. In fact, the profile file format did not change, it only got a gzip wrapper. In the reader we simply check if a given profile is gzip-compressed using the gzip magic bytes "0x1f 0x8b"; if it is compressed, we decompress it and then continue parsing the now-uncompressed profile just like we parse an uncompressed profile.

jonashaag · 2016-07-14T10:24:19Z

src/vmprof_common.h

+        close(out_fd);
+        /* Try system gzip */
+        execlp("gzip", "gzip", NULL);
+        perror("gzip");


Not sure how to improve this code here; how can I know if there's a executable gzip file somewhere in PATH without executing it?

jonashaag · 2016-07-14T17:34:06Z

OK, I reimplemented the gzip part entirely in the Python frontend. I decided to put it into the enable/disable methods, so that it's separated from the Profiler component. One oddity is that we have to keep a list of gzip processes internally, so that we can wait for their termination when disable is called. Since vmprof can run only once per interpreter at the same time, it's safe to terminate all gzip child processes in that place.

methane · 2016-07-15T02:20:05Z

vmprof/__init__.py

+    else:
+        gzip_cmd = ["python", "-m", "gzip"]
+    proc = subprocess.Popen(gzip_cmd, stdin=subprocess.PIPE,
+                            stdout=fileno, bufsize=-1, close_fds=True)


I'm sorry. close_fds is idiomatic, but will not work on Windows.
See https://docs.python.org/2.7/library/subprocess.html#subprocess.Popen

Maybe, close_fds=(sys.platform != 'win32') is ok.

Thanks. To be honest, I simply copied that from your code when I could not figure out how to properly set up the child process with Python (it never shut down, probably due to the fds being still open). I have given you push access to my fork. If you have the time you can fix it yourself as I won't be getting to it for a few days.

I've pushed to your branch.

Don't see any changes?

planrich · 2016-07-19T07:58:01Z

Hi, first: Thanks for your contribution. I left two comments how we could improve it.
What about some tests? As time goes by, often there are some critical issues overlooked.

planrich · 2016-07-19T08:04:59Z

I'm thinking about to have some tests that starts python + vmprof and then waits on a signal. in the test you can then verify that the gzip process has been spawned.

jonashaag · 2016-07-19T08:19:53Z

What would that be useful for? Isn't the fact that gzip is spawned as a subprocess an implementation detail and not part of the interface? We already check that the resulting profile is gzipped and that we can read a gzipped profile.

planrich · 2016-07-19T09:17:05Z

Here is my concern: If we do not test that this impl. detail (spawning of gzip process) works correctly this will at some point break. That is a very common thing and if this happens, it makes users very unhappy.

Multiprocessing seems to be easy, but it is really not. There are some questions like: did the process start? Was the process killed? Do we handle such scenarios?

methane · 2016-07-19T09:27:10Z

I have afraid same to @planrich. That's why I prefer gzip as optional (see #89).
@fijal How do you think?

fijal · 2016-07-19T09:32:31Z

I think leaving it on is fine, but we need to make sure we test it properly (e.g. if gzip explodes because we run out of disk space what happens? do we have a hanging process or do we get a clear error message etc.)

jonashaag · 2016-07-19T11:17:14Z

I don't necessarily disagree with having tests for this. But are these kinds of circumstances even tested with the current vmprof master? Out of disk space, out of memory?

Maybe we can add a simply test that kills the gzip subprocess and make sure we handle that case gracefully.

planrich · 2016-07-19T11:45:34Z

@jonashaag having such a test (e.g. killing gzip sub process) would really be a good start.

fijal · 2016-07-19T11:47:24Z

Yes, that's what I meant (gzip exiting)

On 19 Jul 2016 1:17 PM, "Jonas Haag" notifications@github.com wrote:

I don't necessarily disagree with having tests for this. But are these
kinds of circumstances even tested with the current vmprof master? Out of
disk space, out of memory?

Maybe we can add a simply test that kills the gzip subprocess and make
sure we handle that case gracefully.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#90 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFPuEITryMNR9ZpHGdD5DcWrBpoVOzOks5qXLI7gaJpZM4JMScu
.

jonashaag · 2016-07-19T14:38:07Z

Ok will do!

jonashaag · 2016-07-25T21:21:57Z

I committed the code improvements as suggested.

planrich · 2016-07-27T10:11:24Z

We decided that we should give it a shot in a pre release version.
Can't merge it right now (I commited some changes recently). Can you resolve the conflicts?

Another good improvement would be to upload the gzipped profile to vmprof.com instead of decoding it to json. That should be easy, because it is already done for the jitlog (would require some changes to the service as well).

jonashaag · 2016-07-27T16:17:06Z

Cool! I'll resolve the conflicts.

Regarding the server changes: We have implemented our own clone of the vmprof server that also supports memory profiles (and more). I'll introduce that to you guys in a few days. I'll wait for feedback on that before doing changes to the official server.

jonashaag · 2016-07-27T21:04:16Z

Here we go. I added two unrelated changes. The missing memory warning on PyPy looks like an oversight to me, also the fact that the error message for PyPy < 4.1.0 is printed and thrown?

jonashaag added 3 commits July 14, 2016 11:53

Wait for child gizp process to terminate

7c997b1

Add py35 to tox.ini

891d532

jonashaag reviewed Jul 14, 2016
View reviewed changes

Reimplement gzip process spawning in Python code

93eed13

jonashaag mentioned this pull request Jul 14, 2016

Gzip subproc #88

Closed

methane reviewed Jul 15, 2016
View reviewed changes

jonashaag added 2 commits July 20, 2016 10:12

Minor code improvements

fbd4e8a

Gracefully handle gzip errors (broken pipes etc.)

46eac22

jonashaag added 3 commits July 27, 2016 19:17

Merge branch 'master'

7365f17

Fix tox.ini

2087fd3

Add memory warning on PyPy

b4273d5

planrich merged commit 8adf845 into vmprof:master Jul 28, 2016

jonashaag mentioned this pull request Aug 4, 2016

Discussion: Integrate our fork vmprof/vmprof-server#11

Open

Kojoley mentioned this pull request Sep 25, 2016

Recent versions do not work on Windows #103

Closed

Gzip subproc 2 #90

Gzip subproc 2 #90

Uh oh!

Conversation

jonashaag commented Jul 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonashaag Jul 14, 2016

Choose a reason for hiding this comment

Uh oh!

jonashaag commented Jul 14, 2016

Uh oh!

methane Jul 15, 2016

Choose a reason for hiding this comment

Uh oh!

jonashaag Jul 15, 2016

Choose a reason for hiding this comment

Uh oh!

methane Jul 19, 2016

Choose a reason for hiding this comment

Uh oh!

jonashaag Jul 19, 2016

Choose a reason for hiding this comment

Uh oh!

planrich commented Jul 19, 2016

Uh oh!

planrich commented Jul 19, 2016

Uh oh!

jonashaag commented Jul 19, 2016

Uh oh!

planrich commented Jul 19, 2016

Uh oh!

methane commented Jul 19, 2016

Uh oh!

fijal commented Jul 19, 2016

Uh oh!

jonashaag commented Jul 19, 2016

Uh oh!

planrich commented Jul 19, 2016

Uh oh!

fijal commented Jul 19, 2016

Uh oh!

jonashaag commented Jul 19, 2016

Uh oh!

jonashaag commented Jul 25, 2016

Uh oh!

planrich commented Jul 27, 2016

Uh oh!

jonashaag commented Jul 27, 2016

Uh oh!

jonashaag commented Jul 27, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jonashaag commented Jul 14, 2016 •

edited

Loading