Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subprocess support #791

Merged
merged 51 commits into from
Dec 27, 2018
Merged

Add subprocess support #791

merged 51 commits into from
Dec 27, 2018

Conversation

oremanj
Copy link
Member

@oremanj oremanj commented Nov 28, 2018

Subprocess support! Add trio.subprocess, an async wrapper around the stdlib subprocess module with a similar interface: Process (renamed from Popen) to create a process and interact with it at your leisure, run to run a process to completion and report the results, and reexports of all the stdlib subprocess exceptions and constants. There is a subprocess.Popen object hiding inside each trio.subprocess.Process, so all the stdlib options for starting processes in various wacky ways work unmodified. (Currently we only support unbuffered byte streams for communicating with subprocess pipes, so the universal_newlines, encoding, errors, and bufsize arguments to subprocess.Popen are not supported by the Trio version.) Instead of file objects, the stdin, stdout, and stderr attributes of a trio.subprocess.Process are Trio streams. There is no trio.subprocess.Process.communicate because the stdlib version has confusing cancellation semantics - use trio.subprocess.run or interact with the pipe streams directly.

Fixes #4. Substantially improves our IOCP support (Windows native async I/O), which is relevant to #52.

Still TODO:

  • documentation writeup
  • decide about Process vs Popen naming
  • decide about wait_child_exiting taking a stdlib subprocess.Popen vs a trio.subprocess.Process (or Popen or whatever we call it)

Adds `trio.subprocess.Process` and `trio.subprocess.run`, async wrappers for the stdlib subprocess module. No communicate() yet due to discussion on python-trio#783.

Note that this did not wind up using the code in linux_waitpid, instead running waitid() [which supports not consuming the status of the process] in a thread, in order to interoperate better with subprocess.Popen code.

Still TODO:
[ ] Windows support (need pipe implementations and testing)
[ ] documentation writeup, newsfragment
[ ] consider whether this module layout is the one we want (other option would be to move _subprocess/unix_pipes.py to just _unix_pipes.py, _subprocess/highlevel.py to _subprocess.py, and get rid of _subprocess/linux_waitpid.py)
[ ] expose other subprocess functions such as call, check_call, output (simple wrappers around run())
[ ] expose a basic communicate()?
@oremanj oremanj requested a review from njsmith November 28, 2018 00:20
@codecov
Copy link

codecov bot commented Nov 28, 2018

Codecov Report

Merging #791 into master will increase coverage by 0.22%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #791      +/-   ##
==========================================
+ Coverage   99.01%   99.23%   +0.22%     
==========================================
  Files          94      101       +7     
  Lines       11631    12182     +551     
  Branches      832      882      +50     
==========================================
+ Hits        11516    12089     +573     
+ Misses         94       72      -22     
  Partials       21       21
Impacted Files Coverage Δ
trio/hazmat.py 100% <ø> (ø) ⬆️
trio/_unix_pipes.py 100% <100%> (ø)
trio/_core/tests/test_windows.py 100% <100%> (ø) ⬆️
trio/tests/test_unix_pipes.py 100% <100%> (ø)
trio/_subprocess_platform/__init__.py 100% <100%> (ø)
trio/_core/_io_windows.py 91.59% <100%> (+13.43%) ⬆️
trio/_subprocess_platform/waitid.py 100% <100%> (ø)
trio/_subprocess_platform/windows.py 100% <100%> (ø)
trio/_subprocess.py 100% <100%> (ø)
trio/__init__.py 100% <100%> (ø) ⬆️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a03bee...8ec3a6c. Read the comment docs.

@njsmith
Copy link
Member

njsmith commented Dec 3, 2018

First: thank you, this is amazing work. I'm super excited!

This isn't a complete review (I'm still chewing on the high-level API design issues), but rather than hold things up I'll write down some of the simpler comments :-)

  1. That waitid trick is slick! I didn't know WNOWAIT existed. But, there is an issue: the way it's implemented here leaks a thread every time a call to wait is cancelled. cancellable=True doesn't actually make blocking calls cancellable – it just tells run_sync_in_worker_thread to abandon the thread to keep running if it's cancelled. (Basically, imagine what happens if someone keeps calling wait and then cancelling it on the same Process object, we'll end up with lots of redundant threads waiting for the same subprocess.) That's why linux_waitpid.py has the stuff involving spawn_system_task. You're right that we don't necessarily want to use it directly (with your approach we don't need the .outcome attribute, and the waitpid function was always for testing rather than something we could use directly), but we want to follow the basic pattern of spawning a system task once per process, either when it's spawned or on the first call to wait, and then future calls to wait just block on the Event.

  2. Currently we only depend on cffi on Windows, so it's not actually available on Linux/macOS. Also, your waitid wrapper doesn't handle EAGAIN. We could add a cffi dependency when on PyPy, and fix the EAGAIN thing, but... I'm kind of leaning towards, let's just document that subprocesses aren't supported on PyPy until they fix their bugs? We do try to support PyPy in general, but when they're just missing a crucial feature then I don't know that it's worth spending a lot of time on working around it, esp. since we know we're going to throw that out again later... and we have plenty of other interesting things to spend time on, like Windows pipe support :-).

  3. I expect we'll go around a few times on the high-level API details, as we figure out how to get them to fit into Trio best... but one easy thing to say now is, let's drop the check_call, call, check_output functions entirely, they're effectively obsolete since run was added, and only kept for legacy support.

  4. in test_kill_when_context_cancelled, I'm confused at why we need to sleep. Surely the process creation happens-before the return from the Process constructor? On Unix the exec might not have happened, but the fork will have completed, the pid is valid, etc. Or am I missing something?

  5. Process should implement the AsyncResource interface. Meaning: (a) inherit from it, (b) rename the current __aexit__ to aclose, (c) drop __aenter__, b/c the inherited version works.

  6. I feel like Process.aclose should probably clean up all the process-related resources? So: kill the process (like it does now), close stdin (like it does now), and also close stdout/stderr (which it doesn't do now)? This might also help in some obscure corner cases where the process deadlocks trying to write to stdout/stderr, but we're not listening because we're sitting in the call to self.wait()...

  7. I was going to say that the way the windows code calls OpenProcess on every call to wait was dangerous, because the pid could be reallocated, so we should hold a single handle open the whole time. But, actually it is safe I think, because this blog post says that so long as any handle remains open the pid won't be reallocated, and AFAICT subprocess.Popen does hold a handle open for the lifetime of the object. But... this is super subtle, so at the least we should have some kind of comment about it (including a link to that blog post)... and maybe consider using self._proc._handle instead of calling OpenProcess, even though it's icky to reach into a stdlib hidden attribute like that.

  8. In the case noted in a comment, where macOS kqueue says that the process has exited, but waitpid doesn't (WTF), would it be simpler to just do a blocking waitpid? If we're simply waiting for the kernel to finish updating some in-memory data structures then that means a "blocking" waitpid is actually a quick CPU-bound operation, I think?

- _subprocess/highlevel.py becomes _subprocess.py
- _subprocess/unix_pipes.py becomes _unix_pipes.py
- _subprocess/linux_waitpid.py removed (obsoleted by use of `waitid`)
@oremanj
Copy link
Member Author

oremanj commented Dec 13, 2018

  1. Good point! As committed, linux_waitpid.py doesn't deduplicate calls per process either, so I figured someone had thought about that and decided we didn't care.

  2. http://doc.pypy.org/en/latest/extending.html claims that PyPy includes CFFI by default. I don't mind removing the support if you think it's going to create too much of a maintenance burden, though.

  3. Ok! I do find subprocess.check_output(...) to be a substantially simpler way to write a common operation than subprocess.run(..., stdout=subprocess.PIPE, check=True).output, which is why I originally included these methods, but it's easy to write a local helper for without cluttering our public API.

  4. I'm... really not sure what I was thinking there. :P Will simplify.

5-6. Makes sense, will make these changes.

  1. Thanks for the link - I didn't even realize this was something to worry about! If we expose wait_reapable publicly in some form (which is the direction I'm leaning - current thought is an IO manager method wait_process_exiting) I think it will want to be expressed in terms of the PID on all platforms, so my inclination is to go with the comment rather than accessing _handle.

  2. My thought here was that if our model of what's possible is wrong, and there is some rare circumstance where we can get a kevent NOTE_EXIT for a process that's nowhere near exiting, it's better to use a lot of CPU than to hang. Happy to go either way on this one.

@oremanj
Copy link
Member Author

oremanj commented Dec 13, 2018

Never mind - wait_process_exiting can't be in the IO manager because it uses stuff from outside _core. But I still think it would be a useful public API.

Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick skim and made one comment, but there are several test failures and strange coverage results currently, so I figure I'll let you clear that up before I do a close read :-).

I'm wary about adding wait_process_exiting as a generic API... the native wait APIs are pretty quirky. On Windows users should be encouraged to use a process handle, not a PID, to avoid races. Windows and kqueue platforms let you wait on arbitrary processes, but waitid only works on children. Unix can wait for stop/continue, not just exiting, but Windows doesn't have anything like that. With any luck, Linux will eventually make it possible to get an fd referring to a child process and then use that to get death notifications, instead of using a PID. (So then it would work like Windows, basically. BTW, FreeBSD also has process descriptors, but only for child processes, and macOS doesn't support it at all...)

If all we expose is Process.wait, then it's easy to hide all this complexity from users. If we expose wait_process_exiting, we have to figure out which bits to abstract away and how...

My thought here was that if our model of what's possible is wrong, and there is some rare circumstance where we can get a kevent NOTE_EXIT for a process that's nowhere near exiting, it's better to use a lot of CPU than to hang. Happy to go either way on this one.

NOTE_EXIT actually includes the exit status in the notification, so it seems unlikely that the kernel could send a spurious note for a process that hasn't actually exited. Looking at macOS kern_exit.c and FreeBSD kern_exit.c, they both seem to do some bookkeeping immediately after sending NOTE_EXIT, so I guess it's just a little race waiting for this to happens.

I guess under really bizarre circumstances we could have:

  1. child process exits
  2. trio gets NOTE_EXIT
  3. some other Evil Code that's running in our process calls wait to reap the child (since only our process can reap the child)
  4. Evil Code in our process spawns another child, which happens to be assigned the same PID (since only a process we spawned is a valid argument to waitpid)
  5. trio calls waitpid, and it hangs waiting for the new child

This seems pretty unlikely to me. And I don't think there's really any correct way to handle this situation anyway... the Unix subprocess API has a deeply baked-in assumption that the parent will be careful to call waitpid exactly once on each child, no more, no less. The only real solution is for the user to remove the Evil Code.

trio/_platform/waitid.py Outdated Show resolved Hide resolved
@oremanj
Copy link
Member Author

oremanj commented Dec 15, 2018

I believe I've addressed everything that was outstanding except for docs and a few remaining corners of coverage. I haven't tested the Windows support with anything more strenuous than the unit tests, but it seems to be holding up well enough so far...

@ncoghlan
Copy link

ncoghlan commented Dec 21, 2018

We went with capture_output over capture_stdout and capture_stderr because we wanted the calling application to have complete control over the output display, and didn't want to add 3 new options when one would do (if you only want to capture stdout, just pass stdout=PIPE, the same as you always have).

However, if you wanted to tweak the trio variant of the API on the basis of "Errors should never pass silently", then my recommendation would be to:

  1. Make check=True the default, so check=False is explicitly opt-in (doing that in the subprocess module would have been overly inconsistent with the other APIs in that module, but trio doesn't have that problem)

  2. Change CompletedProcess.check_returncode() to print a captured stderr to sys.stderr in the current process by default, with an explicit print_captured_stderr=False opt-out on that method

Edit to clarify: the check=True behaviour would be the default behaviour of check_returncode(print_captured_stderr=True). If you didn't want the subprocess stderr printed, then you'd need to use check=False in the original call.

@takluyver
Copy link
Contributor

takluyver commented Dec 21, 2018 via email

@oremanj
Copy link
Member Author

oremanj commented Dec 21, 2018

I removed test_send_on_closed_pipe and the Unix test_close_during_write. The Windows one was needed to resolve a coverage gap, since Windows doesn't support wait_send_all_might_not_block.

As far as I can tell, check_one_way_stream never receives more than one byte at a time -- its do_receive_some helper even has the argument to the underlying receive_some hardcoded at 1 (probably by mistake). So I'm keeping the bulk transfer tests for now.

@oremanj
Copy link
Member Author

oremanj commented Dec 21, 2018

The change I just uploaded doesn't include the run API changes; I agree with the commentary so far, and want to think a bit about how to make the best interface if we're not slavishly following what subprocess does anymore.

@njsmith
Copy link
Member

njsmith commented Dec 21, 2018

@takluyver

I don't remember any particular design reason for making the default check=False.

Yeah, I read through the original thread and didn't see any discussion about this either way. And of course you couldn't have done a corpus survey of how people were using the API you hadn't invented yet :-).

I do still occasionally like the flexibility of connecting std streams directly to a file descriptor

Right, I think it makes total sense to pass an explicit file descriptor, or DEVNULL, or stderr=STDOUT. To me those fit in well with the general philosophy of "here's **kwargs, for any long-tail quirky requirements". It's specifically the PIPE case that I find weird, since it's exposing an implementation detail. I bet there are lots of users who don't know that output capturing involves pipes, or even what a pipe is.

@ncoghlan

Edit to clarify: the check=True behaviour would be the default behaviour of check_returncode(print_captured_stderr=True).

Hmmmm interesting idea. Need to ponder that some more.

@takluyver
Copy link
Contributor

takluyver commented Dec 22, 2018 via email

@njsmith
Copy link
Member

njsmith commented Dec 22, 2018

@oremanj perhaps we should split out run into a separate PR, so that the discussion doesn't block the rest of this from landing?

Note for records: significant discussion about this in chat, starting here: https://gitter.im/python-trio/general?at=5c1d6718babbc178b2bb1235
The biggest new thing brought up is that subprocess's handling of shell= and string-vs-list commands is also pretty confusing.

@njsmith
Copy link
Member

njsmith commented Dec 22, 2018

As far as I can tell, check_one_way_stream never receives more than one byte at a time -- its do_receive_some helper even has the argument to the underlying receive_some hardcoded at 1 (probably by mistake). So I'm keeping the bulk transfer tests for now.

Okay, I split that off into #813.

@oremanj
Copy link
Member Author

oremanj commented Dec 23, 2018

OK, I fixed the silly Windows test bug and I think this is ready modulo API tweaking.

I'm a little reluctant to provide subprocess support without run, since I think almost anyone who tries to use it will wind up implementing something akin to run, and it's easier to get the details right in one place. On the other hand, I understand that we don't want to provide an interface that we expect to change in the near future.

I suggested a little ways upthread that maybe we could keep the code in this PR in trio.subprocess as a sort of "lower-level" interface, akin to trio.socket vs the higher-level streams code, and come back and write the higher-level interface later. @njsmith, what do you think of that possibility?

Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments, but otherwise yeah, I agree this is ready to go except for run.

I'm not a big fan of having both trio.run_process and trio.subprocess.run, that do almost the same thing with slightly different APIs. Process is the low-level API like trio.socket.

Let's split it into a separate PR. It doesn't mean we aren't going to provide a high-level API; it means we can have a focused discussion on getting that API right, without being distracted by stuff like details of how IOCP works, and without holding up the rest of this.

finally:
# Run loop is exited without unwinding running tasks, so
# we don't get here until the main() coroutine is GC'ed
assert left_run_yet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert unfortunately doesn't accomplish much, since it's called from inside a __del__ method, so exceptions get converted into text-printed-on-stderr. And then pytest will hide that text. So even if it fails, we're unlikely to notice...

Copy link
Member Author

@oremanj oremanj Dec 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if it is in fact called from inside __del__, left_run_yet should be True, and we're fine regardless. But if the task gets unwound normally, we won't be inside a __del__, and the assert will fire.

# Suppress the Nursery.__del__ assertion about dangling children,
# for both the nursery we created and the system nursery
nursery._children.clear()
nursery.parent_task.parent_nursery._children.clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is just hiding some messages printed by __del__, I think it'd be simpler to let them be printed, and let pytest hide them?

We should use gc_collect_harder() though to make sure that they do get printed during this test, so pytest will attach them to the right place.

@oremanj
Copy link
Member Author

oremanj commented Dec 23, 2018

Excising run() from this PR is not totally trivial, since there are tests that use it which are the sole source of coverage for some things that aren't in run(), and the docs need to be restructured without it. I'm home for the holidays at the moment but will try to get this done soonish.

@ncoghlan
Copy link

To minimise refactoring without committing to a public 'run()' API yet, you could call the interim internal-only API '_run()'

@oremanj
Copy link
Member Author

oremanj commented Dec 26, 2018

run() removed - @njsmith, does this look good for landing?

@njsmith
Copy link
Member

njsmith commented Dec 27, 2018

🎉

Now to open all the follow-up tickets... :-)

@njsmith
Copy link
Member

njsmith commented Dec 28, 2018

Followup issues:

#661 is also relevant (a subtle bug in Pipe*Stream)

@pringshia
Copy link

Just curious - are the docs on this up to date or is that still TBD? Thanks for all the work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subprocess support
5 participants