Don't setpgid() on actors #3347

ericl · 2018-11-18T04:54:11Z

What do these changes do?

Don't call setpgid() on actors (introduced in #3297)
pros: Ctrl-C won't leak actor processes
cons: actors can more easily leak subprocesses (but you can manage these via atexit hooks)

Related issue number

Closes #3345

ericl · 2018-11-18T04:57:48Z

python/ray/worker.py

@@ -976,7 +972,7 @@ def _wait_for_and_process_task(self, task):
            driver_id, function_id.id()) == execution_info.max_calls)
        if reached_max_executions:
            self.local_scheduler_client.disconnect()
-            os._exit(0)
+            sys.exit(0)


@robertnishihara this doesn't have to be an os._exit right? That prevents exit hooks from running.

It doesn't need to be anything in particular. Just needs to exit the process.

ericl · 2018-11-18T04:58:39Z

python/ray/rllib/test/test_env_with_subprocess.py

@@ -29,12 +29,13 @@ def __init__(self, config):
        self.action_space = Discrete(2)
        self.observation_space = Discrete(2)
        # Subprocess that should be cleaned up
-        self.subproc = subprocess.Popen(UNIQUE_CMD, shell=True)
+        self.subproc = subprocess.Popen(UNIQUE_CMD.split(" "), shell=False)


Can't launch in a shell now, since we don't have process group containment. Launching it in a shell causes the process to escape.

AmplabJenkins · 2018-11-18T05:35:44Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9430/
Test FAILed.

AmplabJenkins · 2018-11-18T06:29:13Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9429/
Test FAILed.

robertnishihara · 2018-11-18T07:54:03Z

You seem to be reverting the code change from #3297 but leaving the test in place, is that right? What's the point of the test if we revert the pgid change?

ericl · 2018-11-18T08:32:43Z

We're still testing the atexit hooks, just not the auto pgroup killing. Hence the addition of the kill line to the test. This is also only reverting a portion of that PR. The important thing is to not os._exit() which bypasses exit hooks.

…

On Sat, Nov 17, 2018, 11:54 PM Robert Nishihara ***@***.***> wrote: You seem to be reverting the code change from #3297 <#3297> but leaving the test in place, is that right? What's the point of the test if we revert the pgid change? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3347 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA6SrhCYmvBnBud2PwT2Q21im8lURnuks5uwRIngaJpZM4YnyAs> .

AmplabJenkins · 2018-11-18T23:38:41Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9431/
Test FAILed.

dont setpgid

04fbb0f

ericl mentioned this pull request Nov 18, 2018

KeyboardInterrupt: Fatal Error when using tune #3345

Closed

also sys.exit

a57db20

ericl commented Nov 18, 2018

View reviewed changes

fmt

6b8ec11

ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Nov 19, 2018

robertnishihara approved these changes Nov 20, 2018

View reviewed changes

robertnishihara merged commit afc48d7 into ray-project:master Nov 20, 2018

robertnishihara deleted the no-pgid branch November 20, 2018 01:35

stephanie-wang mentioned this pull request Nov 21, 2018

Travis stress tests failing nondeterministically #3375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't setpgid() on actors #3347

Don't setpgid() on actors #3347

ericl commented Nov 18, 2018

ericl Nov 18, 2018

robertnishihara Nov 18, 2018

ericl Nov 18, 2018 •

edited

Loading

AmplabJenkins commented Nov 18, 2018

AmplabJenkins commented Nov 18, 2018

robertnishihara commented Nov 18, 2018

ericl commented Nov 18, 2018 via email

AmplabJenkins commented Nov 18, 2018

Don't setpgid() on actors #3347

Don't setpgid() on actors #3347

Conversation

ericl commented Nov 18, 2018

What do these changes do?

Related issue number

ericl Nov 18, 2018

Choose a reason for hiding this comment

robertnishihara Nov 18, 2018

Choose a reason for hiding this comment

ericl Nov 18, 2018 • edited Loading

Choose a reason for hiding this comment

AmplabJenkins commented Nov 18, 2018

AmplabJenkins commented Nov 18, 2018

robertnishihara commented Nov 18, 2018

ericl commented Nov 18, 2018 via email

AmplabJenkins commented Nov 18, 2018

ericl Nov 18, 2018 •

edited

Loading