Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-30703: Improve signal delivery #2415

Merged
merged 10 commits into from
Jun 28, 2017
Merged

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Jun 26, 2017

Avoid using Py_AddPendingCall from signal handler, to avoid calling signal-unsafe functions.

Avoid using Py_AddPendingCall from signal handler, to avoid calling signal-unsafe functions.
@mention-bot
Copy link

@pitrou, thanks for your PR! By analyzing the history of the files in this pull request, we identified @benjaminp, @serhiy-storchaka and @1st1 to be potential reviewers.

@pitrou pitrou changed the title Improve signal delivery bpo-30703: Improve signal delivery Jun 26, 2017
@pitrou pitrou added the type-bug An unexpected behavior, bug, or error label Jun 26, 2017
@pitrou
Copy link
Member Author

pitrou commented Jun 26, 2017

The test I'm adding here fails without the rest of the patch (on Linux and OS X). This means our signal delivery logic really had holes in it.

@pitrou
Copy link
Member Author

pitrou commented Jun 27, 2017

@tim-one, would you like to review this one?

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I wrote the first version of this change, I like the design, haha. More seriously, the overall change LGTM.

But I added a few comments with some questions.

@@ -1556,8 +1552,10 @@ PyErr_CheckSignals(void)
arglist);
Py_DECREF(arglist);
}
if (!result)
if (!result) {
is_tripped = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IHMO it would be more explicit to have a "error:" label and use goto here, and if arglist creation failed. But well, it's just a coding style suggestion, it's up to you ;-)

Python/ceval.c Outdated
{
/* bpo-30703: Function called when the C signal handler of Python gets a
signal. We cannot queue a callback using Py_AddPendingCall() since this
function is not reentrant (uses a lock and a list). */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you asked me to write "async-signal safe" :-) maybe remove "(uses a lock and a list)"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thank you. I had forgotten that.

@@ -408,6 +415,16 @@ Py_MakePendingCalls(void)
if (busy)
return 0;
busy = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand well the purpose of busy... is it a flag to catch reentrant call? If no, maybe it should use volalite? If yes, please add a comment on the variable declaration to document this flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"don't perform recursive pending calls" as written above :-)

Python/ceval.c Outdated
that a signal was received, see _PyEval_SignalReceived(). */
UNSIGNAL_PENDING_CALLS();
if (PyErr_CheckSignals() < 0) {
busy = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to move this code at the end of the function, in a new "error:" label and use goto. IMHO it's more readable like that (set busy to 1 at the function entry and set it to 0 at exit) (and it would avoid code duplication).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do.

@@ -408,6 +415,16 @@ Py_MakePendingCalls(void)
if (busy)
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to add assert(PyGILState_Check()); at the function entry to make it more explicit that this function requires to hold the GIL?

(Right now, it's not obvious for me if the pending_lock initialization is safe.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do.

Python/ceval.c Outdated
UNSIGNAL_PENDING_CALLS();
if (PyErr_CheckSignals() < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please copy the comment before the if:

  • /* Python signal handler doesn't really queue a callback: it only signals
  •   that a signal was received, see _PyEval_SignalReceived(). */
    

@@ -941,6 +942,101 @@ def handler(signum, frame):
(exitcode, stdout))


class StressTest(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... I dislike such tests, I expect that I will have to fix corner cases in the test itself on BSD, Solaris, etc. I understand why you wrote the test, but I also expect that it will be a pain to "maintain" it :-(

setsig(signal.SIGALRM, second_handler) # for ITIMER_REAL

expected_sigs = 0
deadline = time.time() + 15.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 000 and 15 seconds? Oh, this test seems expensive :-/ Maybe tag it using the "cpu" resource? Or reduce these parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test takes 0.5 second on my machine. The large 15 seconds timeout was just there to make you (and the buildbots) happy :-)

@vstinner
Copy link
Member

About the test, I would even prefer to not have such test. Or maybe only run it a few times (ex: 100 signals, not 10 000).

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm not confortable to backport this change on stable versions right now. If you want to backport it, maybe wait at least one week to see if the test works well on all buildbots.

@pitrou
Copy link
Member Author

pitrou commented Jun 28, 2017

Agreed. If it goes well then I may consider a backport to 3.6. I think 2.7 and 3.5 are out of question.

@pitrou pitrou merged commit c08177a into python:master Jun 28, 2017
@pitrou pitrou deleted the signal_delivery branch June 28, 2017 21:29
@pitrou
Copy link
Member Author

pitrou commented Jun 28, 2017

It seems some systems may have crappy resolution for setitimer. For example, the FreeBSD man page says:

Time values smaller than the resolution of the system clock are rounded up to this resolution (typically 10 milliseconds).

(https://www.freebsd.org/cgi/man.cgi?query=setitimer&apropos=0&sektion=2&manpath=FreeBSD+10.0-RELEASE&arch=default&format=html)

This might explain the failure here: http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.x/builds/275/steps/test/logs/stdio
(note signal 14 is SIGALRM)

@pitrou
Copy link
Member Author

pitrou commented Jun 28, 2017

Same for AIX:

If a nonprivileged user attempts to submit a fine granularity timer (that is, a timer request of less than 10 milliseconds), the timer request interval is automatically raised to 10 milliseconds.

(https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.basetrf1/getinterval.htm)

@vstinner
Copy link
Member

vstinner commented Jun 28, 2017 via email

@pitrou
Copy link
Member Author

pitrou commented Jun 28, 2017

Please stop being offensive with my test :-)

@vstinner
Copy link
Member

vstinner commented Jun 28, 2017 via email

pitrou added a commit to pitrou/cpython that referenced this pull request Jul 1, 2017
* Improve signal delivery

Avoid using Py_AddPendingCall from signal handler, to avoid calling signal-unsafe functions.

* Remove unused function

* Improve comments

* Add stress test

* Adapt for --without-threads

* Add second stress test

* Add NEWS blurb

* Address comments @Haypo.
(cherry picked from commit c08177a)
pitrou added a commit that referenced this pull request Jul 1, 2017
* [3.6] bpo-30703: Improve signal delivery (GH-2415)

* Improve signal delivery

Avoid using Py_AddPendingCall from signal handler, to avoid calling signal-unsafe functions.

* Remove unused function

* Improve comments

* Add stress test

* Adapt for --without-threads

* Add second stress test

* Add NEWS blurb

* Address comments @Haypo.
(cherry picked from commit c08177a)

* bpo-30796: Fix failures in signal delivery stress test (#2488)

* bpo-30796: Fix failures in signal delivery stress test

setitimer() can have a poor minimum resolution on some machines,
this would make the test reach its deadline (and a stray signal
could then kill a subsequent test).

* Make sure to clear the itimer after the test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants