Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent test failure in test_subprocess.test_signals #851

Closed
njsmith opened this issue Jan 11, 2019 · 11 comments
Closed

Intermittent test failure in test_subprocess.test_signals #851

njsmith opened this issue Jan 11, 2019 · 11 comments

Comments

@njsmith
Copy link
Member

njsmith commented Jan 11, 2019

It looks like #849 failed to automerge because of an intermittent failure in test_subprocess.test_signals: https://ci.cryptography.io/blue/organizations/jenkins/python-trio%2Ftrio/detail/PR-849/1/pipeline

The traceback is:

_________________________________ test_signals _________________________________

    async def test_signals():
        async def test_one_signal(send_it, signum):
            with move_on_after(1.0) as scope:
                async with subprocess.Process(SLEEP(3600)) as proc:
                    send_it(proc)
            assert not scope.cancelled_caught
            if posix:
                assert proc.returncode == -signum
            else:
                assert proc.returncode != 0
    
        await test_one_signal(subprocess.Process.kill, SIGKILL)
>       await test_one_signal(subprocess.Process.terminate, SIGTERM)
../.venv/lib/python3.5/site-packages/trio/tests/test_subprocess.py:240: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
send_it = <function Process.terminate at 0x102bc6d90>
signum = <Signals.SIGTERM: 15>

    async def test_one_signal(send_it, signum):
        with move_on_after(1.0) as scope:
            async with subprocess.Process(SLEEP(3600)) as proc:
                send_it(proc)
        assert not scope.cancelled_caught
        if posix:
>           assert proc.returncode == -signum
E           assert -9 == -<Signals.SIGTERM: 15>
E            +  where -9 = <trio._subprocess.Process object at 0x105d6c908>.returncode

../.venv/lib/python3.5/site-packages/trio/tests/test_subprocess.py:235: AssertionError

CC: @oremanj

@njsmith
Copy link
Member Author

njsmith commented Jan 11, 2019

(This failure is in jenkins which we're phasing out, and indeed I have no idea why it's even running here. But I assume that if it failed here it can probably fail on other systems too.)

@oremanj
Copy link
Member

oremanj commented Jan 14, 2019

I'm thoroughly confused at this one -- we're calling Popen.terminate() which should always send SIGTERM, we're waiting for the process to exit after sending the signal, the cancel scope wasn't cancelled so we wouldn't have sent kill()...

I can nerf the test so that it permits exited-on-SIGKILL in response to sending SIGTERM, but I'm a little reluctant to do that unless we see this failure more than once - thoughts?

@njsmith
Copy link
Member Author

njsmith commented Jan 15, 2019

Thanks for taking a look! It seems pretty mysterious to me too.

I guess we have two goals here:

  • Prevent random failures from interfering with other unrelated development
  • Understand what happened, because if we don't understand then it might indicate some kind of latent bug

Right now we've only seen the failure once, so it's probably pretty rare, and it's not really interfering with other development. And if we nerf the test, then we won't notice future examples (if any), which will make it harder for us to understand what's going on.

So I'd leave the test alone for now, and if you don't understand it either then we can leave this open for a while and see if it happens again...

@njsmith
Copy link
Member Author

njsmith commented Jan 17, 2019

Here's another example, also on Jenkins macOS, same python version (3.5), exact same error: https://ci.cryptography.io/blue/organizations/jenkins/python-trio%2Ftrio/detail/PR-805/17/pipeline

@oremanj
Copy link
Member

oremanj commented May 2, 2019

We're not using Jenkins anymore, and I don't think this has shown up elsewhere. OK to close?

@njsmith
Copy link
Member Author

njsmith commented May 2, 2019

It seems very unlikely to me that Jenkins was actually the culprit, as opposed to some weird macOS thing. But since it isn't causing problems and we can't even prove the bug exists, I guess we can close it. (And re-open if it ever resurfaces.)

@njsmith njsmith closed this as completed May 2, 2019
@pquentin
Copy link
Member

Seen again on Linux Python 3.6 in Azure Pipelines: https://dev.azure.com/python-trio/trio/_build/results?buildId=1461&view=logs&jobId=9a864fd9-6c8f-52ca-79ce-2aa6dca1a1de&j=9a864fd9-6c8f-52ca-79ce-2aa6dca1a1de&t=14435553-4bd6-5ab0-e7af-caf830f382a2

_________________________________ test_signals _________________________________

    async def test_signals():
        async def test_one_signal(send_it, signum):
        await test_one_signal(Process.terminate, SIGTERM)
        if posix:
>           await test_one_signal(lambda proc: proc.send_signal(SIGINT), SIGINT)

/opt/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/site-packages/trio/tests/test_subprocess.py:366: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

send_it = <function test_signals.<locals>.<lambda> at 0x7f57d234a730>
signum = <Signals.SIGINT: 2>

    async def test_one_signal(send_it, signum):
        with move_on_after(1.0) as scope:
            async with await open_process(SLEEP(3600)) as proc:
                send_it(proc)
        assert not scope.cancelled_caught
        if posix:
>           assert proc.returncode == -signum
E           assert 1 == -2
E             -1
E             +-2

/opt/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/site-packages/trio/tests/test_subprocess.py:359: AssertionError
----------------------------- Captured stderr call -----------------------------
Failed to import the site module
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/site.py", line 73, in <module>
    import os
  File "/opt/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/os.py", line 57, in <module>
    import posixpath as path
  File "/opt/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/posixpath.py", line 28, in <module>
    import genericpath
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 674, in exec_module
  File "<frozen importlib._bootstrap_external>", line 779, in get_code
  File "<frozen importlib._bootstrap_external>", line 487, in _compile_bytecode
KeyboardInterrupt

@njsmith
Copy link
Member Author

njsmith commented Jan 31, 2020

@pquentin I think that's actually another variant of #851, so I reposted over there.

@njsmith njsmith closed this as completed Jan 31, 2020
@pquentin
Copy link
Member

Thanks 👍 I think you meant #1170

@njsmith
Copy link
Member Author

njsmith commented Jan 31, 2020

Uh... right :-)

@tjstum
Copy link
Member

tjstum commented Dec 3, 2021

I mentioned this in gitter, but I am reliably seeing this on my own macOS laptop. Happy to help with the triage or the fix but super receptive to hints!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants