Skip to content

Conversation

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Jun 12, 2018

https://bugs.python.org/issue33630

This PR is triying to fix this error (more info in the bpo):

test_no_such_executable (test.test_posix.TestPosixSpawn) ... ok
test_open_file (test.test_posix.TestPosixSpawn) ... ERROR
test_returns_pid (test.test_posix.TestPosixSpawn) ... ok
Warning -- files was modified by test_posix
  Before: []
  After:  ['\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb'] 
test test_posix failed
test_specify_environment (test.test_posix.TestPosixSpawn) ... ok

======================================================================
ERROR: test_open_file (test.test_posix.TestPosixSpawn)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/shager/cpython-buildarea/3.x.edelsohn-fedora-ppc64/build/Lib/test/test_posix.py", line 1542, in test_open_file
    with open(outfile) as f:
FileNotFoundError: [Errno 2] No such file or directory: '@test_42482_tmp'

@pablogsal
Copy link
Member Author

pablogsal commented Jun 12, 2018

@serhiy-storchaka Do you mind reviewing this? I am not completely happy about the code to wait for the inode to be created but I don't know if I am missing something obviously better.

I am referring to this code:

        max_tries = 3
        while not os.path.exists(outfile) and max_tries:
            max_tries -= 1
            time.sleep(0.1)

Note: This code has changed in 28a8ced

@pablogsal pablogsal changed the title bpo-33630: Fix race condition in `Lib/test/test_posix.py:TestPosixSpawn.test_ope… bpo-33630: Fix race condition in TestPosixSpawn.test_open Jun 12, 2018
os.environ, file_actions)
self.assertEqual(os.waitpid(pid, 0), (pid, 0))

max_tries = 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests, I prefer to use a timeout rather than a number of tries. Like:

deadline = time.monotonic()
while 1:
if time.monotonic() > deadline: ... timeout ...
... the code ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 28a8ced

@serhiy-storchaka
Copy link
Member

There is still a race condition. It is possible that the file is created, but is still empty.

Relying on time is not reliable. The child can be paused at any moment.

@pablogsal
Copy link
Member Author

pablogsal commented Jun 12, 2018

If the child is paused the self.assertEqual(os.waitpid(pid, 0), (pid, 0)) will still block until the child actually finishes, right? Notice that we are waiting until the inode is created after the child has finished. Also, we are flushing now stdout before the child finishes.

@pablogsal
Copy link
Member Author

pablogsal commented Jun 12, 2018

Commit 1ad0a08 handles the situation in which the file is created but nothing is still written into it.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to first see the posix_spawn() memory issue fixed, before fixing the Python test.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@pablogsal
Copy link
Member Author

@vstinner It seems that the memory corruption is due to a bug in glibc<2.20. There is a discussion on the issue to skip the test for older versions of glibc or add a workaround.

@pablogsal
Copy link
Member Author

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@vstinner: please review the changes made to this pull request.

Copy link
Member

@zhangyangyu zhangyangyu Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the version is glibc < 2.20. It would also be nice to mention CVE-2014-4043 so later code readers could easily find out details about the glibc bug if they are interested.

Copy link
Member

@zhangyangyu zhangyangyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is we should use the temp list always or just use it for glibc < 2.20? We could use macros __GLIBC__ and __GLIBC_MINOR__.

@pablogsal
Copy link
Member Author

@zhangyangyu Done in 869dd1c

@pablogsal
Copy link
Member Author

pablogsal commented Jun 13, 2018

I can confirm that this PR fixes both bugs. Tested on gcc110.fsffrance.org:

[pablogsal@gcc1-power7 cpython]$ git rev-parse HEAD
2c7f50d712bb9f64c53404c2629ce1998e6e2b0f

[pablogsal@gcc1-power7 cpython]$ uname -a
Linux gcc1-power7.osuosl.org 3.10.0-514.26.2.el7.ppc64 #1 SMP Mon Jul 10 02:26:53 GMT 2017 ppc64 ppc64 ppc64 GNU/Linux

[pablogsal@gcc1-power7 cpython]$ ldd --version
ldd (GNU libc) 2.17

[pablogsal@gcc1-power7 cpython]$ ./python -m unittest test.test_posix
ss....s....................ss................ss............s...s.....................................
----------------------------------------------------------------------
Ran 101 tests in 1.684s

OK (skipped=9)

[pablogsal@gcc1-power7 cpython]$ ./python -m test test_posix -R 3:3

Run tests sequentially
0:00:00 load avg: 3.94 [1/1] test_posix
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 9 sec
Tests result: SUCCESS

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think the temp list should always be used. This would make code clearer and make testing easier.

  2. Make two different PRs for these changes. Changes to the test still don't LGTM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__GLIBC__ and __GLIBC_MINOR__ give you the version of glibc used at compile time. But can not the different version be dynamically linked at run time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I updated the issue to check what people think.

Copy link
Member

@zhangyangyu zhangyangyu Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serhiy-storchaka Isn't this the case for all other condition macros? For example HAVE_POSIX_SPAWN could be defined at compile time but at run time you could link to a version doesn't have posix_spawn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you can link only to the lib with equal or larger version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 7: Add space before *, not after it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the new PR (see comment below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyList_Append() can fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the new PR (see comment below).

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@pablogsal
Copy link
Member Author

@serhiy-storchaka I am going to open another PR for the changes in posixmodule.c leaving this one for the tests only.

@pablogsal
Copy link
Member Author

Created #7685 for the memory problems. This PR is only for the test race conditions.

@pablogsal
Copy link
Member Author

I have simplified this bugfix:

As far as I understand this won't finish until the inode is created and the data is written to disk:

import sys, os
sys.stdout.write("hello")
sys.stdout.flush()
os.fsync(sys.stdout.fileno())

and on the parent waitpid blocks until the child finished with 0 (otherwise it fails the check):

self.assertEqual(os.waitpid(pid, 0), (pid, 0))

Unless I am missing something there is no race condition. When waitpid exists successfully, the file has to be there.

@vstinner
Copy link
Member

Let's see if the commit cb97073 fixed tests on buildbots.

@vstinner
Copy link
Member

PPC64 Fedora 3.x, PPC64LE Fedora 3.x, s390x Debian 3.x and s390x RHEL 3.x buildbots are back to green thanks to the commit cb97073, so I don't think that this change is still needed. I close the PR.

@vstinner vstinner closed this Jun 19, 2018
@pablogsal pablogsal deleted the bpo33630 branch May 19, 2021 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants