Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-101766: Fix refleak for _BlockingOnManager resources #101942

Merged
merged 2 commits into from
Feb 17, 2023

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Feb 16, 2023

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @corona10 for commit 6ff177a 🤖

If you want to schedule another build, you need to add the :hammer: test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Feb 16, 2023
@corona10 corona10 added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Feb 16, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @corona10 for commit 6ff177a 🤖

If you want to schedule another build, you need to add the :hammer: test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Feb 16, 2023
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I never saw this code before.

@corona10
Copy link
Member Author

corona10 commented Feb 16, 2023

The current failure does not relate to this PR..
Also, it is failed with the main branch too: eb0c485 and also for 89ac665

$> ./python -m test test_imp -R 3:3 -v

======================================================================
FAIL: test_singlephase_variants (test.test_imp.ImportTests.test_singlephase_variants) [_testsinglephase]
Exercise the most meaningful variants described in Python/import.c.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/cpython/Lib/test/test_imp.py", line 366, in test_singlephase_variants
    self.assertEqual(init_count, expected_init_count)
AssertionError: 3 != 1

======================================================================
FAIL: test_singlephase_variants (test.test_imp.ImportTests.test_singlephase_variants) [_testsinglephase_basic_wrapper]
Exercise the most meaningful variants described in Python/import.c.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/cpython/Lib/test/test_imp.py", line 383, in test_singlephase_variants
    self.assertEqual(init_count, expected_init_count)
AssertionError: 3 != 2

----------------------------------------------------------------------
Ran 30 tests in 0.106s

@corona10
Copy link
Member Author

@ericsnowcurrently Can you please take a look?

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can safely remove it, because it is reset here: self.blocked_on = _blocking_on.setdefault(self.thread_id, [])

Thanks for fixing this!

@corona10
Copy link
Member Author

@gvanrossum @sobolevn
I will merge this PR after I check the leak test is passed.

@corona10 corona10 self-assigned this Feb 16, 2023
Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I left one comment but feel free to ignore it if you don't think it matters.

Lib/importlib/_bootstrap.py Show resolved Hide resolved
@corona10
Copy link
Member Author

@ericsnowcurrently

Hi Eric
096d009 looks causing
$> ./python -m test test_imp -R 3:3 failure.
Is this under investigation?

@ericsnowcurrently
Copy link
Member

I'll take a look at the refleak.

@ericsnowcurrently
Copy link
Member

I'm working on it via gh-101969.

@corona10 corona10 added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Feb 17, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @corona10 for commit 79fd697 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Feb 17, 2023
@corona10
Copy link
Member Author

Merge this PR since refleaks test are passed, and macOS CI is another issue:
#101981

@corona10 corona10 merged commit 775f881 into python:main Feb 17, 2023
@corona10 corona10 deleted the gh-101766 branch February 17, 2023 10:14
Comment on lines +88 to +92
if len(self.blocked_on) == 0:
# gh-101766: glboal cache should be cleaned-up
# if there is no more _blocking_on for this thread.
del _blocking_on[self.thread_id]
del self.blocked_on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corona10 @gvanrossum @sobolevn

This change re-introduces the re-entrancy bug that _BlockingOnManager was introduced to fix.

Consider: A thread has executed __enter__ and begins executing __exit__. It executes line 87 and the length of self.blocked_on becomes 0. The check on the new line 88 passes and the thread enters the block starting with the new line 89. Call this "point x". Now the garbage collector is triggered and an object with a __del__ is collected. The __del__ runs and imports a module (as is very common now that there are ResourceWarnings). As part of satisfying the import, the thread runs __enter__. It executes line 82 and receives the list that is already in _blocking_on (it uses the same key because it is the same thread - this is a single-threaded case). It executes line 83 to add its lock to the list. The import eventually finishes and it executes __exit__. It removes its lock from the list (line 87) and removes the list from _blocking_on (new line 91). Garbage collection proceeds and we get back to the original import being resolved. Execution returns to "point x". At point x we have already decided there are no locks in self.blocked_on and we're going to clean up the global dictionary. New line 91 runs and finds no item at key self.thread_id. This raises a KeyError which goes unhandled all the way up to some application code that was doing import somemodule.

I recommend reverting this change until a re-entrant safe version of the fix can be determined. The resource leak is not great but if I understand correctly, it is also more or less academic right now (it causes a very small amount of memory to be wasted while the test suite is running). The re-entrancy bug is real and breaks many real applications in extremely confusing ways (consider that the original bug reports were open for years before anyone tracked down the problem, at great expense, and it took more than half a year to get the fix reviewed and merged).

Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exarkun cc @gvanrossum @sobolevn
Okay, I understand your intention.
I will try to submit the patch if the resource can be cleaned up at the test suite level including reverting the current change..

And also would you like to submit the test code to prevent regression?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to submit the patch if the resource can be cleaned up at the test suite level including reverting the current change..

Thanks you very much.

And also would you like to submit the test code to prevent regression?

That would be ideal, certainly. There is a reproducer on #94504 but it is CPU intensive and non-deterministic without additional instrumentation inside _bootstrap.py itself. It's not clear to me exactly how to turn this into a regression test that can be included in the automated suite. I'd be happy to discuss it further though.

@gvanrossum
Copy link
Member

Would it be possible to delete the cached empty list in a __del__ method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants