Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure to properly untrack gc objects before freeing them #4461

Merged
merged 4 commits into from
Jan 18, 2023

Conversation

albanD
Copy link
Contributor

@albanD albanD commented Jan 18, 2023

Description

There is no issue for this as the problem is hard to describe without showing code.
The problem is that this dealloc function is sometimes associated with GC objects but it doesn't properly call PyObject_GC_UnTrack as required by https://docs.python.org/3/c-api/gcsupport.html (note that the track call is implicitly done by the default tp_alloc which we never override).

This was detected because the latest 3.11 release now raises a warning for this when pydebug is enabled.
This warning is being raised in particular for this class in PyTorch: https://github.com/pytorch/pytorch/blob/764f79f6804a57a75d9d4da97824b4f57b638ef5/torch/csrc/jit/python/script_init.cpp#L1454
Since similar warnings for some of our custom python objects do lead to hard crash, I think it is important to fix it (details pytorch/pytorch#91161 )

There is no test in this PR as I don't know how to test this. The warning is only raised in python debug builds (and is a weird warning that cannot be turned into an error). If you have an idea on how to test this, I am happy to do it!

Suggested changelog entry:

* Fix a warning when pydebug is enabled on Python 3.11

@albanD
Copy link
Contributor Author

albanD commented Jan 18, 2023

CI failure looks unrelated to my change right?

@rwgk
Copy link
Collaborator

rwgk commented Jan 18, 2023

The flake8 failure is certainly unrelated, I don't know how that slipped in.

The clang-tidy failure is real, but just use the suggested fix.

@rwgk
Copy link
Collaborator

rwgk commented Jan 18, 2023

If you have an idea on how to test this
...
Since similar warnings for some of our custom python objects do lead to hard crash,

Could that be reduced to a unit test here (IIUC it's a pytorch unit test)?

@albanD
Copy link
Contributor Author

albanD commented Jan 18, 2023

Could that be reduced to a unit test here (IIUC it's a pytorch unit test)?

The crash I observed were only with debug build and for a class that was bound manually (not with pybind).
For the pybind-bound classes, the warning was raised but I never observed the crash.

Do you actually have a CI config with 3.11 and pydebug enabled?

@rwgk
Copy link
Collaborator

rwgk commented Jan 18, 2023

Do you actually have a CI config with 3.11 and pydebug enabled?

This one? https://pypi.org/project/pydebug/

No, I don't think so. How much trouble would it be to port the pytorch pydebug testing here? (for just one platform maybe)

Might be good to have in general. Maybe we discover more issues?

@albanD
Copy link
Contributor Author

albanD commented Jan 18, 2023

Ho sorry I should have explained.
No I don't mean this package.
I meant the CPython build flag: https://pythonextensionpatterns.readthedocs.io/en/latest/debugging/debug_python.html#building-a-standard-debug-version-of-python

I never managed to get my hand on a debug cpython easily for the pytorch CI either. But I'm running with that locally because it is very helpful as it contains many extra assert (pre conditions, refcount, etc).
But AFAIK you would have to build your own version of CPython from source for your CI to enable this.

@rwgk
Copy link
Collaborator

rwgk commented Jan 18, 2023

But AFAIK you would have to build your own version of CPython from source for your CI to enable this.

AFAIK that's true ... and too much trouble, at least for this PR.

I'll run this PR through Google's global testing, to make sure there aren't any weird surprises. (That'll take half a day or so.)

@rwgk
Copy link
Collaborator

rwgk commented Jan 18, 2023

The new flake8 error is in test code I added ... sigh. I'll fix it in a separate PR. Good to ignore here.

@rwgk rwgk mentioned this pull request Jan 18, 2023
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global testing looks good (i.e. no surprises; internal test ID OCL:502758013:BASE:502854252:1674050307767:9412450b).

Thanks for the fix!

auto *type = Py_TYPE(self);

// If this is a GC tracked object, untrack it first
if (PyType_HasFeature(type, Py_TPFLAGS_HAVE_GC) != 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add your sentence from the PR description as a comment here?

// Note that the track call is implicitly done by the default tp_alloc, which we never override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good now!

@rwgk rwgk merged commit c709d2a into pybind:master Jan 18, 2023
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Jan 18, 2023
@albanD albanD deleted the gc_obj_dealloc branch January 19, 2023 16:36
henryiii pushed a commit that referenced this pull request Feb 24, 2023
* Make sure to properly untrack gc objects before freeing them

* style: pre-commit fixes

* Fix lint

* Add comment about where the original track comes from

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Mar 16, 2023
@rwgk rwgk mentioned this pull request May 16, 2023
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 9, 2023
In particular, I would like to incorporate pybind/pybind11#4461, which means that pybind11 objects are properly untracked from the Python GC before destruction, absent which we see warnings when running in a debug build of Python 3.11.

PiperOrigin-RevId: 539148388
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants