Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-18372: Add missing PyObject_GC_Track() calls in the pickle module #8505

Merged

Conversation

ZackerySpytz
Copy link
Contributor

@ZackerySpytz ZackerySpytz commented Jul 27, 2018

@zooba
Copy link
Member

zooba commented Jul 29, 2018

@ZackerySpytz We should have a NEWS entry for this (something like "ensure Pickler and Unpickler are correctly garbage collected". Click on the "Details" link next to the failed check for info on how to add it.

@serhiy-storchaka
Copy link
Member

@ZackerySpytz, please add a news entry.

@serhiy-storchaka
Copy link
Member

Ping.

@ZackerySpytz
Copy link
Contributor Author

@zooba I created this PR without a news entry because this is a very minor change. The "skip news" label is usually applied for such changes.

@serhiy-storchaka
Copy link
Member

_Pickler_New() and _Unpickler_New() are only used in implementations of module-level functions dump(), dumps(), load(), loads(). The result of _Pickler_New() and _Unpickler_New() is never leaked to the user, so it is never GC collected. What is the benefit of adding PyObject_GC_Track() calls?

@vstinner
Copy link
Member

I removed the " needs backport to 3.6" label, the 3.6 branch no longer accept bugfixes (only security fixes are accepted): https://devguide.python.org/#status-of-python-branches

@methane
Copy link
Member

methane commented Apr 8, 2019

I think @serhiy-storchaka is right. This doesn't fix real bug, so no need to backport.

How should we do in 3.8? Merge this for consistency? Or add comment like
"// We skip PyObject_GC_Track(pickler) here because pickler never leaks." ?

@vstinner
Copy link
Member

vstinner commented Apr 8, 2019

The result of _Pickler_New() and _Unpickler_New() is never leaked to the user, so it is never GC collected. What is the benefit of adding PyObject_GC_Track() calls?

The doc says that PyObject_GC_Track() must be called on objects allocated using PyObject_GC_New() or PyObject_GC_NewVar().

Serhiy wrote "_pickle.Pickler and _pickle.Unpickler have the Py_TPFLAGS_HAVE_GC flag, implement tp_traverse and tp_clear, but PyObject_GC_Track is newer called."

https://bugs.python.org/issue18372#msg228622

Either GC support must be removed (remove Py_TPFLAGS_HAVE_GC, remove tp_clear and tp_traverse, etc.), or the implementation should be fixed (call PyObject_GC_Track).

IMHO it's better to fix the implementation. I like the ability of using functions like gc.get_referrers() during serialization/deserialization.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I would like to merge this change and apply it to 2.7 and 3.7, except if someone sees a good reason to not fix this bug?

@vstinner
Copy link
Member

vstinner commented Apr 8, 2019

I think @serhiy-storchaka is right. This doesn't fix real bug, so no need to backport.

IMHO it's a bug to not call PyObject_GC_Track(). It's too easy to create a reference cycle in Python, especially in Python 3 with exceptions keeping local variables alive.

@vstinner
Copy link
Member

vstinner commented Apr 8, 2019

Hum wait, GC experts: should PyObject_GC_UnTrack() be called in pickle/unpickle dealloc functions?

@vstinner
Copy link
Member

vstinner commented Apr 8, 2019

Something else, PyMemoTable keeps a strong reference to objects. Pickler_traverse() should also traverse self->memo, not only self->fast_memo, no?

@methane
Copy link
Member

methane commented Apr 8, 2019

Hum wait, GC experts: should PyObject_GC_UnTrack() be called in pickle/unpickle dealloc functions?

It is called already :)


_pickle module exports Pickler and Unpickler class. So I think we should call PyObject_GC_Track.

But I'm not sure it's important enough to backport to 2.7.
GC in Python 2.7 is not good as in recent Python 3. If real issue is not reported, I don't want
to touch Python 2.7.

@vstinner
Copy link
Member

vstinner commented Apr 8, 2019

_Pickler_New() and _Unpickler_New() are only used in implementations of module-level functions dump(), dumps(), load(), loads(). The result of _Pickler_New() and _Unpickler_New() is never leaked to the user, so it is never GC collected.

Ok, I now understand the "never leaked to the user" part: _pickle_dump_impl() doesn't pass the temporary 'pickler' object to any "user function".

What is the benefit of adding PyObject_GC_Track() calls?

It's not only a matter of breaking reference cycles.

The GC is more than that: it's always a way to introspect all Python objects (tracked by the GC). For example, gc.get_objects() is used by some projects to measure frequently the memory usage, I like to use gc.get_referrers() understand the relationship between objects and manually "check" the reference count of an object, etc.

Said differently, it's a matter of consistency :-)

--

What should be done with traverse functions and the memo?

@vstinner
Copy link
Member

vstinner commented Apr 8, 2019

But I'm not sure it's important enough to backport to 2.7. GC in Python 2.7 is not good as in recent Python 3. If real issue is not reported, I don't want to touch Python 2.7.

Ok. I removed the "needs backport to 2.7" label.

@methane methane merged commit 359bd4f into python:master Apr 23, 2019
@miss-islington
Copy link
Contributor

Thanks @ZackerySpytz for the PR, and @methane for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 23, 2019
…pythonGH-8505)

(cherry picked from commit 359bd4f)

Co-authored-by: Zackery Spytz <zspytz@gmail.com>
@bedevere-bot
Copy link

GH-12926 is a backport of this pull request to the 3.7 branch.

miss-islington added a commit that referenced this pull request Apr 23, 2019
…GH-8505)

(cherry picked from commit 359bd4f)

Co-authored-by: Zackery Spytz <zspytz@gmail.com>
@serhiy-storchaka
Copy link
Member

I do not think this change was necessary.

@vstinner
Copy link
Member

I do not think this change was necessary.

Well, it shouldn't hurt anyone :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants