Skip to content

Free tstate on python 3.7+ on finalize_interpreter #2020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 12, 2019

Conversation

bstaletic
Copy link
Collaborator

@bstaletic bstaletic commented Dec 10, 2019

Right now pybind leaks a single pointer worth of memory when embedding the interpreter. This is because, with the new python 3.7 thread C API, tstate in the internals struct needs to be freed with a call to PyThread_tss_free().

This PR frees the block inside the finalize_interpreter, where all the cleanup seems to be located.
An alternative solution is adding ~internals() { PyThread_tss_free(tstate); }, which would make it impossible for pybind to forget to free tstate ever again. However this solution has ABI implications. A destructor that isn't implicitly generated or explicitly defaulted makes the type non-trivial, which changes the calling convention rules. I'm fine with moving the code to the destructor if @wjakob thinks that is more appropriate.

Finally, whatever soluton we settle on, this only affects python 3.7+.

Ping @YannickJadoul

EDIT: The valgrind reported stack trace of the leak:

==25127== 8 bytes in 1 blocks are definitely lost in loss record 4 of 2,659
==25127==    at 0x483877F: malloc (vg_replace_malloc.c:309)
==25127==    by 0x506FB13: PyThread_tss_alloc (in /usr/lib/libpython3.8.so.1.0)
==25127==    by 0x49B05BF: pybind11::detail::get_internals() (in /home/bstaletic/work/ycmd/ycm_core.so)
==25127==    by 0x49AB876: YouCompleteMe::FilterAndSortCandidates(pybind11::list, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long) (in /home/bstaletic/work/ycmd/ycm_core.so)
==25127==    by 0x11DDBB: YouCompleteMe::PythonSupportFixture_FilterAndSortUnstoredCandidatesWithCommonPrefix_Benchmark::BenchmarkCase(benchmark::State&) (in /home/bstaletic/work/ycmd/build/ycm/benchmarks/ycm_core_benchmarks)
==25127==    by 0x12EE9D: benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) (in /home/bstaletic/work/ycmd/build/ycm/benchmarks/ycm_core_benchmarks)
==25127==    by 0x11C017: main (in /home/bstaletic/work/ycmd/build/ycm/benchmarks/ycm_core_benchmarks)

@bstaletic bstaletic force-pushed the tstate-leak branch 4 times, most recently from 269327b to 5eb948e Compare December 11, 2019 12:47
@bstaletic
Copy link
Collaborator Author

Rebased after travis fixes. Tests are now green.

@wjakob
Copy link
Member

wjakob commented Dec 11, 2019

Hi @bstaletic, indeed I think that it would be more appropriate to put this into the destructor. Consequently, the PYBIND11 ABI version should be updated. (there is a constant indicating this in one of the header files).
Best,
Wenzel

@YannickJadoul
Copy link
Collaborator

more appropriate to put this into the destructor.

We'd need to make sure PyThread_tss_free does not (and will not, in the future) touch the interpreter, since ~internals only gets called after Py_Finalize();.

@bstaletic
Copy link
Collaborator Author

If ~internal() gets called after Py_Finalize(), we can't call PyThread_tss_free(tstate) in the destructor. Thanks for the reminder, @YannickJadoul

@YannickJadoul
Copy link
Collaborator

If ~internal() gets called after Py_Finalize(), we can't call PyThread_tss_free(tstate) in the destructor.

We might get away with it, but it wouldn't be very clean.

Also, another reminder: there's an explicit comment saying internals can't be deleted/destroyed before Py_Finalize because some Python object finalizers (or whatever this __del__ is called) might need the data in the internals struct.

But actually, are we sure none of the objects __del__ methods would depend on tstate still being working/valid? Because if not, the current solution is also incorrect :-(

That being said, it would be nice to have the internals destruction logic somewhere close to the initialization. So maybe just some kind of clean up function in internals.h that we can call from finalize_interpreter?

@bstaletic
Copy link
Collaborator Author

But actually, are we sure none of the objects del methods would depend on tstate still being working/valid? Because if not, the current solution is also incorrect :-(

Is that allowed? If __del__ is allowed to depend on Py_tss_t* pointing to a valid object, then you have no way to correctly order the cleanup function calls.

@bstaletic
Copy link
Collaborator Author

@wjakob I've updated the pull request. It turns out that using the destructor to clean up tstate is perfectly fine. There's a fat comment in the destructor itself, detailing what's going on.
Valgrind does agree that this is fine, no memory errors are reported regardless of which allocator python uses - pymalloc/malloc.

@wjakob
Copy link
Member

wjakob commented Dec 12, 2019

Ok, looks good to me -- thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants