Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix python segfaults #470

Merged
merged 3 commits into from
Nov 9, 2022
Merged

Conversation

hellkite500
Copy link
Member

@hellkite500 hellkite500 commented Nov 8, 2022

Under certain conditions (compilers, time of day, user mood...), ngen with python support active would hit a segmentation fault at the end of main, during the destruction/cleanup of the program. This was first observed in #370 and has recently been seen even outside of the parallel runtime. As noted in that issue, the static singleton used across multiple compilation units is likely to blame, as the the destruction order of resources is not guaranteed.

This PR addresses this problem while maintaining a static global interpreter util by using a weak_ptr as the static resource, and then constructing shared_ptr objects as the singleton instances. The weak_ptr ref counting prevents the destruction the of the underlying util class, which contains the py::scoped_interpreter that defines the lifecycle of the python interpreter.

Note that if an instance is called but not referenced

utils::ngenPy::InterpreterUtil::getInstance();

then the interpreter is started and finalized, as there is no reference to count. In this case, the next call to getInstance() creates and initializes a NEW interpreter.

In contrast, if the reference is held

auto a = utils::ngenPy::InterpreterUtil::getInstance();

then the next call to getInstance() returns a shared pointer to the existing interpreter.

I also added a reference to the Routing_Py_Adapter class to ensure that it always had a valid interpreter available, instead of relying strictly on the assumption that someone created and was holding a reference at the right scope for it to operate.

Changes

  • Change the singleton implementation of InterpreterUtil to use a ref-counting mechanism.

Testing

  1. Tested with and without routing enabled on a sample ngen run.

Notes

  • This was tested on multiple versions of pybind, namely 2.6.1 and 2.10.1. This was branched off the update to pybind 2.10.1 #468 branch, and may want to be merged sequentially with that PR, but I don't expect any merge conflicts regardless.

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Target Environment support

  • Linux
  • MacOS

Closes #370

@mattw-nws
Copy link
Contributor

Approving based on discussions in standup this morning, seems everyone's in agreement that this makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel runs with routing crash
2 participants