-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the inference server exit gracefully in case of errors instead of hanging #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be lifechanging for MoLeR users everywhere. Thank you!
Hello, I am currently using the latest version of molecule-generation (0.4.1) and am encountering an issue similar to one that was previously discussed and resolved. Here is the error message I'm receiving:
My current environment where the code is running includes: Python 3.8.10 Despite the solution provided here, I am still encountering an error. Any assistance for solving or understanding the problem I am encountering would be greatly appreciated. Thank you. |
@cankobanz The PR you posted in made it so that MoLeR no longer hangs on errors, and instead exists gracefully. Your logs indeed show it has done so, thus it's handling the error as expected - the only question is, why is it encountering the error 🙂 From your logs, I assume this happens during encoding (not decoding) of a potentially long sequence of SMILES. Is that correct? If so, do you see the error always or only at scale (which could suggest it's being triggered by outliers)? Are all of the SMILES you're encoding valid (i.e. can they be parsed by |
@kmaziarz, thank you for the quick and detailed response. Now, I understand the fix that was applied here. I can easily identify and remove the invalid SMILES molecules from my dataset. |
Currently, the inference server waits for the results from child processes indefinitely, which leads to hanging if one of these processes dies. There are at least two ways to reliably trigger this:
(1) Pass in an invalid SMILES for encoding (tracked in #15)
(2) Initialize the inference server after having initialized
tensorflow
(which poisons the context of the forked child processes)We could handle (1) better, and this is further described in #15, but I didn't find a way to detect (2) other than let the child process die and then try to recover from that.
In this PR, I make the experience a bit smoother by making the parent process exit gracefully in case a child process dies.