-
-
Notifications
You must be signed in to change notification settings - Fork 758
Fix subprocess handling when killed on timeout #5220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Kami
merged 22 commits into
StackStorm:master
from
engineer-roman:fix_zombie_spawning_green_shell
Apr 7, 2021
Merged
Changes from 2 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
b367ef2
Add WIP fix for handling urls with unicode characters.
Kami 4e0bb04
Add WIP fix for the client side and make sure urls with unicode
Kami 20e9b81
Revert "Add WIP fix for the client side and make sure urls with unicode"
Kami cab8b52
Add a fix for CLI which would result in exceptions in various scenarios
Kami e08f9f1
Move functionality for surrogate re-encoding to a uility function for
Kami 2f06857
Add additional unit tests for the utf-8 bug.
Kami 3ef58c9
Add changelog entry.
Kami db8e85e
Add an example rule fixture for rule with unicode name which can be used
Kami f3a92b1
Use a different more unique name to avoid breaking other test with a
Kami 04e58b6
Update API router code to throw a more user-friendly exception in case
Kami 15274ea
Add a test case for it.
Kami ae42830
fix subprocess handling when killed on timeout
engineer-roman fde03cc
Merge branch 'master' into url_path_unicode_fix
Kami c0ed21b
For now, remove test fixture which is causing failure in end to end
Kami b475c64
Merge branch 'master' into fix_zombie_spawning_green_shell
Kami c687259
Merge pull request #5189 from StackStorm/url_path_unicode_fix
Kami d1912e1
Merge branch 'fix_zombie_spawning_green_shell' of https://github.com/…
Kami a4e3c98
Instead of overriding process.returncode and utilizing that attribute to
Kami 708fa44
Update docstring.
Kami 39e95b4
Add integration tests for st2common.util.green.shell.run_command()
Kami 927fe07
Add TODO comment.
Kami 0770969
Add changelog entry.
Kami File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking into this tonight.
I started by adding a test case and I was indeed able to reproduce the zombie process test issue with a local command runner like setup where we use a custom
kill_func()
.It seems that adding
process.wait()
after kill fixes the zombie process issue and we can leave that exit code assignment in place - I think that's important since if we remove it process will have a different exit code set which would break our timeout detection logic in some places.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just curious where you encountered this issue?
Was it via local shell command action runner or some other code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And under which Python version you encountered this issue?
Since technically now that only support Python >= 3.6, we could also simplify that function a bit and utilize
timeout
parameter forcommunicate()
method which we couldn't do when we still supported Python 2.7.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kami
That was long-running Python actions, but next morning found zombies and started to investigate root cause of them. Finally I realized those processes just never receive kill signal, checked source of
kill
/communicate
/wait
and here we are :)No,
wait()
won't work in both cases (eventlet and gevent) because this method checksreturncode
againstNone
to proceed, otherwise it will just return existingreturncode
. Also it should not cause any mess with timeout detection logic becuse it will be set to-9
right afterwait()
.Python 3.6. I'll be able to look into using
communicate()
here tonight.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
communicate()
is actually not a blocker, just a potential future simplification :)And thanks for answering those questions.
I think the change which was merged should cover all those scenarios and I believe we now also have integration tests for most of those situations (shell=True as is the case for local runner, shell=False for python action runner, etc. Only thing we are missing is tests for read_stdout_func + read_stdout_func which verify threads which call those functions are correctly stopped / killed on timeout.).
Can you please confirm the change which has been merged indeed fixes the issue (I believe it should since it's based on your changes)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see :)
Just tried to reproduce the issue and everything seem to be fine:
Thank you for helping with my PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Thanks for confirming and submitting this bug fix.