-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc shutdown and child process handling #88
Conversation
0769ef6
to
aa9c0bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests pass and the changes look good to me.
The tests didn't fail BEFORE this patch though which makes me feel weird... is there a way we can exercise that failure to verify this change? I'm okay merging before that just to get the fix in, but would be really great to get a failing test before that is fixed after.
All the sleeps also feel kind of weird.
Yes, let me get a failing test for the before case. I don't like the sleeps either, but they were just inserted until I had time to refactor the tests (which I do now that the next TF alpha is out). I'll update the tests in this PR before merging. |
26e1831
to
fa01c2f
Compare
Rebased on the latest master. This is closer now, but I've still managed to induce some odd ordering during shutdown, so there will be at least one more commit incoming. |
The Serve method was waiting on a channel that was never closes. The Serve method is what should be closing the channel to notify when the server is done. The stdrr read loop only checks for EOF, so if there's another type of error, it ends up in a busy loop.
The reattach proess poller runs on a 1s ticker, which means it can possibly take almost 2 seconds to get a response.
This uses proper os.File file descriptors for stdin and stdout rather than io.Pipe, removing extra copy goroutines from each, and automatically closes the fds when necessary. Don't scan with slices from stdout, as they are handled concurrently and the backing array may be re-used.
Simplify the goroutine management by removing multiple channels and explicitly waiting for _all_ client goroutines to return on Kill(). Make sure Context cancellation funcs are called in all paths. Process termination it detected using the existing cancellation context. This makes detection of early termination and termination during Kill use the same mechanism, rather than the latter relying to the side-effect of the logging channel being closed. Add a flag to mark when a process was forcefully killed for tests. (this addition makes the pre-refactor grpc tests fail, to validate some of the PR changes)
Unlike the netRPC server, the gRPC server cannot terminate when the client closes its connection. Since we want to use the server as a plugin with control over its lifecycle, there needs to be another channel of communication to notify the server when to shutdown. This adds a small grpc service, currently with only a Shutdown method, which calls Stop on the GRPCServer. This will allow the server to shutdown normally without the client timing out and calling Kill on the process.
Adds a record of how the protobuf code was built, and keeps it inline with the controller code.
Remove the protobuf files from the plugin package, to avoid polluting the public API.
ffafcca
to
cd0df79
Compare
All set now. Cleaned up the temporary commits in the middle, and all tests all pass under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this add the failing test w/o this PR?
Looks a lot cleaner though.
Sorry, the failing test change was squashed into the rest of the test fixes, since it was pretty ugly to rebase in before the rest of the commits. Any of the grpc tests that call |
This fixes a bug related to shutting down of GRPC plugin interfaces (more info: hashicorp/go-plugin#88) This does not yet fix all test cases for subprocess leaking, but is a useful independant change.
This fixes a bug related to shutting down of GRPC plugin interfaces (more info: hashicorp/go-plugin#88) This does not yet fix all test cases for subprocess leaking, but is a useful independant change.
This PR's main goal is to fix the shutdown process of grpc plugins, which would always timeout.
In the process we also clean up the subprocess output handling, which speeds up the start/stop of the child process. This broke some tests that were dependent on the shutdown taking more time, but those were patched with a slight delay and can be fixed with a more robust solution at a later time.
The grpc server couldn't exit at all, because the Serve method was waiting on a channel that was never closed. The Serve method is what should be closing the channel to notify when the server is done.
The close signal is implemented by adding a small grpc service (
GRPCController
) with aShutdown
method to indicate that the process should exit. There is still something blocking aGracefulStop
attempt, but the implementation can be extended to try a graceful stop with a timeout once that is resolved.The GRPCController and GRPCBroker protobuf definitions were moved into an internal package, which prevents the generated code from polluting the public API.