Skip to content

tfexec: Enable graceful (SIGINT-based) cancellation #512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 10, 2025

Conversation

radeksimko
Copy link
Member

@radeksimko radeksimko commented Apr 8, 2025

Closes #389

Changelog draft

  • Context cancellation on Unix systems will now send Terraform process SIGINT instead of killing it (which is otherwise default os/exec behaviour)
    • You can change the default 60s WaitDelay via SetWaitDelay(time.Duration)
  • error type returned from individual commands now implements Unwrap making it possible to pass it into errors.As and access lower-level error such as exec.ExitError

@radeksimko radeksimko force-pushed the radek/sigint-cancellation branch 3 times, most recently from 84d3e2c to 7a5221f Compare April 8, 2025 17:08
@radeksimko radeksimko force-pushed the radek/sigint-cancellation branch from 7a5221f to d72de79 Compare April 8, 2025 17:31
@radeksimko radeksimko marked this pull request as ready for review April 8, 2025 17:52
@radeksimko radeksimko requested review from a team as code owners April 8, 2025 17:52
Copy link
Member

@dbanck dbanck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@radeksimko radeksimko merged commit 49c2a4c into main Apr 10, 2025
107 checks passed
@radeksimko radeksimko deleted the radek/sigint-cancellation branch April 10, 2025 10:03
@wjf3121
Copy link

wjf3121 commented Apr 27, 2025

Hi @radeksimko thanks for your effort to address this issue. However, I think this PR won't work as expected:

writeOutput uses the context spawned from the command context, which means when the command is canceled, writeOutput will in turn be canceled leading to signal: broken pipe. This will cause the Terraform command to immediately exit without waiting for the graceful shutdown.

We found this issue when making our own implementation in #454. Hopefully it can be addressed in the main branch so we can switch to the official version again.

Another difference between this PR and #454 is that we pass Setpgid: true to unix cmd, which currently is only passed in the Linux implementation. I don't recall exactly why we do this but I remember there were issues when we needed to force kill the command when graceful shutdown timed out.

@radeksimko
Copy link
Member Author

@wjf3121 I would be more than happy to address any outstanding issues if you can provide a way of reliably reproducing it, preferably in a form of a test.

That applies to both signal: broken pipe and Setpgid: true.

As far as I can tell, the linked PR #454 does not contain test for either of these issues so it is hard to understand the nature of those problems and confirm that the code actually solves them. Correct me if I'm wrong.

@dastrobu
Copy link

I am a bit late to the party, but we see the broken pipe issue as well in our project. I picked up the suggestion to create a test case for this scenario. You can find it here: #523.

It would be great to get feedback or suggestions on how to fix the code, as the PR currently just contains the test case. Feedback is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Does this library provide a way to gracefully shutdown
4 participants