-
-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set a default write timeout #1314
Conversation
Sets a default write timeout of 295 seconds to match the default read timeout. Signed-off-by: Alex Hunt <alex.hunt@materialize.com>
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1314 +/- ##
=======================================
- Coverage 72.4% 72.3% -0.0%
=======================================
Files 75 75
Lines 6343 6344 +1
=======================================
- Hits 4587 4586 -1
- Misses 1756 1758 +2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey. Thanks for this. I think setting a default here makes sense.
Initially, thought that 295s was a weird number, and was going to suggest somewhere in the range 60-100s originally, but a quick search over github/kubernetes reveals general ballpark numbers in the same order. Probably makes sense to start high as here.
perhaps of interest, someone did report one weird lockup on the discord - have not been able to get to the bottom of it yet |
Thanks for the quick review and the link. I'll have to take a look again later, the discord isn't loading for me. |
There is an issue the lockup in #1316. If it is similar to what you are encountering then it would be good to know. |
Thank you for the link. It is hard to tell if what we saw is that same issue. It certainly could be. We do not restart our controllers within a single process, but do recreate our controller pods when their code changes (up to several times a day). We do run several controllers in a single process. On startup there can be hundreds of reconciliations. |
Thanks again. Merging this. There might be a better timeout number long term, but don't have a good justification for a smaller number (feel free to give one in the future). |
Sets a default write timeout of 295 seconds to match the default read timeout.
Motivation
We recently had an application hang during some kube operations with no obvious cause other than a lack of timeouts. When investigating, we realize that there were timeouts for connect and read by default, but write seems to have been omitted. We do not have strong evidence that this lack of timeout was the cause of our hang, but at this moment it seems like our strongest lead.
While we can set the timeout ourselves after creating the Config, it would be better to prevent this foot gun for others by setting it by default when creating the Config.
Solution
Choosing 295 seconds is a bit arbitrary, and should arguably be much smaller, but having a timeout at all seems obviously better than just hanging forever.
The possibility of setting the timeout was introduced in #971, but that PR did not include the write timeout simply because the Go client didn't have one. To me, that another client neglected a timeout on a network operation does not seem like a sufficient reason to do the same.