-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Failed to send msg SaltReqTimeoutError('Message timed out') after upgrade 3003.5 -> 3005.1 #63582
Comments
Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. |
honorable mentions: |
This is a duplicate of the already mentioned regressions. Good to have more data though. |
Out of curiosity, does TCP transport have this problem? In my/our experience, it's much more reliable than zmq |
@OrangeDog could you please reference/link duplicate(s)? |
@gyro666 you already did |
Confirmed on 3005.1, happens randomly |
Same for me, seem link to the latency between minion and master |
Confirmed |
We were having the same issues in our, relatively small (~250 minions), Salt environment. For us swapping from the zeromq transport to the tcp transport solved the problem. The issues we were seeing when we were still on the zeromq transport were exactly the same as described in the first issue. After a while we got more and more timeout issues and minions "not responding" and when we would restart the master everything worked fine again for a while. Now with the tcp transport everything keeps running great without the issues. |
Confirmed in SUSE Manager. It happens about weekly. Changing to TCP transport is not an option, because the proxy does not support this method. |
Newly installed minion can't connect ( |
Description
intermittently minion state runs into above timeout
run time is then delayed 60s or more.
IMPORTANT: behavior starts after 4-7days of running and goes away after master restart.
usually after a timeout, state.apply continues.
This can cause huge delays with complex high states. Some can run longer then hour. (instead of minutes)
no obvious errors are reported in the logs on either master or minion side.
Setup
1 master, 58 minions (debian10)
onedir installation (but it was the same with the old style)
issue is observed within same subnet (no firewalls)
there is no iptables rules on the salt-master or minion
~# sed '/^#/d;/^$/d' /etc/salt/master
cat salt/tests/event_test123.sls
Please be as specific as possible and give set-up details.
Steps to Reproduce the behavior
run (but also anything else like test.ping timeouts)
at the same time event log on the master show:
I can't include full traces as I don't have same version in the staging ENV, however I was able to perform stack traces on few occasions.
strace parser link: https://gitlab.com/gitlab-com/support/toolbox/strace-parser
strace done with:
Here is a clear cut example of a succsefful run (no delays ~7s):
here is a 60s delayed run:
here is a bit of insight into the most delayed pid run there
Expected behavior
Behave reliable consistent without timeouts and delays.
Screenshots
(n/a)
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) No differences between master/minion versions (3005.1)Additional context
The text was updated successfully, but these errors were encountered: