Dan/bump pq#2858
Conversation
|
To explain the approach here: It seems the automatic reconnect and retry logic in lib/pq is kinda dangerous as-is: lib/pq#939 and attempts to bring back the previous retrying behavior petered out because of safety concerns: lib/pq#871 So it seems the best path forward is to assume the retrying behavior is not coming back in lib/pq, or at least not soon. I also think it's fine for Automate to return one 500 after a database disconnect/reconnnect. If you agree with that then the sensible thing to do is to make our tests a little more forgiving of the occasional 500 that we would expect to see from the database resets. |
7922677 to
0e7ecc8
Compare
There was a problem hiding this comment.
i'm not sure i understand whats happening here. We've just done a deploy. Why does restarting everything help
There was a problem hiding this comment.
The frontends come up with broken Postgres connections, so the Inspec tests get exactly 2 500s, see for example https://buildkite.com/chef/chef-automate-master-verify-private/builds/9454#d349a780-69be-4a0f-ac17-10f6ae8f2475. Retrying 500s in the Inspec would be the better solution but it's a lot harder.
There was a problem hiding this comment.
For anyone who might stumble on this later, the code comment above is the correct answer--an HAProxy somewhere has the idle timeout set to 5 minutes, which kills the connections on frontend1 during the more-than-five-minutes when frontend2 is being deployed.
d8bf7e7 to
0d35487
Compare
There was a problem hiding this comment.
should we wire this up to the config so its easy to change
2f1161c to
c3eeb38
Compare
Full set of changes: lib/pq@83612a5...3427c32 Highlights include: - A fix for a potential deadlock when related to commit and rollback - QuoteLiteral function that can replace our custom implementation Signed-off-by: Steven Danna <steve@chef.io>
Signed-off-by: Daniel DeLeo <dan@chef.io>
Signed-off-by: Daniel DeLeo <dan@chef.io>
Signed-off-by: Daniel DeLeo <dan@chef.io>
Signed-off-by: Daniel DeLeo <dan@chef.io>
Signed-off-by: Daniel DeLeo <dan@chef.io>
Signed-off-by: Daniel DeLeo <dan@chef.io>
Signed-off-by: Daniel DeLeo <dan@chef.io>
Signed-off-by: Daniel DeLeo <dan@chef.io>
f39f569 to
7f00294
Compare
Signed-off-by: Daniel DeLeo <dan@chef.io>
ed041a1 to
df2e8ff
Compare
Signed-off-by: Daniel DeLeo <dan@chef.io>
df2e8ff to
730b0dd
Compare
🔩 Description: What code changed, and why?
Giving it another go to see what happens. Probably won't actively work on it a lot if it's not easy so feel free to take over if you happen upon this PR and need it done.
⛓️ Related Resources
👍 Definition of Done
👟 How to Build and Test the Change
✅ Checklist
📷 Screenshots, if applicable