Dan/bump pq #2858

danielsdeleo · 2020-02-12T22:17:19Z

🔩 Description: What code changed, and why?

Giving it another go to see what happens. Probably won't actively work on it a lot if it's not easy so feel free to take over if you happen upon this PR and need it done.

⛓️ Related Resources

👍 Definition of Done

👟 How to Build and Test the Change

✅ Checklist

I have read the CONTRIBUTING document.
Tests added/updated?
Docs added/updated?
All commits have been signed-off for the Developer Certification of Origin.

📷 Screenshots, if applicable

danielsdeleo · 2020-02-12T23:24:32Z

To explain the approach here:

It seems the automatic reconnect and retry logic in lib/pq is kinda dangerous as-is: lib/pq#939 and attempts to bring back the previous retrying behavior petered out because of safety concerns: lib/pq#871

So it seems the best path forward is to assume the retrying behavior is not coming back in lib/pq, or at least not soon. I also think it's fine for Automate to return one 500 after a database disconnect/reconnnect. If you agree with that then the sensible thing to do is to make our tests a little more forgiving of the occasional 500 that we would expect to see from the database resets.

jaym · 2020-02-14T15:07:19Z

integration/tests/cluster.sh

i'm not sure i understand whats happening here. We've just done a deploy. Why does restarting everything help

The frontends come up with broken Postgres connections, so the Inspec tests get exactly 2 500s, see for example https://buildkite.com/chef/chef-automate-master-verify-private/builds/9454#d349a780-69be-4a0f-ac17-10f6ae8f2475. Retrying 500s in the Inspec would be the better solution but it's a lot harder.

For anyone who might stumble on this later, the code comment above is the correct answer--an HAProxy somewhere has the idle timeout set to 5 minutes, which kills the connections on frontend1 during the more-than-five-minutes when frontend2 is being deployed.

jaym · 2020-02-19T19:10:48Z

components/automate-pg-gateway/habitat/default.toml

should we wire this up to the config so its easy to change

Full set of changes: lib/pq@83612a5...3427c32 Highlights include: - A fix for a potential deadlock when related to commit and rollback - QuoteLiteral function that can replace our custom implementation Signed-off-by: Steven Danna <steve@chef.io>

Signed-off-by: Daniel DeLeo <dan@chef.io>

danielsdeleo force-pushed the dan/bump-pq branch 5 times, most recently from 7922677 to 0e7ecc8 Compare February 14, 2020 01:08

jaym reviewed Feb 14, 2020

View reviewed changes

danielsdeleo force-pushed the dan/bump-pq branch 4 times, most recently from d8bf7e7 to 0d35487 Compare February 19, 2020 18:39

jaym reviewed Feb 19, 2020

View reviewed changes

components/automate-pg-gateway/habitat/default.toml Outdated

Copy link

Contributor

jaym Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we wire this up to the config so its easy to change

jaym approved these changes Feb 19, 2020

View reviewed changes

danielsdeleo force-pushed the dan/bump-pq branch 8 times, most recently from 2f1161c to c3eeb38 Compare February 20, 2020 22:29

stevendanna and others added 9 commits February 20, 2020 17:39

bump lib/pq to 1.2.0

ce01419

Full set of changes: lib/pq@83612a5...3427c32 Highlights include: - A fix for a potential deadlock when related to commit and rollback - QuoteLiteral function that can replace our custom implementation Signed-off-by: Steven Danna <steve@chef.io>

Retry admin token creation in diagnostics

2286280

Signed-off-by: Daniel DeLeo <dan@chef.io>

Increase idle timeout for pg and make it configurable-ish

4cac41c

Signed-off-by: Daniel DeLeo <dan@chef.io>

workaround inspec tests not handling 500s from stale db conns

c6657b4

Signed-off-by: Daniel DeLeo <dan@chef.io>

Allow pg-gateway idle connection time to be set in automate config

85cac23

Signed-off-by: Daniel DeLeo <dan@chef.io>

Make cfgmgmt_actions diagnostics less picky

978526f

Signed-off-by: Daniel DeLeo <dan@chef.io>

More cfgmgmt_actions diagnostics fixes

2840ae1

Signed-off-by: Daniel DeLeo <dan@chef.io>

Wrap of a nil error is just nil :|

8d65d11

Signed-off-by: Daniel DeLeo <dan@chef.io>

Retry 403 in diagnostics in case of auth internal error

adbc5ae

Signed-off-by: Daniel DeLeo <dan@chef.io>

danielsdeleo force-pushed the dan/bump-pq branch from f39f569 to 7f00294 Compare February 21, 2020 01:39

Dial back added debug/logging

33b402a

Signed-off-by: Daniel DeLeo <dan@chef.io>

danielsdeleo force-pushed the dan/bump-pq branch 3 times, most recently from ed041a1 to df2e8ff Compare February 21, 2020 21:08

try a different means of purging stale connections

730b0dd

Signed-off-by: Daniel DeLeo <dan@chef.io>

danielsdeleo force-pushed the dan/bump-pq branch from df2e8ff to 730b0dd Compare February 21, 2020 23:09

danielsdeleo requested a review from a team as a code owner February 21, 2020 23:09

danielsdeleo merged commit 5650c42 into master Feb 22, 2020

danielsdeleo deleted the dan/bump-pq branch February 22, 2020 01:34

danielsdeleo mentioned this pull request Mar 4, 2020

Jdm/bump pq #2895

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dan/bump pq #2858

Dan/bump pq #2858

Uh oh!

danielsdeleo commented Feb 12, 2020

Uh oh!

danielsdeleo commented Feb 12, 2020

Uh oh!

jaym Feb 14, 2020

Uh oh!

danielsdeleo Feb 14, 2020

Uh oh!

danielsdeleo Feb 19, 2020

Uh oh!

jaym Feb 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Dan/bump pq #2858

Dan/bump pq #2858

Uh oh!

Conversation

danielsdeleo commented Feb 12, 2020

🔩 Description: What code changed, and why?

⛓️ Related Resources

👍 Definition of Done

👟 How to Build and Test the Change

✅ Checklist

📷 Screenshots, if applicable

Uh oh!

danielsdeleo commented Feb 12, 2020

Uh oh!

jaym Feb 14, 2020

Choose a reason for hiding this comment

Uh oh!

danielsdeleo Feb 14, 2020

Choose a reason for hiding this comment

Uh oh!

danielsdeleo Feb 19, 2020

Choose a reason for hiding this comment

Uh oh!

jaym Feb 19, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants