Skip to content

Conversation

domsolutions
Copy link
Contributor

Motivation

Under load when pipeline-gw shutsdown, on occasion we'd get 503s as envoy would still proxy reqs while HTTP/gRPC were gracefully shutting down and no longer accepting new connections.

Summary of changes

  • sleep for 1 second after closing connection to scheduler which will trigger envoy cluster update

Checklist

  • Added/updated unit tests
  • Added/updated documentation
  • Checked for typos in variable names, comments, etc.
  • Added licences for new files

Testing

@domsolutions domsolutions requested a review from lc525 as a code owner September 23, 2025 21:19
Copy link
Member

@lc525 lc525 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

)

const (
waitForRouteToBeRemovedFromEnvoy = 1 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be enough with multiple envoy replicas, but it's a good point to start from. Did you by any chance run with multiple envoy replicas already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me test that scenario, only tested with 1

@domsolutions domsolutions force-pushed the fix/pipeline-gw-shutdown-envoy branch from 836fc03 to 7accca8 Compare September 25, 2025 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants