Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Fix gateway services cleanup where proxy deregistration happens after service deregistration into release/1.15.x #18856

Merged
merged 1 commit into from
Sep 18, 2023

Conversation

andrewstucki
Copy link
Contributor

Backport

This PR is a backport of #18831 to release/1.15.x

The below text is copied from the body of the original PR.


Description

While trying to debug a customer issue, I noticed that when:

  1. A user specifies an ingress gateway with a target of "*"
  2. They boot up the ingress gateway and a service in the target namespace
  3. They've registered the target and its sidecar proxy individually
  4. They then deregister the target followed by the corresponding sidecar
  5. The gateway still attempts to route to the service

This happens because of a bug in the store that was causing wildcard-targeted gateway services to be pruned from tableGatewayServices only when they don't have existing sidecar proxies. In the above case, when the sidecar is deregistered second, the cleanup routine never triggers for the main service (instead a routine runs that attempts to prune the sidecar service name from tableGatewayServices rather than the service it's acting as a sidecar for).

Testing & Reproduction steps

I added a unit test to exercise this case, but I also threw together a gist that manually recreates this. I did so using an enterprise Consul build, but the steps can be adjusted for non-enterprise as well.

  1. Clone the repro gist
  2. build a local consul-enterprise build with make dev-docker and tag it with docker tag consul:local consul-enterprise:local
  3. shove an enterprise license in the environment variable CONSUL_LICENSE
  4. run ./reset.sh in the gist
  5. run kubectl delete deployment static-server
  6. run kubectl exec -it consul-server-0 -- consul catalog services and see that static-server is not longer in the catalog
  7. run kubectl exec -it consul-server-0 -- curl https://localhost:8501/v1/catalog/gateway-services/consul-dc1-ingress-gateway -k | jq

Without the fix you'll see something like (notice static-server still in the payload):

[
  {
    "Gateway": {
      "Name": "consul-dc1-ingress-gateway",
      "Partition": "default",
      "Namespace": "default"
    },
    "Service": {
      "Name": "service-two",
      "Partition": "default",
      "Namespace": "default"
    },
    "GatewayKind": "ingress-gateway",
    "Port": 8080,
    "Protocol": "http",
    "FromWildcard": true,
    "ServiceKind": "service",
    "CreateIndex": 76,
    "ModifyIndex": 76
  },
  {
    "Gateway": {
      "Name": "consul-dc1-ingress-gateway",
      "Partition": "default",
      "Namespace": "default"
    },
    "Service": {
      "Name": "static-server",
      "Partition": "default",
      "Namespace": "default"
    },
    "GatewayKind": "ingress-gateway",
    "Port": 8080,
    "Protocol": "http",
    "FromWildcard": true,
    "ServiceKind": "service",
    "CreateIndex": 57,
    "ModifyIndex": 57
  }
]

With the fix you should see something like:

[
  {
    "Gateway": {
      "Name": "consul-dc1-ingress-gateway",
      "Partition": "default",
      "Namespace": "default"
    },
    "Service": {
      "Name": "service-two",
      "Partition": "default",
      "Namespace": "default"
    },
    "GatewayKind": "ingress-gateway",
    "Port": 8080,
    "Protocol": "http",
    "FromWildcard": true,
    "CreateIndex": 607,
    "ModifyIndex": 607
  }
]

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

Overview of commits

087539f

… service deregistration (#18831)

* Fix gateway services cleanup where proxy deregistration happens after service deregistration

* Add test

* Add changelog

* Fix comment
@andrewstucki andrewstucki requested a review from a team as a code owner September 18, 2023 20:29
@andrewstucki andrewstucki added pr/no-changelog PR does not need a corresponding .changelog entry pr/no-backport labels Sep 18, 2023
@andrewstucki andrewstucki merged commit 4d3d650 into release/1.15.x Sep 18, 2023
94 of 96 checks passed
@andrewstucki andrewstucki deleted the 18831-backport1.15 branch September 18, 2023 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/no-backport pr/no-changelog PR does not need a corresponding .changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants