Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect proxy: x-forwarded-client-cert contains duplicate certificates in chain #15704

Closed
t-davies opened this issue Dec 7, 2022 · 8 comments
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/consul-vault Relating to Consul & Vault interactions type/bug Feature does not function as expected

Comments

@t-davies
Copy link
Contributor

t-davies commented Dec 7, 2022

Overview of the Issue

TLDR
The x-forwarded-client-cert value suddenly contains multiple duplicate certificates.

--

We have a Consul cluster that has been running for several months without issue. The servers were upgraded from 1.12.3 to 1.13.3 approximately 1 month ago and have been running without issue since. Consul agents running on Nomad client nodes were upgraded afterwards over a period of ~2 weeks and, again, have been running without issue since.

On 6th December at approx 06:09 numerous applications running in the cluster that have traffic go via sidecar proxies start reporting issues with large headers.

We observe that x-forwarded-client-cert is indeed very large, luckily we don't depend on this and so we PUT /v1/config

{
  "Kind": "mesh",
  "HTTP": {
    "SanitizeXForwardedClientCert": true
  }
}

This stops Envoy from sending the x-forwarded-client-cert header to our applications and resolves the large headers issues.

Upon further investigation it seems that since 6th December at 06:09, the Chain value within x-forwarded-client-cert contains multiple duplicate issuer certs.

Prior to 6th December, 06:09
Chain contains 2 certs - just listing the subjects here not the full certs.

1. uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul/ns/default/dc/abc/svc/xxx-traefik-xxx
2. dns:pri-50m8xxx.vault.ca.2dcbc96a.consul,uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul

Post to 6th December, 06:09
Chain contains 10 certs, the previously present 2 plus 8 duplicates - just listing the subjects here not the full certs.

1. uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul/ns/default/dc/abc/svc/xxx-traefik-xxx
2. dns:pri-50m8xxx.vault.ca.2dcbc96a.consul,uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul
3. dns:pri-50m8xxx.vault.ca.2dcbc96a.consul,uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul
4. dns:pri-50m8xxx.vault.ca.2dcbc96a.consul,uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul
5. dns:pri-50m8xxx.vault.ca.2dcbc96a.consul,uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul
6. dns:pri-50m8xxx.vault.ca.2dcbc96a.consul,uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul
7. dns:pri-50m8xxx.vault.ca.2dcbc96a.consul,uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul
8. dns:pri-50m8xxx.vault.ca.2dcbc96a.consul,uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul
9. dns:pri-50m8xxx.vault.ca.2dcbc96a.consul,uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul
10. dns:pri-50m8xxx.vault.ca.2dcbc96a.consul,uri:spiffe://2dcbc96a-xxxx-27fb-5286-3855779de5cf.consul

This, obviously, makes the x-forwarded-client-cert header extremely large. Whilst everything is working now, since we are no longer passing this header to services - I'm assuming that this is not an intended behaviour.

Reproduction Steps

See issue description for recent upgrade path, nothing specific was done to trigger this - it just started occurring.

Consul info for both Client and Server

Client info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 65
        services = 71
build:
        prerelease =
        revision = b29e5894
        version = 1.13.3
        version_metadata =
consul:
        acl = enabled
        known_servers = 5
        server = false
runtime:
        arch = amd64
        cpu_count = 8
        goroutines = 1526
        max_procs = 8
        os = linux
        version = go1.18.1
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 19
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 6
        member_time = 212418
        members = 25
        query_queue = 0
        query_time = 2
Server info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease =
        revision = b29e5894
        version = 1.13.3
        version_metadata =
consul:
        acl = enabled
        bootstrap = false
        known_datacenters = 1
        leader = false
        leader_addr = xxx.37:8300
        server = true
raft:
        applied_index = 20644423
        commit_index = 20644423
        fsm_pending = 0
        last_contact = 35.448649ms
        last_log_index = 20644424
        last_log_term = 32
        last_snapshot_index = 20633021
        last_snapshot_term = 32
        latest_configuration = [{Suffrage:Voter ID:4db92480-37dd-577e-e3a7-35b50877c382 Address:xxx.30:8300} {Suffrage:Voter ID:17353d45-31bd-ca37-48a3-951f89af2847 Address:xxx.5:8300} {Suffrage:Voter ID:a1ab302c-caa5-8c58-2fdc-33ca05bfba4e Address:xxx.8:8300} {Suffrage:Voter ID:433f0869-3f4e-a0aa-7b01-1a5949abd0d1 Address:xxx.37:8300} {Suffrage:Voter ID:0abdc788-eb41-6c87-8730-e65189dafd96 Address:xxx.27:8300}]
        latest_configuration_index = 0
        num_peers = 4
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 32
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 731
        max_procs = 2
        os = linux
        version = go1.18.1
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 19
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 4
        member_time = 212423
        members = 23
        query_queue = 0
        query_time = 2
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 4
        member_time = 18585
        members = 9
        query_queue = 0
        query_time = 2

Operating system and Environment details

Deployed to AWS on Amazon Linux 2, single region - 5 Consul servers.
Agents deployed on Nomad cluster nodes, sidecar proxies running as Nomad tasks.
envoy version: edd69583372955fdfa0b8ca3820dd7312c094e46/1.23.1/Clean/RELEASE/BoringSSL

Log Fragments

Nothing stands out as being obviously wrong in the logs.

@jkirschner-hashicorp
Copy link
Contributor

jkirschner-hashicorp commented Dec 7, 2022

Hi @t-davies,

It looks like you're using the Vault CA provider for Consul. Which Vault version is in use?

If using Vault 1.11+, review this knowledge base article ASAP, including the recommended workaround. You can also follow the related issue on Github: #15217.

@jkirschner-hashicorp jkirschner-hashicorp added the theme/consul-vault Relating to Consul & Vault interactions label Dec 7, 2022
@t-davies
Copy link
Contributor Author

t-davies commented Dec 7, 2022

Hey @jkirschner-hashicorp, it is indeed 1.11 - thanks for the info and the quick reply.
Indeed GET /v1/connect/ca/roots shows me many, many, intermediate certs. Will take a look at the KB article and apply the workaround.

@jkirschner-hashicorp jkirschner-hashicorp added type/bug Feature does not function as expected theme/certificates Related to creating, distributing, and rotating certificates in Consul labels Dec 7, 2022
@t-davies
Copy link
Contributor Author

t-davies commented Dec 7, 2022

@jkirschner-hashicorp looks like this is resolved in Consul 1.13.4? Would it be worth us just upgrading?

@jkirschner-hashicorp
Copy link
Contributor

jkirschner-hashicorp commented Dec 7, 2022

It's resolved in Consul 1.13.4 for primary datacenters, but not secondary datacenters.

Do you have a multi-datacenter Consul deployment (where the datacenters are connected with WAN federation, rather than just being disconnected, independent datacenters)?

More details in this comment: #15217 (comment)

@jkirschner-hashicorp
Copy link
Contributor

I clarified this in the main changelog, but not in the list on the releases page. This is a good reminder for me to update the information on the releases page as well.

@t-davies
Copy link
Contributor Author

t-davies commented Dec 7, 2022

Thanks for the info. We're single DC, so sounds like that should be ok for us.

@t-davies
Copy link
Contributor Author

t-davies commented Dec 7, 2022

Just confirming that applying the workaround from the KB article did fix this issue, the chain in x-forwarded-client-cert header now only contains a single, renewed, intermediate certificate. We have also now upgraded to 1.13.4 and everything appears stable.

The date/time that we experienced this (06/12/2022 06:09:34) aligns with ~50% of the original intermediate CA lifespan (not after: 06/06/2023 10:22:22) too.

@jkirschner-hashicorp
Copy link
Contributor

Excellent - I'm glad to hear that upgrading to Consul 1.13.4 resolved the issue for you, and I appreciate the additional confirmation of the ~50% of the original intermediate CA lifespan timing.

The more general issue (#15217) will remain open until we've released the fix for secondary datacenters, but I'll close this particular issue for now given your current status. Feel free to re-open if needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/consul-vault Relating to Consul & Vault interactions type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

2 participants