Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] vault + consul +dnsmasq = unresolvable ping #3710

Open
Justin-DynamicD opened this issue Nov 23, 2017 · 5 comments
Open

[bug] vault + consul +dnsmasq = unresolvable ping #3710

Justin-DynamicD opened this issue Nov 23, 2017 · 5 comments
Labels
theme/api Relating to the HTTP API interface theme/consul-vault Relating to Consul & Vault interactions type/bug Feature does not function as expected
Milestone

Comments

@Justin-DynamicD
Copy link

vault 0.9.0
consul 1.0.0

This behavior is being observed after attempting to emulate the "c1m" or "container 1 million" challenge configuration. I'm posting it under consul as the vault team feels it's a consul problem (I'm not convinced of this as the issue seems unique to vault registration, but see issue #3604)

Setting the Stage
Using Ubuntu 16.04 servers, use packer to deploy servers using the scripts from c1m (or take existing boxes and just run the various .sh scripts) . This will result in not only consul being locally installed, but dnsmasq bound to 127.0.0.1 and redirecting /consul/127.0.0.1#8600

Once done, introduce an HA Vault solution in the same Datacenter (not the same server). This will, of course, result in the "vault" service being registered in Consul.

Demonstrating the Issue

Easiest way to demonstrate this issue will be to break things down into steps, so bare with me:

  1. ping randomservice.service.consul <-- success!
  2. ping active.vault.service.consul <-- fails!
  3. ping vault.service.consul <-- fails!
  4. service stop dnsmasq
  5. ping randomservice.service.consul <-- success!
  6. ping active.vault.service.consul <--success!
  7. ping vault.service.consul <-- fails!

So the oddity here is the "active.vault" ping fails when dnsmasq is running, but succeeds when it is stopped. it' also worth mentioning, that an nslookup and dig will both work perfectly fine, and that when you dig vault.service.consul, you get CNAMES instead of A records:

; <<>> DiG 9.10.3-P4-Ubuntu <<>> vault.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13482
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;vault.service.consul.       IN      A

;; ANSWER SECTION:
vault.service.consul. 0      IN      CNAME   lv-pd-sdsc-02[redacted].
vault.service.consul. 0      IN      CNAME   lv-pd-sdsc-01[redacted].
vault.service.consul. 0      IN      CNAME   lv-pd-sdsc-03[redacted].
lv-pd-sdsc-02[redacted]. 1200 IN A   10.10.32.10

;; Query time: 0 msec
;; SERVER: 10.10.12.10#53(10.10.12.10)
;; WHEN: Wed Nov 22 21:06:08 UTC 2017
;; MSG SIZE  rcvd: 172

I'm asking in the consul support because it looks like the culprit might be that somehow vault is registering CNAMES instead of A records, which is technically "bad" (doesn't follow RFC). I'm suspicious of this being why dnsmasq refuses to resolve "active.vault" as it's a sub record.

Again, service resolution for anything OTHER THAN vault will work, unless I change the service name of the vault cluster ... then that new name won't work (basically, the vault service fails to ping regardless of name chosen).

Before closing and claiming it's a vault problem (old api calls or some such), please be aware they have already closed this can called it a consul problem. There's some finger pointing going on here ...

@Justin-DynamicD
Copy link
Author

Some googling about send to reveal I'm not the only person to notice this behavior:

https://groups.google.com/forum/m/#!topic/consul-tool/IUp5LvUrGDA

@Justin-DynamicD
Copy link
Author

Update:

worked with the vault team and made the discovery: if I set vault variables to only advertise it's IP address to Consul, Consul then appropriately uses A records and all problems are solved. It seems that Consul will indiscriminately return CNAMES for DNS names even when it violates DNS RFC that you should never return more than a single record in the case of a cname.

It seems there should be a behavior update from Consul, especially as fowarders have been introduced and the clear intent to use Consul as full purpose DNS server.

@slackpad slackpad added type/bug Feature does not function as expected theme/api Relating to the HTTP API interface labels Jan 5, 2018
@slackpad slackpad added this to the Unplanned milestone Jan 5, 2018
@slackpad
Copy link
Contributor

slackpad commented Jan 5, 2018

This does look like we need to limit the number of CNAME responses.

@codyja
Copy link

codyja commented May 11, 2018

We hit this bug as well.

@lokesp11
Copy link

lokesp11 commented Apr 3, 2020

Hello Team,

For us ping and curl is not at all working for any *.service.consul. Tough nslookup and dig works fine.Please suggest how can I fix it?
#7587

@jsosulska jsosulska added the theme/consul-vault Relating to Consul & Vault interactions label Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/api Relating to the HTTP API interface theme/consul-vault Relating to Consul & Vault interactions type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

5 participants