-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NS records and A records for each server. Constructs ns host name… #3353
Conversation
Example output
|
agent/dns.go
Outdated
Name: d.domain, | ||
Rrtype: dns.TypeNS, | ||
Class: dns.ClassINET, | ||
Ttl: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open question around TTL for this part. Should it be > 0?
Note that the A record below with the ip address mapped to the ns record name does set a TTL
Reached out to person that reported this first to ask for testing it - |
Still need to read the RFC on what this needs to look like but at a first glance I'd use a different name for the NS name since dots usually denote subdomains. Maybe like this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments on style and format. Still need to read RFCs on whether the solution is correct.
agent/consul/servers/router.go
Outdated
} | ||
|
||
var ret []string | ||
// Try each manager until we get a server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop comment or explain why not every manager is a server
agent/dns.go
Outdated
@@ -673,6 +682,40 @@ RPC: | |||
} | |||
} | |||
|
|||
// addNSAndARecordsForDomain uses the agent's advertise address to | |||
func (d *DNSServer) addNSAndARecordsForDomain(msg *dns.Msg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/addNSAndARecordsForDomain/addAuthority/g
Also the comment isn't done yet.
agent/dns.go
Outdated
func (d *DNSServer) addNSAndARecordsForDomain(msg *dns.Msg) { | ||
serverAddrs := d.agent.delegate.ServerAddrs() | ||
for _, addr := range serverAddrs { | ||
ipAddrStr := strings.Split(addr, ":")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See separate comment on nsName
format.
agent/dns.go
Outdated
} | ||
msg.Ns = append(msg.Ns, ns) | ||
|
||
//add an A record for the NS record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add space between //
and comment
TL;DR
I still need to test proper delegation from bind/named. So the last patch is my idea of the correct behavior of consul for this use case after reviewing the RFCs and comparing to other DNS responses. Therefore, additional eyes are welcome. Long versionI've reviewed the RFCs and some additional documents to review the implemented solution. The IANA nameserver guidelines [1] are useful in how to configure an authoritative nameserver but not all requirements apply since consul is usually not deployed as an internet facing DNS server. For example, we will usually not be able to provide name servers in different BGP networks and ip addresses are usually in private networks. The main purpose for consul is to handle DNS resolution for registered services on an internal network. To embed consul into a larger DNS setup it should be possible to delegate a zone authoritavely to consul. For this the responses need to have the The main problem is that consul does not provide Therefore, responding to In the current implementation I've limited the number of servers returned for NS queries to three and also randomized them. I am not sure whether this is a good idea for a delegated zone since the parent server needs to know about the slave servers' ip addresses. Therefore, they cannot change at will since the parent server config would have to be updated as well. Setting the TTLs to zero is a good idea for services since we don't want to cache this. Whether this is also a good idea for I've also dropped the The repsonse for the This needs to be tested by us/the community whether it works with zone delegation. References
Consul 0.9.0 behavior:
Latest patch behavior
|
@magiconair This looks close to being right, except I don't think we need the additional section when its queried for the ns record, as long as the record returned resolves correctly in a subsequent query. Example with google:
Note the lack of an additional section above^. But if I query for one of the name servers like below, I do get an iP back.
|
I pushed another commit, that does the following, I think this is pretty close now: Fixes the tests I would suggest that the other configurability changes you suggested (like ttl and postmaster address being configurable) should be addressed in a followup issue, to keep the scope of this small. The SOA changes looked right to me after I read the specs above, so I left them alone. |
The difference is that the google NS records have a TTL > 0. They are also running a real DNS server on the internet with fixed addresses rather static databases. Consul is a different use case and I don't think that you can compare them one-to-one. Adding the A records to the Extra section means that the caller does not have to make another request to resolve the records. Also, if you are providing more than one NS record the caller has to make multiple calls to resolve them. Adding the glue records reduces latency and traffic. The main point is not to add the NS records and the glue records to all responses. I'd leave them in the SOA and NS responses. |
@magiconair I added the glue records back, and expanded on the test. One note about unit testing - I couldn't come up with a way to create a test agent server that has a ipV6 address, I tried converting 127.0.0.1 into ipv6, but net.ParseIP parses it back so the agent sees it as a iPv4 address (https://play.golang.org/p/7F1wflOlQZ). (This was so that we could verify AAAA records in the NS response). |
@preetapan Wouldn't |
agent/dns_test.go
Outdated
t.Fatalf("err: %v", err) | ||
} | ||
|
||
if len(in.Answer) != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using verify.Values
works better in this case than comparing individual fields. I can have a quick look.
in string | ||
invalid bool | ||
}{ | ||
{"Valid Hostname", "testnode", false}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to include edge cases like ""
, "."
, "a.b"
, " "
, ...
I usually try to use generic values like a
, ab
, a-b
and so forth to encode a pattern of what I'm testing. This way you don't have to invent names and you can use the pattern to group certain cases.
Feel free to disregard both suggestions if you feel that this is sufficient.
@magiconair I didn't know about ::1, thanks. That does work, and is a good test case because it exposes that our parsing of the address returned by memberlist doesn't handle the `[ ]' brackets added by the String() method of TCPAddr ( https://golang.org/src/net/ipsock.go#L186). I am working on a IPV6 specific test case now. |
…s using the advertise address of the server.
…with "server" rather than "ns"
This patch changes the behavior of the DNS server as follows: * The SOA response contains the SOA record in the Answer section instead of the Authority section. It also contains NS records in the Authority and the corresponding A glue records in the Extra section. In addition, CNAMEs are added to the Extra section to make the MNAME of the SOA record resolvable. AAAA glue records are not yet supported. * The NS response returns up to three random servers from the consul cluster in the Answer section and the glue A records in the Extra section. AAAA glue records are not yet supported.
…d already resolves correctly. Also fixed all the unit tests, and ignored hostnames that don't meet valid dns hostname criteria
…ed same function used in node lookup for adding A/AAAA records in the extra section of the NS response
…to use verify library and other code review feedback
I've rebased the branch and did a force push. Tests with running a I didn't manage to get a bind server configured with forwarding but the following worked:
@preetapan The changes broke the tests and I didn't get around fixing them yet. |
…rd being returned.
I've completed the tests and delegation via I did not manage to setup a
I ran a I've also found out that he Primary Master in the SOA record ( test setup
bind setup on 192.168.33.11
configure bind (disable DNSSEC for this test)
configure master and consul zone
create master zone
reload changes
test bind setup
consul setup
test consul directquery SOA
query NS
query service
test consul via bindquery SOA
query NS
query service
|
Did this patch ever make it into consul? I'm running v1.0.1 here and desperately need this functionality, but the above functionality is definitely NOT in place. |
@danparsons this should be in 1.0.1 for sure - can you provide some more details about what you are expecting that's not there? |
Did a quick test with a local Consul agent, and definitely see NS records:
|
Hi, Below the results of the dig tests I did
|
@rodbellg Fascinating! Thanks for telling me it works on v0.9.1. I'm going to try that version right now and see if I get better results. |
|
@preetapan Perhaps I have a mistake in my configuration that is causing this problem for me? Here's my config: { I get this output: $ dig @consul-a.my.companys.domain consul.my.companys.domain ; <<>> DiG 9.9.7-P3 <<>> @consul-a.my.companys.domain consul.my.companys.domain ;; OPT PSEUDOSECTION: ;; AUTHORITY SECTION: ;; Query time: 50 msec See where it says IN SOA "ns.consul.my.companys.domain"? Where is it getting that value from? None of my consul servers have that hostname. Their names are consul-a.my.companys.domain etc. In addition to the wrong SOA line, Consul will also not respond to queries for the NS records for consul.my.companys.domain: $ host -t ns consul.my.companys.domain consul-a.my.companys.domain consul.my.companys.domain has no NS record This is completely counter to the RFC for how NS records work, and because of this, Consul breaks NS delegation. The way it's supposed to work: (1) DNS server (Route53) for my.companys.domain has 3 NS record for "consul.my.companys.domain". One for each of my 3 consul servers. This part works correctly because it's Route53. (2) Consul ALSO has to serve (authoritatively) NS records for each of the 3 consul servers in the cluster. Above where it says "has no NS record", it should be showing e.g.: consul.my.companys.domain. 300 IN NS consul-a.my.companys.domain. But it doesn't. And because the two servers don't have identical output, all well-written DNS libraries ignore the NS records entirely, and then delegation to Consul fails. What am I doing wrong here? Thanks for reading!!! |
BTW, I tried on 0.9.1 and saw identical behavior. |
The "ns" in that SOA record is coming from a hardcoded default value. It picks that as a prefix to whatever you set your 'domain' to in the config. We haven't made that part configurable yet For your second question about responding to ns records - Consul currently only responds to node, service and prepared queries as per the documentation here. So it won't understand Consul also does respond with NS records for any prefix of what you set "domain" to. So in your example you should be able to get NS records if you looked up `"consul-a.consul.my.companys.domain." See my examples below, I used a local agent again and configured the agents domain to Example with node lookup:
Example with service lookup:
Example with a random prefix "consul-test", Consul does not return anything for this one
|
No matter what I do, I can't get consul to emit a single NS record, ever. Even following your examples. For reference purposes, here's my 'consul members':
Now, some digs: First, I'd like to demonstrate that consul will at least return A records:
Following your first example in your last post produces no NS records:
Why is this happening? Why is your consul able to emit NS records, but mine isn't? |
Furthermore, how does your consul know to emit an NS record with "preetha-work.node.mydc.consul.my.test.domain."? |
I've just upgraded my Consul to 1.0.2 and I'm not receiving NS answers. Yes I'm using the default .consul domain I do see SOA
But not NS
|
Hi @TheVendiniPhil I'm not sure what's different about your setup, other than my version of
|
From one of the Consul Servers (the cluster leader):
Consul is run from systemd
And the consul configuration is:
|
So I went to another server (in our lab) , stopped the running , configured consul service there. Started consul as a standalone -dev instance, zero configuration as per your example.
And I still get the same lack of NS answer.
|
Further investigation shows that Consul is wanting to provide an NS answer, but is unhappy about something somewhere.
What's invalid about that? It's the actual hostname, that hostname IS the nodename as configured in consul, and it's a valid name in DNS. aargh! apparently "node_name" is NOT allowed to be the FQDN, it needs to be the value from "hostname -s" (sometimes the Consul documentation could be a little more explicit) |
@TheVendiniPhil We validate that the node name has to match RFC1123. We internally use You are right that the documentation does not call this out and the agent accepts node names with dots in them. I have created issue #3854 to track this |
…s using the advertise address of the server.