-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SRV responses with absolute domain names are treated as relative domain names in v3.5.3 #13948
Comments
This bug exists in master, and was backported to 3.5.3 (and apparently copies a similar bug that already existed on the server-side - Line 72 in e814f6f
|
/cc @ahrtr @spzala @serathius |
Thanks @liggitt for raising this issue, but it isn't correct to me. Firstly, the addrs isn't the cname, instead it's a slice of SRV records. The cname in this case is one of the following two values,
Secondly, no matter whether there is a dot at the end of Let's work with your example. removing the trailing dot at the end of each target, and adding the port 2379,
Afterwards, run command below,
The response is,
Thirdly, removing the trailing dot is correct, because users might also need the second DNS lookup. Fourthly, are you using " |
Sorry to disagree, but this is a bug. Using local dns search paths to resolve an absolute domain name returned in a SRV record is not correct. All the queries in the |
that's because SRV responses return |
@ahrtr please re-add the |
I'm not a DNS expert, but in my option:
If the CNAMEs are explicitly excluded, I doubt the intention of standard author was 'allowing second DNS lookup' for local resolution. |
I agree with you if this statement is always true. |
I'm not sure what you mean by that, but if a hostname is returned in the SRV record instead of an IP (which is valid to do), a DNS lookup is required to resolve that hostname to a usable IP. The point of disagreement is whether local DNS search paths should be used in that DNS lookup. When resolving an absolute hostname returned in the SRV record, local search paths should definitely not be used. |
Ah, I think I confused the issue by calling the What I should have called them were "absolute domain names", meaning they explicitly should not have search suffixes appended. from rfc1035:
We should not be stripping a trailing dot from the domain name, turning an absolute domain name into a relative domain name. I edited the title/description/comments to clarify the issue is in the treatment of absolute domain names returned in SRV records. |
Whether it is always true or not, if a particular SRV response includes a dot suffix, that indicates that hostname is absolute, so we have to keep the dot suffix to do the DNS resolution of that hostname correctly. |
Technically speaking, I agree with you. But it's a little counterintuitive. Just as I raised previously, usually we access a service using URL something like Usually we define an entry something below into /etc/hosts, so it doesn't matter whether we trim the trailing dot or not in this case from technical perspective. But from users' perspective, a URL without the trailing dot makes more sense? I guess this is the reason why previously the trailing dot is trimmed?
Thanks for raising this interesting discussion, which I am totally open to. |
I agree with you. By "local resolution" I mean taking in consideration local I approved the PR for rollback in 3.5.3, and I think we should release 3.5.4. @liggitt: Has this problem manifested practically (e.g. broken k8s tests) or you cached it by reading the changelog ? |
I just caught it reviewing the code changes. Kubernetes doesn't inherently use the SRV lookup approach, but particular installations certainly could, and the fact that kubernetes sets ndots:5 inside containers could make anyone using SRV lookup inside a kubernetes container more susceptible to this issue. |
It took me a couple of hours to read through the source code of net.LookupSRV, it seems that the returned srv.Target always has a trailing dot, no matter whether is a absolute name or not, please see message.go;l=2021 and message.go;l=2046. Based on this point and my previous comment issuecomment-1100756920, so we should still trim the trailing dot? Or does it mean that the nameserver (to which the golang lib send request, see dnsclient_unix.go#L257 ) will always return the absolute domain name? |
I am not a DNS expert either. If there is a trailing dot, then But from another perspective, the the SRV.Target what the In summary, there are two DNS Lookups. The first time is to translate one of the following two targets
into a slice of etcd endpoints something like below,
The second time is to translate If the returned SRV.Target returned by DNS Server always has a trailing dot, even for the domain which isn't an absolute value, then I think we should trim the trailing dot. Otherwise, golang will not try to append any search suffix. |
We should not.
My understanding is that targets returned in SRV records should always be absolute. It doesn't make sense that a DNS server would return relative names that would be resolved using search paths local to the client which could be inconsistent. |
It seems that the DNS server just returns what it's configured/told, so we might really need to trim the trailing dot. Please see my experiment (based on yours) below, Step 1: start the dnsmasq. Note that the target for
Step 2: run etcdctl in etcd 3.5.2.
The output is,
The response of dnsmasq is as below. Obviously it only tried to resolve etcd1, and did not append any search suffix.
Step 3: run etcdctl in etcd 3.5.3.
The output is, (Note: I added a "k1/v1" into the etcd server beforehand).
The response of dnsmasq is as below. Since etcd 3.5.3 trims the trailing dot, so it tried to append the search suffix ".vmware.com". Accordingly it could resolve the address successfully.
|
I'm not a DNS expert, but I would not expect DNS to reply with relative hostnames that depend on local search paths to resolve. But whatever we do with relative hostnames coming back from DNS (if those are even possible), I would never expect to re-relativize an absolute hostname that came back from DNS by trimming the trailing /cc @thockin @bowei |
I am AFK, so I can't cite RFC, but it is unfathomable to me that an SRV lookup would intentionally respond with anything other than an absolute name. The fact that Go's implementation agrees gives me even more confidence. |
Maybe also worth pointing out that "search paths" are not actually part of the DNS protocol but are part of the resolver libraries. No DNS server implementation would sanely depend on resolver behavior for correct response handling. The fact that you got a DNS server to return crap (a bare label cannot be a DNS subdomain, I am 99% sure) doesn't mean that's what was intended. Beyond that case, anything could theoretically be a TLD. The trailing period is actually part of the name, we just forgot about it because it is ugly. |
Looking through the RFCs, as far as I can tell, this should be a properly fully-qualified name, not one that should be subject to name aliasing -- which is a concept outside of the DNS protocol itself. |
What happened?
#13712 incorrectly trimmed trailing
.
from SRV responses when constructing client addresses. #13714 then backported the bug to the 3.5 stream where it was released in v3.5.3.This turns absolute domain names into relative domain names and means local search paths are appended when resolving the addresses
What did you expect to happen?
Absolute hostname SRV records do not get local DNS search paths appended again
How can we reproduce it (as minimally and precisely as possible)?
Start a local DNS server that knows how to return srv records for
etcd.example.com
:Dockerfile:
docker build -t dnsmasq . docker run -it --rm --name dnsmasq dnsmasq \ --user=root \ --keep-in-foreground \ --bind-dynamic \ --conf-file=/dev/null \ --log-queries \ --log-facility=- \ --srv-host=_etcd-client-ssl._tcp.etcd.example.com,etcd1.example.com. \ --srv-host=_etcd-client-ssl._tcp.etcd.example.com,etcd2.example.com. \ --srv-host=_etcd-client-ssl._tcp.etcd.example.com,etcd3.example.com.
Capture the IP address of the DNS server and verify it is responding correctly:
See the DNS server observe the query:
And the response:
Run a v3.5.2 client pointing at the custom DNS server with custom DNS search paths, trying to query SRV records:
docker run --dns="${dnsip}" --dns-option=ndots:5 --dns-search=example.org --dns-search=corp.example.org \ quay.io/coreos/etcd:v3.5.2 etcdctl -d etcd.example.com get /
And observe the resulting DNS queries are as expected:
Now run the same command with a v3.5.3 client (which includes a backport of #13712)
docker run --dns="${dnsip}" --dns-option=ndots:5 --dns-search=example.org --dns-search=corp.example.org \ quay.io/coreos/etcd:v3.5.3 etcdctl -d etcd.example.com get /
Anything else we need to know?
No response
Etcd version (please run commands below)
3.5.3
Etcd configuration (command line flags or environment variables)
No response
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered: