Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNNSEC not working when stubby run as systemd service. Works fine run stubby run manually #106

Closed
eccgecko opened this issue Apr 28, 2018 · 50 comments

Comments

@eccgecko
Copy link

eccgecko commented Apr 28, 2018

I have a strange issue that when I run the stubby daemon manually, DNSSEC seems to be working ok. For example the command dig @127.0.2.2 -p 5353 www.dnssec-failed.org returns the following:

; <<>> DiG 9.10.3-P4-Raspbian <<>> @127.0.2.2 -p 5353 +dnssec www.dnssec-failed.org ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 24774 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;www.dnssec-failed.org. IN A ;; Query time: 129 msec ;; SERVER: 127.0.2.2#5353(127.0.2.2) ;; WHEN: Sat Apr 28 12:22:10 CEST 2018 ;; MSG SIZE rcvd: 39
so dnssec-failed.org doesn't resolve. However, once I quit the manual daemon, and start the systemd stubby.service I have, which starts up ok, I now get a reply from dnssec-failed.org:

; <<>> DiG 9.10.3-P4-Raspbian <<>> @127.0.2.2 -p 5353 www.dnssec-failed.org ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16532 ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 1536 ; OPT=12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 (".............................................................................................................................................................................................................") ;; QUESTION SECTION: ;www.dnssec-failed.org. IN A ;; ANSWER SECTION: www.dnssec-failed.org. 2325 IN A 68.87.109.242 www.dnssec-failed.org. 2325 IN A 69.252.193.191 www.dnssec-failed.org. 2325 IN RRSIG A 5 3 7200 20180430172414 20180423141914 44973 dnssec-failed.org. w7tdNJ/YrlNO30y2GuPSJ31388GnzrPrHgJw4vQijlsL5LgkTTg5hzJw Ox5Ra2xSjlLdR7JeA4ZXvKF9rzws+8ys+EFJyps0+KejonIELKuLIqEw b9QS4ITc3mii4hFqVOwMtxj7txv6lKngknqbxiFr2nCpyJX0SOo6UXye YsI= ;; Query time: 167 msec ;; SERVER: 127.0.2.2#5353(127.0.2.2) ;; WHEN: Sat Apr 28 12:29:53 CEST 2018 ;; MSG SIZE rcvd: 531

This is strange, as when I run the daemon manually I am using the exact same options as the stubby.service file uses, so I can't work out why it would behave like this.

I have zero-configuration DNSSEC enabled in the stubby.yml config file

@hanvinke
Copy link

What does stubby -i say?

@eccgecko
Copy link
Author

eccgecko commented Apr 28, 2018

stubby -i output is as follows (Apologies, I tried formatting the following in the code, but it was not working very well as none of the line breaks worked and it looked very hard to read):

[20:54:35.422324] STUBBY: Read config from file /etc/stubby.yml
{
"all_context":
{
"add_warning_for_bad_dns": GETDNS_EXTENSION_FALSE,
"appdata_dir": <bindata of "/root/.getdns/">,
"append_name": GETDNS_APPEND_NAME_TO_SINGLE_LABEL_FIRST,
"dns_transport_list":
[
GETDNS_TRANSPORT_TLS
],
"dnssec_allowed_skew": 0,
"dnssec_return_all_statuses": GETDNS_EXTENSION_FALSE,
"dnssec_return_full_validation_chain": GETDNS_EXTENSION_FALSE,
"dnssec_return_only_secure": GETDNS_EXTENSION_FALSE,
"dnssec_return_status": GETDNS_EXTENSION_TRUE,
"dnssec_return_validation_chain": GETDNS_EXTENSION_FALSE,
"edns_client_subnet_private": 1,
"edns_cookies": GETDNS_EXTENSION_FALSE,
"edns_do_bit": 0,
"edns_extended_rcode": 0,
"edns_version": 0,
"follow_redirects": GETDNS_REDIRECTS_FOLLOW,
"hosts": <bindata of "/etc/hosts">,
"idle_timeout": 10000,
"limit_outstanding_queries": 0,
"max_backoff_value": 1000,
"namespaces":
[
GETDNS_NAMESPACE_LOCALNAMES,
GETDNS_NAMESPACE_DNS
],
"resolution_type": GETDNS_RESOLUTION_STUB,
"resolvconf": <bindata of "/etc/resolv.conf">,
"return_both_v4_and_v6": GETDNS_EXTENSION_FALSE,
"return_call_reporting": GETDNS_EXTENSION_FALSE,
"round_robin_upstreams": 1,
"specify_class": 1,
"suffix": [],
"timeout": 5000,
"tls_authentication": GETDNS_AUTHENTICATION_REQUIRED,
"tls_backoff_time": 3600,
"tls_cipher_list": <bindata of "TLS13-AES-256-GCM-SHA384:TLS13-A"...>,
"tls_connection_retries": 2,
"tls_query_padding_blocksize": 256,
"trust_anchors_url": <bindata of "http://data.iana.org/root-anchor"...>,
"trust_anchors_verify_CA": <bindata of 0x2d2d2d2d2d424547494e204345525449...>,
"trust_anchors_verify_email": <bindata of "dnssec@iana.org">,
"upstream_recursive_servers":
[
{
"address_data": <bindata for 1.1.1.1>,
"address_type": <bindata of "IPv4">,
"tls_auth_name": <bindata of "cloudflare-dns.com">,
"tls_pubkey_pinset":
[
{
"digest": <bindata of "sha256">,
"value": <bindata of yioEpqeR4WtDwE9YxNVnCEkTxIjx6EEIwFSQW+lJsbc=>
}
]
},
{
"address_data": <bindata for 1.0.0.1>,
"address_type": <bindata of "IPv4">,
"tls_auth_name": <bindata of "cloudflare-dns.com">,
"tls_pubkey_pinset":
[
{
"digest": <bindata of "sha256">,
"value": <bindata of yioEpqeR4WtDwE9YxNVnCEkTxIjx6EEIwFSQW+lJsbc=>
}
]
}
]
},
"api_version_number": 132058112,
"api_version_string": <bindata of "December 2015">,
"compilation_comment": <bindata of "getdns 1.4.1 configured on 2018-"...>,
"default_hosts_location": <bindata of "/etc/hosts">,
"default_resolvconf_location": <bindata of "/etc/resolv.conf">,
"default_trust_anchor_location": <bindata of "/opt/stubby/etc/unbound/getdns-r"...>,
"implementation_string": <bindata of "https://getdnsapi.net">,
"listen_addresses":
[
{
"address_data": <bindata for 127.0.2.2>,
"address_type": <bindata of "IPv4">,
"port": 5353
}
],
"openssl_build_version_number": 269484143,
"openssl_built_on": <bindata of "built on: reproducible build, da"...>,
"openssl_cflags": <bindata of "compiler: gcc -DDSO_DLFCN -DHAVE"...>,
"openssl_dir": <bindata of "OPENSSLDIR: "/usr/lib/ssl"">,
"openssl_engines_dir": <bindata of "ENGINESDIR: "/usr/lib/arm-linux-"...>,
"openssl_platform": <bindata of "platform: debian-armhf">,
"openssl_version_number": 269484143,
"openssl_version_string": <bindata of "OpenSSL 1.1.0f 25 May 2017">,
"resolution_type": GETDNS_RESOLUTION_STUB,
"version_number": 17039616,
"version_string": <bindata of "1.4.1">
}
Result: Config file syntax is valid.

I’m guessing it’s something to do with DNSSEC stuff towards the top? My question would be why it would make a difference whether or not the daemon is started manually or via systemd how this works, when they apply the same config file?

@eccgecko eccgecko reopened this Apr 28, 2018
@wtoorop
Copy link
Contributor

wtoorop commented Apr 30, 2018

@eccgecko Zero configuration DNSSEC needs a writeable appdata_dir directory. When none is configured, it defaults to the home directory of the UID running the stubby process. I noticed on my arch linux system this is the for the stubby user unwriteable / directory:

[root@bunker ~]# echo ~stubby
/

I managed to fix it by including a writeable appdata_dir in /etc/stubby/stubby.yml:

[root@bunker ~]# grep appdata_dir /etc/stubby/stubby.yml
appdata_dir: "/run/stubby"

/run/stubby was already writeable for userstubby on my system:

[root@bunker ~]# ls -ld /run/stubby/
drwxrwx--- 2 root stubby 100 Apr 30 17:23 /run/stubby/

After doing a query, I noticed the AD bit in the result, and also that Zero configuration DNSSEC succeeded since it downloaded the root trust-anchor and root DNSKEY rrset to track in the appdata_dir:

[root@bunker ~]# ls -l /run/stubby/
total 12
-rw------- 1 stubby stubby 4095 Apr 30 17:23 root-anchors.p7s
-rw------- 1 stubby stubby  651 Apr 30 17:23 root-anchors.xml
-rw------- 1 stubby stubby 1659 Apr 30 17:23 root.key

@ArchangeGabriel I think it would be good to have that appdata_dir setting in the stubby.yml file by default...

@ArchangeGabriel
Copy link
Contributor

We are supposed to have DNSSEC working OOTB already on Arch (I build getdns with --with-trust-anchor=/etc/trusted-key.key and the file is supposed to exist since its provided by a getdns dependency on Arch)… And I currently provide “upstream” stubby.yml, if it could stay the case that would be nice. So I would say answering #62 and then adding appdata_dir: "/run/stubby" to the default config.

But @eccgecko is not on Arch anyway.

@wtoorop
Copy link
Contributor

wtoorop commented Apr 30, 2018

@ArchangeGabriel acknowledged. Alternatively you could give the stubby user a writeable home directory...

@wtoorop
Copy link
Contributor

wtoorop commented Apr 30, 2018

@ArchangeGabriel Oh yes... as a side note, having Zero configuration DNSSEC working would be more robust on systems that haven't been updated when the KSK rolls over.

@hanvinke
Copy link

hanvinke commented Apr 30, 2018

When adding appdata_dir: "/run/stubby" to /etc/stubby/stubby.yml and doing a sudo systemctl daemon-reload and restart of service I still have output:

echo ~stubby
/home/han/.getdns

although stubby -i shows "appdata_dir": <bindata of "/run/stubby/">

Any clue?

@ArchangeGabriel
Copy link
Contributor

That’s normal, you did not change the stubby user home folder (that would require editing /etc/passwd).

@hanvinke
Copy link

Thanks, learning every day here.. 🙂

@eccgecko
Copy link
Author

eccgecko commented May 2, 2018

@wtoorop Thanks. That makes sense. @ArchangeGabriel is correct, I am not running Arch but the Raspbian flavor of Debian, but I do, like you, also have the systemd service to execute as user = stubby whereas when I run the daemon manually I am using sudo to start it, so I suppose then it is able to download the necessary trust anchor files. Having said that, I am trying to add appdata_dir: "/run/stubby"to my stubby.yml config file, but I am obviously doing something wrong, as when I try to start stubby after adding this line, I am told there is a generic error:

"Generic error" Could not parse config file "/etc/stubby.yml": Generic error

Sorry if I am being dense here - what exactly is the correct method for inserting this line?

@hanvinke
Copy link

hanvinke commented May 2, 2018

Just adding something like this at the top [at line 24-25 for example] should work:

# Include a writeable appdata_dir for Zero configuration DNSSEC.
appdata_dir: "/run/stubby/"

Watch out for spacing.

@hanvinke
Copy link

hanvinke commented May 3, 2018

@wtoorop @ArchangeGabriel
Thank you both for finding the solution why Zero configuration DNSSEC didn't work for me before, I never saw any files appear.

After adding appdata_dir: "/var/run/stubby" it works fine now. I use a slightly little different configuration stubby.service with:

[Unit]
Description=stubby DNS resolver

[Service]
ExecStart=/usr/bin/stubby
DynamicUser=yes
RuntimeDirectory=stubby
AmbientCapabilities=CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

Since systemd 235 this DynamicUser configuration is possible. And it works very well. It is kind of magic to see the folder appear out of nothing creating root.key, root-anchors.p7s and root-anchors.xml.

@wtoorop
Copy link
Contributor

wtoorop commented May 3, 2018

@hanvinke Glad to hear and glad to be of help :)
@eccgecko We currently also have a bug with string configuration options. I will do a release candidate tomorrow, so you can try that one. You could provide your stubby.yml for us to check, just to be sure it is not something in the syntax..

@eccgecko
Copy link
Author

eccgecko commented May 3, 2018

@wtoorop I managed to get it working in the end by adding it around line 24-25 like you said (I was adding it on at the end before).

However, unfortunately it hasn't managed to fix my issues with DNSSEC zero-config. In fact, I don't know how excatly, but it even seems to have made it slightly worse, as now dig @127.0.2.2 -p 5353 www.dnssec-failed.org is getting a reply even when I run the daemon manually as sudo, which it wasn't replying to before.

I added both appdata_dir: "/var/run/stubby" and appdata_dir: "/run/stubby" (not at the same time; one at a time when the first didn't work) to my stubby.yml config file. Neither does the trick. Looking in the run/stubby folder, I don't see any trust anchor files being downloaded either, so it's definitely not doing the same thing as @hanvinke 's config seems to be achieving :(

@wtoorop
Copy link
Contributor

wtoorop commented May 3, 2018

@eccgecko Oh that's a pity. Could you do a sudo -u stubby stubby -i and copy paste the output maybe?

@eccgecko
Copy link
Author

eccgecko commented May 3, 2018

Sure. Again, apologies that I can't seem to get the formatting right.
sudo -u stubby /opt/stubby/bin/stubby -C /etc/stubby.yml -i output is as follows:

[12:51:16.164408] STUBBY: Read config from file /etc/stubby.yml
{
"all_context":
{
"add_warning_for_bad_dns": GETDNS_EXTENSION_FALSE,
"appdata_dir": <bindata of "var/run/stubby">,
"append_name": GETDNS_APPEND_NAME_TO_SINGLE_LABEL_FIRST,
"dns_transport_list":
[
GETDNS_TRANSPORT_TLS
],
"dnssec_allowed_skew": 0,
"dnssec_return_all_statuses": GETDNS_EXTENSION_FALSE,
"dnssec_return_full_validation_chain": GETDNS_EXTENSION_FALSE,
"dnssec_return_only_secure": GETDNS_EXTENSION_FALSE,
"dnssec_return_status": GETDNS_EXTENSION_TRUE,
"dnssec_return_validation_chain": GETDNS_EXTENSION_FALSE,
"edns_client_subnet_private": 1,
"edns_cookies": GETDNS_EXTENSION_FALSE,
"edns_do_bit": 0,
"edns_extended_rcode": 0,
"edns_version": 0,
"follow_redirects": GETDNS_REDIRECTS_FOLLOW,
"hosts": <bindata of "/etc/hosts">,
"idle_timeout": 10000,
"limit_outstanding_queries": 0,
"max_backoff_value": 1000,
"namespaces":
[
GETDNS_NAMESPACE_LOCALNAMES,
GETDNS_NAMESPACE_DNS
],
"resolution_type": GETDNS_RESOLUTION_STUB,
"resolvconf": <bindata of "/etc/resolv.conf">,
"return_both_v4_and_v6": GETDNS_EXTENSION_FALSE,
"return_call_reporting": GETDNS_EXTENSION_FALSE,
"round_robin_upstreams": 1,
"specify_class": 1,
"suffix": [],
"timeout": 5000,
"tls_authentication": GETDNS_AUTHENTICATION_REQUIRED,
"tls_backoff_time": 3600,
"tls_cipher_list": <bindata of "TLS13-AES-256-GCM-SHA384:TLS13-A"...>,
"tls_connection_retries": 2,
"tls_query_padding_blocksize": 256,
"trust_anchors_url": <bindata of "http://data.iana.org/root-anchor"...>,
"trust_anchors_verify_CA": <bindata of 0x2d2d2d2d2d424547494e204345525449...>,
"trust_anchors_verify_email": <bindata of "dnssec@iana.org">,
"upstream_recursive_servers":
[
{
"address_data": <bindata for 1.1.1.1>,
"address_type": <bindata of "IPv4">,
"tls_auth_name": <bindata of "cloudflare-dns.com">,
"tls_pubkey_pinset":
[
{
"digest": <bindata of "sha256">,
"value": <bindata of yioEpqeR4WtDwE9YxNVnCEkTxIjx6EEIwFSQW+lJsbc=>
}
]
},
{
"address_data": <bindata for 1.0.0.1>,
"address_type": <bindata of "IPv4">,
"tls_auth_name": <bindata of "cloudflare-dns.com">,
"tls_pubkey_pinset":
[
{
"digest": <bindata of "sha256">,
"value": <bindata of yioEpqeR4WtDwE9YxNVnCEkTxIjx6EEIwFSQW+lJsbc=>
}
]
}
]
},
"api_version_number": 132058112,
"api_version_string": <bindata of "December 2015">,
"compilation_comment": <bindata of "getdns 1.4.1 configured on 2018-"...>,
"default_hosts_location": <bindata of "/etc/hosts">,
"default_resolvconf_location": <bindata of "/etc/resolv.conf">,
"default_trust_anchor_location": <bindata of "/opt/stubby/etc/unbound/getdns-r"...>,
"implementation_string": <bindata of "https://getdnsapi.net">,
"listen_addresses":
[
{
"address_data": <bindata for 127.0.2.2>,
"address_type": <bindata of "IPv4">,
"port": 5353
}
],
"openssl_build_version_number": 269484143,
"openssl_built_on": <bindata of "built on: reproducible build, da"...>,
"openssl_cflags": <bindata of "compiler: gcc -DDSO_DLFCN -DHAVE"...>,
"openssl_dir": <bindata of "OPENSSLDIR: "/usr/lib/ssl"">,
"openssl_engines_dir": <bindata of "ENGINESDIR: "/usr/lib/arm-linux-"...>,
"openssl_platform": <bindata of "platform: debian-armhf">,
"openssl_version_number": 269484143,
"openssl_version_string": <bindata of "OpenSSL 1.1.0f 25 May 2017">,
"resolution_type": GETDNS_RESOLUTION_STUB,
"version_number": 17039616,
"version_string": <bindata of "1.4.1">
}
Result: Config file syntax is valid.

@ArchangeGabriel
Copy link
Contributor

You’re missing a leading / in appdata_dir.

@wtoorop
Copy link
Contributor

wtoorop commented May 3, 2018

Yes that's probably it... and also make sure /var/run/stubby (with leading slash) is writable (and readable) for user stubby.

@eccgecko
Copy link
Author

eccgecko commented May 3, 2018

Thanks, that's pretty much solved it. I think we're very close to completely solving it. Yes, the missing leading / was part of the problem, although that was a mistake I made when I changed it from /run/stubby to var/run/stubby. There had been a leading / when I just used /run/stubby. I have changed it to /var/run/stubby now.

However, the main issue is with permissions on the folder. I believed the folder permissions were already correct, as they had been when I had checked before. However, it seems that they aren't persisting through a reboot, and that's the problem I've been facing, as I hadn't checked again since the first time I checked. Changing the permissions of /var/run/stubby so that stubby group has read and write permissions solves the issue, and www.dnssec-failed.org no longer replies, and there are indeed trust anchor files created within the /var/run/stubby folder 👍 :)

However, when I reboot, the permissions revert to only being read, write, execute for root, and just executable for the stubby group i.e. stubby user.

How can I make permissions for /var/run/stubby persistent?

@ArchangeGabriel
Copy link
Contributor

@eccgecko What are the permissions before you change them? Do you know how this folder is created on your system?

@wtoorop
Copy link
Contributor

wtoorop commented May 3, 2018

Acknowledged. This is due to the line d /run/stubby 0750 root stubby - - in /usr/lib/tmpfiles.d/stubby.conf. Change that line to d /run/stubby 0770 root stubby - - and it comes back with correct permissions.

@eccgecko
Copy link
Author

eccgecko commented May 3, 2018

@wtoorop That's it! 👍 great! DNSSEC now working and persisting through reboots. Thank you and @ArchangeGabriel for all your help with this :)

@hanvinke
Copy link

hanvinke commented May 3, 2018

My stubby.service example needs some attention. I was not completely sure about the use of RuntimeDirectory and StateDirectory. Although it works, only one of them is needed as it turns out. Sorry I had to scratch the information together.

With only RuntimeDirectory you wil have a volatile directory /var/run/stubby. When the service quits it removes all, including the directory. Strong advice: if present first remove a leftover /var/run/stubby directory from another install, since it might have the wrong permissions set. Systemd will now take full care of folder and file creation and their permissions. You need to set appdata_dir: /var/run/stubby in stubby.yml.

With only StateDirectory you will have a persistent directory, so after a reboot it keeps the Zero configuration DNSSEC files intact. You need to set appdata_dir: /var/lib/stubby in stubby.yml.

More information of the benefits of a DynamicUser here: "http://0pointer.net/blog/dynamic-users-with-systemd.html"

BTW my stubby.yml used for testing is very basic:

stubby.txt

(This one is for use with StateDirectory=stubby )

[I edited my previous stubby.service above]

@hanvinke
Copy link

hanvinke commented May 4, 2018

Prior to using DynamicUser it is important that any existing user stubby and group stubby have to be removed also.
I forgot to do that yesterday and got this morning with StateDirectory=stubby active an error I never saw before:
screenshot from 2018-05-04 07-37-09

After removing user:stubby and group:stubby and a reboot all was fine again:
same as root

@wtoorop
Copy link
Contributor

wtoorop commented May 4, 2018

@hanvinke Thanks for pointing out the DynamicUser configuration of systemd! I like it a lot! I believe the error you had could have been prevented when a User=stubby had been left in the [Service] section of stubby.service. In fact Lennart points out that you should do that when upgrading from a static UID setup in the 6th Note in the Notes section of is blog post.

I'll play with these settings a bit and will include it in the getdns 1.4.2-rc1 release candidate today (which will have a stubby 0.2.3-rc1 release candidate on board).

wtoorop added a commit to getdnsapi/getdns that referenced this issue May 4, 2018
To improve integration with system and service managers like systemd
See also getdnsapi/stubby#106
wtoorop added a commit that referenced this issue May 4, 2018
@hanvinke
Copy link

hanvinke commented May 4, 2018

Testing right now the new release candidate 😃 !
With many servers enabled I unfortunately got the message:

$ stubby -i
[19:11:47.569013] STUBBY: Read config from file /etc/stubby/stubby.yml
stubby: ./gldns/gbuffer.h:285: gldns_buffer_skip: Assertion `buffer->_position + count <= buffer->_limit || buffer->_vfixed' failed.
Aborted (core dumped)

So I used only the default enabled DNS recursive servers in stubby.yml, and that gave no errors. Buffer overflow of some kind?

@wtoorop
Copy link
Contributor

wtoorop commented May 4, 2018

Ouch!

That's really bad. I just tried with around 500000. It was really slow to parse, but it did.
Would it be possible for you to provide a core dump from a stubby and libgetdns compiled with CFLAGS="-g"? That would be very helpful to debug the issue. You could also send me the stubby.yml, just to be sure. You can send it to me by e-mail encrypted with my PGP key?

wtoorop added a commit to getdnsapi/getdns that referenced this issue May 11, 2018
printing certain configuration. Thanks Han Vinke
@wtoorop wtoorop closed this as completed May 11, 2018
@eccgecko
Copy link
Author

eccgecko commented May 12, 2018

Sorry to be the bearer of bad news, but unfortunately, after updating to latest getdns 1.4.2 with the latest commit e0e8576 for the stubby 0.2.3 submodule, this problem has resurfaced for me.

As far as I can tell, I am now using the new default options regarding systemd and the working_app_dir.

In my stubby.service file I have WorkingDirectory=/var/cache/stubby
And in my config stubby.yml file I have appdata_dir: "/var/cache/stubby"

I also succesfully have the following in my /usr/lib/tmpfiles.d/stubby.conf file:
# tmpfiles.d (5) for use with stubby.service d /var/cache/stubby 0750 stubby stubby - -
The daemon seems to have successfully created the root-anchors.p7s root-anchors.xml root.key files in /var/cache/stubby, but that is most likely from when I ran the binary as sudo. The permissions on the /var/cache/stubby directory seem to be in order:
drwxr-x--- 2 stubby stubby

However, it's exact same issue as before, with dnssec failing when run as the systemd service, but it's successful when the binary is run as sudo. Is it to do with it now using /var/cache/stubby? Your advice before was to use /var/run/stubby. I will probably change it back to this to see if that works, but ideally I wanted to use the defaults as much as possible.

@abelbeck
Copy link

@eccgecko On your system what is ls -ld /var/cache

@ArchangeGabriel
Copy link
Contributor

Can you paste your full stubby.service? If you use DynamicUser, then you must not have a /usr/lib/tmpfiles.d/stubby.conf at all (so remove it), and you should delete the /var/cache/stubby folder before restarting the service after that.

@hanvinke
Copy link

Also delete the contents of /var/cache/private/stubby when retesting Zero configuration DNSSEC, since the folder /var/cache/stubby is just a symlink (owned by root) to /var/cache/private/stubby.

@eccgecko
Copy link
Author

ls -ld /var/cache outputs the following: drwxr-xr-x 11 root root 4096 May 17 17:14 /var/cache

My stubby.service file looks like this:

[Unit] Description=stubby DNS resolver [Service] User=stubby DynamicUser=yes CacheDirectory=stubby WorkingDirectory=/var/cache/stubby ExecStart=/opt/stubby/bin/stubby -C /etc/stubby.yml AmbientCapabilities=CAP_NET_BIND_SERVICE CapabilityBoundingSet=CAP_NET_BIND_SERVICE [Install] WantedBy=multi-user.target

This is the default stubby.service file. The only change I made was to add -C /etc/stubby.yml to the ExecStart line.

I did as you said @ArchangeGabriel and deleted both /usr/lib/tmpfiles.d/stubby.conf and /var/cache/stubby and even ran sudo systemctl disable stubby and deleted the /lib/systemd/system/stubby.service file, then started again by adding it back and re-enabling. Unfortunately it's no-go. In fact, the behaviour is slightly different to before. Now systemd fails at starting the service at all (instead of simply starting but dnssec not working, as before) and I believe it is because it cannot create the /var/cache/stubby directory. When I run the stubby binary as sudo, the daemon starts and /var/cache/stubby is created, and dnssec works.

One thing I did notice is that WorkingDirectory=/var/cache/stubby and appdata_dir: "/var/cache/stubby" are different to what @wtoorop recommended before (/var/run/stubby). Could that be related? I want to keep it as close to default config as possible so haven't changed this yet.

@hanvinke my system doesn't seem to have any /var/cache/private/ directory at all. Could that also be related?

@ArchangeGabriel
Copy link
Contributor

@eccgecko What systemd version? Doesn’t give any outputs in status on failure to start Stubby?

@hanvinke
Copy link

@eccgecko
Maybe the user name selected by systemd (stubby) already exists on your system?
Systemd will not operate in dynamic user mode otherwise. Better to delete any existing stubby user or group first before restarting the service.

@eccgecko
Copy link
Author

@ArchangeGabriel ah...I guess that's the issue. I'm on the default Raspbian systemd package, which at present is 232, unfortunately. I see from the blog @hanvinke referenced, DynamicUser was introduced in 235. I did try upgrading my systemd package by downloading the 238 package from the buster repo, but unfortunately this broke my system and I had to restore from a backup.

I guess it's just a case of removing DynamicUser from the stubby.service file?

@ArchangeGabriel
Copy link
Contributor

Indeed. Actually DynamicUser was introduced in 232, but Stubby uses related features introduced in 235. So yes, in this case you have to use the tmpfiles.d snippet to create /var/cache/stubby with the right permissions. And you should indeed remove the DynamicUser from the service file.

@eccgecko
Copy link
Author

Thanks @ArchangeGabriel @hanvinke for your help. I removed the DynamicUser=yes line from stubby.service and used the tmpfiles.d snippet to create /var/cache/stubby and am now back up and running successfully using the stubby daemon as a systemd service with stubby as the user and with DNSSEC successfully working :) thanks again 👍

Not wanting to push my luck, so apologies if this is the wrong place, but just wanted to ask one additional question relating to stubby and DNSSEC. Is there a reason why, when running a DNSSEC algorithm test here https://rootcanary.org/test.html, ED25519 is not validated as an algorithm, but when using dnsmasq with DNSSEC, it is?

screen shot 2018-05-18 at 18 11 53

@hanvinke
Copy link

hanvinke commented May 20, 2018

@eccgecko
Sorry, I cannot help you with ED25519 support for stubby. I think it is not implemented yet.

For the enthusiasts I have made the stubby service file optimized for security. This is because f.i. the ReadWritePaths= was not added to the original file.
Some info: http://0pointer.net/blog/avoiding-cve-2016-8655-with-systemd.html

My stubby.service:
stubby.service.TXT

@ArchangeGabriel
Copy link
Contributor

Well I’m not sure… ED25519 is supported for the TLS connection, but maybe not for DS signing.

@ArchangeGabriel
Copy link
Contributor

@hanvinke If you can put a PR with comments on each added line, that would be very welcomed I think.

@hanvinke
Copy link

@ArchangeGabriel
Thank you for your interest!

I edited my previous file a little.
Removed: PrivateTmp=yes and ProtectSystem=strict
Reason: Both are already implied by DynamicUser=yes

I also changed Umask setting to 077, and ProtectHome to yes, which are much more restrictive.
Maybe someone can tell me if Stubby needs also to have @aio [Asynchronous I/O (io_setup(2), io_submit(2), and related calls)] in the SystemCallFilter.
More information about the systemd settings can be found on https://www.mankier.com/5/systemd.exec

@hanvinke
Copy link

hanvinke commented May 21, 2018

Edited a third time -
decided to remove all whitelisted systemcalls (the ones with a @ before it), because f.i.

ptrace: Already is blocked by dropping CAP_PTRACE under CapabilityBoundingSet
aio: There is no need for stubby to get access to any IO port and is already blocked by dropping CAP_SYS_RAWIO under CapabilityBoundingSet

AmbientCapabilities=CAP_NET_BIND_SERVICE can only be emitted when you do not online banking, otherwise f.i. payment transactions with Ideal will fail without it. For now I am just keeping SystemCallFilter= ~madvise
(The tilde after the equal sign indicates that this is a blacklist of syscalls)

@wtoorop
Copy link
Contributor

wtoorop commented May 22, 2018

Thank you all!

@ArchangeGabriel you say systemd 232 will not start the service if it encounters an for it unkown (i.e. DynamicUser) directive? I assumed it just would because I is allowed with systemd version 238...

I suppose we have to provide two stubby.service files (one for systemd before 235 and one for systemd 235 and higher), but perhaps there was something else going on...

@eccgecko I believe most of the groundwork for ED25519 has already been done. I'll see if I can enable the ED25519 and ED448 with newer OpenSSL and let you know (provide patch).

@ArchangeGabriel
Copy link
Contributor

@wtoorop No, I’m saying that it handled the DynamicUser correctly (since it is a 232 feature), but not the custom directory part. Thus, the service started as DynamicUser, but could not write files, and zero-conf DNSSEC failed.

@wtoorop
Copy link
Contributor

wtoorop commented May 22, 2018

@ArchangeGabriel Ok, so the CacheDirectory directive wouldn't work, but I supposed (wrongly?) that that wouldn't matter since that directory would have been created with /usr/lib/tmpfiles.d/stubby.conf vanyway...

@ArchangeGabriel
Copy link
Contributor

No, I think a DynamicUser still doesn’t have the right to write to standard directory even if it owns it because the user is in fact not dynamic and the directory attributed to it. But I might be wrong, and this could also be a bug of some kind.

In any case, problematic systems in this regard would be the one with 232 ≤ systemd < 235.

@eccgecko
Copy link
Author

@wtoorop it may be necessary to provide 2 different stubby.service files, as, with DynamicUser included, my systemd 232 failed to start at all with that line included. Or are you @ArchangeGabriel saying that it was inclusion of both DyanmicUser and custom directory that was causing the issue? Because I only removed DynamicUser and nothing else, and that got it going again.

Stretch, the current stable dist of Debian / Raspbian, only ships systemd 232, so it may be necessary to make some allowances for that user-base.
I for one have now switched from Raspbian to Arch, mostly because of the out of date packages that Debian has, and a lot of the issues I've been experiencing with stubby and other projects I'm running on my pi have been fixed by updated packages. Since my migration to arch, stubby now runs fine with the default systemd service file :)

@hanvinke
Copy link

hanvinke commented May 24, 2018

While testing logging with Stubby through 'stubby -v 7' I noticed that dns-tls.bitwiseshift.net has a problem currently, stubby nicely reporting:
STUBBY: 81.187.221.24 : Verify failed : TLS - Failure - (10) "certificate has expired".
Also gnutls-cli --print-cert -p 853 81.187.221.24 shows the same problem. Possible cause is that of recently cerbot no longer checks if the certificate is about to expire?
Where can I comment there is a problem with this server?

@saradickinson
Copy link
Contributor

@hanvinke I've pinged the operator directly as they haven't made their contact details public. To double check for issues you can find monitoring of the servers here:
https://dnsprivacy.org/jenkins/job/dnsprivacy-monitoring/

wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this issue Jan 28, 2019
Package changes:
 * PLIST adjustment; stubby no longer built by default

Upstream changes:
* 2018-12-21: Version 1.5.0
  * RFE getdnsapi/stubby#121 log re-instantiating TLS
    upstreams (because they reached tls_backoff_time) at
    log level 4 (WARNING)
  * GETDNS_RESPSTATUS_NO_NAME for NODATA answers too
  * ZONEMD rr-type
  * getdns_query queries for addresses when a query name
    without a type is given.
  * RFE #408: Fetching of trust anchors will be retried
    after failure, after a certain backoff time. The time
    can be configured with
    getdns_context_set_trust_anchors_backoff_time().
  * RFE #408: A "dnssec" extension that requires DNSSEC
    verification.  When this extension is set, Indeterminate
    DNSSEC status will not be returned.
  * Issue #410: Unspecified ownership of get_api_information()
  * Fix for DNSSEC bug in finding most specific key when
    trust anchor proves non-existance of one of the labels
    along the authentication chain other than the non-
    existance of a DS record on a zonecut.
  * Enhancement getdnsapi/stubby#56 & getdnsapi/stubby#130:
    Configurable minimum and maximum TLS versions with
    getdns_context_set_tls_min_version() and
    getdns_context_set_tls_max_version() functions and
    tls_min_version and tls_max_version configuration parameters
    for upstreams.
  * Configurable TLS1.3 ciphersuites with the
    getdns_context_set_tls_ciphersuites() function and
    tls_ciphersuites config parameter for upstreams.
  * Bugfix in upstream string configurations: tls_cipher_list and
    tls_curve_list
  * Bugfix finding signer for validating NSEC and NSEC3s, which
    caused trouble with the partly tracing DNSSEC from the root
    up, introduced in 1.4.2.  Thanks Philip Homburg

* 2018-05-11: Version 1.4.2
  * Bugfix getdnsapi/stubby#87: Detect and ignore duplicate certs
    in the Windows root CA store.
  * PR #397: No TCP sendto without TCP_FASTOPEN
    Thanks Emery Hemingway
  * Bugfix getdnsapi/stubby#106: Core dump when printing certain
    configuration. Thanks Han Vinke
  * Bugfix getdnsapi/stubby#99: Partly trace DNSSEC from the root
    up (for tld and sld), to find insecure delegations quicker.
    Thanks UniverseXXX
  * Bugfix: Allow NSEC spans starting from (unexpanded) wildcards
    Bug was introduced when dealing with CVE-2017-15105
  * Bugfix getdnsapi/stubby#46: Don't assume trailing zero with
    string bindata's.  Thanks Lonnie Abelbeck
  * Bugfix #394: Update src/compat/getentropy_linux.c in order to
    handle ENOSYS (not implemented) fallback.
    Thanks Brent Blood
  * Bugfix #395: Clarify that libidn2 dependency is for version 2.0.0
    or higher. Thanks mire3212

* 2018-03-12: Version 1.4.1
  * Bugfix #388: Prevent fallback to an earlier tries upstream within a
    single query.  Thanks Robert Groenenberg
  * PR #387: Compile with OpenSSL with deprecated APIs disabled.
    Thanks Rosen Penev
  * PR #386: UDP failover improvements:
    - When all UDP upstreams fail, retry them (more or less) equally
    - Limit maximum UDP backoff (default to 1000)
      This is configurable with the --with-max-udp-backoff configure
      option.
    Thanks Robert Groenenberg
  * Bugfix: Find zonecut with DS queries (instead of SOA queries).
    Thanks Elmer Lastdrager
  * Bugfix #385: Verifying insecure NODATA answers (broken since 1.2.1).
    Thanks hanvinke
  * PR #384: Fix minor spelling and formatting.  Thanks dkg.
  * Bugfix #382: Parallel install of getdns_query and getdns_server_mon

* 2018-02-21: Version 1.4.0
  * .so revision bump to please fedora packaging system.
    Thanks Paul Wouters
  * Specify the supported curves with getdns_context_set_tls_curves_list()
    An upstream specific list of supported curves may also be given
    with the tls_curves_list setting in the upstream dict with
    getdns_context_set_upstream_recursive_servers()
  * New tool getdns_server_mon for checking upstream recursive
    resolver's capabilities.
  * Improved handling of opportunistic back-off.  If other transports
    are working, don't forcibly promote failed upstreams just wait for
    the re-try timer.
  * Hostname authentication with libressl
    Thanks Norbert Copones
  * Security bugfix in response to CVE-2017-15105.  Although getdns was
    not vulnerable for this specific issue, as a precaution code has been
    adapted so that signatures of DNSKEYs, DSs, NSECs and NSEC3s can not
    be wildcard expansions when used with DNSSEC proofs.  Only direct
    queries for those types are allowed to be wildcard expansions.
  * Bugfix PR#379: Miscelleneous double free or corruption, and corrupted
    memory double linked list detected issue, with serving functionality.
    Thanks maddie and Bruno Pagani
  * Security Bugfix PR#293: Check sha256 pinset's
    with OpenSSL native DANE functions for OpenSSL >= 1.1.0
    with Viktor Dukhovni's danessl library for OpenSSL >= 1.0.0
    don't allow for authentication exceptions (like self-signed
    certificates) otherwise.  Thanks Viktor Dukhovni
  * libidn2 support.  Thanks Paul Wouters

* 2017-12-21: Version 1.3.0
  * Bugfix #300: Detect dnsmasq and skip unit test that fails with it.
    Thanks Tim Rohsen and Konomi Kitten
  * Specify default available cipher suites for authenticated TLS
    upstreams with getdns_context_set_tls_ciphers_list()
    An upstream specific available cipher suite may also be given
    with the tls_cipher_list setting in the upstream dict with
    getdns_context_set_upstream_recursive_servers()
  * PR #366: Add support for TLS 1.3 and Chacha20-Poly1305
    Thanks Pascal Ernster
  * Bugfix #356: Do Zero configuration DNSSEC meta queries over on the
    context configured upstreams.  Thanks Andreas Schulze
  * Report default extension settings with
    getdns_context_get_api_information()
  * Specify locations at which CA certificates for verification purposes
    are located: getdns_context_set_tls_ca_path()
    getdns_context_set_tls_ca_file()
  * getdns_context_set_resolvconf() function to initialize a context
    upstreams and suffices with a resolv.conf file.
    getdns_context_get_resolvconf() to get the file used to initialize
    the context's upstreams and suffixes.
    getdns_context_set_hosts() function to initialize a context's
    LOCALNAMES namespace.
    getdns_context_get_hosts() function to get the file used to initialize
    the context's LOCALNAMES namespace.
  * get which version of OpenSSL was used at build time and at run time
    when available with getdns_context_get_api_information()
  * GETDNS_RETURN_IO_ERROR return error code
  * Bugfix #359: edns_client_subnet_private should set family
    Thanks Daniel Areiza & Andreas Schulze
  * Bugfix getdnsapi/stubby#34: Segfault issue with native DNSSEC
    validation.  Thanks Bruno Pagani

* 2017-11-11: Version 1.2.1
  * Handle more I/O error cases.  Also, when an I/O error does occur,
    never stop listening (with servers), and
    never exit (when running the built-in event loop).
  * Bugfix: Tolerate unsigned and unused RRsets in the authority section.
            Fixes DNSSEC with BIND upstream.
  * Bugfix: DNSSEC validation without support records
  * Bugfix: Validation of full recursive DNSKEY lookups
  * Bugfix: Retry to validate full recursion BOGUS replies with zero
    configuration DNSSEC only when DNSSEC was actually requested
  * Bugfix #348: Fix a linking issue in stubby when libbsd is present
    Thanks Remi Gacogne
  * More robust scheduling; Eliminating a segfault with long running
    applications.
  * Miscellaneous Windows portability fixes from Jim Hague.
  * Fix Makefile dependencies for parallel install.
    Thanks ilovezfs

* 2017-09-29: Version 1.2.0
  * Bugfix of rc1: authentication of first query with TLS
    Thanks Travis Burtrum
  * A function to set the location for library specific data,
    like trust-anchors: getdns_context_set_appdata().
  * Zero configuration DNSSEC - build upon the scheme
    described in RFC7958.  The URL from which to fetch
    the trust anchor, the verification CA and email
    can be set with the new getdns_context_set_trust_anchor_url(),
    getdns_context_set_trust_anchor_verify_CA() and
    getdns_context_set_trust_anchor_verify_email() functions.
    The default values are to fetch from IANA and to validate
    with the ICANN CA.
  * Update of Stubby with yaml configuration file and
    logging from a certain severity support.
  * Fix tpkg exit status on test failure. Thanks Jim Hague.
  * Refined logging levels for upstream statistics
  * Reuse (best behaving) backed-off TLS upstreams when non are usable.
  * Let TLS upstreams back-off a incremental amount of time.
    Back-off time starts with 1 second and is doubled each failure, but
    will not exceed the time given by getdns_context_set_tls_backoff_time()
  * Make TLS upstream management more resilient to temporary outages
    (like laptop sleeps)

* 2017-09-04: Version 1.1.3
  * Small bugfixes that came out of static analysis
  * No annotations with the output of getdns_query anymore,
    unless -V option is given to increase verbosity
    Thanks Ollivier Robert
  * getdns_query will now exit with failure status if replies are BOGUS
  * Bugfix: dnssec_return_validation_chain now also works when fallback
    to full recursion was needed with dnssec_roadblock_avoidance
  * More clear build instructions from Paul Hoffman.  Thanks.
  * Bugfix #320.1: Eliminate multiple closing of file descriptors
    Thanks Neil Cook
  * Bugfix #320.2: Array bounds bug in upstream_select
    Thanks Neil Cook
  * Bugfix #318: getdnsapi/getdns/README.md links to nonexistent wiki
    pages.  Thanks James Raftery
  * Bugfix #322: MacOS 10.10 (Yosemite) provides TCP fastopen interface
    but does not have it implemented.  Thanks Joel Purra
  * Compile without Stubby by default.  Stubby now has a git repository
    of its own.  The new Stubby repository is added as a submodule.
    Stubby will still be build alongside getdns with the --with-stubby
    configure option.

* 2017-07-03: Version 1.1.2
  * Bugfix for parallel make install
  * Bugfix to trigger event callbacks on socket errors
  * A getdns_context_set_logfunc() function with which one may
    register a callback log function for certain library subsystems
    at certain levels.  Currently this can only be used for
    upstream stastistics subsystem.

* 2017-06-15: Version 1.1.1
  * Bugfix #306 hanging/segfaulting on certain (IPv6) upstream failures
  * Spelling fix s/receive/receive.  Thanks Andreas Schulze.
  * Added stubby-setdns-macos.sh script to support Homebrew formula
  * Include stubby.conf in the districution tarball
  * Bugfix #286 reschedule reused listening addresses
  * Bugfix #166 Allow parallel builds and unit-tests
  * NSAP-PTR, EID and NIMLOC, TALINK, AVC support
  * Bugfix of TA RR type
  * OPENPGPKEY and SMIMEA support
  * Bugfix TAG rdata type presentation format for CAA RR type
  * Bugfix Zero sized gateways with IPSECKEY gateway_type 0
  * Guidance for integration with systemd
  * Also check for memory leaks with advances server capabilities.
  * Bugfix convert IP string to IP dict with getdns_str2dict() directly.

ok'ed by root@zta.lk
@karavan
Copy link

karavan commented Jul 7, 2021

When DNSSEC not working, also check system date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants