Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stubby dies when connectivity is lost #34

Closed
ArchangeGabriel opened this issue Nov 1, 2017 · 34 comments
Closed

stubby dies when connectivity is lost #34

ArchangeGabriel opened this issue Nov 1, 2017 · 34 comments
Labels

Comments

@ArchangeGabriel
Copy link
Contributor

My logs are often spammed with:

stubby: Could not schedule query: None of the configured upstreams could be used to send queries on the specified transports

This is generally the case after going out of suspend, before the connectivity is regained.

But when connectivity is completely lost (i.e. when roaming from work to home), stubby apparently even segfaults:

systemd[1]: ^[[0;1;39mstubby.service: Main process exited, code=dumped, status=11/SEGV
systemd[1]: ^[[0;1;39mstubby.service: Failed with result 'core-dump'.
@saradickinson
Copy link
Contributor

What platform and what version of stubby are you using?

@ArchangeGabriel
Copy link
Contributor Author

Linux (ArchLinux), 0.1.4.

@saradickinson
Copy link
Contributor

Sorry - and what version of getdns?

@ArchangeGabriel
Copy link
Contributor Author

1.2.0

@saradickinson
Copy link
Contributor

OK - there are a bunch of stability improvements in the develop branch including a fix for a segfault. You could try building from develop or hang on as we'll be releasing a 1.2.1 version in the next week or 2.

If that doesn't help if you are able to send us the core dump that would be great.

@ArchangeGabriel
Copy link
Contributor Author

I’ll wait for the next release, I’ve kept the coredump just in case.

@ArchangeGabriel
Copy link
Contributor Author

Hum… Now I get segfaults all the time, I can’t get it to work at all (I had to revert to network provided DNS instead). I will have to try getdns-git then.

@ArchangeGabriel
Copy link
Contributor Author

ArchangeGabriel commented Nov 7, 2017

Hum by the way, I have lines in dmesg:

stubby[573]: segfault at 3c ip 00007fe59e002e75 sp 00007fff6cafb940 error 4 in libgetdns.so.6.2.0[7fe59dfe4000+67000]

EDIT: Pressed enter while not focused, sorry about that.

@saradickinson
Copy link
Contributor

The latest release is now available - please try that:
getdns 1.2.1rc, stubby 0.1.5
https://getdnsapi.net/releases/getdns-1-2-1-rc1/

@ArchangeGabriel
Copy link
Contributor Author

Still coredumping without even answering any request. Here are the coredumps:
https://paste.xinu.at/ohbw6/

@ArchangeGabriel
Copy link
Contributor Author

So after changing getdns to 1.2.1 final, it seems to be working again. The only commit difference seems to be getdnsapi/getdns@260416a.

I’ll wait until Wednesday to be sure that it works under all my network environments and especially when roaming, keeping you updated.

@ArchangeGabriel
Copy link
Contributor Author

Should I backport a43be56?

@ArchangeGabriel
Copy link
Contributor Author

Still getting segfaults. Maybe I should try backporting this commit.

@ArchangeGabriel
Copy link
Contributor Author

OK, still coredumping anyway. Do you want new coredumps?

@wtoorop
Copy link
Contributor

wtoorop commented Nov 12, 2017

Hi Br uno,
I am very much interested in the coredumps, thanks for willing to provide them.
Besides the coredump I also need the stubby binary and possible also the libgetdns.so to be able to inspect the backtrace. Did you compile stubby and getdns with debugging symbols? (i.e. with -g in CFLAGS).
Thank you very much for willing to provide all this feedback! This is definitely very valuable for us!

@ArchangeGabriel
Copy link
Contributor Author

So here are the new coredumps: https://paste.xinu.at/1ux/.

I did not compile with debugging symbols yet (I’ve compiled in release mode for ArchLinux official repos), I’m doing so right now and replacing the one on my system (I’ll upload them here too).

Just in case they could still be usefull, current stubby binary is at https://paste.xinu.at/xdZ/, /usr/lib/libgetdns.so.6.2.1 is at https://paste.xinu.at/R0K/.

@ArchangeGabriel
Copy link
Contributor Author

OK I’ve found a way to reliably coredump stubby (drill www.hellobank.fr, not sure why this specific domain does this, the same one without www. works OK), so I already have everything required.

stubby binary with debug symbols: https://paste.xinu.at/oTvQ1w/
libgetdns.so.6.2.1 library with debug symbols: https://paste.xinu.at/RRx/
core.stubby.993.21629c238c924720863f9b509cc73910.28639.1510485185000000.lz4: https://paste.xinu.at/2uELr/

@ArchangeGabriel
Copy link
Contributor Author

(Well it does not coredumps systematically, but at the very least never resolves the domain)

@ArchangeGabriel
Copy link
Contributor Author

Got a coredump on roaming: I left work where I’m plugged to the network for home where I’m on Wi-Fi, so upon resuming from suspend, I had no network. I suspect stubby died somehow because of this.

Binaries are still the same, new coredump at https://paste.xinu.at/OPeZ/.

@ArchangeGabriel
Copy link
Contributor Author

So as a matter of fact stubby still segfaults even without roaming, and quite often. Do you want more coredumps? I can provides some more.

@ArchangeGabriel
Copy link
Contributor Author

This is my current boot:

[ 2514.980376] stubby[28639]: segfault at 44 ip 00007f4de0535075 sp 00007ffe72953180 error 4 in libgetdns.so.6.2.1[7f4de0516000+67000]
[28548.167065] stubby[3314]: segfault at 44 ip 00007fe462b00075 sp 00007ffc2fc1b830 error 4 in libgetdns.so.6.2.1[7fe462ae1000+67000]
[73184.214193] stubby[17732]: segfault at 44 ip 00007f0f90f58075 sp 00007ffd3b6f9520 error 4 in libgetdns.so.6.2.1[7f0f90f39000+67000]
[89433.147344] stubby[28216]: segfault at 44 ip 00007f3ab3934075 sp 00007ffd9574d970 error 4 in libgetdns.so.6.2.1[7f3ab3915000+67000]
[141723.762015] stubby[31407]: segfault at 44 ip 00007f4f634be075 sp 00007ffdba1a3c00 error 4 in libgetdns.so.6.2.1[7f4f6349f000+67000]
[230376.219121] stubby[25578]: segfault at 44 ip 00007f502b32b075 sp 00007fffb2787e00 error 4 in libgetdns.so.6.2.1[7f502b30c000+67000]
[230480.599009] stubby[4538]: segfault at 44 ip 00007f2b8f894075 sp 00007ffd2236f570 error 4 in libgetdns.so.6.2.1[7f2b8f875000+67000]
[260869.761059] stubby[4878]: segfault at 44 ip 00007fd5609fa075 sp 00007ffc664e18a0 error 4 in libgetdns.so.6.2.1[7fd5609db000+67000]

The first two are available in previous posts, the next three I think I’ve deleted them, but I still have the last three if you want.

@wtoorop
Copy link
Contributor

wtoorop commented Nov 20, 2017

Thanks Bruno, sorry for the late response, last week I was busy at ietf.
I will dive into the core dumps first thing tomorrow!

@wtoorop
Copy link
Contributor

wtoorop commented Nov 21, 2017

Yes please. The two dumps you provided via paste.xinu.at with debugging symbols, both pointed to the same location in dnssec.c. They were probably caused by failure to fetch all data to do DNSSEC validation in a timely manner. Indeed this code path needs to be reviewed more carefully.

For now, a patch might be this:
0001-BOGUS-answer-because-unable-to-fetch-root-DNSKEY.patch.txt

I am still interested in the last three core dumps since they might be caused by something else...
Thank you for your support!

@ArchangeGabriel
Copy link
Contributor Author

You’ve got 4 for the same price: https://paste.xinu.at/QPSpFj/ ;)

I’m applying the patch on my local build, will see if things change.

@wtoorop wtoorop added the bug label Nov 21, 2017
@wtoorop
Copy link
Contributor

wtoorop commented Nov 21, 2017

Thanks! They all point to line 3167 in dnssec.c:

willem@makaak:~/sink/core-dumps$ gdb stubby core.stubby.993.21629c238c924720863f9b509cc73910.4878.1511186950000000 
GNU gdb (Ubuntu 8.0.1-0ubuntu1) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from stubby...done.
[New LWP 4878]

warning: .dynamic section for "/lib64/ld-linux-x86-64.so.2" is not at the expected address (wrong library or version mismatch?)

warning: Could not load shared library symbols for 14 libraries, e.g. /usr/lib/libyaml-0.so.2.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `/usr/bin/stubby'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fd5609fa075 in ?? ()
(gdb) set solib-search-path .
Reading symbols from /home/willem/sink/core-dumps/libgetdns.so.6.2.1...done.
(gdb) bt
#0  0x00007fd5609fa075 in check_chain_complete (chain=0x55d18d1e3360) at ./dnssec.c:3167
#1  0x00007fd560a1d0d9 in poll_timeout_cb (event=<optimized out>) at ./extension/poll_eventloop.c:314
#2  poll_eventloop_run_once (loop=loop@entry=0x55d18d034c20, blocking=blocking@entry=1)
    at ./extension/poll_eventloop.c:346
#3  0x00007fd560a1d67d in poll_eventloop_run_once (blocking=1, loop=0x55d18d034c20)
    at ./extension/poll_eventloop.c:491
#4  poll_eventloop_run (loop=0x55d18d034c20) at ./extension/poll_eventloop.c:499
#5  0x000055d18c5bdca2 in main (argc=<optimized out>, argv=<optimized out>) at stubby.c:838

So the patch might work, but since I haven't reproduced the issue myself yet, it might also introduce new issues. At least I now know where to start. Thanks again for that.

@ArchangeGabriel
Copy link
Contributor Author

OK, so it has been much more stable lately. I’ve just had my first coredump right now, and the message in dmesg is a bit different:

stubby[599]: segfault at 555fc9a8f34a ip 00007efe5f3998f9 sp 00007ffc5dea3940 error 4 in libgetdns.so.6.2.1[7efe5f37a000+67000]

New coredump: https://paste.xinu.at/BSo8z/

wtoorop added a commit to getdnsapi/getdns that referenced this issue Nov 27, 2017
@wtoorop
Copy link
Contributor

wtoorop commented Nov 27, 2017

Yes! Thanks for finding this bug too! I was able to reproduce. We should have found this already were it not for broken error detection in our valgrind unit tests. Your report made another issue apparent too!
I have a fix on the getdns develop branch, or you could apply this patch:
0001-Access-of-freed-memory-in-stub-DNSSEC-cleanup-code.patch.txt

@ArchangeGabriel
Copy link
Contributor Author

Added to my local build, thanks. Note that I’m stressing stubby far less than before because I’m now using Unbound which then relays to stubby, so that I get the benefits of a validating resolver with caching and the advantages of DNS over TLS on port 443 with reuse of connection thanks to stubby.

@wtoorop
Copy link
Contributor

wtoorop commented Nov 27, 2017

In that case you might want to turn of validation in Stubby as well?
Or perhaps use the dnssec_return_all_statuses extension to be on the safe side...

@ArchangeGabriel
Copy link
Contributor Author

You mean I should change dnssec_return_status: GETDNS_EXTENSION_TRUE? What are other options (outside of nothing — would that be enough for Unbound?)?

@wtoorop
Copy link
Contributor

wtoorop commented Nov 29, 2017

Yes, leaving it out should be enough for Unbound.

@wtoorop
Copy link
Contributor

wtoorop commented Dec 13, 2017

Actually stubby would always do DNSSEC validation when behind an unbound forwarder, because it sets the dnssec_return_all_statuses extension. PR #53 combined with getdnsapi/getdns#363 will resolve this.

@wtoorop
Copy link
Contributor

wtoorop commented Dec 28, 2017

Both forementioned PRs have been merged, so issue is resolved.

@wtoorop wtoorop closed this as completed Dec 28, 2017
@ArchangeGabriel
Copy link
Contributor Author

Yes indeed. It works like a charm now, I’m starting to spreed my Unbound+Stubby configuration around me. We would need more DNS servers answering on port 443, but that’s an other topic.

wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this issue Jan 28, 2019
Package changes:
 * PLIST adjustment; stubby no longer built by default

Upstream changes:
* 2018-12-21: Version 1.5.0
  * RFE getdnsapi/stubby#121 log re-instantiating TLS
    upstreams (because they reached tls_backoff_time) at
    log level 4 (WARNING)
  * GETDNS_RESPSTATUS_NO_NAME for NODATA answers too
  * ZONEMD rr-type
  * getdns_query queries for addresses when a query name
    without a type is given.
  * RFE #408: Fetching of trust anchors will be retried
    after failure, after a certain backoff time. The time
    can be configured with
    getdns_context_set_trust_anchors_backoff_time().
  * RFE #408: A "dnssec" extension that requires DNSSEC
    verification.  When this extension is set, Indeterminate
    DNSSEC status will not be returned.
  * Issue #410: Unspecified ownership of get_api_information()
  * Fix for DNSSEC bug in finding most specific key when
    trust anchor proves non-existance of one of the labels
    along the authentication chain other than the non-
    existance of a DS record on a zonecut.
  * Enhancement getdnsapi/stubby#56 & getdnsapi/stubby#130:
    Configurable minimum and maximum TLS versions with
    getdns_context_set_tls_min_version() and
    getdns_context_set_tls_max_version() functions and
    tls_min_version and tls_max_version configuration parameters
    for upstreams.
  * Configurable TLS1.3 ciphersuites with the
    getdns_context_set_tls_ciphersuites() function and
    tls_ciphersuites config parameter for upstreams.
  * Bugfix in upstream string configurations: tls_cipher_list and
    tls_curve_list
  * Bugfix finding signer for validating NSEC and NSEC3s, which
    caused trouble with the partly tracing DNSSEC from the root
    up, introduced in 1.4.2.  Thanks Philip Homburg

* 2018-05-11: Version 1.4.2
  * Bugfix getdnsapi/stubby#87: Detect and ignore duplicate certs
    in the Windows root CA store.
  * PR #397: No TCP sendto without TCP_FASTOPEN
    Thanks Emery Hemingway
  * Bugfix getdnsapi/stubby#106: Core dump when printing certain
    configuration. Thanks Han Vinke
  * Bugfix getdnsapi/stubby#99: Partly trace DNSSEC from the root
    up (for tld and sld), to find insecure delegations quicker.
    Thanks UniverseXXX
  * Bugfix: Allow NSEC spans starting from (unexpanded) wildcards
    Bug was introduced when dealing with CVE-2017-15105
  * Bugfix getdnsapi/stubby#46: Don't assume trailing zero with
    string bindata's.  Thanks Lonnie Abelbeck
  * Bugfix #394: Update src/compat/getentropy_linux.c in order to
    handle ENOSYS (not implemented) fallback.
    Thanks Brent Blood
  * Bugfix #395: Clarify that libidn2 dependency is for version 2.0.0
    or higher. Thanks mire3212

* 2018-03-12: Version 1.4.1
  * Bugfix #388: Prevent fallback to an earlier tries upstream within a
    single query.  Thanks Robert Groenenberg
  * PR #387: Compile with OpenSSL with deprecated APIs disabled.
    Thanks Rosen Penev
  * PR #386: UDP failover improvements:
    - When all UDP upstreams fail, retry them (more or less) equally
    - Limit maximum UDP backoff (default to 1000)
      This is configurable with the --with-max-udp-backoff configure
      option.
    Thanks Robert Groenenberg
  * Bugfix: Find zonecut with DS queries (instead of SOA queries).
    Thanks Elmer Lastdrager
  * Bugfix #385: Verifying insecure NODATA answers (broken since 1.2.1).
    Thanks hanvinke
  * PR #384: Fix minor spelling and formatting.  Thanks dkg.
  * Bugfix #382: Parallel install of getdns_query and getdns_server_mon

* 2018-02-21: Version 1.4.0
  * .so revision bump to please fedora packaging system.
    Thanks Paul Wouters
  * Specify the supported curves with getdns_context_set_tls_curves_list()
    An upstream specific list of supported curves may also be given
    with the tls_curves_list setting in the upstream dict with
    getdns_context_set_upstream_recursive_servers()
  * New tool getdns_server_mon for checking upstream recursive
    resolver's capabilities.
  * Improved handling of opportunistic back-off.  If other transports
    are working, don't forcibly promote failed upstreams just wait for
    the re-try timer.
  * Hostname authentication with libressl
    Thanks Norbert Copones
  * Security bugfix in response to CVE-2017-15105.  Although getdns was
    not vulnerable for this specific issue, as a precaution code has been
    adapted so that signatures of DNSKEYs, DSs, NSECs and NSEC3s can not
    be wildcard expansions when used with DNSSEC proofs.  Only direct
    queries for those types are allowed to be wildcard expansions.
  * Bugfix PR#379: Miscelleneous double free or corruption, and corrupted
    memory double linked list detected issue, with serving functionality.
    Thanks maddie and Bruno Pagani
  * Security Bugfix PR#293: Check sha256 pinset's
    with OpenSSL native DANE functions for OpenSSL >= 1.1.0
    with Viktor Dukhovni's danessl library for OpenSSL >= 1.0.0
    don't allow for authentication exceptions (like self-signed
    certificates) otherwise.  Thanks Viktor Dukhovni
  * libidn2 support.  Thanks Paul Wouters

* 2017-12-21: Version 1.3.0
  * Bugfix #300: Detect dnsmasq and skip unit test that fails with it.
    Thanks Tim Rohsen and Konomi Kitten
  * Specify default available cipher suites for authenticated TLS
    upstreams with getdns_context_set_tls_ciphers_list()
    An upstream specific available cipher suite may also be given
    with the tls_cipher_list setting in the upstream dict with
    getdns_context_set_upstream_recursive_servers()
  * PR #366: Add support for TLS 1.3 and Chacha20-Poly1305
    Thanks Pascal Ernster
  * Bugfix #356: Do Zero configuration DNSSEC meta queries over on the
    context configured upstreams.  Thanks Andreas Schulze
  * Report default extension settings with
    getdns_context_get_api_information()
  * Specify locations at which CA certificates for verification purposes
    are located: getdns_context_set_tls_ca_path()
    getdns_context_set_tls_ca_file()
  * getdns_context_set_resolvconf() function to initialize a context
    upstreams and suffices with a resolv.conf file.
    getdns_context_get_resolvconf() to get the file used to initialize
    the context's upstreams and suffixes.
    getdns_context_set_hosts() function to initialize a context's
    LOCALNAMES namespace.
    getdns_context_get_hosts() function to get the file used to initialize
    the context's LOCALNAMES namespace.
  * get which version of OpenSSL was used at build time and at run time
    when available with getdns_context_get_api_information()
  * GETDNS_RETURN_IO_ERROR return error code
  * Bugfix #359: edns_client_subnet_private should set family
    Thanks Daniel Areiza & Andreas Schulze
  * Bugfix getdnsapi/stubby#34: Segfault issue with native DNSSEC
    validation.  Thanks Bruno Pagani

* 2017-11-11: Version 1.2.1
  * Handle more I/O error cases.  Also, when an I/O error does occur,
    never stop listening (with servers), and
    never exit (when running the built-in event loop).
  * Bugfix: Tolerate unsigned and unused RRsets in the authority section.
            Fixes DNSSEC with BIND upstream.
  * Bugfix: DNSSEC validation without support records
  * Bugfix: Validation of full recursive DNSKEY lookups
  * Bugfix: Retry to validate full recursion BOGUS replies with zero
    configuration DNSSEC only when DNSSEC was actually requested
  * Bugfix #348: Fix a linking issue in stubby when libbsd is present
    Thanks Remi Gacogne
  * More robust scheduling; Eliminating a segfault with long running
    applications.
  * Miscellaneous Windows portability fixes from Jim Hague.
  * Fix Makefile dependencies for parallel install.
    Thanks ilovezfs

* 2017-09-29: Version 1.2.0
  * Bugfix of rc1: authentication of first query with TLS
    Thanks Travis Burtrum
  * A function to set the location for library specific data,
    like trust-anchors: getdns_context_set_appdata().
  * Zero configuration DNSSEC - build upon the scheme
    described in RFC7958.  The URL from which to fetch
    the trust anchor, the verification CA and email
    can be set with the new getdns_context_set_trust_anchor_url(),
    getdns_context_set_trust_anchor_verify_CA() and
    getdns_context_set_trust_anchor_verify_email() functions.
    The default values are to fetch from IANA and to validate
    with the ICANN CA.
  * Update of Stubby with yaml configuration file and
    logging from a certain severity support.
  * Fix tpkg exit status on test failure. Thanks Jim Hague.
  * Refined logging levels for upstream statistics
  * Reuse (best behaving) backed-off TLS upstreams when non are usable.
  * Let TLS upstreams back-off a incremental amount of time.
    Back-off time starts with 1 second and is doubled each failure, but
    will not exceed the time given by getdns_context_set_tls_backoff_time()
  * Make TLS upstream management more resilient to temporary outages
    (like laptop sleeps)

* 2017-09-04: Version 1.1.3
  * Small bugfixes that came out of static analysis
  * No annotations with the output of getdns_query anymore,
    unless -V option is given to increase verbosity
    Thanks Ollivier Robert
  * getdns_query will now exit with failure status if replies are BOGUS
  * Bugfix: dnssec_return_validation_chain now also works when fallback
    to full recursion was needed with dnssec_roadblock_avoidance
  * More clear build instructions from Paul Hoffman.  Thanks.
  * Bugfix #320.1: Eliminate multiple closing of file descriptors
    Thanks Neil Cook
  * Bugfix #320.2: Array bounds bug in upstream_select
    Thanks Neil Cook
  * Bugfix #318: getdnsapi/getdns/README.md links to nonexistent wiki
    pages.  Thanks James Raftery
  * Bugfix #322: MacOS 10.10 (Yosemite) provides TCP fastopen interface
    but does not have it implemented.  Thanks Joel Purra
  * Compile without Stubby by default.  Stubby now has a git repository
    of its own.  The new Stubby repository is added as a submodule.
    Stubby will still be build alongside getdns with the --with-stubby
    configure option.

* 2017-07-03: Version 1.1.2
  * Bugfix for parallel make install
  * Bugfix to trigger event callbacks on socket errors
  * A getdns_context_set_logfunc() function with which one may
    register a callback log function for certain library subsystems
    at certain levels.  Currently this can only be used for
    upstream stastistics subsystem.

* 2017-06-15: Version 1.1.1
  * Bugfix #306 hanging/segfaulting on certain (IPv6) upstream failures
  * Spelling fix s/receive/receive.  Thanks Andreas Schulze.
  * Added stubby-setdns-macos.sh script to support Homebrew formula
  * Include stubby.conf in the districution tarball
  * Bugfix #286 reschedule reused listening addresses
  * Bugfix #166 Allow parallel builds and unit-tests
  * NSAP-PTR, EID and NIMLOC, TALINK, AVC support
  * Bugfix of TA RR type
  * OPENPGPKEY and SMIMEA support
  * Bugfix TAG rdata type presentation format for CAA RR type
  * Bugfix Zero sized gateways with IPSECKEY gateway_type 0
  * Guidance for integration with systemd
  * Also check for memory leaks with advances server capabilities.
  * Bugfix convert IP string to IP dict with getdns_str2dict() directly.

ok'ed by root@zta.lk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants