Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hostDNS incompatibility #9143

Closed
jhogendorn opened this issue Aug 9, 2024 · 2 comments · Fixed by #9179
Closed

hostDNS incompatibility #9143

jhogendorn opened this issue Aug 9, 2024 · 2 comments · Fixed by #9179
Assignees

Comments

@jhogendorn
Copy link

Bug Report

Description

I was having some issues applying the config, as soon as I ran it, the machine would start failing network connectivity. I was helped by Justin Garrison in slack and we set hostDNS: disabled to resolve it but filing the bug in case theres further investigation needed. The upstream dns servers are .2 (coredns) and .4 (adguard). It seems the cache system was having some incompatibility. No logs showing errors in the upstream dns.

Logs

support.zip

Environment

  • Talos version: 1.7.5
  • Platform: proxmox 8.2.0
@DmitriyMV DmitriyMV self-assigned this Aug 11, 2024
@smira
Copy link
Member

smira commented Aug 12, 2024

Unfortunately (it's a bug), but the support bundle doesn't contain the log for dns-resolve-cache which would be the one which has the clue to the problem.

If you could reproduce and grab talosct logs dns-resolve-cache, that would be perfect. Thank you!

smira added a commit to smira/go-talos-support that referenced this issue Aug 12, 2024
See siderolabs/talos#9143

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
@jhogendorn
Copy link
Author

I went to replicate this using the existing bootstrapped cluster and it worked with hostDNS: true.
I have changed some settings in my coredns instance in the meantime, so that could be a confounding factor.
I made a new talos cluster config and spun up an instance to apply-config to and it also worked with hostDNS: true.

The change I think is relevant in the local dns servers is that previously, coredns (.2) (the first dns in the dhcp list) was not configured to pass requests it didnt match upstream anywhere, and it now is. This was ok before because clients would get a miss on .2 and then hit .4 and get a hit. Perhaps some config in talos image is unable to do the same?

To test this i disable the upstream dns config in coredns and retried the new cluster, and got the same failure as before.

I've attached the requested log file.

dns-resolve-cache.log

the coredns configuration difference is just having a block similar to:

. { # this upstream forward rule makes talos dns work
    forward . dns://192.168.0.4
    forward . dns://1.1.1.1
}

DmitriyMV added a commit to DmitriyMV/talos that referenced this issue Aug 14, 2024
Do not return response to the client if we got SERVFAIL or REFUSED,
until we run out of upstreams.

Fixes siderolabs#9143

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue Aug 14, 2024
Do not return response to the client if we got SERVFAIL or REFUSED,
until we run out of upstreams.

Fixes siderolabs#9143

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
smira pushed a commit to smira/talos that referenced this issue Sep 25, 2024
Do not return response to the client if we got SERVFAIL or REFUSED,
until we run out of upstreams.

Fixes siderolabs#9143

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
(cherry picked from commit a5bd770)
smira pushed a commit to smira/talos that referenced this issue Sep 25, 2024
Do not return response to the client if we got SERVFAIL or REFUSED,
until we run out of upstreams.

Fixes siderolabs#9143

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
(cherry picked from commit a5bd770)
smira pushed a commit to smira/talos that referenced this issue Sep 25, 2024
Do not return response to the client if we got SERVFAIL or REFUSED,
until we run out of upstreams.

Fixes siderolabs#9143

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
(cherry picked from commit a5bd770)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants