fix(remote): use OS DNS resolver in pyqwest transport#1077
Open
EngHabu wants to merge 6 commits into
Open
Conversation
pyqwest defaults to use_system_dns=False, which routes all RPCs through
the bundled trust-dns resolver. trust-dns happily returns AAAA records
even on hosts with no usable IPv6 default route (e.g. tethered mobile
hotspots that advertise IPv6 via RA but don't actually route it). The
result is every RPC hangs and eventually fails with:
client error (Connect): dns error: proto error: io error:
No route to host (os error 65)
curl works on the same network because it uses getaddrinfo, which
honors AI_ADDRCONFIG and suppresses AAAA records when there's no v6
default route.
Setting use_system_dns=True on the HTTPTransport routes lookups through
getaddrinfo, matching curl's behavior and eliminating the spurious
EHOSTUNREACH failures on flaky/tethered networks.
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
pyqwest defaults to use_system_dns=False, which routes all RPCs through
the bundled trust-dns resolver. trust-dns happily returns AAAA records
even on hosts with no usable IPv6 default route (e.g. tethered mobile
hotspots that advertise IPv6 via RA but don't actually route it). The
result is every RPC hangs and eventually fails with:
client error (Connect): dns error: proto error: io error:
No route to host (os error 65)
curl works on the same network because it uses getaddrinfo, which
honors AI_ADDRCONFIG and suppresses AAAA records when there's no v6
default route.
Setting use_system_dns=True on the HTTPTransport routes lookups through
getaddrinfo, matching curl's behavior and eliminating the spurious
EHOSTUNREACH failures on flaky/tethered networks.
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
5ecf678 to
cf2cdf9
Compare
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
cf2cdf9 to
9286007
Compare
…tham/use-system-dns # Conflicts: # src/flyte/remote/_client/auth/_session.py
pingsutw
approved these changes
May 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Every Flyte SDK RPC goes through ConnectRPC's
pyqwestHTTP transport.pyqwest.HTTPTransportdefaults touse_system_dns=False, which routesall name lookups through the bundled Rust
trust-dnsresolver insteadof the OS's
getaddrinfo. This subtly breaks on a common real-worldnetwork condition.
The break
On networks that advertise IPv6 via RA but don't actually have a usable
v6 default route — typical of tethered phones, hotel Wi-Fi captive
portals after handoff, some corporate VPNs, and most mobile hotspots —
the bundled resolver can return AAAA records that the kernel then refuses
to route. Every RPC hangs the connect timeout and eventually fails with:
curlagainst the same hostname succeeds on the same machine at thesame time, because
curlusesgetaddrinfo, which honors the OS resolverpolicy and address selection. So users see a confusing "Flyte CLI is
broken but my browser/curl works fine" report.
Repro on a hotspot:
Change
Pass
use_system_dns=Truetopyqwest.HTTPTransportin_build_pyqwest_clientby default. This routes lookups throughgetaddrinfo, matchingcurl's and the rest of the OS's behavior, andeliminates the spurious
EHOSTUNREACHfailures on flaky / tetherednetworks.
Server deployments that prefer application-owned DNS behavior can opt
back into pyqwest's bundled resolver with:
No new dependencies.
Test plan
(
Run.listallfails in ~3s with EHOSTUNREACH whilecurlto thesame host succeeds)
Run.listall(limit=3)completes in ~1s
the
_FLYTE_USE_PYQWEST_DNS_RESOLVER=trueopt-inuv run python -m pytest tests/flyte/remote/test_session.pymake fmtmake mypy9286007d975bed8567e5b073108010c0ca0311a1