-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
Hi 👋
Title: Perf issue with c-ares DNS resolver
Description:
The c-ares DNS resolver calls getifaddrs
syscall. getifaddrs
can be slow or CPU intensive (example #19717). If Envoy has many DNS clusters, the performance hit can be considerable.
Repro steps:
This issue happened for us with the following scenario:
- A server runs latest Envoy version
- The server has 700+ network namespaces
- XDS server tries to push between 500 and 2k DNS clusters onto Envoy
As a result, after connecting to the XDS server, the main thread is saturated initializing c-ares DNS resolvers. For us, it causes high CPU usage and disconnection-reconnection loops to the XDS server, probably because the main thread is busy listing interfaces instead of responding to keep-alives.
Root cause:
Because there are many network namespaces on our server, getifaddrs
is CPU intensive.
It seems that c-ares DNS resolver lists the network interfaces of the machine to satisfy filter_unroutable_families
https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/network/dns_resolver/cares/v3/cares_dns_resolver.proto#extensions-network-dns-resolver-cares-v3-caresdnsresolverconfig:
(bool) The resolver will query available network interfaces and determine if there are no available interfaces for a given IP family. It will then filter these addresses from the results it presents. e.g., if there are no available IPv4 network interfaces, the resolver will not provide IPv4 addresses.
Suggestion:
No matter the value of filter_unroutable_families
, interfaces are listed anyway. I would suggest to:
- Either only list interfaces whenever
filter_unroutable_families
is true (early return here https://github.com/envoyproxy/envoy/blob/main/source/extensions/network/dns_resolver/cares/dns_impl.cc#L508) - Or only list interfaces once and re-use the result across different DNS resolvers
Other ideas welcome. I'm happy to provide a PR too