upstream: reject and gracefully handle invalid bind network namespace#45721
upstream: reject and gracefully handle invalid bind network namespace#45721bpalermo wants to merge 1 commit into
Conversation
Binding an upstream connection to a configured Linux network namespace (`SocketAddress.network_namespace_filepath`) could crash Envoy when the namespace did not exist: `createClientConnection` returned nullptr and the result was dereferenced unconditionally in `HostImplBase::createConnection`. This change addresses the crash from two angles: - A cluster whose `upstream_bind_config` source address references a network namespace that cannot be opened is now rejected at config admission time, mirroring the existing listener (LDS) behavior. This is guarded by the runtime feature `envoy.reloadable_features.reject_invalid_bind_network_namespace` so it can be reverted on a live instance. - If the namespace becomes unavailable at runtime (after admission), the upstream connection now fails gracefully instead of crashing. The host connection factory returns a null connection, and the TCP/HTTP1/HTTP2/ mixed connection pools surface this as a `FailedToCreateConnection` pool failure, matching how the HTTP/3 pool already behaves. Signed-off-by: Bruno Palermo <b@palermo.dev>
|
Hi @bpalermo, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
|
Thanks for this improvement. Would it be possible to split this PR into two to deal with different concerns: namespace validation at config ingestion time and preventing hard crash when socket can not be bound to a namespace. It will make review easier. For config validation, I do not think runtime guard is the right approach, since it is temporary in nature (removed after 6 months). I think there are scenarios where namespace might not exist at the time when Envoy parses cluster config, so NACKing it would be wrong at this point. But may exist later when traffic is flowing. Listener socket bind config is different, since it needs to be bound immediately when config is ingested, so it makes sense to check it right away. I wonder if this should be an bool option in the bind config. /wait-any |
|
@yanavlasov thanks for the review — that all makes sense. I've split this PR in two:
Closing this PR in favor of the two above. |
Commit Message
upstream: reject and gracefully handle invalid bind network namespace
Additional Description
Binding an upstream connection to a configured Linux network namespace
(
SocketAddress.network_namespace_filepath) could crash Envoy when the namespace didnot exist:
DispatcherImpl::createClientConnectionreturnsnullptron netns failure,and the result was dereferenced unconditionally in
HostImplBase::createConnection. Thelistener (LDS) path already rejected such configs gracefully; only the cluster (CDS)
upstream-bind path crashed.
This fixes the crash from two angles:
upstream_bind_configsource addressreferences a network namespace that cannot be opened is now rejected at config
admission time, mirroring the listener behavior. Guarded by the runtime feature
envoy.reloadable_features.reject_invalid_bind_network_namespaceso it can be revertedon a live instance.
host connection factory returns a null connection and the TCP/HTTP1/HTTP2/mixed
connection pools surface this as a
FailedToCreateConnectionpool failure — matching howthe HTTP/3 pool already behaves — instead of crashing.
Risk Level
Low — the new rejection is runtime-guarded; the runtime-safety change converts a crash into
an existing graceful failure path.
Testing
Unit tests added:
Network::Utility::validateNetworkNamespace(success/open-failure), andCDS cluster rejection with the runtime guard enabled and disabled.
Docs Changes
N/A
Release Notes
Added a bug-fix changelog fragment.
Platform Specific Features
Network namespace binding is Linux-only; the new validation and code paths are
#if defined(__linux__)-guarded.