Skip to content

consul connect: allow "cni/*" network mode #26449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

gulducat
Copy link
Member

@gulducat gulducat commented Aug 6, 2025

Currently, to use Consul Connect (service mesh), you must use "bridge" network mode.

We've kept it this way so we can be confident that Connect can actually... connect.

But it's entirely possible to configure your own CNI network (Nomad's "bridge" is just a CNI network, after all) to be compatible with Connect. We just can't make guarantees about it, because CNI is highly configurable.

This PR opens up the validation to allow "cni/*" network modes in a "use at your own risk" capacity.

The CLI even presents a warning on job plan/run when your jobspec is so configured:

$ nomad plan connect-cni.nomad.hcl
...
Job Warnings:
3 warnings:

* connect sidecar: use CNI networks with Consul Connect at your own risk: group "api" uses network mode "cni/my-nomad"
* connect gateway: use CNI networks with Consul Connect at your own risk: group "ingress" uses network mode "cni/my-nomad"
* connect expose check: use CNI networks with Consul Connect at your own risk: group "api" uses network mode "cni/my-nomad"
...

Closes #8953 (from Sep 2020!)
Internal ref: NMD-585

rather than only allowing "bridge"

the cni network must be configured appropriately,
or the fancy network feature (Connect) won't work
(naturally)

so a warning is shown at job submission time
for groups with cni networks and connect tasks.
@gulducat gulducat added this to the 1.11.0 milestone Aug 6, 2025
@gulducat gulducat added theme/consul/connect Consul Connect integration theme/cni theme/docs Documentation issues and enhancements labels Aug 6, 2025
@gulducat gulducat marked this pull request as ready for review August 6, 2025 21:20
@gulducat gulducat requested review from a team as code owners August 6, 2025 21:20
@gulducat
Copy link
Member Author

gulducat commented Aug 7, 2025

Welp, I caused a compile error trying to fix something "real quick" and in fixing that, it seems to have broken connect+ipv6, so I'm hunting that down.


Edit: Nope, I just goofed the very thing that we are concerned people might goof when using this. I copied the ipv4-only Nomad "bridge" CNI config, so the ip6tables rules didn't get set up. 🙃

I will follow up with a second PR that updates the docs for the bridge config, so there's a full example of IPv6 there, too. Doing that separately so I can backport it, while this PR only goes to main for release with 1.11

Edit edit: docs PR for this: #26456

strings don't "startswith" around these parts
Copy link
Contributor

@aimeeu aimeeu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating the docs! Minor nit about Consul Connect name change.

all external traffic is provided by the sidecar.
You must set the group's `network` [`mode`][] to `bridge`, or an appropriately
configured `cni/*` network, for network isolation. Using a `cni/*` network
with Consul Connect requires extra care. You may model your network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with Consul Connect requires extra care. You may model your network
with Consul service mesh requires extra care. You may model your network

Consul Connect because Consul service mesh 3-4 years ago. I just recently updated the Nomad docs to reflect the name change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoa! TIL.

while I have you here, I'll admit I was surprised that this was the only page that mentions the restriction that I'm loosening. If you happen to know of any others, please let me know!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not surprised. Much of the Nomad content RE Consul needs refreshing (several tickets for Nomad IA phase 2). Jeff has scoped out the work, and I think Daniele is going to tackle the refrsh since he is a Consul expert.

aimeeu
aimeeu previously approved these changes Aug 7, 2025
aimeeu
aimeeu previously approved these changes Aug 8, 2025
all external traffic is provided by the sidecar.
You must set the group's `network` [`mode`][] to `bridge`, or an appropriately
configured `cni/*` network, for network isolation. Using a `cni/*` network
with Consul service mesh requires extra care. You may model your network
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"extra care" feels like it leaves a lot to the imagination. Can we provide specific guidance or otherwise specifically discourage people from doing this here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly I hadn't thought through all the nuances to give proper guidance, and I thought we wanted to be pretty hands-off on this, which is why I'd added the warning to do it "at your own risk".

but yeah, a little more guidance is probably worthwhile.

I believe these are the high points?

  1. the alloc network should be isolated, so traffic between the main task and the sidecar can't be intercepted (because it's probably not encrypted)
  2. incoming traffic needs to be able to reach the sidecar at the IP:port which will be advertised on the service
  3. traffic needs to be able to flow from different allocs' sidecars to one another (similar, but different from # 2?)

did I get any of that wrong? am I missing anything big?

there are way more ways to screw it up than to get it right, so I hesitate to give very much more in-depth information. 😅

Copy link
Member

@tgross tgross Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the alloc network should be isolated

CNI network configs get a network namespace created by Nomad, so that part's given I think?

incoming traffic needs to be able to reach the sidecar at the IP:port which will be advertised on the service

Where "the service" == "the sidecar service" here, but otherwise 👍

traffic needs to be able to flow from different allocs' sidecars to one another (similar, but different from # 2?)

👍

did I get any of that wrong? am I missing anything big?

Hm, what about tproxy mode? In that scenario, we'll create another set of IP tables rules inside the network namespace (and these are controlled by the consul-cni plugin, not by Nomad directly). So if you're trying to use cni/* with tproxy you need to account for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a network namespace created by Nomad

this should be isolated by default, but surely a CNI network could be (mistakenly) crafted to pry it open

tproxy mode

good call-out, and maybe best communicated with a new tab in the example configs here? (there's "Default" and "IPv6" there nowadays)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR documenting tproxy conflist: #26532

I'll follow up next week with a craftily worded description of the "extra care" 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/cni theme/consul/connect Consul Connect integration theme/docs Documentation issues and enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Consul Service Mesh on CNI networks
3 participants