Skip to content

docs(MADR): auto-generate MeshService on Universal post-tags#16507

Open
Automaat wants to merge 7 commits into
masterfrom
ms-gen
Open

docs(MADR): auto-generate MeshService on Universal post-tags#16507
Automaat wants to merge 7 commits into
masterfrom
ms-gen

Conversation

@Automaat
Copy link
Copy Markdown
Contributor

@Automaat Automaat commented May 8, 2026

Motivation

Capture design discussion for auto-generating MeshService on Universal mode
after planned removal of Dataplane inbound tags in Kuma 3.0. A field report
on a downstream project surfaced two unmet needs: custom Dataplane labels
do not propagate to MeshService.metadata.labels (breaking MMZS selection on
team/env), and a subset of operators (ECS/Fargate behind restricted networks)
cannot reach the zone CP REST API at all.

The MADR documents the conflict between two in-flight changes (remove
auto-generation on Universal, remove inbound tags) and the M:M
Workload-to-MeshService cases (port carve-out, blue/green aggregation) that
constrain the design space.

Implementation information

Documentation only. Presents four options with tradeoffs:

  • A: kuma-dp run --meshservice-template
  • B: operator-authored MeshService, no auto-generation
  • C: workload-only auto-generation
  • D: structured per-inbound meshServices field

Plus a tactical label-propagation patch that ships independently and closes
the field report. The MADR adds a release timeline (2.14 opt-in via
inboundTagsDisabled, 3.0 default), single conflict-resolution policy,
required failure-mode handling table, performance budget, and exit criteria
scoped to the 2.14 cycle.

Supporting documentation

  • Existing Workload generator: pkg/core/resources/apis/workload/
  • Existing MeshService generator: pkg/core/resources/apis/meshservice/generate/
  • Technical Story link inline in the MADR

Changelog: skip

Capture design discussion for auto-generating MeshService on
Universal after planned removal of Dataplane inbound tags in
Kuma 3.0.

Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
@Automaat Automaat added the ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change) label May 8, 2026
@Automaat Automaat changed the title docs(MADR): MeshService autogen on Universal docs(MADR): auto-generate MeshService on Universal post-tags May 8, 2026
Mesh operator authors MeshService directly; CP no longer
auto-generates on Universal. Tactical label-propagation patch
still ships in 2.14.

Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Reviewer Checklist

🔍 Each of these sections need to be checked by the reviewer of the PR 🔍:
If something doesn't apply please check the box and add a justification if the reason is non obvious.

  • Is the PR title satisfactory? Is this part of a larger feature and should be grouped using > Changelog?
  • PR description is clear and complete. It Links to relevant issue as well as docs and UI issues
  • This will not break child repos: it doesn't hardcode values (.e.g "kumahq" as an image registry)
  • IPv6 is taken into account (.e.g: no string concatenation of host port)
  • Tests (Unit test, E2E tests, manual test on universal and k8s)
    • Don't forget ci/ labels to run additional/fewer tests
  • Does this contain a change that needs to be notified to users? In this case, UPGRADE.md should be updated.
  • Does it need to be backported according to the backporting policy? (this GH action will add "backport" label based on these file globs, if you want to prevent it from adding the "backport" label use no-backport-autolabel label)

Automaat added 4 commits May 11, 2026 09:47
Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>

On Universal, the CP auto-generates `MeshService` from `Dataplane` inbound tags (`pkg/core/resources/apis/meshservice/generate/generator.go`). A field report exposed two unmet needs:

- Custom `Dataplane.metadata.labels` and inbound tags do not propagate to the auto-generated `MeshService`. MMZS selects on `MeshService.metadata.labels`, so multi-zone selection by team/env is impossible for Universal today.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Custom `Dataplane.metadata.labels` and inbound tags do not propagate to the auto-generated `MeshService`. MMZS selects on `MeshService.metadata.labels`, so multi-zone selection by team/env is impossible for Universal today.
- Custom `Dataplane.metadata.labels` and inbound tags do not propagate to the auto-generated `MeshService`. MeshMultizoneService selects on `MeshService.metadata.labels`, so multi-zone selection by team/env is impossible for Universal today.


* Good. Full M:M expressiveness; the multi-valued list fits port carve-out and aggregation.
* Good. The channel is the existing `Dataplane`; restricted-network operators are unblocked.
* Good. Typed and validated; typos fail at registration, not silently at MMZS.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By typos you mean typos in the meshServices key?

* Good. The channel is the existing `Dataplane`; restricted-network operators are unblocked.
* Good. Typed and validated; typos fail at registration, not silently at MMZS.
* Good. Composes with the existing `kuma.io/workload` label and `Workload` generator.
* Good. The tactical label-propagation patch ships under it; the field report closes immediately.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this point is vague for me and it took me some time to figure out that by "field report" you mean the report which initiated the creation of this MADR; I would slightly maybe reword it or explain what you mean by "label-propagation patch ships under it"

* Good. Typed and validated; typos fail at registration, not silently at MMZS.
* Good. Composes with the existing `kuma.io/workload` label and `Workload` generator.
* Good. The tactical label-propagation patch ships under it; the field report closes immediately.
* Bad. Concedes per-inbound service membership is load-bearing; that's a walk-back from "remove inbound tags entirely."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this point, can you expand on it?

* Good. The tactical label-propagation patch ships under it; the field report closes immediately.
* Bad. Concedes per-inbound service membership is load-bearing; that's a walk-back from "remove inbound tags entirely."
* Bad. Adds a hard-to-delete field on `Dataplane`. The polling generator and `inboundTagsDisabled` branching stay.
* Bad. `meshServices` (plural, on inbound) vs `MeshService` (resource) creates support confusion.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it has significance enough to put it as downsides

* Bad. Concedes per-inbound service membership is load-bearing; that's a walk-back from "remove inbound tags entirely."
* Bad. Adds a hard-to-delete field on `Dataplane`. The polling generator and `inboundTagsDisabled` branching stay.
* Bad. `meshServices` (plural, on inbound) vs `MeshService` (resource) creates support confusion.
* Bad. First-DP-wins may not match operator intuition for blue/green (newest-wins).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need more explanation here, what do you mean by that


#### Migration window behavior

A fleet in transition carries both forms. `checkMeshServicesConsistency` oscillates each tick under split fleets. The chosen option must enforce one of:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole first sentence is confusing and overly complicated, "a fleet in transition"? "oscillates each tick under split fleets" I don't get it

* Bad. The mitigations (idempotent first-write, ref counting, primary DP) reintroduce CP coordination.
* Bad. Broadens DP token to write a shared resource; security regression.

### Option B: operator-authored MeshService, no auto-generation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the option according to my understanding you are suggesting we should choose, so I would put it as a last option (D)

- `WorkloadStatus.Conditions[PortConflict|LabelConflict]` are set and cleared on every reconcile pass; stale `True` values are unacceptable and must be tested.
- Conflict signals must mirror to `DataplaneInsight` so `kuma-dp` logs surface them locally for restricted-network operators.

## Tactical patch (independent, ships in 2.14)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the "tactical patch" mean?

Comment on lines +126 to +128
## Implications for Kong Mesh

Significant in 3.0. Every downstream policy matching on `kuma.io/service` inbound tags breaks at upgrade unless migrated. The downstream project must audit policies, run the migration tool, and document the 2.14-to-3.0 upgrade.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this point is correct as it's exactly the same for Kuma and it describes more the requirements for the mesh operators, and not how the Kong Mesh project needs to be modified/updated according to the changes described in the MADR I would expect

Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
@Automaat Automaat marked this pull request as ready for review May 11, 2026 11:42
@Automaat Automaat requested a review from a team as a code owner May 11, 2026 11:42
@Automaat Automaat requested review from Copilot, lukidzi and slonka May 11, 2026 11:42
@Automaat Automaat added this to the 2.14.x milestone May 11, 2026
@Automaat Automaat self-assigned this May 11, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new MADR documenting the design decision space for Universal-mode MeshService generation after inbound-tag removal (Kuma 3.0), including operator constraints (restricted networks), M:M workload↔service relationships, and an interim “tactical patch” plan.

Changes:

  • Introduces MADR 103 describing the current problem, use cases, and decision drivers around Universal MeshService generation.
  • Documents multiple design options (DP-submitted templates, workload-only generation, typed per-inbound membership field, and operator-authored services) with tradeoffs and migration considerations.
  • Captures intended release timeline, security/reliability implications, and an independently shippable label-propagation/observability patch.

Comment on lines +31 to +32
- Kuma 2.14: tag-free operation supported on Kubernetes (K8s) and Universal, opt-in via `inboundTagsDisabled`. The chosen path (Option D) ships here with a migration tool. The tactical label-propagation patch ships here.
- Kuma 3.0: tags removed by default. Downstream policies matching `kuma.io/service` break unless migrated.
Comment on lines +118 to +122
* Bad. The tactical label-propagation patch cannot ship under it.

## Tactical patch (independent, ships in 2.14)

The "tactical patch" is a small immediate change that ships ahead of the structural decision and closes the user-reported issue (the field report). It is independent of which option (A-D) is ultimately chosen.
Comment on lines +11 to +12
- Custom `Dataplane.metadata.labels` and inbound tags do not propagate to the auto-generated `MeshService`. MeshMultizoneService selects on `MeshService.metadata.labels`, so multi-zone selection by team/env is impossible for Universal today.
- Some operators (ECS/Fargate behind restricted networks) cannot reach the zone CP REST API. Their only channel is the `Dataplane` shipped via `kuma-dp run --dataplane-file`.
Comment on lines +133 to +134
Not generating MeshService on Universal is most clean solution. It removes all the ambiguities that come with MeshService generation.
It leaves full control over MeshService to mesh operator, they can label it as they need for grouping in MeshMultizoneService.
- Kuma 2.14: tag-free operation supported on Kubernetes (K8s) and Universal, opt-in via `inboundTagsDisabled`. The chosen path (Option D) ships here with a migration tool. The tactical label-propagation patch ships here.
- Kuma 3.0: tags removed by default. Downstream policies matching `kuma.io/service` break unless migrated.

## Design
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe opening a can of worms, but what if we allow applying MeshService on global with kuma.io/zone: target-zone, so it'll be synced only to target-zone? In that case users that can't reach Zone CP API can use Global CP API and have their MeshService synced.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other option could be: MeshServiceTemplate(on global), where user could define template which would be resolved on the zone to MeshServices based on properties defined by the users

inbound:
- port: 8080
name: http
meshServices: [checkout]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit like kuma.io/service but in an envelope of MeshService

- `WorkloadStatus.Conditions[PortConflict|LabelConflict]` are set and cleared on every reconcile pass; stale `True` values are unacceptable and must be tested.
- Conflict signals must mirror to `DataplaneInsight` so `kuma-dp` logs surface them locally for restricted-network operators.

### Option D: operator-authored MeshService, no auto-generation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this is the best and most safe option - as user has a full control of selector and won't be have problems with conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants