Conversation
Capture design discussion for auto-generating MeshService on Universal after planned removal of Dataplane inbound tags in Kuma 3.0. Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
Mesh operator authors MeshService directly; CP no longer auto-generates on Universal. Tactical label-propagation patch still ships in 2.14. Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
Reviewer Checklist🔍 Each of these sections need to be checked by the reviewer of the PR 🔍:
|
Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
|
|
||
| On Universal, the CP auto-generates `MeshService` from `Dataplane` inbound tags (`pkg/core/resources/apis/meshservice/generate/generator.go`). A field report exposed two unmet needs: | ||
|
|
||
| - Custom `Dataplane.metadata.labels` and inbound tags do not propagate to the auto-generated `MeshService`. MMZS selects on `MeshService.metadata.labels`, so multi-zone selection by team/env is impossible for Universal today. |
There was a problem hiding this comment.
| - Custom `Dataplane.metadata.labels` and inbound tags do not propagate to the auto-generated `MeshService`. MMZS selects on `MeshService.metadata.labels`, so multi-zone selection by team/env is impossible for Universal today. | |
| - Custom `Dataplane.metadata.labels` and inbound tags do not propagate to the auto-generated `MeshService`. MeshMultizoneService selects on `MeshService.metadata.labels`, so multi-zone selection by team/env is impossible for Universal today. |
|
|
||
| * Good. Full M:M expressiveness; the multi-valued list fits port carve-out and aggregation. | ||
| * Good. The channel is the existing `Dataplane`; restricted-network operators are unblocked. | ||
| * Good. Typed and validated; typos fail at registration, not silently at MMZS. |
There was a problem hiding this comment.
By typos you mean typos in the meshServices key?
| * Good. The channel is the existing `Dataplane`; restricted-network operators are unblocked. | ||
| * Good. Typed and validated; typos fail at registration, not silently at MMZS. | ||
| * Good. Composes with the existing `kuma.io/workload` label and `Workload` generator. | ||
| * Good. The tactical label-propagation patch ships under it; the field report closes immediately. |
There was a problem hiding this comment.
this point is vague for me and it took me some time to figure out that by "field report" you mean the report which initiated the creation of this MADR; I would slightly maybe reword it or explain what you mean by "label-propagation patch ships under it"
| * Good. Typed and validated; typos fail at registration, not silently at MMZS. | ||
| * Good. Composes with the existing `kuma.io/workload` label and `Workload` generator. | ||
| * Good. The tactical label-propagation patch ships under it; the field report closes immediately. | ||
| * Bad. Concedes per-inbound service membership is load-bearing; that's a walk-back from "remove inbound tags entirely." |
There was a problem hiding this comment.
I don't understand this point, can you expand on it?
| * Good. The tactical label-propagation patch ships under it; the field report closes immediately. | ||
| * Bad. Concedes per-inbound service membership is load-bearing; that's a walk-back from "remove inbound tags entirely." | ||
| * Bad. Adds a hard-to-delete field on `Dataplane`. The polling generator and `inboundTagsDisabled` branching stay. | ||
| * Bad. `meshServices` (plural, on inbound) vs `MeshService` (resource) creates support confusion. |
There was a problem hiding this comment.
I don't think it has significance enough to put it as downsides
| * Bad. Concedes per-inbound service membership is load-bearing; that's a walk-back from "remove inbound tags entirely." | ||
| * Bad. Adds a hard-to-delete field on `Dataplane`. The polling generator and `inboundTagsDisabled` branching stay. | ||
| * Bad. `meshServices` (plural, on inbound) vs `MeshService` (resource) creates support confusion. | ||
| * Bad. First-DP-wins may not match operator intuition for blue/green (newest-wins). |
There was a problem hiding this comment.
I would need more explanation here, what do you mean by that
|
|
||
| #### Migration window behavior | ||
|
|
||
| A fleet in transition carries both forms. `checkMeshServicesConsistency` oscillates each tick under split fleets. The chosen option must enforce one of: |
There was a problem hiding this comment.
This whole first sentence is confusing and overly complicated, "a fleet in transition"? "oscillates each tick under split fleets" I don't get it
| * Bad. The mitigations (idempotent first-write, ref counting, primary DP) reintroduce CP coordination. | ||
| * Bad. Broadens DP token to write a shared resource; security regression. | ||
|
|
||
| ### Option B: operator-authored MeshService, no auto-generation |
There was a problem hiding this comment.
this is the option according to my understanding you are suggesting we should choose, so I would put it as a last option (D)
| - `WorkloadStatus.Conditions[PortConflict|LabelConflict]` are set and cleared on every reconcile pass; stale `True` values are unacceptable and must be tested. | ||
| - Conflict signals must mirror to `DataplaneInsight` so `kuma-dp` logs surface them locally for restricted-network operators. | ||
|
|
||
| ## Tactical patch (independent, ships in 2.14) |
There was a problem hiding this comment.
what does the "tactical patch" mean?
| ## Implications for Kong Mesh | ||
|
|
||
| Significant in 3.0. Every downstream policy matching on `kuma.io/service` inbound tags breaks at upgrade unless migrated. The downstream project must audit policies, run the migration tool, and document the 2.14-to-3.0 upgrade. |
There was a problem hiding this comment.
I don't think this point is correct as it's exactly the same for Kuma and it describes more the requirements for the mesh operators, and not how the Kong Mesh project needs to be modified/updated according to the changes described in the MADR I would expect
Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
There was a problem hiding this comment.
Pull request overview
Adds a new MADR documenting the design decision space for Universal-mode MeshService generation after inbound-tag removal (Kuma 3.0), including operator constraints (restricted networks), M:M workload↔service relationships, and an interim “tactical patch” plan.
Changes:
- Introduces MADR 103 describing the current problem, use cases, and decision drivers around Universal
MeshServicegeneration. - Documents multiple design options (DP-submitted templates, workload-only generation, typed per-inbound membership field, and operator-authored services) with tradeoffs and migration considerations.
- Captures intended release timeline, security/reliability implications, and an independently shippable label-propagation/observability patch.
| - Kuma 2.14: tag-free operation supported on Kubernetes (K8s) and Universal, opt-in via `inboundTagsDisabled`. The chosen path (Option D) ships here with a migration tool. The tactical label-propagation patch ships here. | ||
| - Kuma 3.0: tags removed by default. Downstream policies matching `kuma.io/service` break unless migrated. |
| * Bad. The tactical label-propagation patch cannot ship under it. | ||
|
|
||
| ## Tactical patch (independent, ships in 2.14) | ||
|
|
||
| The "tactical patch" is a small immediate change that ships ahead of the structural decision and closes the user-reported issue (the field report). It is independent of which option (A-D) is ultimately chosen. |
| - Custom `Dataplane.metadata.labels` and inbound tags do not propagate to the auto-generated `MeshService`. MeshMultizoneService selects on `MeshService.metadata.labels`, so multi-zone selection by team/env is impossible for Universal today. | ||
| - Some operators (ECS/Fargate behind restricted networks) cannot reach the zone CP REST API. Their only channel is the `Dataplane` shipped via `kuma-dp run --dataplane-file`. |
| Not generating MeshService on Universal is most clean solution. It removes all the ambiguities that come with MeshService generation. | ||
| It leaves full control over MeshService to mesh operator, they can label it as they need for grouping in MeshMultizoneService. |
| - Kuma 2.14: tag-free operation supported on Kubernetes (K8s) and Universal, opt-in via `inboundTagsDisabled`. The chosen path (Option D) ships here with a migration tool. The tactical label-propagation patch ships here. | ||
| - Kuma 3.0: tags removed by default. Downstream policies matching `kuma.io/service` break unless migrated. | ||
|
|
||
| ## Design |
There was a problem hiding this comment.
Maybe opening a can of worms, but what if we allow applying MeshService on global with kuma.io/zone: target-zone, so it'll be synced only to target-zone? In that case users that can't reach Zone CP API can use Global CP API and have their MeshService synced.
There was a problem hiding this comment.
Other option could be: MeshServiceTemplate(on global), where user could define template which would be resolved on the zone to MeshServices based on properties defined by the users
| inbound: | ||
| - port: 8080 | ||
| name: http | ||
| meshServices: [checkout] |
There was a problem hiding this comment.
It feels a bit like kuma.io/service but in an envelope of MeshService
| - `WorkloadStatus.Conditions[PortConflict|LabelConflict]` are set and cleared on every reconcile pass; stale `True` values are unacceptable and must be tested. | ||
| - Conflict signals must mirror to `DataplaneInsight` so `kuma-dp` logs surface them locally for restricted-network operators. | ||
|
|
||
| ### Option D: operator-authored MeshService, no auto-generation |
There was a problem hiding this comment.
It feels like this is the best and most safe option - as user has a full control of selector and won't be have problems with conflicts
Motivation
Capture design discussion for auto-generating
MeshServiceon Universal modeafter planned removal of
Dataplaneinbound tags in Kuma 3.0. A field reporton a downstream project surfaced two unmet needs: custom
Dataplanelabelsdo not propagate to
MeshService.metadata.labels(breaking MMZS selection onteam/env), and a subset of operators (ECS/Fargate behind restricted networks)
cannot reach the zone CP REST API at all.
The MADR documents the conflict between two in-flight changes (remove
auto-generation on Universal, remove inbound tags) and the M:M
Workload-to-MeshService cases (port carve-out, blue/green aggregation) that
constrain the design space.
Implementation information
Documentation only. Presents four options with tradeoffs:
kuma-dp run --meshservice-templatemeshServicesfieldPlus a tactical label-propagation patch that ships independently and closes
the field report. The MADR adds a release timeline (2.14 opt-in via
inboundTagsDisabled, 3.0 default), single conflict-resolution policy,required failure-mode handling table, performance budget, and exit criteria
scoped to the 2.14 cycle.
Supporting documentation
pkg/core/resources/apis/workload/pkg/core/resources/apis/meshservice/generate/