Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new 'workload name' label override #3374

Merged
merged 2 commits into from
Dec 16, 2024

Conversation

howardjohn
Copy link
Member

Today, we have a "workload name" concept that ends up in metrics.
This is defaulted to deriving from deployment metadata (Deployment name,
etc).

This approach is not perfect. We cannot always derive an appropriate
name (for instance, a Pod can be created directly!). WorkloadEntry,
which often is equvilent to Pod, is also using the WE name -- generally
we would want a higher order name there.

To fix this, I propose we add a label to explicitly specify the workload
name. This format mirrors the existing canonical-{service,revision}
label.

@howardjohn howardjohn requested a review from a team as a code owner December 2, 2024 18:48
@howardjohn howardjohn added the release-notes-none Indicates a PR that does not require release notes. label Dec 2, 2024
@istio-policy-bot
Copy link

🤔 🐛 You appear to be fixing a bug in Go code, yet your PR doesn't include updates to any test files. Did you forget to add a test?

Courtesy of your friendly test nag.

@istio-testing istio-testing added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 2, 2024
@@ -49,6 +49,17 @@ labels:
resources:
- Pod

- name: service.istio.io/workload-name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to think about this more, but we probably need a warning to advise users to consider label cardinality

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably so, same applies to the canonical-{service,revision} which I can add. Interestingly, usage of this label is often in order to reduce cardinality (where we do not correctly pick an abstract-enough name and end up with random suffix in the default derivation)

Copy link
Member

@linsun linsun Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn any Action Item (AI) from @keithmattix's comment? Wasn't sure if the warning should be added from your comment.

@costinm
Copy link
Contributor

costinm commented Dec 2, 2024 via email

@howardjohn
Copy link
Member Author

Not sure what is the problem - yes, a pod can be created directly and like WE it has a name.
Why would a user set a label that is different then the WE name ?

My pod/workload entry name is foobar-j871d2-da88a. Workload name is a metric label. Prometheus explodes.

We have heuristics to prevent this, but they are only that -- heuristics. And there is not (unless this PR lands) a way to override the heuristics. The intent here is to provide a reliable way to get sane behavior when the heuristic fails.

Having a separate resource with a selector is better

I disagree:

  • I haven't seen any data/anecdotes/etc indicating this, aside from you saying it, which doesn't match my intuition on what users prefer
  • Its >10x harder to implement that way
  • Its inconsistent with how the other fields, which have extremely similar functionality, are implemented

using the OTel mechanisms instead of inventing our own would likely be better too.

There is not one that I know of.

@hzxuzhonghu
Copy link
Member

Though it may give some facilities, but over relying annotation and labels is not a good UX. Istio may already have hundred of label + annotation. That's a horrible thing.

For this particular case, i would think if no deployment or job found, we can fallback to service name. In most cases, they may have same name.

@howardjohn
Copy link
Member Author

For this particular case, i would think if no deployment or job found, we can fallback to service name. In most cases, they may have same name.

this is injected into the pod so getting service name is impossible

@costinm
Copy link
Contributor

costinm commented Dec 10, 2024 via email

@howardjohn
Copy link
Member Author

And if it doesn't match a service - it is also fine to just not set that particular dimension - we are not breaking anything since the attribute doesn't exist yet.

The attribute does exist. I am not proposing adding a new attribute, I would be strongly against that. The problem is users today get a label set to WorkloadEntry/Pod name, or something else that is dynamic, which is extremely unsafe. This is trivial to reproduce, try to use an WorkloadEntry and you will see its name show up in the Prometheus labels.

Getting the service name is not impossible - pod has labels, service has label selectors.

Even if we hypothetically could, this attribute is something that is derived at startup/injection time so its practically infeasible. Not to mention the value is workload which is intentionally distinct from "service" as a concept in Istio.

@louiscryan
Copy link
Contributor

Its pretty clear the OTel can't be a solution for this issue since it can only operate on data that is in-band to the pipeline and an alternate semantic name could never be.

The proposal here mimics what kubernetes has done for itself with the app.kubernetes.io/name label per https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/

Could we use that instead?

@howardjohn
Copy link
Member Author

The proposal here mimics what kubernetes has done for itself with the app.kubernetes.io/name label per https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/

We currently have these labels in our metadata:

  • workload: what this PR is about. Typically this is the Deployment name, but it is a heuristic. Falls back to WE name or Pod name if we cannot derive a "parent" resource.
  • app: Logical "application" name. Made obsolete by 'canonical-service'
  • canonical_service: service.istio.io/canonical-name | app.kubernetes.io/name| app | workload name. Meant to be the new more modern way to reference an "application".
  • version: logical "application" version. Made obsolete by 'canonical-revision'
  • canonical_revision: service.istio.io/canonical-revision | app.kubernetes.io/version | version | "latest" . Meant to be the more modern way to reference a "version"
  • workload_namespace: namespace
  • principal: spiffe
  • cluster: cluster

So effectively we have 2 concepts:

  • Application - a name + a version, encoded as a legacy and modern label
  • Workload - an infrastructure thing, not the logical application

One may argue we should just have one concept, application. Maybe so, but IMO its pretty much too late to change it.

Notably, if we used app.kubernetes.io/name we are making a backwards incompatible change - users would no longer have a distinction between "application" and "workload". We know users are using this label as well, so they would unexpectedly get this new behavior on upgrade.

Using a new label doesn't have this issue since (1) its clearly scoped to just workload ( app.kubernetes.io/name impacts canonical service) and (2) its a new label so no one is using it yet.

@howardjohn
Copy link
Member Author

The expectation here is that a user will set this to point to the object type of the highest level of workload orchestration they have.

This could be a WorkloadGroup, a Deployment, an ArgoCD application, etc.

In many of these cases we can auto detect it. In fact, with WorkloadGroup we could (there is a "TODO" in the codebase to do this, which I will also address). But not all -- sometimes Pods and WorkloadEntry are created in a such a way that there is no way to determine their parent automatically. In these cases, it is imperative that we allow users a way to not place an unstable ID into a Prometheus label.

@louiscryan
Copy link
Contributor

I generally agree with what's said above.

We use the ability of Deployment to copy labels down onto pods to handle the "provide a semantic name" thing and we support a plethora of labels via that mechanism to align with common usage patterns.

WorkloadGroup plays the same role as Deployment here but we don't have an auto-copying mechanism and WorkloadEntry is allowed to exist without a WorkloadGroup. Assuming we keep using the same label definitions and derivation mechanisms I see no reason not support reading those labels of WorkloadEntry just as we would for a directly scheduled Job or Pod

@costinm
Copy link
Contributor

costinm commented Dec 14, 2024 via email

@costinm
Copy link
Contributor

costinm commented Dec 14, 2024 via email

@howardjohn
Copy link
Member Author

howardjohn commented Dec 14, 2024 via email

howardjohn added a commit to howardjohn/istio that referenced this pull request Dec 16, 2024
@howardjohn
Copy link
Member Author

Here is the implementation: istio/istio#54340.

WorkloadName is used in 3 places:

  1. Bootstrap for envoy, so the proxy knows how to emit source_workload and send it over metadata exchange
  2. WDS for Ztunnel, so it knows how to emit {source,destination}_workload
  3. EDS for envoy proxies, so it knows how to emit destination_workload as a fallback if metadata exchange does not have it

Can we set it to the workload group instead of workload entry

The above PR is adding this. It sort of already exists because istioctl will read the WorkloadGroup and generate a bootstrap config that has ISTIO_META_WORKLOAD_NAME which hits case (1) and sends over MX. This doesn't work with HBONE though. The above PR makes sure all 3 cases fallback to WorkloadGroup name

@costinm
Copy link
Contributor

costinm commented Dec 16, 2024 via email

@costinm
Copy link
Contributor

costinm commented Dec 16, 2024 via email

Copy link
Member

@linsun linsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - don't see outstanding comments unresolved, except Keith's comment which I am not exactly sure if needs any resolution.

Copy link
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concerns with having the label itself, we have plenty of others.

Today, we have a "workload name" concept that ends up in metrics.
This is defaulted to deriving from deployment metadata (Deployment name,
etc).

This approach is not perfect. We cannot always derive an appropriate
name (for instance, a Pod can be created directly!). WorkloadEntry,
which often is equvilent to Pod, is also using the WE name -- generally
we would want a higher order name there.

To fix this, I propose we add a label to explicitly specify the workload
name. This format mirrors the existing canonical-{service,revision}
label.
@istio-testing istio-testing merged commit 8fb86e9 into istio:master Dec 16, 2024
5 checks passed
howardjohn added a commit to howardjohn/istio that referenced this pull request Dec 17, 2024
istio-testing pushed a commit to istio/istio that referenced this pull request Dec 18, 2024
* Implement new workload-name label

Implementation for istio/api#3374

* add release note
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes-none Indicates a PR that does not require release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants