Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Change application identifier to allow app names longer than 63 chars #6425

Merged
merged 6 commits into from
Sep 13, 2021
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
397 changes: 397 additions & 0 deletions docs/proposals/application-name-identifier.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,397 @@
---
title: Change the way application resources are identified
authors:
- "@jannfis"
sponsors:
- TBD
reviewers:
- TBD
approvers:
- TBD

creation-date: 2021-06-07
last-updated: 2021-06-07
---

# Change the way application resources are identified

This is a proposal to introduce using a hash value as application identifier
in the application instance label. This will allow application names longer
than 63 characters. As an additional goal, we propose to introduce a GUID as
installation ID that will allow multiple Argo CD instances manage resources
on the same cluster, without the need for manual reconfiguration.

## Open Questions [optional]

The major open questions are:

* Which hashing algorithm to use? The authors think that `SHA-1` and `FNV-1a`
are the best options on the table, with `FNV-1a` being the more performant
one and `SHA-1` being the one most resilient against collisions with a
compromise for speed. There is a pretty good comparison of non cryptographic
hash algorithms on Stack Exchange that suggests that `FNV-1a` might be more
than sufficient for our use-cases:
https://softwareengineering.stackexchange.com/a/145633

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: hash choice -

If we're going to go with SHA-based algorithm, I recommend SHA-256, as SHA-1 is considered broken. It doesn't matter for this use case (we're not using the cryptographic properties of a hash, as you note later), but we will constantly get people telling us it's broken and asking us when we will move to another algorithm 😄 .

Copy link
Member

@jgwest jgwest Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, correct me if I'm wrong, but I don't think hash performance is an issue: for each Application, we only need to calculate this value once, and then store it in memory (eg memoize the result of the hash function for each application).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a good argument there. I favored the SHA-1 variant for producing the shorter hash values, but I think you are right in that we may want to consider usage of SHA-256 instead. Although we do not need to be secure in cryptographic terms here, probably SHA-1 will become utterly deprecated soon enough (maybe up to the point of being removed from any standard libraries?), and even Git will move away from it. So yeah, it leaves a "bad aftertaste" kind of thing :)

Thanks for bringing it up, I will change it in the proposal accordingly.

Copy link
Collaborator

@alexmt alexmt Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we've got two similar discussion threads in this PR. Copying comments from another thread here.

As I understand there are trying to solve three issues:

  • allow app with long names
  • support two argocd managing apps in the same cluster
  • avoid argocd confusion that happened when something else adds app.kubernetes.io/instance label

Also, I think additional requirements are: try to keep it human-friendly and try avoid breaking changes.

To solve I want to propose:

  • use annotations instead of labels for tracking to solve long names issue
  • use three annotations (e.g. argo-cd.argoproj.io/application, argo-cd.argoproj.io/resourceKey, ``argo-cd.argoproj.io/installation` ) that includes app name, resource key and argocd installation id
  • allow the user to switch to annotation-based tracking when necessary using new app.spec.tracking field. (this comment tries to justify why new way of tracking should be optional)

So main changes in my proposal:

  • use annotations + human-readable strings instead of hash + labels.
  • new way of tracking should be explicitly enabled per app

@jannfis , @jgwest what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not provide two different methods for tracking applications, as it will complicate things greatly and probably will cause some confusion among users.

What speaks against using the label for tracking and the annotations for human-readable identifiers together? So the annotations will only serve informational purposes, and the label is used to select and identify resources.

To solve

avoid argocd confusion that happened when something else adds app.kubernetes.io/instance label

I would rather propose to change the default app.kubernetes.io/instance label to something more sensible, Argo CD specific. E.g. argocd.argoproj.io/instance - I think the Argo CD Helm chart already sets this by default.

IMHO, I think we shouldn't sacrifice the label in order to enable longer-than-63-chars application names.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to avoid painful migration for end-users: if Argo CD changes the tracking method then all managed resources suddenly became out-of-sync.

However, I agree that two ways if tracking is not great. In order to make it easier to migrate, we should support using "legacy" way of tracking for 1~2 releases.

* The proposal defines a stretch goal for allowing multiple Argo CD instances
to manage applications on the same cluster, without the need for changing
the _application instance label key_ of each installation. This change may
sound intrusive, but the authors think the benefits would outweigh the
disadvantages.

## Summary

Argo CD identifies resources it manages by setting the _application instance
label_ to the name of the managing `Application` on all resources that are
managed (i.e. reconciled from Git). The default label used is the well-known
label `app.kubernetes.io/instance`.

This proposal suggests to change the _value_ of this label to not use the
literal application name, but instead use a value from a stable and reasonable
collision free hash algorithm.

## Motivation

The main motivation behind this change is the Kubernetes restriction for the
maximum allowed length of label values, which no more than 63 characters.

In large scale installations, in order to build up an easy to understand and
well-formed naming schemes for applications managed by Argo CD, people often
hit the 63 character limit and need to define the naming scheme around this
unnecessary limit.

Furthermore, proposed changes such as described in
https://github.com/argoproj/argo-cd/pull/6409
would require the application names to include more implicit information
(such as the application's source namespace), which will even add more
characters to the application names.

An additional motivation - while we're at touching at application instance
label - is to improve the way how multiple Argo CD instances could manage
applications on the same cluster, without requiring the user to actually
perform instance specific configuration.

### Goals

* Allow application names of more than 63 characters

* Keep using a label to properly _select_ resources managed by an Application
using Kubernetes label selectors

* Keep having a human-readable way to identify resources that belong to a
given Argo CD application

* As a stretch-goal, allow multiple Argo CD instances to manage resources on
the same cluster without the need for configuring application instance label
key (usually `app.kubernetes.io/instance`)

### Non-Goals

* Change the default name of the application instance label

## Proposal

We propose to move from a _human-readable_ value to identify the resources of
an application to a _machine-readable_ value of fixed length. For this purpose
we propose to use a one-way hashing algorithm. The chosen algorithm should be
_collision-free_ to a certain extent, but does not require to be _secure_ in a
cryptographic evaluation.

We further propose to add an _annotation_ to the managed resources, which will
contain the plain text value of the application's name so that humans and other
consumers of the resources will still be able to identify the application that
is managing the resource. Annotations in Kubernetes don't suffer from the same
length restrictions as labels, and can easily store about every possible
application name that users might come up with.

Furthermore, we may want to consider encoding a unique, persistant identifier
of the Argo CD installation into the final hash value, e.g.

```
value = hashFunc(installationIdentifier + "." + applicationName)
Copy link
Contributor

@kshamajain99 kshamajain99 Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be that hashValue can be applicationName+"."+resourceKey. So that we know what the name of original resourceName in case when some kubernetes operator clone the resource.

By that we can also aim to solve another usecase of ignoreExtraneous resources outside target namespace

#6324

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea to bring in resourceKey somewhere. But I think if we put it into the application instance label, we will end up with a different value on each resource. This makes it utterly hard to use the label as a selector, e.g. to select all resources belonging to a given application by running kubectl get somekind -l app.kubernetes.io/instance=<somevalue>

Maybe it would make more sense to add it as an additional annotation on the resource? E.g. argo-cd.argoproj.io/resourceKey instead of trying to condense it.

But at the end, this will be a design decision: Do we want to support above use-case, e.g. using the label as a selector for resources managed by a given application, or not. I clearly see the benefits of using the label as a selector, but I don't have any strong opinions on it. We should just be aware of the consequences.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is very convenient to be compatible with kubectl get -l app.kubernetes.io/instance=<somevalue> . This works most of the time except few edge cases:

  • app name is too long
  • something else add app.kubernetes.io/instance label and confuses argo cd. This is the most annoying issue and we don't have a very good resolution.

So I think it would be great if keep using app.kubernetes.io/instance label by default and allow to switch to new way when necessary.

@jessesuen had a good point about trying to keep app mark human readable. How about we add new app-level field resourceTracking that might be label or annotation. If user select annotation that Argo CD should apply app.kubernetes.io/instance annotation where annotation value is <appName>-<resouceName> ?

This would allow us to upgrade smoothly and takes care about both edge cases. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argo-cd.argoproj.io/resourceKey annotation instead of embedding resource key into app.kubernetes.io/instance works too

```

This way, we could support multiple Argo CD instances managing applications on
the same cluster _without_ the need for those installations to have their
instance label name changed to a unique value in order to prevent a "split
brain" scenario where multiple Argo CD instances claim ownership to given
resources, which could potentially result in severe data loss (unintentinoal
resource pruning).

### Use cases

Add a list of detailed use cases this enhancement intends to take care of.

#### Use case 1: Allow for more than 63 characters in application name

As a user, I would like to be able to give my applications names with arbitrary
length, because I want to include identifiers like target regions and possibly
availability zones, the environment and possibly other identifiers (e.g. a team
name) in the application names. The current restriction of 63 characters is not
sufficient for my naming requirements.

#### Use case 2: Allow for retrieving all resources using Kubernetes

As an administrator, I want to enable my users to use more than 63 characters
in their application names, but I still want to be able to retrieve all of the
resources managed by that particular application using Kubernetes mechanisms,
e.g. a label selector as in the following example:

```
kubectl get deployments -l app.kubernetes.io/instance=<application> --all-namespaces
```

#### Use case 3: Multiple Argo CD instances managing apps on same cluster

I also want to be able to see which application and Argo CD instance is the
one in charge of a given resource.

### Implementation Details/Notes/Constraints [optional]

#### Use a hash value for the application name

* The application controller needs to be adapted to use the computed hash value
for determining whether the inspected resource is being managed by the
current application.

* The used hash algorithm is _not required_ to be cryptographically secure, but
should be _reasonable resilient_ against collisions. It should also be
_reasonable fast_ in computing the hash.

* The _application instance label_ on a managed resource should be set to the
hexa-decimal representation of the computed hash value for the application
name. What becomes the source for the hash's computation has yet to be
decided upon (see above).

* Additionally, a new annotation should be introduced to managed resources.
This annotation will hold the _human readable_ representation of the name
of the application the resource belongs to. This annotation only serves an
informational purpose for humans, or other tools that need to determine
the name of the application that manages a given resource. This may seem
redundant at first, but we believe is a requirement for many users.

For example, a managed resource could look like the following after this
change has been implemented using `SHA-1` algorithm when the application's
name is `some-application` and noted `installationIdentifier` isn't taken
into account yet):

```yaml
apiVersion: v1
Kind: Secret
metadata:
name: some-secret
namespace: some-namespace
labels:
app.kubernetes.io/instance: 3fbe5782d494cc956140bc58386c971bf0e96fad
annotations:
argo-cd.argoproj.io/application-name: some-application
```

The hash-value in the above `app.kubernetes.io/instance` label is simply the
hexa-decimal representation of the SHA-1 value for `some-application`:

```shell
$ echo "some-application" | sha1sum
3fbe5782d494cc956140bc58386c971bf0e96fad -
```

For humans or consumers that need the application name, the application name
is stored in clear text in an annotation. Suggestion for this annotation name
would be `argo-cd.argoproj.io/application-name`.

#### Allow multiple Argo CD instances manage applications on same cluster

As of today, to allow two or more Argo CD instances with a similar set of
permissions (e.g. cluster-wide read access to resources) manage applications
on the same cluster, users would have to configure the _application instance
label key_ in the Argo CD configuration to a unique value. Otherwise, if an
application with the same name exists in two different Argo CD installations,
both would claim ownership of the resources of that application.

We do see the need for preventing such scenarios out-of-the-box in Argo CD.
For this, we do suggest the introduction of an _installation ID_ in the
form of a standard _GUID_.

This GUID would be generated once by Argo CD upon startup, and is persisted in
the Argo CD configuration, e.g. by storing it as `installationID` in the
`argocd-cm` ConfigMap. The GUID of the installation would need to be encoded
in some way in the resources managed by that Argo CD instance, and we do see
two main possibilities to do so:

* Encode the GUID in the application name's hash value, e.g. (as noted above)
by concatenating the GUID with the application name before hashing it:

```yaml
value = hashFunction(GUID + "." + applicationName)
```

So, given a GUID of `61199294-412c-4e78-a237-3ebba6784fcd`, the previous
example for `some-application` would become:

```yaml
apiVersion: v1
Kind: Secret
metadata:
name: some-secret
namespace: some-namespace
labels:
app.kubernetes.io/instance: 1e2927872cdb73ec3c4010b8afeba2c4be1a92cb
annotations:
argo-cd.argoproj.io/application-name: some-application
```

To generate the hash manually (e.g. for selection of resources by label),
the user would have to take the GUID into account:

```shell
$ echo "61199294-412c-4e78-a237-3ebba6784fcd.some-application" | sha1sum
1e2927872cdb73ec3c4010b8afeba2c4be1a92cb -
```

* Use a dedicated label to store the GUID instead of encoding it in the
application name, and modify Argo CD so that it matches _both_, the app
instance key and the GUID to determine whether a resource is managed by
this Argo CD instance. Given above mentioned GUID, this may look like the
following on a resource:

```yaml
apiVersion: v1
Kind: Secret
metadata:
name: some-secret
namespace: some-namespace
labels:
app.kubernetes.io/instance: 3fbe5782d494cc956140bc58386c971bf0e96fad
argo-cd.argoproj.io/installation-id: 61199294-412c-4e78-a237-3ebba6784fcd
annotations:
argo-cd.argoproj.io/application-name: some-application
```

Both methods have their advantages and disadvantages. We would prefer to use
the first approach and encode the GUID into the application name's hash, as
this would not introduce a new label. However, when manually selecting the
resources of any given application, the user would have to take this GUID
into account when constructing the hash to use for selecting resources.

### Detailed examples

### Security Considerations

We think this change will not have a direct impact on the security of Argo CD
or the applications it manages.

One concern however is that when a user intentionally produces an application
name whose hash will collide with another, existing application's hash. That
may lead to the undesired deletion (pruning) of resources that do in fact
belong to another application outside the adversaries control. However, this
risk could be properly mitigated by permissions in the `AppProject`, since
Argo CD will never prune resources that are not allowed for any specific
Application.

### Risks and Mitigations

The main risks from the authors' points of view are collisions in the hashes
of application names. For example, if the strings `some-app` and `other-app`
produce the same hash, this could result in undesired behavior, especially
when unintentional. The mitigation for this is to chose a hashing algorithm
with a high resilience against collisions.

Another risk is that third party tools that need to map a managed resource
in the cluster back to the managing application rely on the clear text value
of the application instance label. These tools would have to be adapted to
read the proposed annotation instead of the application instance label.

We think that this may be a breaking change, however, the advantages outweigh
the risks.

### Upgrade / Downgrade Strategy

Upgrading to a version that implements this proposal should be seamless, as
previously injected labels will not be removed and additional annotations will
be applied to the resource. E.g. consider following resource in Git, that will
be synced as part of an application named `some-application`. In Git, the
resource looks like follows:

```yaml
apiVersion: v1
Kind: Secret
metadata:
name: some-secret
namespace: some-namespace
```

When synced with the current incarnation of Argo CD, Argo CD would inject the
application instance label and once the resource is applied in the cluster, it
would look like follows:

```yaml
apiVersion: v1
Kind: Secret
metadata:
name: some-secret
namespace: some-namespace
labels:
app.kubernetes.io/instance: some-application
```

Once Argo CD is updated to a version implementing this proposal, the resource
would be rewritten to look like the following:

```yaml
apiVersion: v1
Kind: Secret
metadata:
name: some-secret
namespace: some-namespace
labels:
app.kubernetes.io/instance: 3fbe5782d494cc956140bc58386c971bf0e96fad
annotations:
argo-cd.argoproj.io/application-name: some-application
```

This would be very similar to _renaming_ an application, as in the managed
resources metadata would be modified in the cluster.

On a rollback to a previous Argo CD version, this change would be reverted
and the resource would look like the first shown example above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would Argo CD know whether a resource's app.kubernetes.io/instance was applied using the old system or the new system (the one described in this proposal)?

For example:

  1. A cluster is running with the old version of Argo CD
  2. All the application resources are labelled using the old system
  3. Argo CD is upgraded
  4. The new Argo CD looks at all the existing app.kubernetes.io/instance fields of the existing resources, discovers that they don't match any expected hash (and thus don't belong to any Application), and ignores them (recreating new Application resources).

I think we need the ability to look at the app.kubernetes.io/instance field of a resource and know which system it was created with.

One option we could use is look at the app.kubernetes.io/instance value and determine if it is a 32 character hexadecimal value, but this is not foolproof (users may be using Applications with names that are 32-byte hexadecimal values).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point you raise there.

I haven't yet verified, but I think it would be similar to the behavior if you rename an application, e.g. by deleting an Application and re-creating it with a different name, but same parameters (source + destination).

Argo CD would just go and overwrite what's in app.kubernetes.io/instance - at least as long as another application with the previous name does not exist, in which case Argo CD would issue a Shared Resource Warning and prevent the overwrite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if an Argo CD instance encounters a resource with a app.kubernetes.io/instance value that it doesn't recognize (eg it does not correspond to a hash of a known application name+install id value), then Argo CD must assume that that resource is owned by another Argo CD instance (the goal of 'allow multiple Argo CD instances to manage resources on the same cluster').

So, presuming that Argo CD encounters a resource it thinks is owned by someone else, and it has the same name/namespace as a resouce managed by an Application managed by the current Argo CD instance, do we:

  1. Take ownership of it with the current Argo CD instance
  2. Leave it alone

Probably 2), and report an error ('error: Argo CD Application would overwrite a resource that may be owned by another Argo CD instance').

It might be worthwhile to document the algorithm that Argo CD will use to move existing resources from the old style to the new style, in the upgrade scenario.

I'm guessing it would be:

  1. If we encounter a resource that contains a app.kubernetes.io/instance that is not a 32/64 character hexadecimal value, AND it matches the name field of an existing Argo CD Application, then convert it to the new system (ensure the app.kubernetes.io/instance field is updated)
    • As you said, this might already be handled by the rename-style behaviour.
    • Another option would be to delete the resource, and hope that Argo CD recreates it.
  2. If we encounter a resource that contains a app.kubernetes.io/instance that is a 32/64 character hexadecimal value, leave it alone. (Assume it's a valid instance id from another Argo CD instance).
    • Unless it has the same Kubernetes resource name as a resource that this Argo CD instance is trying to create, in which case report the above error ('error: would overwrite a resource that may be owned...')
  3. Else, either delete or leave it alone (orphan it).
    (in very rare cases this might orphan or delete old resources, but it's probably fine.)

## Drawbacks

We do see some drawbacks to this implementation:

* The already mentioned possible backwards incompatibility with existing tools
that rely on reading the `app.kubernetes.io/instance` label to map back a
resource to its managing application.

* This change would trigger a re-sync of each and every managed resource, which
may result in unexpected heavy load on Argo CD and the cluster at upgrade
time.

* People manually selecting resources from an application must create the
value for the label selector instead of just using the application name, e.g.
instead of

```shell
kubectl get secrets -l app.kubernetes.io/instance=some-application --all-namespaces
```

Something like the following needs to be constructed:

```shell
kubectl get secrets -l app.kubernetes.io/instance=$(echo "some-application" | sha1sum | awk '{ print $1; }') --all-namespaces
```

* If we chose to also implement the GUID as application identifier, the GUID
token becomes a viable part of the _state_ and needs to be backed up as
part of any recovery procedures. Without this GUID, resources not anymore
existing in Git will not get pruned upon removal, thus becoming stale in
the cluster. A resource synced with a given GUID will only be ever removed
again if the GUID of the pruning instance is the same as at sync time.

## Alternatives

* Enabling application names longer than 63 characters could also be done
by having the _application instance label_ to become an annotation instead.
This is a much simpler solution, however, this would disable the possibility
to use label selectors for retrieving all resources managed by a given
application.