Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a cluster UUID persisted to a ConfigMap #1805

Merged

Conversation

antoninbas
Copy link
Contributor

@antoninbas antoninbas commented Jan 30, 2021

Antrea Controller persists a randomly-generated cluster UUID to the
antrea-cluster-identity ConfigMap during initialization. The cluster UUID
stays the same across all Antrea Controller restarts and Antrea
upgrades. The UUID can be consumed by other Antrea components when
needed. One current example is the Flow Aggregator, which can use the
cluster UUID to generate a deterministic IPFIX Observation Domain ID to
uniquely identify the exporting process to the external flow collector.

An error during the generation of the cluster UUID will not cause
Controller initialization to fail. Instead the Controller will keep
retrying until it can successfully 1) establish that a cluster UUID
already exists and is correct, or 2) generate and persist a new cluster
UUID. If a component strictly depends on the cluster UUID, it should
fail its own initialization process until it becomes available.

@antoninbas
Copy link
Contributor Author

I'm planning to add more tests (including an e2e test), but I'd like to get some early feedback first

@antoninbas antoninbas force-pushed the persist-cluster-uuid-to-config-map branch 2 times, most recently from 60748b4 to dcf1562 Compare January 30, 2021 03:24
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions:

  1. We must use UUID, or can also use a unique string to identify a cluster?
  2. I assume we can support customer specified UUID as well?
  3. Have you considered using the antrea ConfigMap?

@antoninbas
Copy link
Contributor Author

Questions:

  1. We must use UUID, or can also use a unique string to identify a cluster?
  2. I assume we can support customer specified UUID as well?
  3. Have you considered using the antrea ConfigMap?
  1. Right now, I'm using a UUID, but we can change that to arbitrary string (especially if we want to support user-provided ids, as per your second question). Of course, in that case the user is in charge of guaranteeing uniqueness. When I designed this, I mostly had telemetry in mind, in which case user-provided UUID was not the best solution.
  2. This is not something I specifically designed for, but yes the user can provide a UUID by editing the ConfigMap definition in the YAML: the Antrea Controller will use the existing user-provided UUID and will not generate a new one.
  3. IMO, letting Antrea mutate the main "configuration" ConfigMap is not right. If you think about some operator design, the operator should own the ConfigMap, and be the only one making changes. So unless the only case we want to support is "user-provided" UUID, then it's not ideal. And I think we want to support auto-generating an UUID in the Controller to guarantee the existence of the UUID in all cases. What we could do is accept an ID as part of the main antrea ConfigMap. If present the ID will be used directly, as-is, in the antrea-cluster-uuid ConfigMap. If no ID is provided (default), then the Controller generates a UUID and stores it in antrea-cluster-uuid. All consumers should get the ID from antrea-cluster-uuid, but for the user, there is still a single ConfigMap for all configuration (antrea).

If we ever want to use the UUID for telemetry / usage reporting, and if we support user-provided values (arbitrary strings), we should probably hash the UUID in case the user includes some identifiable information in the ID string.

@jianjuns
Copy link
Contributor

jianjuns commented Feb 1, 2021

I assume when used by distros, we might populate some distro ID to identify a cluster.

If present the ID will be used directly, as-is, in the antrea-cluster-uuid ConfigMap. If no ID is provided (default), then the Controller generates a UUID and stores it in antrea-cluster-uuid.

The only thing then it introduces redundancy and potential inconsistency when an ID is configured via the antrea ConfigMap. Might not be a big deal though.

If we ever want to use the UUID for telemetry / usage reporting, and if we support user-provided values (arbitrary strings), we should probably hash the UUID in case the user includes some identifiable information in the ID string.

Sounds good to me.

@abhiraut
Copy link
Contributor

abhiraut commented Feb 2, 2021

I assume when used by distros, we might populate some distro ID to identify a cluster.

If present the ID will be used directly, as-is, in the antrea-cluster-uuid ConfigMap. If no ID is provided (default), then the Controller generates a UUID and stores it in antrea-cluster-uuid.

The only thing then it introduces redundancy and potential inconsistency when an ID is configured via the antrea ConfigMap. Might not be a big deal though.

If we ever want to use the UUID for telemetry / usage reporting, and if we support user-provided values (arbitrary strings), we should probably hash the UUID in case the user includes some identifiable information in the ID string.

Sounds good to me.

Setting in configMap directly could also mean updating the map with new cluster ID, i guess we won't support such a scenario right? Where the clusters'ID can be updated after deployment

@zyiou
Copy link
Contributor

zyiou commented Feb 2, 2021

I have two questions on flow exporter and flow aggregator side:

  1. why we would want users to input the custom observation domain id? We can use the Service name as before if UUID is unavailable.
  2. will controller generate a new UUID each time when Antrea restarts? I'm asking this because if users want to config flow exporter (e.g. poll interval), they will need to restart the Antrea.

@antoninbas
Copy link
Contributor Author

I have two questions on flow exporter and flow aggregator side:

  1. why we would want users to input the custom observation domain id? We can use the Service name as before if UUID is unavailable.
  2. will controller generate a new UUID each time when Antrea restarts? I'm asking this because if users want to config flow exporter (e.g. poll interval), they will need to restart the Antrea.
  1. Using the Service name may yield the same observation domain ID for all aggregators running in all clusters, which may share the same external flow collector. I think that users should have the opportunity to specify their own observation domain ID if they want to be able to map the flow records received by the flow collector to a specific aggregator, i.e. a specific cluster. If they don't, we use the cluster UUID to make sure we have a persistent observation domain ID. If by any chance the cluster UUID is not available, we just generate a random ID (not persistent). That last case is unlikely to occur and we could use the Service name there. It depends on what is more important 1) unique domain IDs across aggregators connected to the same collector, or 2) persistent domain ID. If you think it's 2), then I can use the Service name instead of a random number for the fallback case.
  2. No, the whole point of using a ConfigMap is that the UUID is persistent and stays across all restarts and upgrades

@zyiou
Copy link
Contributor

zyiou commented Feb 3, 2021

I have two questions on flow exporter and flow aggregator side:

  1. why we would want users to input the custom observation domain id? We can use the Service name as before if UUID is unavailable.
  2. will controller generate a new UUID each time when Antrea restarts? I'm asking this because if users want to config flow exporter (e.g. poll interval), they will need to restart the Antrea.
  1. Using the Service name may yield the same observation domain ID for all aggregators running in all clusters, which may share the same external flow collector. I think that users should have the opportunity to specify their own observation domain ID if they want to be able to map the flow records received by the flow collector to a specific aggregator, i.e. a specific cluster. If they don't, we use the cluster UUID to make sure we have a persistent observation domain ID. If by any chance the cluster UUID is not available, we just generate a random ID (not persistent). That last case is unlikely to occur and we could use the Service name there. It depends on what is more important 1) unique domain IDs across aggregators connected to the same collector, or 2) persistent domain ID. If you think it's 2), then I can use the Service name instead of a random number for the fallback case.

Thanks for elaboration. Much clearer to me. I prefer choice 1) as it is now.

  1. No, the whole point of using a ConfigMap is that the UUID is persistent and stays across all restarts and upgrades

Got it. Thanks!

@antoninbas antoninbas force-pushed the persist-cluster-uuid-to-config-map branch from dcf1562 to c371425 Compare February 4, 2021 03:16
@antoninbas
Copy link
Contributor Author

I updated my approach based on review comments and pushed a new commit: c371425

@codecov-io
Copy link

codecov-io commented Feb 4, 2021

Codecov Report

Merging #1805 (28c3e88) into main (38c8c5e) will decrease coverage by 0.93%.
The diff coverage is 63.38%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1805      +/-   ##
==========================================
- Coverage   62.46%   61.52%   -0.94%     
==========================================
  Files         201      203       +2     
  Lines       17377    17532     +155     
==========================================
- Hits        10854    10787      -67     
- Misses       5365     5580     +215     
- Partials     1158     1165       +7     
Flag Coverage Δ
kind-e2e-tests 51.26% <45.07%> (-1.26%) ⬇️
unit-tests 40.90% <59.15%> (-0.30%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/apiserver/certificate/cacert_controller.go 55.83% <ø> (-0.03%) ⬇️
pkg/flowaggregator/flowaggregator.go 12.71% <0.00%> (+0.28%) ⬆️
pkg/util/env/env.go 48.57% <40.00%> (-1.43%) ⬇️
pkg/clusteridentity/clusteridentity.go 68.33% <68.33%> (ø)
pkg/apiserver/certificate/certificate.go 69.86% <100.00%> (-7.77%) ⬇️
pkg/controller/networkpolicy/validate.go 70.93% <100.00%> (+0.44%) ⬆️
pkg/antctl/transform/ovstracing/transform.go 0.00% <0.00%> (-50.00%) ⬇️
pkg/antctl/transform/addressgroup/transform.go 0.00% <0.00%> (-43.48%) ⬇️
pkg/antctl/transform/appliedtogroup/transform.go 0.00% <0.00%> (-43.48%) ⬇️
pkg/antctl/transform/utils.go 0.00% <0.00%> (-35.72%) ⬇️
... and 23 more

@antoninbas antoninbas added this to the Antrea v0.13.0 release milestone Feb 9, 2021

# UUID used to identify the cluster. It should be globally unique. The UUID will be written to the
# antrea-cluster-id ConfigMap. If no UUID is provided in this configuration, the Antrea Controller
# will auto-generate a random one (version 4 UUID, as per RFC 4122). The auto-generated name will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# will auto-generate a random one (version 4 UUID, as per RFC 4122). The auto-generated name will be
# will auto-generate a random one (version 4 UUID, as per RFC 4122). The auto-generated UUID will be

metadata:
labels:
app: flow-aggregator
name: flow-aggregator-uuid-reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this role be a generic cluster-id reader role rather than pinning it to flow aggregator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm waiting for #1649 before I write the final version of this file, since there will be conflicts.

I doubt my PR will make it into v0.13.0 to be honest :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea, even if only the flow aggregator needs that role for now. Pushing a commit with this change now.

# Provide the 32-bit Observation Domain ID which will uniquely identify this instance of the flow
# aggregator to an external flow collector. If omitted, an Observation Domain ID will be generated
# from the persistent cluster UUID generated by Antrea. Failing that (e.g. because the cluster UUID
# is not available), a random value will be randomly generated, which may vary across restarts of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random value randomly generated seems repetitive

// UUID used to identify the cluster. It should be globally unique. The UUID will be written
// to the antrea-cluster-id ConfigMap. If no UUID is provided in this configuration, the
// Antrea Controller will auto-generate a random one (version 4 UUID, as per RFC 4122). The
// auto-generated name will be persistent across Antrea Controller restarts and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@antoninbas antoninbas force-pushed the persist-cluster-uuid-to-config-map branch 2 times, most recently from eced2da to 1f8ad4b Compare February 17, 2021 04:53
abhiraut
abhiraut previously approved these changes Feb 19, 2021
Copy link
Contributor

@abhiraut abhiraut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

if userProvidedUUID != "" {
parsedUUID, err := uuid.Parse(userProvidedUUID)
if err != nil {
return nil, fmt.Errorf("Invalid UUID '%s': %v", userProvidedUUID, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("Invalid UUID '%s': %v", userProvidedUUID, err)
return nil, fmt.Errorf("invalid UUID '%s': %v", userProvidedUUID, err)

if err != nil {
return nil, fmt.Errorf("Invalid UUID '%s': %v", userProvidedUUID, err)
}
userProvidedUUIDParsed = &parsedUUID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a.userProvidedUUID can be uuid.UUID and use uuid.Nil to check nil like how the library itself returns the zero value? It looks like the code would be a little simpler than using a pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point. Originally I wanted to support a user-provided UUID of all zeros, but thinking about it more it doesn't seem necessary. The UUID should ideally be globally unique so setting it to all zeros defeat the purpose anyway... I'll make the change in the upcoming days.

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions:
I do not see clusterName is used anywhere. What is its usage?
For clusterID, is that any use case requires a UUID (but not a string ID)?

@antoninbas
Copy link
Contributor Author

Questions:
I do not see clusterName is used anywhere. What is its usage?

It's not used right now. I added it so we have a user-friendly id for applications that may require it, like cross-cluster features. This PR doesn't really try to introduce use cases for clusterName / clusterUUID; I did modify the FlowAggregator code to use the cluster UUID to generate an observation domain ID, because that's one use case I had in mind and it's a good proof-of-concept.

For clusterID, is that any use case requires a UUID (but not a string ID)?

The idea is to have a globally unique cluster id for scenarios where cluster names may be selected by different entities. At this time, the one use case I have in mind is for downstream projects built for Antrea which may want to enable telemetry / usage reporting. Of course it's always possible for a user to override the UUID with a fixed value, but that's not the intent.

Typically I would expect users to set the clusterName and let the clusterUUID be automatically generated.

Copy link
Member

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review.

Seems like the cluster will have both a unique name and UUID. If the user wants to have a unique cluster name to persist across different Kubernetes instances, then he needs to provide the custom cluster name. Is my understanding of this config usage correct?

If so, Flow Aggregator may want to use the provided name instead of UUID, if provided to generate the observation UUID. In that scenario, can we remove the user config for observationDomainID and make use of the cluster name? What do you think?

@@ -36,3 +36,18 @@ featureGates:

# TLS min version from: VersionTLS10, VersionTLS11, VersionTLS12, VersionTLS13.
#tlsMinVersion:

# A name used to identify the cluster. The name will be written to the antrea-cluster-id ConfigMap.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:
"A name that can be used to uniquely identify the cluster."

@jianjuns
Copy link
Contributor

For clusterID, is that any use case requires a UUID (but not a string ID)?

The idea is to have a globally unique cluster id for scenarios where cluster names may be selected by different entities. At this time, the one use case I have in mind is for downstream projects built for Antrea which may want to enable telemetry / usage reporting. Of course it's always possible for a user to override the UUID with a fixed value, but that's not the intent.

But does the unique ID must be UUID or it can be any string? Or you like follow K8s resources which have both a name and a UID, but in K8s both names and IDs must be unique?
I am asking because I know some systems we might integrate with do not require UUID but need a unique string ID. But I do not know if any system must require a UUID (but you can still generate UUID from the string ID in that case?).

@antoninbas
Copy link
Contributor Author

For clusterID, is that any use case requires a UUID (but not a string ID)?

The idea is to have a globally unique cluster id for scenarios where cluster names may be selected by different entities. At this time, the one use case I have in mind is for downstream projects built for Antrea which may want to enable telemetry / usage reporting. Of course it's always possible for a user to override the UUID with a fixed value, but that's not the intent.

But does the unique ID must be UUID or it can be any string? Or you like follow K8s resources which have both a name and a UID, but in K8s both names and IDs must be unique?
I am asking because I know some systems we might integrate with do not require UUID but need a unique string ID. But I do not know if any system must require a UUID (but you can still generate UUID from the string ID in that case?).

@jianjuns My concern is with the definition of uniqueness. My intent was to have a high probability of global uniqueness (unique among all clusters from all Antrea users). If we start accepting an arbitrary string id, I'm afraid that we may have increased collisions as users may start providing ids which are unique to them. Of course the risk also exists when a user provides a UUID,
but is lesser IMO based on how UUIDs are typically generated. Generating a UUID from the arbitrary string ID does not solve that problem, at least not if we want to generate it deterministically.

But then maybe my goal to have a "globally" unique ID is not practical...

I'd like to understand more about the systems you mention. So far this PR seeks to provide:

  • a cluster name, which is meant to be unique for a given user; it can be user-provided and human-readable
  • a standard UUID, which is meant to be globally unique and auto-generated by Antrea

For the systems you mention, I imagine using the UUID generated by Antrea is not an option and they want to use Antrea to store / consume their own ID within the scope of a cluster. And is it the case that the "cluster name" cannot be used to store the id in that case, i.e. there is a need for both cluster name and cluster id (as an arbitrary string)?

Of course, I can always add a 3rd entry in the cluster identity ConfigMap (ConfigMap used to persist the name / id values) to have:

  • cluster name
  • cluster id
  • cluster UUID
    And to keep the antrea ConfigMap simple, I can remove the UUID from it and replace it with the arbitrary id. This may be better actually because now it makes it more unlikely for the UUID not to be globally unique (admin would have to overwrite the identify ConfigMap directly). If the user provides a name, but no id, the generated UUID will be used as the id by default. What do you think?

Antrea Controller persists a randomly-generated cluster UUID tp the
antrea-cluster-uuid ConfigMap during initialization. The cluster UUID
stays the same across all Antrea Controller restarts and Antrea
upgrades. The UUID can be consummed by other Antrea components when
needed. One current example is the Flow Aggregator, which can use the
cluster UUID to generate a deterministic IPFIX Observation Domain ID to
uniquely identify the exporting process to the external flow collector.

An error during the generation of the cluster UUID will not cause
Controller initialization to fail. Instead the Controller will keep
retrying until it can successfully 1) establish that a cluster UUID
already exists and is correct, or 2) generate and persist a new cluster
UUID. If a component strictly depends on the cluster UUID, it should
fail its own initialization process until it becomes available.
Cluster has a name in addition to the UUID.
Both name and UUID can be provided by the user in the antrea-config
ConfigMap. When provided, the values will be used in the
antrea-cluster-id ConfigMap. When not provided, values will be generated
by Antrea.
I expect users to provide cluster names, but user-provided cluster
UUIDs, should be rarer. They can be used in different types of
applications.
Ideally both should not change during the lifetime of the cluster. If
user tries to update the name value, the Controller will log a warning
and perform the update. if user tries to update the UUID value, the
Controller will refuse and log an error. The error will not go away
until the user 1) deletes the current UUID value in antrea-cluster-id or
2) reverts the change in the antrea-config ConfigMap.
The ConfigMap is antrea=cluster-identity. At the moment it only stores
a UUID value. The UUID is meant to be generated by the Antrea
Controller, and there is no longer a way for a user to provide it
through config. In theory, a user can populate the ConfigMap directly
before starting the Controller, and Antrea will use that value (it will
not be overridden). There is no longer a cluster name.

Also added an end-to-end test.
@antoninbas
Copy link
Contributor Author

antoninbas commented Mar 2, 2021

Updated the PR based on recent offline discussions with @jianjuns and @edwardbadboy. The PR is much closer now to what it was originally:

  • a cluster UUID is persisted to the antrea-cluster-identity ConfigMap by the Controller
  • users cannot provide their own UUID, except by editing the antrea-cluster-identity ConfigMap before starting the Controller

jianjuns
jianjuns previously approved these changes Mar 3, 2021
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes. LGTM.

pkg/clusteridentity/clusteridentity.go Outdated Show resolved Hide resolved
@antoninbas
Copy link
Contributor Author

/test-all

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas antoninbas merged commit b04928b into antrea-io:main Mar 5, 2021
@antoninbas antoninbas deleted the persist-cluster-uuid-to-config-map branch March 5, 2021 17:26
tnqn pushed a commit to tnqn/antrea that referenced this pull request Mar 17, 2021
Antrea Controller persists a randomly-generated cluster UUID to the
antrea-cluster-identity ConfigMap during initialization. The cluster UUID
stays the same across all Antrea Controller restarts and Antrea
upgrades. The UUID can be consumed by other Antrea components when
needed. One current example is the Flow Aggregator, which can use the
cluster UUID to generate a deterministic IPFIX Observation Domain ID to
uniquely identify the exporting process to the external flow collector.

An error during the generation of the cluster UUID will not cause
Controller initialization to fail. Instead the Controller will keep
retrying until it can successfully 1) establish that a cluster UUID
already exists and is correct, or 2) generate and persist a new cluster
UUID. If a component strictly depends on the cluster UUID, it should
fail its own initialization process until it becomes available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants