-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a cluster UUID persisted to a ConfigMap #1805
Introduce a cluster UUID persisted to a ConfigMap #1805
Conversation
I'm planning to add more tests (including an e2e test), but I'd like to get some early feedback first |
60748b4
to
dcf1562
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions:
- We must use UUID, or can also use a unique string to identify a cluster?
- I assume we can support customer specified UUID as well?
- Have you considered using the antrea ConfigMap?
If we ever want to use the UUID for telemetry / usage reporting, and if we support user-provided values (arbitrary strings), we should probably hash the UUID in case the user includes some identifiable information in the ID string. |
I assume when used by distros, we might populate some distro ID to identify a cluster.
The only thing then it introduces redundancy and potential inconsistency when an ID is configured via the antrea ConfigMap. Might not be a big deal though.
Sounds good to me. |
Setting in configMap directly could also mean updating the map with new cluster ID, i guess we won't support such a scenario right? Where the clusters'ID can be updated after deployment |
I have two questions on flow exporter and flow aggregator side:
|
|
Thanks for elaboration. Much clearer to me. I prefer choice 1) as it is now.
Got it. Thanks! |
dcf1562
to
c371425
Compare
I updated my approach based on review comments and pushed a new commit: c371425 |
Codecov Report
@@ Coverage Diff @@
## main #1805 +/- ##
==========================================
- Coverage 62.46% 61.52% -0.94%
==========================================
Files 201 203 +2
Lines 17377 17532 +155
==========================================
- Hits 10854 10787 -67
- Misses 5365 5580 +215
- Partials 1158 1165 +7
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
||
# UUID used to identify the cluster. It should be globally unique. The UUID will be written to the | ||
# antrea-cluster-id ConfigMap. If no UUID is provided in this configuration, the Antrea Controller | ||
# will auto-generate a random one (version 4 UUID, as per RFC 4122). The auto-generated name will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# will auto-generate a random one (version 4 UUID, as per RFC 4122). The auto-generated name will be | |
# will auto-generate a random one (version 4 UUID, as per RFC 4122). The auto-generated UUID will be |
build/yamls/flow-aggregator.yml
Outdated
metadata: | ||
labels: | ||
app: flow-aggregator | ||
name: flow-aggregator-uuid-reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this role be a generic cluster-id reader role rather than pinning it to flow aggregator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm waiting for #1649 before I write the final version of this file, since there will be conflicts.
I doubt my PR will make it into v0.13.0 to be honest :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good idea, even if only the flow aggregator needs that role for now. Pushing a commit with this change now.
build/yamls/flow-aggregator.yml
Outdated
# Provide the 32-bit Observation Domain ID which will uniquely identify this instance of the flow | ||
# aggregator to an external flow collector. If omitted, an Observation Domain ID will be generated | ||
# from the persistent cluster UUID generated by Antrea. Failing that (e.g. because the cluster UUID | ||
# is not available), a random value will be randomly generated, which may vary across restarts of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random value randomly generated seems repetitive
cmd/antrea-controller/config.go
Outdated
// UUID used to identify the cluster. It should be globally unique. The UUID will be written | ||
// to the antrea-cluster-id ConfigMap. If no UUID is provided in this configuration, the | ||
// Antrea Controller will auto-generate a random one (version 4 UUID, as per RFC 4122). The | ||
// auto-generated name will be persistent across Antrea Controller restarts and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
eced2da
to
1f8ad4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall
pkg/clusterid/clusterid.go
Outdated
if userProvidedUUID != "" { | ||
parsedUUID, err := uuid.Parse(userProvidedUUID) | ||
if err != nil { | ||
return nil, fmt.Errorf("Invalid UUID '%s': %v", userProvidedUUID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("Invalid UUID '%s': %v", userProvidedUUID, err) | |
return nil, fmt.Errorf("invalid UUID '%s': %v", userProvidedUUID, err) |
pkg/clusterid/clusterid.go
Outdated
if err != nil { | ||
return nil, fmt.Errorf("Invalid UUID '%s': %v", userProvidedUUID, err) | ||
} | ||
userProvidedUUIDParsed = &parsedUUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a.userProvidedUUID
can be uuid.UUID and use uuid.Nil
to check nil like how the library itself returns the zero value? It looks like the code would be a little simpler than using a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point. Originally I wanted to support a user-provided UUID of all zeros, but thinking about it more it doesn't seem necessary. The UUID should ideally be globally unique so setting it to all zeros defeat the purpose anyway... I'll make the change in the upcoming days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions:
I do not see clusterName is used anywhere. What is its usage?
For clusterID, is that any use case requires a UUID (but not a string ID)?
It's not used right now. I added it so we have a user-friendly id for applications that may require it, like cross-cluster features. This PR doesn't really try to introduce use cases for clusterName / clusterUUID; I did modify the FlowAggregator code to use the cluster UUID to generate an observation domain ID, because that's one use case I had in mind and it's a good proof-of-concept.
The idea is to have a globally unique cluster id for scenarios where cluster names may be selected by different entities. At this time, the one use case I have in mind is for downstream projects built for Antrea which may want to enable telemetry / usage reporting. Of course it's always possible for a user to override the UUID with a fixed value, but that's not the intent. Typically I would expect users to set the clusterName and let the clusterUUID be automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review.
Seems like the cluster will have both a unique name and UUID. If the user wants to have a unique cluster name to persist across different Kubernetes instances, then he needs to provide the custom cluster name. Is my understanding of this config usage correct?
If so, Flow Aggregator may want to use the provided name instead of UUID, if provided to generate the observation UUID. In that scenario, can we remove the user config for observationDomainID and make use of the cluster name? What do you think?
@@ -36,3 +36,18 @@ featureGates: | |||
|
|||
# TLS min version from: VersionTLS10, VersionTLS11, VersionTLS12, VersionTLS13. | |||
#tlsMinVersion: | |||
|
|||
# A name used to identify the cluster. The name will be written to the antrea-cluster-id ConfigMap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
"A name that can be used to uniquely identify the cluster."
But does the unique ID must be UUID or it can be any string? Or you like follow K8s resources which have both a name and a UID, but in K8s both names and IDs must be unique? |
@jianjuns My concern is with the definition of uniqueness. My intent was to have a high probability of global uniqueness (unique among all clusters from all Antrea users). If we start accepting an arbitrary string id, I'm afraid that we may have increased collisions as users may start providing ids which are unique to them. Of course the risk also exists when a user provides a UUID, But then maybe my goal to have a "globally" unique ID is not practical... I'd like to understand more about the systems you mention. So far this PR seeks to provide:
For the systems you mention, I imagine using the UUID generated by Antrea is not an option and they want to use Antrea to store / consume their own ID within the scope of a cluster. And is it the case that the "cluster name" cannot be used to store the id in that case, i.e. there is a need for both cluster name and cluster id (as an arbitrary string)? Of course, I can always add a 3rd entry in the cluster identity ConfigMap (ConfigMap used to persist the name / id values) to have:
|
Antrea Controller persists a randomly-generated cluster UUID tp the antrea-cluster-uuid ConfigMap during initialization. The cluster UUID stays the same across all Antrea Controller restarts and Antrea upgrades. The UUID can be consummed by other Antrea components when needed. One current example is the Flow Aggregator, which can use the cluster UUID to generate a deterministic IPFIX Observation Domain ID to uniquely identify the exporting process to the external flow collector. An error during the generation of the cluster UUID will not cause Controller initialization to fail. Instead the Controller will keep retrying until it can successfully 1) establish that a cluster UUID already exists and is correct, or 2) generate and persist a new cluster UUID. If a component strictly depends on the cluster UUID, it should fail its own initialization process until it becomes available.
Cluster has a name in addition to the UUID. Both name and UUID can be provided by the user in the antrea-config ConfigMap. When provided, the values will be used in the antrea-cluster-id ConfigMap. When not provided, values will be generated by Antrea. I expect users to provide cluster names, but user-provided cluster UUIDs, should be rarer. They can be used in different types of applications. Ideally both should not change during the lifetime of the cluster. If user tries to update the name value, the Controller will log a warning and perform the update. if user tries to update the UUID value, the Controller will refuse and log an error. The error will not go away until the user 1) deletes the current UUID value in antrea-cluster-id or 2) reverts the change in the antrea-config ConfigMap.
The ConfigMap is antrea=cluster-identity. At the moment it only stores a UUID value. The UUID is meant to be generated by the Antrea Controller, and there is no longer a way for a user to provide it through config. In theory, a user can populate the ConfigMap directly before starting the Controller, and Antrea will use that value (it will not be overridden). There is no longer a cluster name. Also added an end-to-end test.
1f8ad4b
to
fca1da9
Compare
Updated the PR based on recent offline discussions with @jianjuns and @edwardbadboy. The PR is much closer now to what it was originally:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes. LGTM.
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Antrea Controller persists a randomly-generated cluster UUID to the antrea-cluster-identity ConfigMap during initialization. The cluster UUID stays the same across all Antrea Controller restarts and Antrea upgrades. The UUID can be consumed by other Antrea components when needed. One current example is the Flow Aggregator, which can use the cluster UUID to generate a deterministic IPFIX Observation Domain ID to uniquely identify the exporting process to the external flow collector. An error during the generation of the cluster UUID will not cause Controller initialization to fail. Instead the Controller will keep retrying until it can successfully 1) establish that a cluster UUID already exists and is correct, or 2) generate and persist a new cluster UUID. If a component strictly depends on the cluster UUID, it should fail its own initialization process until it becomes available.
Antrea Controller persists a randomly-generated cluster UUID to the
antrea-cluster-identity ConfigMap during initialization. The cluster UUID
stays the same across all Antrea Controller restarts and Antrea
upgrades. The UUID can be consumed by other Antrea components when
needed. One current example is the Flow Aggregator, which can use the
cluster UUID to generate a deterministic IPFIX Observation Domain ID to
uniquely identify the exporting process to the external flow collector.
An error during the generation of the cluster UUID will not cause
Controller initialization to fail. Instead the Controller will keep
retrying until it can successfully 1) establish that a cluster UUID
already exists and is correct, or 2) generate and persist a new cluster
UUID. If a component strictly depends on the cluster UUID, it should
fail its own initialization process until it becomes available.