Skip to content

Conversation

s3rj1k
Copy link
Member

@s3rj1k s3rj1k commented Sep 27, 2025

implements: #118

} else {
name = fmt.Sprintf("p--%s-%s-%s", profileName, prefix, clusterName)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thank you.

We need to handle upgrade scenario. Since we are changing the name, if ClusterSummary instance already exists before upgrade, we need to handle that. I am thinking we need to first check if clusterSummary with old name already exists, if so return that name. Otherwise return the new name.

this method getManagementClusterClient returns the client to the management cluster.

Also, even when we use the name ellipsized name, we need to handle conflicts. So when we generate the ellipsized name we probably need to:

  1. validate no clusterSummary exists with that name (so it can be used)
  2. if one exists, verify we are talking about the same resource (by making sure the labels refer to the correct profile and cluster)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are changing behavior only in case of actually hitting an edge case for long object name, at least the idea in change was like this, don't change names for cases where inject name is valid, ellipsize name for cases when object name was already invalid and it would be impossible to create it.

There shouldn't be any name conflict, I mean the probability of this happening when we add FNV hash as suffix is quite low (checkout fuzz test).

Problem with fetching a list of existing clusterSummary objects before generating unique name is a possibility to have a race condition and we would need to add a retry logic in every place where we create this objects, this will be more bigger/impactful change.

If we do decide to go with fetch list approach, please let me know if hash suffix is acceptable as object name differentiator in that case, we would still need to create unique object names in case of conflict and adding something like an index number as suffix would probbaly increase a chance of races (at least I think it will)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Sorry i missed that we are doing this only when length is over 63 characters. So then yes you are right we don't have to worry about upgrade.

Regarding the conflict, let me think a little more. Maybe instead of fetching, we can keep in memory map. Key could be the generated name and the value profile/cluster names. That will prevent race conditions. And we can rebuild map on restart and postpone profile reconciliation till that is complete

Even if rare I feel we should cover it. Overriding a clustersummary might cause stale resources or missing applications.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can keep in memory map

I guess the only question is, how that would impact memory footprint in general, probably not noticeable. Generally speaking in-mem map would simplify things a lot, avoiding retry logic in a lot of places.

On a side note, if we can say that our clusters won't go back in time, we can use unixtime as object suffix and completely avoid messing with hashing, on downside that would mean names will always be unique, not predictable per each input. In our usecase I think this can be acceptable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Yes we would need to store only names when exceeding 63 chars. So it won't impact.

But good point. We can safely say clusters won't go back on time. But I am not sure that will work. Those methods to get name are invoked also after clustersummary instances are created. For instance clusterprofile reconciliation invokes it for every matching clusters at every reconciliation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those methods to get name are invoked also after clustersummary instances are created. For instance clusterprofile reconciliation invokes it for every matching clusters at every reconciliation

This makes me think that we need everything to be deterministic, so any kind of counter approach at suffix won't work (counter, unixtime, ...)

Let's assume we implement a in-mem map for tracking objects, and we use some very prone to collisions hash function, we can detect collision via map key (same would apply to a case of direct k8s object lookup) but what we do next is unclear, how do we recover from collision and still keep names deterministic?

So I am thinking we need to invest a bit more time into that fuzz test, basically limiting the upper-bound string length to what we actual use (~ 63*4 + 5), ensure that it generates okeish looking strings, run it against current hash func for 24h and see if we have collisions, if we do, increase hash size and/or replace hash function, run again and repeat until we are not getting collision detection, use that as solution and put into docs that for larger cluster profile and cluster names we use hashing to create predictable object names, hashing has a low level of collusion chances but still not zero so if you ever hit that kind of a problem, please report that and rename your profile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. Can we do this:

  1. if name is less than 63 characters, no changes (you are already doing it)
  2. if name is longer than 63 character, check if we have already allocated one for this profile/cluster. If so use that one (this is done using an in memory map)
  3. If name is longer than 63 character and we never allocated one before, allocate a new one (using the function you have currently). Verify this has not been allocated to anybody else, update the in memory map and return it
  4. If name is longer than 63 character and we never allocated one before, allocate a new one and allocation one with the function you have currently we hit a collision append an index to it and keep verifying it till we hit no collision. update the in memory map and return it

On restart we can rebuild this in memory map before we reconcile any clusterProfile/profile

I feel your PR already took care of most, we just need to handle those corners cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can but keep in mind that in case of collision and appending index we can get to a situation when some other object that created collision gets removed and we will have none-deterministic name for new object (that was created with index), basically reusing same name.
(this is what I was referencing before)

And if we already talking about adding indexes we might as well do in-tree map + unixtime and remove a need of hash and additional hash+index

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct but when a clustersummary is deleted, if name was allocated via this new mechanism, we can remove from map.

Also what we want to solve is to make sure clustersummary name is always valid. Today if ClusterProfile name is really long and the cluster name is also long (exceeding the 253 chars) creation of a clustersummary will fail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants