Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement conflict checking for SessionAffinity #1632

Merged

Conversation

tpantelis
Copy link
Contributor

The MCS spec's conflict resolution policy states that a "conflict will be resolved by assigning precedence based on each ServiceExport's creationTimestamp, from oldest to newest". We don't have a central MCS controller with access to all cluster's ServiceExports but we can store each cluster's ServiceExport creationTimestamp as annotations in the aggregated ServiceImport and use them to resolve conflicts. The SessionAffinity and SessionAffinityConfig fields on the aggregated ServiceImport will be set by the cluster with the oldest timestamp. The other clusters will set a ServiceExportConflict condition if their corresponding local service's fields do not match those on the aggregated ServiceImport. If a local service is updated in any cluster, each cluster re-evaluates the updated aggregated ServiceImport and either clears or sets the conflict condition. Also if the service from the precedent cluster is un-exported, the next precedent cluster will set the fields.

@submariner-bot
Copy link
Contributor

🤖 Created branch: z_pr1632/tpantelis/session_affinity_conflict
🚀 Full E2E won't run until the "ready-to-test" label is applied. I will add it automatically once the PR has 2 approvals, or you can add it manually.

Copy link
Contributor

@vthapar vthapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and reviewed code locally. Other than thoughts on how to improve logic for oldest export, looks good.

@@ -442,3 +564,48 @@ func (c *ServiceImportController) localServiceImportLister(transform func(si *mc

return retList
}

func findClusterWithOldestTimestamp(from map[string]string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am wondering if we can improve this logic. In a scale setup there might be too many clusters exporting a service and may cause issues loopin through all of them.

How about we also add an annotation like timestamp.submariner.io/oldest and use that to check if we're older or not. When the oldest one is exported, only then we will need to cycle through all to find the new oldest.

Also, this may have some issues when we're usin VIP. In case two different clusters export service at same time, VIP will end up being allocated by whosoever was first to successfully create Aggregated SI. It may happen that its timestamp is newer than the one that failed to create, coz it depends on timing of when the create attempt is made on broker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am wondering if we can improve this logic. In a scale setup there might be too many clusters exporting a service and may cause issues loopin through all of them.

I think the time would be negligible with a single loop even with thousands. If it was a nested loop, then we could have an issue. With a single loop, it would take millions of iterations to start to become significant.

How about we also add an annotation like timestamp.submariner.io/oldest and use that to check if we're older or not. When the oldest one is exported, only then we will need to cycle through all to find the new oldest.

yeah we could do that, ie also if the oldest cluster is un-exported then remove the annotation to trigger other clusters to re-evaluate. But it adds more complexity that I'm not sure we really need. I can try it out....

Also, this may have some issues when we're usin VIP. In case two different clusters export service at same time, VIP will end up being allocated by whosoever was first to successfully create Aggregated SI. It may happen that its timestamp is newer than the one that failed to create, coz it depends on timing of when the create attempt is made on broker.

I wasn't planning on using the oldest timestamp for the VIP b/c we won't ever re-allocate it once set. The first cluster to successfully create the aggregated SI sets the VIP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some testing of findClusterWithOldestTimestamp with varying number of entries in the annotations map:

  • 1,000 entries took 38.793 µs
  • 10,000 took 244.597 µs
  • 1,000,000 took 46.399 ms
  • 10,000,000 took 711.979 ms

I also tried calling findClusterWithOldestTimestamp in a nested loop. So calling it 1,000 times with 1,000 map entries took 21.924 ms (simulates 1000 services on 1000 clusters). With 10,000 and 10,000 it took 2.16 s, which is still pretty negligible.

So I think we should be fine the way it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth adding those benchmarks as a test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would we actually verify as a test?

This is really just normal looping of an array or map (with some fairly simple string manipulation/parsing) that I'm sure we do similarly in other places throughout the code base. These operations are fast with modern computers at the scales we'd be dealing with (in the thousands) so I don't think we need to be concerned. Only when you start getting into the millions is when you might need to be concerned.

The MCS spec's conflict resolution policy states that a "conflict will
be resolved by assigning precedence based on each ServiceExport's
creationTimestamp, from oldest to newest". We don't have a central
MCS controller with access to all cluster's ServiceExports but we
can store each cluster's ServiceExport creationTimestamp as
annotations in the aggregated ServiceImport and use them to
resolve conflicts. The SessionAffinity and SessionAffinityConfig
fields on the aggregated ServiceImport will be set by the cluster
with the oldest timestamp. The other clusters will set a
ServiceExportConflict condition if their corresponding local
service's fields do not match those on the aggregated ServiceImport.
If a local service is updated in any cluster, each cluster
re-evaluates the updated aggregated ServiceImport and either clears
or sets the conflict conditon. Also if the service from the
precedent cluster is unexported, the next precedent cluster will
set the fields.

Signed-off-by: Tom Pantelis <tompantelis@gmail.com>
@tpantelis
Copy link
Contributor Author

@aswinsuryan

@submariner-bot submariner-bot added the ready-to-test When a PR is ready for full E2E testing label Sep 5, 2024
@aswinsuryan aswinsuryan enabled auto-merge (rebase) September 5, 2024 14:07
@aswinsuryan aswinsuryan merged commit 687c11c into submariner-io:devel Sep 5, 2024
26 checks passed
@submariner-bot
Copy link
Contributor

🤖 Closed branches: [z_pr1632/tpantelis/session_affinity_conflict]

@dfarrell07
Copy link
Member

The release notes are here: submariner-io/submariner-website#1166

Related: #1621

@tpantelis tpantelis deleted the session_affinity_conflict branch September 19, 2024 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-test When a PR is ready for full E2E testing
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants