Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip named nodes in the CNR that don't exist #83

Merged
merged 8 commits into from
Aug 19, 2024

Conversation

vincentportella
Copy link
Member

@vincentportella vincentportella commented Jul 19, 2024

  • Updated to skip named nodes in the CNR if they don't exist through a validationOptions.skipNamedNodes option. As a result if none of the named nodes exist the CNR will succeed immediately when it transitions to the Initialized phase.
  • ValidationOptions option added to the CNR and Nodegroup objects.
  • Very minor refactor of the Pending phase and supporting functions.

Copy link
Collaborator

@mwhittington21 mwhittington21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've significantly altered the behaviour of the software, but haven't added tests for the new behaviour. I would expect to see some tests added that have nodes added to the initial request that are then terminated, and the behaviour validated on that. I'm a little uneasy releasing this functionality without this test.

Additionally, there may be other people running Cyclops who are relying on the existing behaviour. I can see two options here:

  1. Cut a new major version of Cyclops (i.e. backwards incompatible changes)
  2. Add a configurable setting for this behaviour and default it to the old behaviour

foundNode = true
// Skips any named node that does not exist in the node group for this CycleNodeRequest.
func (t *CycleNodeRequestTransitioner) addNamedNodesToTerminate(kubeNodes map[string]corev1.Node, nodeGroupInstances map[string]cloudprovider.Instance) {
kubeNodesMap := make(map[string]corev1.Node)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name this after the function of the variable. Something like nodeLookupByName. I kept trying to find it being used for something more than a lookup.

Copy link
Member Author

@vincentportella vincentportella Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, there may be other people running Cyclops who are relying on the existing behaviour.

Good point, I initially intended for this to be a breaking change but now that I think about it making it configurable may be the better approach to cater to different use-cases.

👍 I'll add some tests to validate the intended behaviour.

kubeNodes, err := t.listReadyNodes(true)
if err != nil {
return t.transitionToHealing(err)
}
if len(kubeNodes) == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're losing the ability to detect user error here. Is there a way we can keep user error detection, while also not failing at nonexistent nodes?

One strategy could be: if our selector did match some nodes, but none were still in the cluster, we can call it a success. If it matches no nodes, we don't call it a success. The only problem with this approach is nodegroups scaled to zero. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see how that's possible unless we have a way of knowing if a node name in the CNR matches a node that existed prior to the CNR being created.

A flag here would help to toggle strict node validation and enable both use cases but I'm wary of adding more settings to a CNR.

MinyiZ
MinyiZ previously approved these changes Aug 14, 2024
@@ -101,6 +101,13 @@ spec:
- Drain
- Wait
type: string
strictValidation:
Copy link
Collaborator

@awprice awprice Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So by default if not set this will be false?

Added a new StrictValidation option to the CNR to keep backwards compatible functionality.

I guess this does keep backwards compatible functionality, but is technically a breaking change or a behaviour change - we'll need to document this in the release notes.

We'll also need to ensure cluster owners update the CRDs to include this new field.

Copy link
Member Author

@vincentportella vincentportella Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is the intention, I would like this new functionality to be the default rather than the exception because it reduces the number of times a cycle will fail by default.

However, happy to change it if we deem it's better to set the flag to true by default.

// StrictValidation is a boolean which determines whether named nodes selected in a CNR must
// exist and be valid nodes before cycling can begin. If set to true when invalid nodes are
// selected the CNR will be transitioned to the "Failed" phase before cycling can begin again.
StrictValidation bool `json:"strictValidation,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever see StrictValidation being used for other things?

At the moment the description of this field suggests its only for this one thing - when invalid nodes are found, transition to failed. But the naming of the field suggests its to generically enable/disable strict validation.

Naming is hard, but I'd probably suggest renaming the flag to something that describes specifically what it is enabling/disabling

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also make sense to put this field into CycleSettings?

Doing this means it's available in a CycleNodeStatus object - do we need it here?

Should we move it elsewhere within CycleNodeRequest to make it not propagate down?

Could we have a "ValidationOptions" field within CycleNodeRequest for holding other validation/configuration options a user might want to configure?

Copy link
Member Author

@vincentportella vincentportella Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see your point. I initially intended for more validations to be affected by PR but then narrowed the scope to just checking for named nodes. Having said that, having ValidationOptions in the CNR would be better because we can then have more granular settings. 🤔

With this PR, we could make it something like, ValidationOptions.SkipMissingNamedNodes. The existing functionality would remain the same and adding the flag would lead to using this new functionality.

@@ -44,6 +44,8 @@ spec:
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
clusterName:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this field come from? Why is it only in the CycleNodeRequest resource?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was part of the operator-sdk render, make generate-crds

Copy link
Collaborator

@awprice awprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - nice work! Love the validationOptions field

Copy link
Collaborator

@mwhittington21 mwhittington21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vincentportella vincentportella merged commit 9e73b88 into master Aug 19, 2024
3 checks passed
@vincentportella vincentportella deleted the vportella/skip-named-nodes-that-dont-exist branch August 19, 2024 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Skip named nodes in a CNR if they do not exist
4 participants