-
Notifications
You must be signed in to change notification settings - Fork 39
fix: Avoid error in unpause operation of Cluster resource #1469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.github/workflows/e2e-testing.yaml
Outdated
| # run idividual test allowing for parallel execution. Due to usage of t.Setenv() in test code, t.Parallel() is not possible | ||
| run: | | ||
| cd cfn-resources/test/e2e/cluster | ||
| go test -timeout 90m -v -run '^TestClusterPauseCFN$' . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not ideal this was a solution that did not require a significant refactor to how our CFN e2e are currently executed and allow the new test to run in parallel. Currently e2e tests rely on executing shell scripts and setting env variables during execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug where unpausing a cluster through CloudFormation would fail due to how the Atlas API handles the unpause operation. The fix introduces a separate PATCH request for unpausing that only updates the paused property before applying any other changes.
- Adds special handling for unpause operations before the main update request
- Introduces comprehensive e2e tests for cluster pause/unpause functionality
- Updates CI workflow to run the new pause test independently
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| cfn-resources/cluster/cmd/resource/resource.go | Implements unpause-specific handling logic to avoid API errors |
| cfn-resources/test/e2e/cluster/cluster_pause_test.go | Adds e2e test validating create, pause, and unpause operations |
| cfn-resources/test/e2e/cluster/cluster_pause.json.template | Provides CloudFormation template for pause/unpause testing |
| .github/workflows/e2e-testing.yaml | Updates workflow to run cluster pause tests separately |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| go test -timeout 90m -v cluster_test.go | ||
| go test -timeout 90m -v -run '^TestClusterCFN$' . | ||
| cluster-pause: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why we are introducing a new workflow as opposed to using the cluster one for new tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial motivation is to add the new test but have them run in parallel (sequentially they take 40min + 20min). Due to how are e2e tests are implemented t.Parallel() cannot be used, Go returns an explicit error panic: testing: test using t.Setenv or t.Chdir can not use t.Parallel.
This is the reasoning behind adding a separate test group so both tests can run in parallel, tried to capture this why in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I see that you changed to a single workflow running 2 packages now. Smart! Ty!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single workflow running 2 packages now
Was close but did not work 😞 , publishing script modifies specific files in cluster package (for private publishing), need to run the tests in isolated filesystems to avoid race conditions.
| } | ||
|
|
||
| // Unpausing must be handled separately from other updates to avoid errors from the API. | ||
| if pe := handleUnpausingUpdate(client, currentCluster, currentModel); pe != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any need to wait for a stable cluster state? Is it ok we immediately send a different update afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not, was able to verify that once a first PATCH is sent doing the unpause, having the cluster in UPDATING allows sending other PATCH requests modifying the cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let me know if you want to discuss about testing here
| # Run idividual test in separate test group for parallel execution. | ||
| # Due to usage of t.Setenv() in test code t.Parallel() is not possible. | ||
| # Having both tests run in same `go test` execution is not possible due to scripts used for private registry publishing modifying same files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ty for the comment
Proposed changes
Jira ticket: CLOUDP-356110
Problem
Cluster resource does not support unpausing. This use case must be handled by defining a separate request only adjusting
pausedproperty due to how the API behaves.Current implement supports pausing a cluster handling a separate request within update callback logic.
Solution
Before sending an update operation information of the complete model (which would result in an error) a check is made to see if the update operation is doing an unpause. Under this scenario an isolated PATCH request is crafted only sending
pausedproperty with false. Follow up PATCH request can be sent after this without encountering any errors (in case other changes are present).Testing
Type of change:
expected)
Manual QA performed:
Required Checklist:
make fmtand formatted my codeworks in Atlas
Further comments