-
Notifications
You must be signed in to change notification settings - Fork 6
fix idempotency in kubernetes state module #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@lkubb Can you review the approach I used here to achieve idempotency? I only made changes to one resource type to keep the changes as low as I can for you. This is going to end up being another big change I believe, but if you can agree with the approach, it should be easy for me to apply across the rest of the resource types. I am not sure that some of these tests like the integration tests are needed, but I wanted to at least show the changes worked. |
|
I will have to look at these test failures, they worked locally |
|
Afaict it's just salt-extensions/salt-extension-copier#199. Sorry, forgot about this issue, will try to get a new template release out soon. |
lkubb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you merge #21 and rebase this branch, CI should pass. 👍
This looks pretty good! I added some comments that hopefully help guide you for the rest of the resources.
Once again thanks for your efforts :)
| ret["comment"] = "The deployment is going to be created" | ||
| # Simulate creation to show changes to account for using the source argument | ||
| try: | ||
| dry_run_result = __salt__["kubernetes.create_deployment"]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hint: [Not sure if that's relevant in this specific case, more general guidance]
When running with test=true, dependencies of this object might not exist because previous required states run in test mode as well, which can cause failures in test mode when it would run fine in regular mode.
In those cases, the simplest solution generally is to leave ret["result"] = None and add a hint to the comment: ret["comment"] = f"Dry run failed. If previous states ensure requirements are met, you can ignore this message. The error was: {err}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow what you mean here exactly. The primary reason I added the dry_run here is so that I can get a change result if using source because at this point that file has not been rendered yet. Using dry_run gets that to render and I can return results on the changes.
Before it was only checking and returning on metadata and spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'll try to be clearer (without knowing specifics about Kubernetes resources, hence I'll illustrate hypothetical docker state modules):
Fetch image:
docker_image.present:
- name: ghcr.io/foo/bar
Create volume:
docker_volume.present:
- name: foo-vol
Create container:
docker_container.present:
- name: foo
- image: ghcr.io/foo/bar
- mount:
foo-vol: /app/data
- require:
- docker_image: ghcr.io/foo/bar
- docker_volume: foo-vol
Other stuff:
file.managed:
- name: /etc/bar/baz.conf
- source: salt://foo/bar/baz.conf
- require:
- docker_container: fooWhen applying this state on a fresh node with test=true, both docker_image and docker_volume will report that their resources will be created, but not actually create them. If docker_container.present relied on a hypothetical docker.create_container(..., dry_run=True) API, Docker could rightfully complain that neither the referenced image nor the referenced volume exist, which would trigger an exception in this case.
If docker_container.present did not have specific test mode handling for this situation, the state would report an error in test mode and dependent states (like Other stuff here) would be marked failed as well, even if the state application as a unit would work just fine without test mode.
The simplest solution is to not fail docker_container.present in test mode (since it can have requirements managed by different states that don't actually ensure the requirement is created in test mode), but instead report ret["result"] = None with a comment explaining that the test failed, but it might just be a result of running in test mode. This allows for the file.managed state to still report on changes it would make.
If a state can expect that it should work in test mode (because it can't really have any requirements, like the docker_image example), this is unnecessary.
Since I'm not familiar with Kubernetes, I can't tell if this is an issue here. Even if it is, it would be minor usability issue. Just wanted to point it out as a guide. Many state modules don't account for this.
TLDR: In general, failing test mode should be restricted to cases where we're reasonably sure there's something wrong (e.g. the ghcr.io/foo/bar image does not exist* or the connection to the repository failed).
* One could argue that in the general case, an image could be created by a previous docker_image.built state, but that would likely be a rare situation, much rarer than a missing image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean. The good thing about kubernetes is that most of the time all of that is in the spec - ref example below. Persistent volumes would likely be the cause of an issue here unless they set their cluster to create persistent volumes when a claim is made. Thank you for clarifying.
EDIT: Secrets and configmaps would also apply here.
spec:
imagePullSecrets:
- name: {{ secret }}
containers:
- name: {{ name }}
image: {{ url }}:{{ tag }}
imagePullPolicy: Always
ports:
- containerPort: {{ tgt_port }}
volumeMounts:
- name: {{ pvc_name }}
mountPath: {{ pvc_mount_path }}
- name: {{ secret_name }}
mountPath: {{ secret_mount_path }}
readOnly: true
volumes:
- name: {{ pvc_name }}
persistentVolumeClaim:
claimName: {{ pvc_name }}
- name: {{ secret_name }}
secret:
secretName: {{ secret_name }}
defaultMode: 0400| assert deployment_state["spec"]["replicas"] == 3 | ||
|
|
||
|
|
||
| def test_deployment_present_patch(kubernetes, deployment, kubernetes_exe): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I'd test in test mode as well and assert that the resource is not changed. It's often neglected when writing a state, but can have surprising consequences if not handled correctly.
| assert deployment_state["spec"]["replicas"] == 4 | ||
|
|
||
|
|
||
| def test_deployment_present_patch_source( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Test mode, see above
| assert result == {"code": 200} | ||
| assert kubernetes.kubernetes.client.AppsV1Api().delete_namespaced_deployment.called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: This test is an example for my remark on unit tests above (or below if viewed from the conversations tab :)). It only checks that delete_deployment returns the value we made the sanitize_for_serialization mock return and ensures it calls delete_namespaced_deployment. For all this brittle setup, it's quite isolated from reality and redundant since we have the functional tests.
Both assertions should be covered by the functional test. The unit test however does not check for example that
- the deployment was actually deleted
- the functions are called with the correct arguments
- the client lib still has the
delete_namespaced_deployment/sanitize_for_serializationetc. functions - the client lib still works with the Kubernetes API
- etc.
Unit tests, especially with comprehensive client libs like the kubernetes one, often need a lot of surgical patching (which can make changes to the functions under test quite annoying and limits their meaningfulness).
I'd reserve unit tests for hard to test behavior like some unhappy paths, e.g. "Does the state module handle situations where an execution module raises a CommandExecutionError" or a specific ApiException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: Imo many of the current unit tests don't make sense since they test the same happy-path behavior as the functional (and integration) ones. I'd scrap the existing redundant ones instead of rewriting them and would advise to focus on comprehensive functional ones instead.
[See related comment on test_delete_deployments]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Same comment regarding unit tests here. I'd scrap most of the existing (redundant) ones instead of rewriting them/adding new ones.
Thank you for taking a look. Most of this should be pretty easy to fix. I will look closer at the unit tests, I am still wrapping my head around what they are actually for. This helps a lot. Functional and integration tests make more sense to me. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #20 +/- ##
==========================================
+ Coverage 75.25% 75.49% +0.23%
==========================================
Files 17 17
Lines 4130 4284 +154
Branches 432 448 +16
==========================================
+ Hits 3108 3234 +126
- Misses 883 906 +23
- Partials 139 144 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
What issues does this PR fix or reference?
Fixes: This PR addresses idempotency issues in the saltext-kubernetes extension by implementing proper patch functionality for Kubernetes resources. Previously, the extension lacked the ability to intelligently update existing resources, leading to unnecessary recreation operations that could disrupt running workloads. This enhancement enables Salt to detect differences between desired and current state, applying only the necessary changes through Kubernetes patch operations, ensuring idempotent resource management.
closes #16
Previous Behavior
The state module was not idempotent for most present and absent functions.
New Behavior
This ensures that running
salt '*' state.applymultiple times produces consistent results without unintended side effects, which is fundamental to Salt's declarative infrastructure management approach.patch_<resource>()functions to support both dictionary patches and source file rendering.to_dict()responses now returnsanitize_for_serialization()responses. The structure is similar but may have slight differences in nested object handling.Merge requirements satisfied?
[NOTICE] Bug fixes or new features require tests.
Commits signed with GPG?
Yes
Please review Salt's Contributing Guide for best practices.
See GitHub's page on GPG signing for more information about signing commits with GPG.