Skip to content

Conversation

@djivey
Copy link
Collaborator

@djivey djivey commented Sep 18, 2025

What does this PR do?

What issues does this PR fix or reference?

Fixes: This PR addresses idempotency issues in the saltext-kubernetes extension by implementing proper patch functionality for Kubernetes resources. Previously, the extension lacked the ability to intelligently update existing resources, leading to unnecessary recreation operations that could disrupt running workloads. This enhancement enables Salt to detect differences between desired and current state, applying only the necessary changes through Kubernetes patch operations, ensuring idempotent resource management.

closes #16

Previous Behavior

The state module was not idempotent for most present and absent functions.

New Behavior

This ensures that running salt '*' state.apply multiple times produces consistent results without unintended side effects, which is fundamental to Salt's declarative infrastructure management approach.

  • Added patch_<resource>() functions to support both dictionary patches and source file rendering
  • Functions that previously returned .to_dict() responses now return sanitize_for_serialization() responses. The structure is similar but may have slight differences in nested object handling.
  • Resources are now patched rather than replaced when differences are detected
  • State operations correctly report "no changes" when resources already match desired state
  • Eliminated unnecessary resource recreation that could cause service disruption
  • Enhanced state logic to properly handle both creation and modification scenarios
  • Added replace as an argument incase someone wanted to force a replacement of a resource
  • Added dry_run to help with determining changes
  • Updated test to reflect changes

Merge requirements satisfied?

[NOTICE] Bug fixes or new features require tests.

Commits signed with GPG?

Yes

Please review Salt's Contributing Guide for best practices.

See GitHub's page on GPG signing for more information about signing commits with GPG.

@djivey
Copy link
Collaborator Author

djivey commented Sep 18, 2025

@lkubb Can you review the approach I used here to achieve idempotency? I only made changes to one resource type to keep the changes as low as I can for you.

This is going to end up being another big change I believe, but if you can agree with the approach, it should be easy for me to apply across the rest of the resource types.

I am not sure that some of these tests like the integration tests are needed, but I wanted to at least show the changes worked.

@djivey
Copy link
Collaborator Author

djivey commented Sep 18, 2025

I will have to look at these test failures, they worked locally

@djivey djivey marked this pull request as draft September 18, 2025 00:57
@lkubb
Copy link
Member

lkubb commented Sep 18, 2025

Afaict it's just salt-extensions/salt-extension-copier#199. Sorry, forgot about this issue, will try to get a new template release out soon.

Copy link
Member

@lkubb lkubb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you merge #21 and rebase this branch, CI should pass. 👍

This looks pretty good! I added some comments that hopefully help guide you for the rest of the resources.

Once again thanks for your efforts :)

ret["comment"] = "The deployment is going to be created"
# Simulate creation to show changes to account for using the source argument
try:
dry_run_result = __salt__["kubernetes.create_deployment"](
Copy link
Member

@lkubb lkubb Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hint: [Not sure if that's relevant in this specific case, more general guidance]

When running with test=true, dependencies of this object might not exist because previous required states run in test mode as well, which can cause failures in test mode when it would run fine in regular mode.

In those cases, the simplest solution generally is to leave ret["result"] = None and add a hint to the comment: ret["comment"] = f"Dry run failed. If previous states ensure requirements are met, you can ignore this message. The error was: {err}"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow what you mean here exactly. The primary reason I added the dry_run here is so that I can get a change result if using source because at this point that file has not been rendered yet. Using dry_run gets that to render and I can return results on the changes.

Before it was only checking and returning on metadata and spec.

Copy link
Member

@lkubb lkubb Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'll try to be clearer (without knowing specifics about Kubernetes resources, hence I'll illustrate hypothetical docker state modules):

Fetch image:
  docker_image.present:
    - name: ghcr.io/foo/bar

Create volume:
  docker_volume.present:
    - name: foo-vol

Create container:
  docker_container.present:
    - name: foo
    - image: ghcr.io/foo/bar
    - mount:
        foo-vol: /app/data
    - require:
      - docker_image: ghcr.io/foo/bar
      - docker_volume: foo-vol

Other stuff:
  file.managed:
    - name: /etc/bar/baz.conf
    - source: salt://foo/bar/baz.conf
    - require:
      - docker_container: foo

When applying this state on a fresh node with test=true, both docker_image and docker_volume will report that their resources will be created, but not actually create them. If docker_container.present relied on a hypothetical docker.create_container(..., dry_run=True) API, Docker could rightfully complain that neither the referenced image nor the referenced volume exist, which would trigger an exception in this case.

If docker_container.present did not have specific test mode handling for this situation, the state would report an error in test mode and dependent states (like Other stuff here) would be marked failed as well, even if the state application as a unit would work just fine without test mode.

The simplest solution is to not fail docker_container.present in test mode (since it can have requirements managed by different states that don't actually ensure the requirement is created in test mode), but instead report ret["result"] = None with a comment explaining that the test failed, but it might just be a result of running in test mode. This allows for the file.managed state to still report on changes it would make.

If a state can expect that it should work in test mode (because it can't really have any requirements, like the docker_image example), this is unnecessary.

Since I'm not familiar with Kubernetes, I can't tell if this is an issue here. Even if it is, it would be minor usability issue. Just wanted to point it out as a guide. Many state modules don't account for this.

TLDR: In general, failing test mode should be restricted to cases where we're reasonably sure there's something wrong (e.g. the ghcr.io/foo/bar image does not exist* or the connection to the repository failed).

* One could argue that in the general case, an image could be created by a previous docker_image.built state, but that would likely be a rare situation, much rarer than a missing image.

Copy link
Collaborator Author

@djivey djivey Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. The good thing about kubernetes is that most of the time all of that is in the spec - ref example below. Persistent volumes would likely be the cause of an issue here unless they set their cluster to create persistent volumes when a claim is made. Thank you for clarifying.

EDIT: Secrets and configmaps would also apply here.

spec:
  imagePullSecrets:
    - name: {{ secret }}
  containers:
  - name: {{ name }}
    image: {{ url }}:{{ tag }}
    imagePullPolicy: Always
    ports:
    - containerPort: {{ tgt_port }}
    volumeMounts:
    - name: {{ pvc_name }}
      mountPath: {{ pvc_mount_path }}
    - name: {{ secret_name }}
      mountPath: {{ secret_mount_path }}
      readOnly: true
  volumes:
  - name: {{ pvc_name }}
    persistentVolumeClaim:
      claimName: {{ pvc_name }}
  - name: {{ secret_name }}
    secret:
      secretName: {{ secret_name }}
      defaultMode: 0400

assert deployment_state["spec"]["replicas"] == 3


def test_deployment_present_patch(kubernetes, deployment, kubernetes_exe):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I'd test in test mode as well and assert that the resource is not changed. It's often neglected when writing a state, but can have surprising consequences if not handled correctly.

assert deployment_state["spec"]["replicas"] == 4


def test_deployment_present_patch_source(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Test mode, see above

Comment on lines +156 to +157
assert result == {"code": 200}
assert kubernetes.kubernetes.client.AppsV1Api().delete_namespaced_deployment.called
Copy link
Member

@lkubb lkubb Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: This test is an example for my remark on unit tests above (or below if viewed from the conversations tab :)). It only checks that delete_deployment returns the value we made the sanitize_for_serialization mock return and ensures it calls delete_namespaced_deployment. For all this brittle setup, it's quite isolated from reality and redundant since we have the functional tests.

Both assertions should be covered by the functional test. The unit test however does not check for example that

  • the deployment was actually deleted
  • the functions are called with the correct arguments
  • the client lib still has the delete_namespaced_deployment/sanitize_for_serialization etc. functions
  • the client lib still works with the Kubernetes API
  • etc.

Unit tests, especially with comprehensive client libs like the kubernetes one, often need a lot of surgical patching (which can make changes to the functions under test quite annoying and limits their meaningfulness).

I'd reserve unit tests for hard to test behavior like some unhappy paths, e.g. "Does the state module handle situations where an execution module raises a CommandExecutionError" or a specific ApiException.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: Imo many of the current unit tests don't make sense since they test the same happy-path behavior as the functional (and integration) ones. I'd scrap the existing redundant ones instead of rewriting them and would advise to focus on comprehensive functional ones instead.

[See related comment on test_delete_deployments]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Same comment regarding unit tests here. I'd scrap most of the existing (redundant) ones instead of rewriting them/adding new ones.

@djivey
Copy link
Collaborator Author

djivey commented Sep 19, 2025

If you merge #21 and rebase this branch, CI should pass. 👍

This looks pretty good! I added some comments that hopefully help guide you for the rest of the resources.

Once again thanks for your efforts :)

Thank you for taking a look. Most of this should be pretty easy to fix.

I will look closer at the unit tests, I am still wrapping my head around what they are actually for. This helps a lot. Functional and integration tests make more sense to me.

@codecov
Copy link

codecov bot commented Sep 19, 2025

Codecov Report

❌ Patch coverage is 82.90155% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.49%. Comparing base (70bc313) to head (6b02962).

Files with missing lines Patch % Lines
src/saltext/kubernetes/states/kubernetes.py 61.53% 17 Missing and 3 partials ⚠️
src/saltext/kubernetes/modules/kubernetesmod.py 64.86% 10 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #20      +/-   ##
==========================================
+ Coverage   75.25%   75.49%   +0.23%     
==========================================
  Files          17       17              
  Lines        4130     4284     +154     
  Branches      432      448      +16     
==========================================
+ Hits         3108     3234     +126     
- Misses        883      906      +23     
- Partials      139      144       +5     
Flag Coverage Δ
Linux 75.49% <82.90%> (+0.23%) ⬆️
macOS 42.57% <43.52%> (+0.03%) ⬆️
project 54.85% <62.92%> (+0.22%) ⬆️
py310 75.32% <82.90%> (+0.24%) ⬆️
py39 75.49% <82.90%> (+0.23%) ⬆️
salt_3006_15 75.49% <82.90%> (+0.23%) ⬆️
salt_3007_7 75.32% <82.90%> (+0.24%) ⬆️
tests 94.40% <100.00%> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor the state module kubernetes.py present functions to be idempotent

2 participants