Skip to content

backports: for v1.10.6 #11471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 31, 2025
Merged

Conversation

@smira smira added this to the v1.10 milestone Jul 30, 2025
@github-project-automation github-project-automation bot moved this to To Do in Planning Jul 30, 2025
@github-project-automation github-project-automation bot moved this from To Do to Approved in Planning Jul 30, 2025
smira and others added 7 commits July 30, 2025 21:26
Also go1.24.5, Kubernetes default to 1.33.3.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This issue is specifically bad with single node controlplanes, and
masked by retries in multiple control planes.

The core issue is that Talos had a race between creating a `talos`
`Service` and `Endpoint`, which might end up with `Endpoint` created
without `Service`, and Kubernetes EndpointController cleans up orphaned
endpoints.

In multiple control planes scenario, and while enabling not on
bootstrap, which might be masked by multiple machines re-creating
endpoints and the issue is not seen.

Fixes siderolabs#11311

Co-authored-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 54bd50b)
See siderolabs#11210

This doesn't fix anything, but the logs will be more helpful to
understand what exactly is wrong.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit a966321)
See siderolabs/pkgs#1277

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 506212a)
This is to make them more easily identifiable for clean up if required.

Signed-off-by: Andrew Longwill <andrew.longwill@siderolabs.com>
(cherry picked from commit bfc57fb)
This showed up in docker runs (not sure why only docker), but the issue
is the following:

* a service is running which has some volume requirements
* `VolumeMountRequests` are created, and `VolumeMountStatus` were
  established
* the service put finalizers on `VolumeMountStatus`
* now the service is going to be restarted - so at first it's going to
  be shut down
* on shutdown, the service will remove `VolumeMountRequest`, and remove
  finalizers on `VolumeMountStatus`
* now it's job of other controllers to tear down and remove mounts
* as the service starts back up after restart, it will re-create
  `VolumeMountRequest`, and will try to wait and put finalizers on
  `VolumeMountStatus`
* here comes the race condition: it can be that the service sees tearing
  down `VolumeMountStatus` which is left from the shutdown time, so it
  puts a finalizer on it, and it blocks the proper teardown of the
  previous "generation" of the mount request/status, leading to a
  deadlock

So the fix is to wait for the new status to be created which is not
tearing down.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 06ef710)
Import the fix from siderolabs/go-blockdevice#135

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit d62e255)
@smira smira force-pushed the backports/v1.10.6 branch from d154bfe to 7553089 Compare July 30, 2025 17:26
@smira smira added the integration/release-gate Builds required to pass for a release label Jul 30, 2025
@smira
Copy link
Member Author

smira commented Jul 31, 2025

/m

@talos-bot talos-bot merged commit 7553089 into siderolabs:release-1.10 Jul 31, 2025
240 of 246 checks passed
@github-project-automation github-project-automation bot moved this from Approved to Done in Planning Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration/release-gate Builds required to pass for a release status/ok-to-merge
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants