Disable grab download resume #5034

jnummelin · 2024-09-24T19:25:30Z

Description

By default grab tries to resume the download if the file name determined from either the url or from content-type headers already exists. This makes things go side ways, if the existing file is smaller than the new one, the old content would still be there and only the "extra" new bytes would get written. I.e. the download would be "resumed". 🤦

So this change disables resuming altogether fro grab.

To be able to download and replace k0s without resuming, i.e. "truncate" the file in-between. We need to force the download to download it as k0s.tmp first. Only after that we can safely move the file as truncating a running binary is not allowed and will error with text file busy.

This is a minimal possible fix that we can easily backport. @twz123 is already working on bigger refactoring of autopilot download functionality that gets rid of grab. Grab seems to bring more (bad) surprises than real benefits. In the end, we just download files and we should pretty much always just replace them. No need for full library dependecy for that.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

How Has This Been Tested?

Manual test
Auto test added
Existing tests cover this

Checklist:

https://github.com/k3s-io/kine/releases/tag/v0.11.9 Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 67bd4dc)

…elease-1.30

The default port is 8080. This conflicts with kube-router, which since 2.x fails if it cannot bind to its metrics port. This leads to a situation where the default configuration for a single node k0s cluster is broken, as kube-router will go into a crash loop. Kube-router 1.x will not be able to provide metrics, but will simply log the error and move on. The integration tests did not cover this case properly, as the kube-router manifests did not specify a readiness probe that the inttests need to detect problems with daemon sets. Port 2380 was chosen because it is normally used by etcd. Kine also uses port 2379 for the client API, so why not use 2380 as well? Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 179989e)

…elease-1.30 [Backport release-1.30] Change kine metrics port from 8080 to 2380

Use the JSON API and look for the push_time_seconds metric. Collect all job labels that had successful pushes. Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 2de1b0f)

This is a variant of check-metricsscraper that executes the inttest on a single-node cluster. This mainly means that, in addition to etcd, kine scraping is also tested. Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 62beca8)

The worker config ConfigMaps have the Kubernetes version in their names. This is to ensure a seamless upgrade path when upgrading minor k0s versions. However, the RBAC resources and the stack name itself didn't have the version in their names. This means that as soon as a controller with the new minor k0s version takes over, it removes all previous worker ConfigMaps from the cluster, effectively defeating the purpose of the versioned names. Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 90e952b)

…elease-1.30 [Backport release-1.30] Use a versioned worker config stack

…elease-1.30 [Backport release-1.30] Introduce `check-metricsscraper-singlenode`

Currently the RestartSec is 120, which mean that systemd waits 2mins before it restarts k0s. This makes updates very slow with autopilot, i.e. with 3 controllers updating the controllers will take 6mins. Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit 225de58)

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#changelog-since-v1300 Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 15df9f3)

…elease-1.30 [Backport release-1.30] Lessen the systemd RestartSec to 10 secs

The watch for CRD getting ready used wrong error when checking if it is retriable. This results in bogus logging and a "busyloop": ``` time="2024-05-16 10:05:45" level=info msg="Transient error while watching etcdmember CRD, last observed version is \"\", starting over after 0s ..." component=etcdMemberReconciler error="<nil>" time="2024-05-16 10:05:45" level=info msg="Transient error while watching etcdmember CRD, last observed version is \"\", starting over after 0s ..." component=etcdMemberReconciler error="<nil>" time="2024-05-16 10:05:45" level=info msg="Transient error while watching etcdmember CRD, last observed version is \"\", starting over after 0s ..." component=etcdMemberReconciler error="<nil>" ... ``` It actually hides the transient errors encountered. Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit d67b442)

With this change the first controller on new version can apply the needed versioned resources as it will always be guaranteed to become the leader. Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit e5a271b)

…elease-1.30 [Backport release-1.30] Bump Kubernetes to v1.30.1

…elease-1.30 [Backport release-1.30] Fix error handling in EtcdMemberReconciler

https://github.com/containerd/containerd/releases/v1.7.17 Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 2bea0b8)

This allows for better feedback of kube-router health via the DaemonSet resource. Without those, it's possible to observe a "healthy" DaemonSet, even if it's not. This affects e.g. rolling updates, and, most notably k0s's own integration tests. Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 65fcf39)

Otherwise the defaulting could pick up `kube-bridge` address as api/etcd address when restarting k0s if it sees it before any "real" address. This happens on e.g. bootloose containers where kube-bridge is at index 11 and eth0 on index 18. Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit dd2fdca)

…elease-1.30 [Backport release-1.30] Skip `kube-bridge` interface for api/etcd address

…elease-1.30 [Backport release-1.30] Use dedicated leasepool for worker config component

…elease-1.30 [Backport release-1.30] Add readinessProbe/minReadySeconds to kube-router

…elease-1.30 [Backport release-1.30] Bump containerd to v1.7.17

So that they are ignored by the Go tooling. Otherwise, when building in parallel, Go may try to treat these temporary directories as Go package folders and fail utterly. https://pkg.go.dev/cmd/go#hdr-Package_lists_and_patterns > Directory and file names that begin with "." or "_" are ignored by the > go tool, as are directories named "testdata". Fixes: d68dc34 ("Improve Docker export in embedded-binaries Makefile") Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit db51280)

Signed-off-by: Alexey Makhov <amakhov@mirantis.com> (cherry picked from commit 5ead418)

…elease-1.30 [Backport release-1.30] Adding CLI args reference to the docs

We may run k0s on arm/v8 but only have images for arm/v7. Use Only() instead of OnlyStrinct() to also match arm/v7 on arm/v8. Fixes commit 9dd47b4 (k0sproject#1939 Fix airgap arm64 build. Strictly use target platform only) ref: https://pkg.go.dev/github.com/containerd/containerd/platforms#Only Signed-off-by: Natanael Copa <ncopa@mirantis.com> (cherry picked from commit 87d6609)

Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit 29c7544)

Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit c2fb3e0)

Co-authored-by: Tom Wieczorek <twz123@users.noreply.github.com> Signed-off-by: Jussi Nummelin <jnummelin@users.noreply.github.com> (cherry picked from commit e53d0fc)

…se-1.30 [Backport release-1.30] Bump runc to 1.1.14

…se-1.30 [Backport release-1.30] Bump etcd to 3.5.16

https://github.com/containerd/containerd/releases/tag/v1.7.22 Signed-off-by: Natanael Copa <ncopa@mirantis.com>

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#changelog-since-v1304 Signed-off-by: Natanael Copa <ncopa@mirantis.com>

[release-1.30] Bump kubernetes to 1.30.5

…se-1.30 [Backport release-1.30] Bump metrics-server to v0.7.2

…-1.7.22 [Backport release-1.30] Bump containerd to 1.7.22

…te description Signed-off-by: SebPlv <53788114+SebPlv@users.noreply.github.com> (cherry picked from commit c334f22)

…elease-1.30 [Backport release-1.30] [doc] Remove non-existant 'targets' field in Autopilot airgapupdate description

@twz123

This is to mitigate cases like k0sproject#4296 By default grab tries to resume the download if the file name determined from either the url or from content-type headers already exists. This makes things go side ways, if the existing file is smaller than the new one, the old content would still be there and only the "extra" new bytes would get written. I.e. the download would be "resumed". 🤦 So this change disables resuming altogether fro grab. To be able to download and replace k0s without resuming, i.e. "truncate" the file in-between. We need to force the download to download it as k0s.tmp first. Only after that we can safely move the file as truncating a running binary is not allowed and will error with `text file busy`. This is a minimal possible fix that we can easily backport. @twz123 is already working on bigger refactoring of autopilot download functionality that gets rid of grab. Grab seems to bring more (bad) surprises than real benefits. In the end, we just download files and we should pretty much always just replace them. No need for full library dependecy for that. Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com>

twz123 · 2024-09-26T09:41:40Z

Can we retarget this to main and then do a backport? I think there are some changes in here that I'd like to rebase #5020 on.

github-actions · 2024-09-26T09:52:54Z

This pull request has merge conflicts that need to be resolved.

jnummelin · 2024-09-26T09:53:41Z

See #5042, targeting main and then we'll backport it

twz123 and others added 30 commits May 13, 2024 18:53

Bump kine to v0.11.9

651a768

https://github.com/k3s-io/kine/releases/tag/v0.11.9 Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 67bd4dc)

Merge pull request k0sproject#4414 from k0sproject/backport-4405-to-r…

d5d4709

…elease-1.30

Merge pull request k0sproject#4423 from k0sproject/backport-4421-to-r…

7c74745

…elease-1.30 [Backport release-1.30] Change kine metrics port from 8080 to 2380

Parse metrics in metricsscraper job

ba588a6

Use the JSON API and look for the push_time_seconds metric. Collect all job labels that had successful pushes. Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 2de1b0f)

Merge pull request k0sproject#4440 from k0sproject/backport-4416-to-r…

bdcc5ac

…elease-1.30 [Backport release-1.30] Use a versioned worker config stack

Merge pull request k0sproject#4439 from k0sproject/backport-4419-to-r…

3bbf383

…elease-1.30 [Backport release-1.30] Introduce `check-metricsscraper-singlenode`

Bump Kubernetes to v1.30.1

b120904

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#changelog-since-v1300 Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 15df9f3)

Merge pull request k0sproject#4446 from k0sproject/backport-4432-to-r…

77fcb37

…elease-1.30 [Backport release-1.30] Lessen the systemd RestartSec to 10 secs

Merge pull request k0sproject#4448 from k0sproject/backport-4424-to-r…

7e4dbee

…elease-1.30 [Backport release-1.30] Bump Kubernetes to v1.30.1

Merge pull request k0sproject#4449 from k0sproject/backport-4435-to-r…

718a304

…elease-1.30 [Backport release-1.30] Fix error handling in EtcdMemberReconciler

Bump containerd to v1.7.17

e15b275

https://github.com/containerd/containerd/releases/v1.7.17 Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 2bea0b8)

Merge pull request k0sproject#4475 from k0sproject/backport-4467-to-r…

30c44ba

…elease-1.30 [Backport release-1.30] Skip `kube-bridge` interface for api/etcd address

Merge pull request k0sproject#4463 from k0sproject/backport-4457-to-r…

2ae1931

…elease-1.30 [Backport release-1.30] Use dedicated leasepool for worker config component

Merge pull request k0sproject#4471 from k0sproject/backport-4420-to-r…

d357f38

…elease-1.30 [Backport release-1.30] Add readinessProbe/minReadySeconds to kube-router

Merge pull request k0sproject#4465 from k0sproject/backport-4447-to-r…

77a9ab2

…elease-1.30 [Backport release-1.30] Bump containerd to v1.7.17

Adding CLI args reference to the docs

6702597

Signed-off-by: Alexey Makhov <amakhov@mirantis.com> (cherry picked from commit 5ead418)

Merge pull request k0sproject#4504 from k0sproject/backport-4501-to-r…

245d5b8

…elease-1.30 [Backport release-1.30] Adding CLI args reference to the docs

Skip missing blobs in import

1cea49a

Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit 29c7544)

Update airgap docs relating to multiarch images import

48357a5

Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit c2fb3e0)

Fix typo

0eb8548

Co-authored-by: Tom Wieczorek <twz123@users.noreply.github.com> Signed-off-by: Jussi Nummelin <jnummelin@users.noreply.github.com> (cherry picked from commit e53d0fc)

twz123 and others added 9 commits September 17, 2024 13:20

Merge pull request k0sproject#4988 from twz123/backport-4967-to-relea…

1fe2634

…se-1.30 [Backport release-1.30] Bump runc to 1.1.14

Merge pull request k0sproject#4987 from twz123/backport-4982-to-relea…

d8c7047

…se-1.30 [Backport release-1.30] Bump etcd to 3.5.16

Bump containerd to 1.7.22

e3c6a87

https://github.com/containerd/containerd/releases/tag/v1.7.22 Signed-off-by: Natanael Copa <ncopa@mirantis.com>

Bump kubernetes to 1.30.5

be2c3c2

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#changelog-since-v1304 Signed-off-by: Natanael Copa <ncopa@mirantis.com>

Merge pull request k0sproject#4999 from ncopa/kube-1.30.5

2e353fb

[release-1.30] Bump kubernetes to 1.30.5

Merge pull request k0sproject#4986 from twz123/backport-4913-to-relea…

eb37f49

…se-1.30 [Backport release-1.30] Bump metrics-server to v0.7.2

Merge pull request k0sproject#4997 from ncopa/release-1.30-containerd…

f7a8856

…-1.7.22 [Backport release-1.30] Bump containerd to 1.7.22

fix(doc): remove non-existant 'targets' field in Autopilot airgapupda…

4f56330

…te description Signed-off-by: SebPlv <53788114+SebPlv@users.noreply.github.com> (cherry picked from commit c334f22)

Merge pull request k0sproject#5008 from k0sproject/backport-4951-to-r…

db0b308

…elease-1.30 [Backport release-1.30] [doc] Remove non-existant 'targets' field in Autopilot airgapupdate description

jnummelin requested a review from a team as a code owner September 24, 2024 19:25

jnummelin requested review from ncopa and twz123 September 24, 2024 19:25

jnummelin added backport/release-1.28 PR that needs to be backported/cherrypicked to release-1.28 branch backport/release-1.29 PR that needs to be backported/cherrypicked to the release-1.29 branch labels Sep 24, 2024

This was referenced Sep 24, 2024

Issues running repeated updates with autopilot #4296

Closed

Disable grab download resume #5019

Closed

jnummelin force-pushed the 1.30/fix-grab-downloader branch from f72ac25 to f0c58df Compare September 24, 2024 19:40

jnummelin force-pushed the 1.30/fix-grab-downloader branch from f0c58df to d5b216b Compare September 24, 2024 19:52

twz123 changed the title ~~[Release 1.30] Disable grab download resume~~ [release-1.30] Disable grab download resume Sep 25, 2024

jnummelin changed the base branch from release-1.30 to main September 26, 2024 09:46

jnummelin changed the title ~~[release-1.30] Disable grab download resume~~ Disable grab download resume Sep 26, 2024

twz123 added bug Something isn't working component/autopilot backport/release-1.30 PR that needs to be backported/cherrypicked to the release-1.30 branch labels Sep 26, 2024

github-actions bot added the merge-conflict label Sep 26, 2024

jnummelin closed this Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable grab download resume #5034

Disable grab download resume #5034

jnummelin commented Sep 24, 2024 •

edited

Loading

twz123 commented Sep 26, 2024

github-actions bot commented Sep 26, 2024

jnummelin commented Sep 26, 2024

Disable grab download resume #5034

Disable grab download resume #5034

Conversation

jnummelin commented Sep 24, 2024 • edited Loading

Description

Type of change

How Has This Been Tested?

Checklist:

twz123 commented Sep 26, 2024

github-actions bot commented Sep 26, 2024

jnummelin commented Sep 26, 2024

jnummelin commented Sep 24, 2024 •

edited

Loading