-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable grab download resume #5034
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://github.com/k3s-io/kine/releases/tag/v0.11.9 Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 67bd4dc)
The default port is 8080. This conflicts with kube-router, which since 2.x fails if it cannot bind to its metrics port. This leads to a situation where the default configuration for a single node k0s cluster is broken, as kube-router will go into a crash loop. Kube-router 1.x will not be able to provide metrics, but will simply log the error and move on. The integration tests did not cover this case properly, as the kube-router manifests did not specify a readiness probe that the inttests need to detect problems with daemon sets. Port 2380 was chosen because it is normally used by etcd. Kine also uses port 2379 for the client API, so why not use 2380 as well? Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 179989e)
…elease-1.30 [Backport release-1.30] Change kine metrics port from 8080 to 2380
Use the JSON API and look for the push_time_seconds metric. Collect all job labels that had successful pushes. Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 2de1b0f)
This is a variant of check-metricsscraper that executes the inttest on a single-node cluster. This mainly means that, in addition to etcd, kine scraping is also tested. Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 62beca8)
The worker config ConfigMaps have the Kubernetes version in their names. This is to ensure a seamless upgrade path when upgrading minor k0s versions. However, the RBAC resources and the stack name itself didn't have the version in their names. This means that as soon as a controller with the new minor k0s version takes over, it removes all previous worker ConfigMaps from the cluster, effectively defeating the purpose of the versioned names. Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 90e952b)
…elease-1.30 [Backport release-1.30] Use a versioned worker config stack
…elease-1.30 [Backport release-1.30] Introduce `check-metricsscraper-singlenode`
Currently the RestartSec is 120, which mean that systemd waits 2mins before it restarts k0s. This makes updates very slow with autopilot, i.e. with 3 controllers updating the controllers will take 6mins. Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit 225de58)
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#changelog-since-v1300 Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 15df9f3)
…elease-1.30 [Backport release-1.30] Lessen the systemd RestartSec to 10 secs
The watch for CRD getting ready used wrong error when checking if it is retriable. This results in bogus logging and a "busyloop": ``` time="2024-05-16 10:05:45" level=info msg="Transient error while watching etcdmember CRD, last observed version is \"\", starting over after 0s ..." component=etcdMemberReconciler error="<nil>" time="2024-05-16 10:05:45" level=info msg="Transient error while watching etcdmember CRD, last observed version is \"\", starting over after 0s ..." component=etcdMemberReconciler error="<nil>" time="2024-05-16 10:05:45" level=info msg="Transient error while watching etcdmember CRD, last observed version is \"\", starting over after 0s ..." component=etcdMemberReconciler error="<nil>" ... ``` It actually hides the transient errors encountered. Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit d67b442)
With this change the first controller on new version can apply the needed versioned resources as it will always be guaranteed to become the leader. Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit e5a271b)
…elease-1.30 [Backport release-1.30] Bump Kubernetes to v1.30.1
…elease-1.30 [Backport release-1.30] Fix error handling in EtcdMemberReconciler
https://github.com/containerd/containerd/releases/v1.7.17 Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 2bea0b8)
This allows for better feedback of kube-router health via the DaemonSet resource. Without those, it's possible to observe a "healthy" DaemonSet, even if it's not. This affects e.g. rolling updates, and, most notably k0s's own integration tests. Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit 65fcf39)
Otherwise the defaulting could pick up `kube-bridge` address as api/etcd address when restarting k0s if it sees it before any "real" address. This happens on e.g. bootloose containers where kube-bridge is at index 11 and eth0 on index 18. Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit dd2fdca)
…elease-1.30 [Backport release-1.30] Skip `kube-bridge` interface for api/etcd address
…elease-1.30 [Backport release-1.30] Use dedicated leasepool for worker config component
…elease-1.30 [Backport release-1.30] Add readinessProbe/minReadySeconds to kube-router
…elease-1.30 [Backport release-1.30] Bump containerd to v1.7.17
So that they are ignored by the Go tooling. Otherwise, when building in parallel, Go may try to treat these temporary directories as Go package folders and fail utterly. https://pkg.go.dev/cmd/go#hdr-Package_lists_and_patterns > Directory and file names that begin with "." or "_" are ignored by the > go tool, as are directories named "testdata". Fixes: d68dc34 ("Improve Docker export in embedded-binaries Makefile") Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com> (cherry picked from commit db51280)
Signed-off-by: Alexey Makhov <amakhov@mirantis.com> (cherry picked from commit 5ead418)
…elease-1.30 [Backport release-1.30] Adding CLI args reference to the docs
We may run k0s on arm/v8 but only have images for arm/v7. Use Only() instead of OnlyStrinct() to also match arm/v7 on arm/v8. Fixes commit 9dd47b4 (k0sproject#1939 Fix airgap arm64 build. Strictly use target platform only) ref: https://pkg.go.dev/github.com/containerd/containerd/platforms#Only Signed-off-by: Natanael Copa <ncopa@mirantis.com> (cherry picked from commit 87d6609)
Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit 29c7544)
Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com> (cherry picked from commit c2fb3e0)
…se-1.30 [Backport release-1.30] Bump runc to 1.1.14
…se-1.30 [Backport release-1.30] Bump etcd to 3.5.16
https://github.com/containerd/containerd/releases/tag/v1.7.22 Signed-off-by: Natanael Copa <ncopa@mirantis.com>
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#changelog-since-v1304 Signed-off-by: Natanael Copa <ncopa@mirantis.com>
[release-1.30] Bump kubernetes to 1.30.5
…se-1.30 [Backport release-1.30] Bump metrics-server to v0.7.2
…-1.7.22 [Backport release-1.30] Bump containerd to 1.7.22
…te description Signed-off-by: SebPlv <53788114+SebPlv@users.noreply.github.com> (cherry picked from commit c334f22)
…elease-1.30 [Backport release-1.30] [doc] Remove non-existant 'targets' field in Autopilot airgapupdate description
jnummelin
added
backport/release-1.28
PR that needs to be backported/cherrypicked to release-1.28 branch
backport/release-1.29
PR that needs to be backported/cherrypicked to the release-1.29 branch
labels
Sep 24, 2024
This was referenced Sep 24, 2024
jnummelin
force-pushed
the
1.30/fix-grab-downloader
branch
from
September 24, 2024 19:40
f72ac25
to
f0c58df
Compare
This is to mitigate cases like k0sproject#4296 By default grab tries to resume the download if the file name determined from either the url or from content-type headers already exists. This makes things go side ways, if the existing file is smaller than the new one, the old content would still be there and only the "extra" new bytes would get written. I.e. the download would be "resumed". 🤦 So this change disables resuming altogether fro grab. To be able to download and replace k0s without resuming, i.e. "truncate" the file in-between. We need to force the download to download it as k0s.tmp first. Only after that we can safely move the file as truncating a running binary is not allowed and will error with `text file busy`. This is a minimal possible fix that we can easily backport. @twz123 is already working on bigger refactoring of autopilot download functionality that gets rid of grab. Grab seems to bring more (bad) surprises than real benefits. In the end, we just download files and we should pretty much always just replace them. No need for full library dependecy for that. Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com>
jnummelin
force-pushed
the
1.30/fix-grab-downloader
branch
from
September 24, 2024 19:52
f0c58df
to
d5b216b
Compare
twz123
changed the title
[Release 1.30] Disable grab download resume
[release-1.30] Disable grab download resume
Sep 25, 2024
Can we retarget this to main and then do a backport? I think there are some changes in here that I'd like to rebase #5020 on. |
jnummelin
changed the title
[release-1.30] Disable grab download resume
Disable grab download resume
Sep 26, 2024
twz123
added
bug
Something isn't working
component/autopilot
backport/release-1.30
PR that needs to be backported/cherrypicked to the release-1.30 branch
labels
Sep 26, 2024
This pull request has merge conflicts that need to be resolved. |
See #5042, targeting main and then we'll backport it |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/release-1.28
PR that needs to be backported/cherrypicked to release-1.28 branch
backport/release-1.29
PR that needs to be backported/cherrypicked to the release-1.29 branch
backport/release-1.30
PR that needs to be backported/cherrypicked to the release-1.30 branch
bug
Something isn't working
component/autopilot
merge-conflict
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fixes #4296
By default grab tries to resume the download if the file name determined from either the url or from content-type headers already exists. This makes things go side ways, if the existing file is smaller than the new one, the old content would still be there and only the "extra" new bytes would get written. I.e. the download would be "resumed". 🤦
So this change disables resuming altogether fro grab.
To be able to download and replace k0s without resuming, i.e. "truncate" the file in-between. We need to force the download to download it as k0s.tmp first. Only after that we can safely move the file as truncating a running binary is not allowed and will error with
text file busy
.This is a minimal possible fix that we can easily backport. @twz123 is already working on bigger refactoring of autopilot download functionality that gets rid of grab. Grab seems to bring more (bad) surprises than real benefits. In the end, we just download files and we should pretty much always just replace them. No need for full library dependecy for that.
Type of change
How Has This Been Tested?
Checklist: