-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Localkube crashing: "Connection reset by peer" #1252
Comments
Can you provide the output of |
And any steps to reproduce, I haven't seen localkube panicing like this before |
Hi @r2d4 — see attached for the logs. With respect to recreating this issue, it's quite tricky in how sporadic it is, and my understanding of minikube internals are probably too limited to understand how what I'm doing might interact. However, maybe I can describe my environment a bit more and what's going on when the crash happens. First, to list out our services and pods:
The cluster is run with 4GB of memory, and when we're running it we're generally forwarding three ports to localhost (nginx -> 8443, postgres -> 5432, redis -> 6379). We're mounting a couple of volumes based on the host path, and then streaming the logs for one of the pods. Beyond this, everything (I think) is fairly standard. We run nodejs for several of the containers, and have nodemon polling the hostMount to look for changes in files and restart the server accordingly. This bug seems to happen irrespective of whether or not anything is really happening within the cluster, and doesn't seem to be related to resource usage. If there's any other detail I can provide, or troubleshooting I can do, please let me know. Many thanks! |
Thanks for the detailed reports One more thing - can you output
This error from the logs was a result of an old bug in the image that has since been fixed If you see a bunch of zombie processes, you might want to make sure you're using the latest version of the minikube iso with |
Thanks @r2d4 — I just realized that those logs were from a colleague's machine on an older build of minikube (0.14); unfortunately I'm still getting this behavior in 0.17.1. I ran Interestingly, when I run [before crash] minikube logs.txt — this was taken when everything was up and running and seemingly stable; crashed 15min later without me touching anything. [after crash] minikube logs.txt |
^ @r2d4 Just pinging on this, seeing if you have any guidance about how I might best debug. It's definitely hampering our team quite a bit, and I'm happy to sink a fair amount of time into investigating, but any pointers would be incredibly useful and save me a lot of time. Many thanks! |
FYI on my linux machine I'm seeing the same behavior where I |
@imathews I haven't been able to tackle the root of the issue - although this repeated line from the logs might be worth investigating
Which comes from some of the cron job code There looks like there are still a few issues with cronjobs keeping around deleted pods. kubernetes/kubernetes#28977. Maybe they aren't getting deleted and thats causing issues? A shot in the dark though. You might want to try our kubernetes 1.6 branch from CI. (We'll be merging this into master once kubernetes 1.6 goes GA)
And @stevesloka any additional logs would help! |
^ Awesome, thanks for the pointers. I'll investigate and let you know what I find. |
@r2d4 I am seeing a similar behavior with v0.17.1 it seems very unstable and seems indeed localkube seems to crash. OSX 10.12.3 no cron jobs ever.. |
minikube logs
|
@sebgoa @r2d4 After much trial and error, I've been able to keep minikube running consistently by using the k8s 1.6 branch (pull 1266) and the xhyve driver. I was never able to get it stable when using virtualbox, tried several permutations of removing jobs, other services, etc... But very glad to have a solution that works. |
and virtualbox version is 5.1.18 r114002 pretty sure it is a problem in v0.17.1 , never had that problem before, it started appearing with this version. |
Im also running into the same issue. I started minikube as such My minikube version is also v0.17.1 |
I'm hitting this issue on 0.17.1 as well. My logs look similar to @sebgoa's. My localkube never recovers even with
I wonder if it's related to an addon which has a ReplicationController which is crashing? Here's my
And a
My localkube flaps up and down as systemctl attempts to restart it continuously. |
@donspaulding Are you running kubernetes 1.6? There might be some weird behavior with some of the addons (since they haven't all been upgraded to the latest versions). There also might be weirdness if you upgrade your cluster from 1.5 -> 1.6 without deleting. We don't guarantee in-place upgrades right now, but its something we would really like to have in the future. |
@r2d4 Nope, 1.5.3.
|
FWIW, without knowing what's really happening, I can't shake the feeling that the panic log is pointing right at the error. Here's the last line in the codebase which spits a message out to the log: I suspect one of the two goroutines in that file to be the problem, since we never get to the "Shutting down RC Manager" logging call at the bottom. I don't know golang or the k8s codebase, so I could be way off, and of course there's been a huge amount of churn in those files between 1.5.3 and 1.6. I'll try to see in the next couple of days whether or not running k8s 1.6.0 makes a difference. |
This seems related to minikube #1090 and perhaps even kubernetes #43430. |
Thanks for the additional debugging @donspaulding. Were you able to test this with 1.6? |
I'm recreating my minikube VM with 1.6.0 now. I've had this issue off and on, so I don't know that I'll know very quickly how successful the version bump is. But I'm willing to give it a go for a couple days. That being said, it will probably be a bit before we upgrade our production clusters to 1.6.0 and one of the main reasons to use minikube is to achieve dev/prod parity. I'm happy to do my part with debugging this, but if upgrading to 1.6.0 fixes the issue, would you expect this issue to be closed as "wontfix-pending-1.6.0-adoption"? If not, what would the next steps be? How can I help get a fix for this on 1.5.X? Thanks for your help with this @r2d4! |
I just ran into this issue also. minikube version: v0.17.1 Can confirm it only happens after
|
@r2d4 this is also happening with v1.6.0 Here is the user experience (with 1.5.3), in the span of 20 seconds:
This should become a blocker, it is a big user facing issue (whatever is happening). |
I was experiencing this too, intermittently, with minikube v0.17.1 and kube 1.5.3. Note that the panicking code path (kube-controller-manager --> |
Following up on this, it seems that when I start my minikube with I think for now, I'll just plan on running on k8s v1.6.0 in my minikube dev environment, but my previous questions/misgivings about this as a solution remain. Anything else I can do to figure out what the exact nature of the issue is? |
For reference, my minikube version...
... and kubernetes version...
|
Yeah, I'm keeping this open until we figure out exactly whats causing the issue in 1.5.3. Although for those reading it seems like this was fixed in 1.6. I haven't been able to reproduce this yet. Have you been able to reproduce on a vanilla minikube cluster @donspaulding ? If not, then maybe some more information on the types of resources you're running on minikube (TPRs, etc.) |
I'm experiencing this behavior with VirtualBox v5.1.18 and these versions:
Here are my logs: |
@r2d4 I can't say that I've ever experienced this on a "vanilla minikube" because we script the setup of minikube to get our dev clusters in a deploy-ready state, so I'm always running a number of pods even on an idle cluster. Also, I've just recently started experiencing this with version 1.6.0, or maybe not. I get a different traceback now, which is perhaps not surprising, but maybe it's still related? Regarding TPRs, here's the only one I have:
That resource in particular is created upon installation of kube-cert-manager. I'm about to delete/recreate my cluster, and this time I'll see if just deploying k-c-m is enough to trigger the behavior. For reference, here's the logs I'm getting when I hit this issue on
|
I've figured out a way to reproduce this. Basic steps
For reference, here's my vital statistics:
Notice the
Here's the PVC:
It seems as though the problem doesn't show up until the first restart of minikube after the problematic resources have been deployed. I doubt anybody else is experiencing this issue for the same reason that I am (i.e. because they're using the mirusresearch/stable/kube-cert-manager helm chart). Still, it would seem that all it takes is some combination of resources to trip up localkube on initial startup and then you get these same symptoms? |
Hey @donspaulding thanks for the detailed notes. I was able to reproduce this. Taking a deeper look into it now. |
cc @aaron-prindle @dlorenc seems a lot of people have been having this issue |
I'm able to reproduce it with just a
|
I sent a PR to fix this issue upstream |
Reference: kubernetes/kubernetes#44771 Fixes kubernetes#1252 TPRs are incorrectly coupled with the RestMapper right now. The real solution is for TPRs to not register themselves with the RestMapper. This is a short term patch for minikube until the work is done upstream. On start/stop, the namespace controller and the garbage collector controller both call this code and panic since TPRs have registered themselves with enabled versions but have no group metadata.
Reference: kubernetes/kubernetes#44771 Fixes kubernetes#1252 TPRs are incorrectly coupled with the RestMapper right now. The real solution is for TPRs to not register themselves with the RestMapper. This is a short term patch for minikube until the work is done upstream. On start/stop, the namespace controller and the garbage collector controller both call this code and panic since TPRs have registered themselves with enabled versions but have no group metadata.
I'm sorry, but why this issue is closed? It still happens with 0.19.0 running kube 1.6.3. |
Sorry, there was a slight copy paste error with my patch, the fix will be in the next release |
Is this fixed in minikube v0.20.0 with kubernetes v1.6.4? I'm experiencing this issue and seems like it could be the same issue. I had to restart (
I'm on Windows 10 running in hyper-v. Happy to provide any other info if it would help. |
I am also having a "similar" issue - win 10 hyperv 0.20 kube 1.6.4 --- it seems it has something to do with ingress addon for me, I keep geeting localkube service crashed without a trace on minikube vm. I was using helm & draft - now that I disabled ingress it seems to work fine |
I don't think this is related. I believe my issue is caused when dynamic memory is turned on (Hyper-V). If I turn off dynamic memory then I don't seem to have a problem. I noticed the following in the event viewer: The VM automatically restarts after which localkube is not running. This happens at around the 1'45" mark after ** disclaimer - I don't yet use minkube / k8s in anger as I'm still learning how to use it. |
I too am seeing the issue - win 10 hyper-v minikube v0.20 with kubernetes 1.7.0. I disabled the ingress addon and that seems to have fixed the problem. |
I've documented the hyper v dynamic memory issue here: https://github.com/kubernetes/minikube/blob/master/docs/drivers.md#hyperv-driver |
Thank you @DenisBiondic . Disabling dynamic memory on Hyper-V seems to have fixed the problem, as I can now use the ingress addon. I did have an issue where localkube had stopped, but I had shut down my laptop and turned it back on with it plugged into a docking station that has Ethernet. The Primary Virtual Switch was setup to point to the WiFi adapter. |
@dsanders1234 funny thing is, it has nothing to do with ingress or any other addon ... simply the problem is when you have something active going on, hyper-v tends to fail with dynamic memory (because it tries to allocate more). I managed to crash it with draft & helm as well. Perhaps there is a better fix than turning off dynamic memory completely off, but I don't have it at the moment. What is to be noted, though, is that after minikube delete / start the machine in Hyper-V will again be in dynamic memory mode... |
This is a BUG REPORT
Minikube version : v0.17.1
Environment:
What happened:
When running minikube, my node.js application is failing fairly regularly (~ every 15-30min), printing the error:
When I then run, for example,
kubectl get pods
, I get the messageminikube status
prints:In order to get things back up and running, I need to run
minikube start
(which for some reason takes several minutes) — though at this point the networking and name resolution between different services is broken (e.g., nginx can't discover the nodejs app), and the only practical resolution is to restart all of my kubernetes services.What you expected to happen:
Minikube and localkube should persists until they are explicitly stopped.
How to reproduce it (as minimally and precisely as possible):
This is the hardest part — sometimes I get crashes every 5 minutes, sometimes it goes for hours without any problem, and crashes seem to be independent of my development behavior. This is affecting all four developers on our team, who all have fairly similar setups. I've tried downgrading all the way to v0.13 with no luck.
The text was updated successfully, but these errors were encountered: