emit more status output during 'Preparing Kubernetes' #12082

drewpca · 2021-07-30T19:59:37Z

I'm testing gcloud beta code dev in cloudshell. A pretty reproducible problem is that minikube emits some status json lines and then is quiet for 90sec. 90sec+ between steps is too long, and gcloud decides MK is hung. I'd rather not raise that timeout, since it's bad UX for the status to be stalled even for 10 sec. Please consider emitting more status lines. Per our early discussions, you might use fractional step numbers if you're reporting a long single step.

I'm attaching the log of a broken run. Because of buffering, the --output=json status lines are at the bottom. However, we write some status-bar ascii to the output, so you can see that the last status display appeared here at 23:01:03:

I0729 23:01:03.956740   95466 command_runner.go:124] > 20.10.7
═══════════════════I0729 23:01:03.964335   95466 cli_runner.go:115] Run: docker network inspect gcloud-local-dev --format "{"Name": "{{.Name}}","Driver": "{{
.Driver}}","Subnet": "{{range .IPAM.Config}}{{.Subnet}}{{end}}","Gateway": "{{range .IPAM.Config}}{{.Gateway}}{{end}}","MTU": {{if (index .Options "com.docke
r.network.driver.mtu")}}{{(index .Options "com.docker.network.driver.mtu")}}{{else}}0{{end}}, "ContainerIPs": [{{range $k,$v := .Containers }}"{{$v.IPv4Addre
ss}}",{{end}}]}"
I0729 23:01:04.041476   95466 ssh_runner.go:149] Run: grep 192.168.49.1 host.minikube.internal$ /etc/hosts

...and then 90 sec later, it's about 23:03:33, and we can see gcloud killing MK:

I0729 23:02:33.867230   95466 command_runner.go:124] ! 2021-07-29 22:47:10.211439 I | etcdserver/api/etcdhttp: /health OK (status code 200)
I0729 23:02:33.867245   95466 command_runner.go:124] ! 2021-07-29 22:47:20.211369 I | etcdserver/api/etcdhttp: /health OK (status code 200)
I0729 23:02:33.867257   95466 command_runner.go:124] ! 2021-07-29 22:47:22.389239 N | pkg/osutil: received terminated signal, shutting down...
I0729 23:02:33.867284   95466 command_runner.go:124] ! WARNING: 2021/07/29 22:47:22 grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
I0729 23:02:33.867301   95466 command_runner.go:124] ! 2021-07-29 22:47:22.401725 I | etcdserver: skipped leadership transfer for single voting member cluster
I0729 23:02:33.874358   95466 logs.go:123] Gathering logs for coredns [466ae792127f] ...
I0729 23:02:33.874419   95466 ssh_runner.go:149] Run: /bin/bash -c "docker logs --tail 400 466ae792127f"
I0729 23:02:33.940799   95466 command_runner.go:124] > .:53
I0729 23:02:33.940839   95466 command_runner.go:124] > [INFO] plugin/reload: Running configuration MD5 = cec3c60eb1cc4909fd4579a8d79ea031
I0729 23:02:33.940847   95466 command_runner.go:124] > CoreDNS-1.7.0
I0729 23:02:33.940858   95466 command_runner.go:124] > linux/amd64, go1.14.4, f59c03d
I0729 23:02:33.940869   95466 command_runner.go:124] > [INFO] SIGTERM: Shutting down servers then terminating
I0729 23:02:33.940880   95466 command_runner.go:124] > [INFO] plugin/health: Going into lameduck mode for 5s
I0729 23:02:33.943494   95466 logs.go:123] Gathering logs for kube-controller-manager [a031b3f623ff] ...
I0729 23:02:33.943553   95466 ssh_runner.go:149] Run: /bin/bash -c "docker logs --tail 400 a031b3f623ff"
══════════════════════════╝
DEBUG: (gcloud.beta.code.dev) No subprocess output for 90.0 seconds

run2.txt

The text was updated successfully, but these errors were encountered:

spowelljr · 2021-07-31T00:03:41Z

@drewpca minikube not reporting a status line for more than 90 seconds would definitely be an issue under normal operations as the entire process itself should finish within 90 seconds.

With that said, from looking at the logs minikube was not in a healthy state, evident by the extreme amount of retries being executed.

Question about this run in particular, was this run after gcloud was cancelled via ctl + c? And if so, gcloud auto runs minikube delete, correct?

drewpca · 2021-08-02T17:30:06Z

Yes there was a ctrl-c (that's part of our repro). I believe we 'minikube stop' in that case.

We 'minikube delete' if you use 'gcloud code clean-up'.

drewpca · 2021-08-17T19:49:25Z

See #12232 for a fix to the cloudshell/cgroups issue.

k8s-triage-robot · 2022-01-04T19:37:32Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-02-03T20:04:17Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

spowelljr added the kind/support Categorizes issue or PR as a support question. label Jul 30, 2021

spowelljr added kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed kind/support Categorizes issue or PR as a support question. labels Oct 6, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 3, 2022

sharifelgamal added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emit more status output during 'Preparing Kubernetes' #12082

emit more status output during 'Preparing Kubernetes' #12082

drewpca commented Jul 30, 2021

spowelljr commented Jul 31, 2021

drewpca commented Aug 2, 2021

drewpca commented Aug 17, 2021

k8s-triage-robot commented Jan 4, 2022

k8s-triage-robot commented Feb 3, 2022

emit more status output during 'Preparing Kubernetes' #12082

emit more status output during 'Preparing Kubernetes' #12082

Comments

drewpca commented Jul 30, 2021

spowelljr commented Jul 31, 2021

drewpca commented Aug 2, 2021

drewpca commented Aug 17, 2021

k8s-triage-robot commented Jan 4, 2022

k8s-triage-robot commented Feb 3, 2022