Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emit more status output during 'Preparing Kubernetes' #12082

Open
drewpca opened this issue Jul 30, 2021 · 5 comments
Open

emit more status output during 'Preparing Kubernetes' #12082

drewpca opened this issue Jul 30, 2021 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@drewpca
Copy link

drewpca commented Jul 30, 2021

I'm testing gcloud beta code dev in cloudshell. A pretty reproducible problem is that minikube emits some status json lines and then is quiet for 90sec. 90sec+ between steps is too long, and gcloud decides MK is hung. I'd rather not raise that timeout, since it's bad UX for the status to be stalled even for 10 sec. Please consider emitting more status lines. Per our early discussions, you might use fractional step numbers if you're reporting a long single step.

I'm attaching the log of a broken run. Because of buffering, the --output=json status lines are at the bottom. However, we write some status-bar ascii to the output, so you can see that the last status display appeared here at 23:01:03:

I0729 23:01:03.956740   95466 command_runner.go:124] > 20.10.7
═══════════════════I0729 23:01:03.964335   95466 cli_runner.go:115] Run: docker network inspect gcloud-local-dev --format "{"Name": "{{.Name}}","Driver": "{{
.Driver}}","Subnet": "{{range .IPAM.Config}}{{.Subnet}}{{end}}","Gateway": "{{range .IPAM.Config}}{{.Gateway}}{{end}}","MTU": {{if (index .Options "com.docke
r.network.driver.mtu")}}{{(index .Options "com.docker.network.driver.mtu")}}{{else}}0{{end}}, "ContainerIPs": [{{range $k,$v := .Containers }}"{{$v.IPv4Addre
ss}}",{{end}}]}"
I0729 23:01:04.041476   95466 ssh_runner.go:149] Run: grep 192.168.49.1 host.minikube.internal$ /etc/hosts

...and then 90 sec later, it's about 23:03:33, and we can see gcloud killing MK:

I0729 23:02:33.867230   95466 command_runner.go:124] ! 2021-07-29 22:47:10.211439 I | etcdserver/api/etcdhttp: /health OK (status code 200)
I0729 23:02:33.867245   95466 command_runner.go:124] ! 2021-07-29 22:47:20.211369 I | etcdserver/api/etcdhttp: /health OK (status code 200)
I0729 23:02:33.867257   95466 command_runner.go:124] ! 2021-07-29 22:47:22.389239 N | pkg/osutil: received terminated signal, shutting down...
I0729 23:02:33.867284   95466 command_runner.go:124] ! WARNING: 2021/07/29 22:47:22 grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
I0729 23:02:33.867301   95466 command_runner.go:124] ! 2021-07-29 22:47:22.401725 I | etcdserver: skipped leadership transfer for single voting member cluster
I0729 23:02:33.874358   95466 logs.go:123] Gathering logs for coredns [466ae792127f] ...
I0729 23:02:33.874419   95466 ssh_runner.go:149] Run: /bin/bash -c "docker logs --tail 400 466ae792127f"
I0729 23:02:33.940799   95466 command_runner.go:124] > .:53
I0729 23:02:33.940839   95466 command_runner.go:124] > [INFO] plugin/reload: Running configuration MD5 = cec3c60eb1cc4909fd4579a8d79ea031
I0729 23:02:33.940847   95466 command_runner.go:124] > CoreDNS-1.7.0
I0729 23:02:33.940858   95466 command_runner.go:124] > linux/amd64, go1.14.4, f59c03d
I0729 23:02:33.940869   95466 command_runner.go:124] > [INFO] SIGTERM: Shutting down servers then terminating
I0729 23:02:33.940880   95466 command_runner.go:124] > [INFO] plugin/health: Going into lameduck mode for 5s
I0729 23:02:33.943494   95466 logs.go:123] Gathering logs for kube-controller-manager [a031b3f623ff] ...
I0729 23:02:33.943553   95466 ssh_runner.go:149] Run: /bin/bash -c "docker logs --tail 400 a031b3f623ff"
══════════════════════════╝
DEBUG: (gcloud.beta.code.dev) No subprocess output for 90.0 seconds

run2.txt

@spowelljr spowelljr added the kind/support Categorizes issue or PR as a support question. label Jul 30, 2021
@spowelljr
Copy link
Member

@drewpca minikube not reporting a status line for more than 90 seconds would definitely be an issue under normal operations as the entire process itself should finish within 90 seconds.

With that said, from looking at the logs minikube was not in a healthy state, evident by the extreme amount of retries being executed.

Question about this run in particular, was this run after gcloud was cancelled via ctl + c? And if so, gcloud auto runs minikube delete, correct?

@drewpca
Copy link
Author

drewpca commented Aug 2, 2021

Yes there was a ctrl-c (that's part of our repro). I believe we 'minikube stop' in that case.

We 'minikube delete' if you use 'gcloud code clean-up'.

@drewpca
Copy link
Author

drewpca commented Aug 17, 2021

See #12232 for a fix to the cloudshell/cgroups issue.

@spowelljr spowelljr added kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed kind/support Categorizes issue or PR as a support question. labels Oct 6, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 3, 2022
@sharifelgamal sharifelgamal added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

5 participants