Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Update Containers - Timeout #361

Closed
IwanSE opened this issue Aug 6, 2019 · 14 comments
Closed

Unable to Update Containers - Timeout #361

IwanSE opened this issue Aug 6, 2019 · 14 comments

Comments

@IwanSE
Copy link

IwanSE commented Aug 6, 2019

Hi, I have the same problems with #192
We use Nexus OSS 3.17.0-01

time="2019-08-06T04:15:04Z" level=debug msg="Error pulling image 172.22.130.59:8083/docker-repository/frontend:1.0.1-SNAPSHOT, Error response from daemon: Get http://172.22.130.59:8083/v2/docker-repository/frontend/manifests/1.0.1-SNAPSHOT: dial tcp 172.22.13
0.59:8083: i/o timeout"
time="2019-08-06T04:15:04Z" level=info msg="Unable to update container /frontend. Proceeding to next."
time="2019-08-06T04:15:04Z" level=debug msg="Error response from daemon: Get http://172.22.130.59:8083/v2/docker-repository/frontend/manifests/1.0.1-SNAPSHOT: dial tcp 172.22.130.59:8083: i/o timeout"
time="2019-08-06T04:15:34Z" level=debug msg="Got image name: 172.22.130.59:8083/docker-repository/master-data:1.0.1-SNAPSHOT"
time="2019-08-06T04:16:04Z" level=debug msg="Error pulling image 172.22.130.59:8083/docker-repository/master-data:1.0.1-SNAPSHOT, Error response from daemon: Get http://172.22.130.59:8083/v2/docker-repository/master-data/manifests/1.0.1-SNAPSHOT: dial tcp 172
time="2019-08-06T04:35:49Z" level=debug msg="Error pulling image 172.22.130.59:8083/ts-docker-repository/ts-auth-service:1.0.1-SNAPSHOT, Error response from daemon: Get http://172.22.130.59:8083/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
.22.130.59:8083: i/o timeout"

My Dockerfile and docker-compose.yml

FROM containrrr/watchtower

ARG webhook

ENV WATCHTOWER_NOTIFICATIONS=slack
ENV WATCHTOWER_NOTIFICATION_SLACK_HOOK_URL=${webhook}
ENV WATCHTOWER_NOTIFICATION_SLACK_IDENTIFIER=watchtower
ENV WATCHTOWER_NOTIFICATION_SLACK_CHANNEL=#my-channel

ADD config.json /
  watchtower:
    image: 172.22.130.59:8083/docker-repository/watchtower
    container_name: watchtower
    networks:
      - net
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    command: frontend master-data --cleanup --interval 60

Changed timeout to 180 in Nexus
This did not solve the problem
time="2019-08-06T04: 15: 04Z "level = info msg =" Unable to update container /frontend. Proceeding to next."
Any ideas to solve this problem? Many messages come to Slack:

watchtowerAPP 9:23 AM
Unable to update container / api-gateway. Proceeding to next.
Unable to update container / authentication-service. Proceeding to next.
Unable to update container / frontend. Proceeding to next.
@welcome
Copy link

welcome bot commented Aug 6, 2019

Hi there!
Thanks a bunch for opening your first issue! 🙏 As you're new to this repo, we'd like to suggest that you read our code of conduct

@unstephenk
Copy link

unstephenk commented Aug 7, 2019

I am also seeing this error. I do believe watchtower also updated recently. This should be high priority. What broke?

@simskij
Copy link
Member

simskij commented Aug 7, 2019

@iwanhsky thank you for your report! if you run watchtower directly without using your own Dockerfile - do you still get the same error? That is, using containrrr/watchtower:latest instead.

I am also seeing this error. I do believe watchtower also updated recently. This should be high priority. What broke?

The last release was July 2nd. Have you been experiencing this since then? Any more details?

@unstephenk
Copy link

unstephenk commented Aug 7, 2019

@simskij I am not currently using a Dockerfile to run WT. I am using the docker run command in the docs with a restart always tag.

@simskij
Copy link
Member

simskij commented Aug 7, 2019

@unstephenk yeah, that part was not aimed towards you. Still cueious when it started happening for you though

@unstephenk
Copy link

Started over the weekend when no one was in office at 2am. It had been running great for 5 days prior. All of a sudden, got a ton of messages on slack for each container saying Unable to update container /watchtower. Proceeding to next so there was still an internet connection present.

@simskij
Copy link
Member

simskij commented Aug 7, 2019

I've just run through a couple of complete runs, both with and without having any updates available and sadly, I can't provoke any error messages to appear. Everything just seems to work. The config I'm using in my local lab:

watchtower:
    <<: *keep-up
    container_name: watchtower
    image: index.docker.io/containrrr/watchtower
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /srv/watchtower/config.json:/config.json
    command: --interval 30 --label-enable --cleanup
    environment:
      - WATCHTOWER_NOTIFICATIONS=slack
      - WATCHTOWER_NOTIFICATION_SLACK_HOOK_URL=https://hooks.slack.com/services/x/y/z
      - WATCHTOWER_NOTIFICATION_SLACK_IDENTIFIER=watchtower

where keep-up is:

  x-common: &keep-up
    restart: always
    labels:
      com.centurylinklabs.watchtower.enable: "true"

One thing I come to think of is that the image pulls are actually performed by the docker daemon rather than by watchtower, while the notifications are being sent directly from watchtower. Are you able to pull images directly from your docker host?

A restart of the host could be a way to resolve this. I understand though if a restart isn't feasible if you run watchtower in a server environment. Also, any other changes or updates to the actual docker host?

One thing to note is that Docker Engine 19.03.1 was released little more than a week ago. One of the known issues of that release (source here) is described like this:

Traffic cannot egress the HOST because of missing Iptables rules in the FORWARD chain The missing rules are :

/sbin/iptables --wait -C FORWARD -o docker_gwbridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

/sbin/iptables --wait -C FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

Workaround: Add these rules back using a script and cron definitions. The script must contain ‘-C’ commands to check for the presence of a rule and ‘-A’ commands to add rules back. Run the script on a cron in regular intervals, for example, every minutes.

And I'm still on 18.09.6, which might explain why it works flawlessly for me.

@IwanSE
Copy link
Author

IwanSE commented Aug 8, 2019

thank you for your report! if you run watchtower directly without using your own Dockerfile - do you still get the same error? That is, using containrrr/watchtower:latest instead.

@simskij
Yes used:

containrrr/watchtower:latest 84803293c0e3 5 weeks ago 14.5MB

The problem remains.
It helps to completely recreate the container, but doing it every time is not practical.

time="2019-08-08T07:16:07Z" level=debug msg="Got image name: 172.22.130.59:8083/docker-repository/frontend:1.0.1"
time="2019-08-08T07:16:22Z" level=debug msg="Error pulling image 172.22.130.59:8083/docker-repository/frontend:1.0.1, Error response from daemon: Get http://172.22.130.59:8083/v2/docker-repository/frontend/manifests/1.0.1: Get http://172.22.
130.59:8083/v2/token?account=admin&scope=repository%3Adocker-repository%2Ffrontend%3Apull&service=http%3A%2F%2F172.22.130.59%3A8083%2Fv2%2Ftoken: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
time="2019-08-08T07:16:22Z" level=info msg="Unable to update container /frontend. Proceeding to next."
time="2019-08-08T07:16:22Z" level=debug msg="Error response from daemon: Get http://172.22.130.59:8083/v2/docker-repository/frontend/manifests/1.0.1: Get http://172.22.130.59:8083/v2/token?account=admin&scope=repository%3Adocker-repository%2Ffrontend
%3Apull&service=http%3A%2F%2F172.22.130.59%3A8083%2Fv2%2Ftoken: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"

I have to use my own Dockerfile and copy rights through config.json
Used Windows for Docker Linux containers
make volum for config.json on Windows fails - config.json is mounted as a folder
Docker info

Containers: 7
 Running: 7
 Paused: 0
 Stopped: 0
Images: 7
Server Version: 18.09.2
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.125-linuxkit
Operating System: Docker for Windows
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.63GiB
Name: linuxkit-00155d823a0b
ID: URTU:B4FJ:OUQL:3I44:TPJO:CYKD:6U2I:LJZG:WVNY:DZGW:RERU:X6PP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 81
 Goroutines: 105
 System Time: 2019-08-08T07:23:27.3834919Z
 EventsListeners: 1
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 172.22.130.59:8083
 127.0.0.0/8
Registry Mirrors:
 http://172.22.130.59:8083/
Live Restore Enabled: false
Product License: Community Engine

Docker version:

Client: Docker Engine - Community
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        6247962
 Built:             Sun Feb 10 04:12:31 2019
 OS/Arch:           windows/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 04:13:06 2019
  OS/Arch:          linux/amd64
  Experimental:     true

@IwanSE
Copy link
Author

IwanSE commented Aug 21, 2019

Any ideas?

@simskij
Copy link
Member

simskij commented Aug 24, 2019

I have very limited knowledge of Docker for Windows. If I remember correctly it has a lot of weird quirks that are yet to be ironed out. Let me have a look next time I have access to a Windows computer and get back to you. Thanks :)

@christoph-kluge
Copy link

This is what I have discovered so far: I have couple of raspberry pi's running in total different geo locations with a short update interval of 30s.

Today I saw a recurring pattern which seems to be related to the registry (Hub, ECR and GitLab Registry). Almost all of pi's did report Unable to update container {...}. Proceeding to next. at roughly the same time for the same container.

My idea was to change the above severity down to debug and add some exponential backoff threshold after an timeout before reporting this. When this threshold is reached then we should report this as as info or higher. I was thinking of something like --timeout-threshold 3600/env WATCHTOWER_TIMEOUT_THRESHOLD=3600.

Let's imagine something like --interval 30 --timeout-threshold 3600. So at a certain point the registry is down. We check 30, 60, 120, 240, 480, 960, 1890 and after the 7th update (>3600) it will report an info/warn/error that the container was not able to update.

Should report such a feature request? Does it make sense? Or do you think this is something what belongs to a logging/monitoring configuration?

@simskij
Copy link
Member

simskij commented Apr 27, 2020

@christoph-kluge
It certainly makes sense! Would be great if you wanted to provide either a feature- or pull request implementing said feature. Thanks! 🙏🏼

@simskij
Copy link
Member

simskij commented Apr 27, 2020

Other than that, I think we can safely arrive at the conclusion that this happens because watchtower is unable to contact the registry. This might be caused by:

  1. The registry being down/unavailable out of some reason. The fact that you've been experiencing this during weekends signal that it could be during a dockerhub maintenance window.

  2. Local networking issues, which does not seem to be the case for most of you as watchtower still has been able to send notifications to slack or similar.

If you are experiencing an issue you think does not fall into these two categories, feel free to open a new issue. Closing this.

@simskij simskij closed this as completed Apr 27, 2020
@ghost
Copy link

ghost commented Apr 27, 2020

To avoid important communication to get lost in a closed issues no one monitors, I'll go ahead and lock this issue. If you want to continue the discussion, please open a new issue. Thank you! 🙏🏼

@ghost ghost locked and limited conversation to collaborators Apr 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants