Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long downtime during restart of multiple containers that are based on the same image #272

Open
Rush opened this issue Apr 7, 2019 · 21 comments · May be fixed by #622
Open

Long downtime during restart of multiple containers that are based on the same image #272

Rush opened this issue Apr 7, 2019 · 21 comments · May be fixed by #622

Comments

@Rush
Copy link

Rush commented Apr 7, 2019

Let's say we have 10 containers based on the same image. Upon update watchtower will:

  • stop and remove all containers
  • re-create all containers

This causes downtime of N * (time to stop and start a container) - where N is the number of containers.

It would be nice if watchtower had an algorithm to:

  • For each contaienr:
    • stop and remove container
    • re-create container.

Is it possible? Is it a planned feature? Is it a known issue?

@SmallFriendlyKiwi
Copy link

I've noticed the same thing and if the above could be implemented, that would be awesome :-)

@simskij
Copy link
Member

simskij commented Apr 7, 2019

Thanks for your issue! This is definitely something we should take a look at. If you feel up for it, feel free to submit a pull request and I'll have a look. 👍

@stale
Copy link

stale bot commented Jun 2, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale label Jun 2, 2019
@Rush
Copy link
Author

Rush commented Jun 2, 2019

Why would a real issue be closed due to inactivity?

@stale stale bot removed the Status: Stale label Jun 2, 2019
@simskij
Copy link
Member

simskij commented Jun 2, 2019

Still trimming in stale-bot. Some false positives remain to be ironed out.

@simskij
Copy link
Member

simskij commented Jun 2, 2019

And just to elaborate a bit on why we do this; I think the stale issues repo explains it all in a very good way:

In an ideal world with infinite resources, there would be no need for this app.

But in any successful software project, there's always more work to do than people to do it. As more and more work piles up, it becomes paralyzing. Just making decisions about what work should and shouldn't get done can exhaust all available resources. In the experience of the maintainers of this app—and the hundreds of other projects and organizations that use it—focusing on issues that are actively affecting humans is an effective method for prioritizing work.

To some, a robot trying to close stale issues may seem inhospitable or offensive to contributors. But the alternative is to disrespect them by setting false expectations and implicitly ignoring their work. This app makes it explicit: if work is not progressing, then it's stale. A comment is all it takes to keep the conversation alive.

With that said, your issue is added to a milestone as this might become an actual problem, and as such - wont be marked as stale.

Thanks for understanding. 🙏

@demyxco
Copy link

demyxco commented Sep 24, 2019

I stopped using watchtower because of this issue.

@smallswan399
Copy link

I am looking for a way that instruct watchtower don't stop all my containers at the same time. This is really a problem! Lets say you have 3 instances behind a load balancer, watchtower will stop them all.

@donce
Copy link

donce commented Dec 15, 2019

As a work-around, you might run multiple watchtower instances, one instance for each container you want to monitor.

@matheuscmpm
Copy link

Is this still an issue? Thinking about implementing watchtower, but with this kind of behavior it won't be good for my scenario. I have more than one hundred containers using the same image in the same server. I really need something more close from what OP said.

@simskij
Copy link
Member

simskij commented Mar 21, 2020

Yes, this is still how it works. However, I'd be more than open to changing this behavior, although it would require some help from the community as I, to be fair, lack time at this point.

@vrajashkr
Copy link

Greetings @simskij !

Is this issue open to be worked on? I'd love to have a go at it if available.

Thank you!

@simskij
Copy link
Member

simskij commented Aug 15, 2020

For sure, go for it! 🙏🏼

@vrajashkr
Copy link

Thank you!

@simskij I ran into some trouble while trying out the application. Should I mention them here or on Gitter?

@simskij
Copy link
Member

simskij commented Aug 15, 2020

Here is better if someone else wants to assist, but Gitter works just as well! 👌

@vrajashkr
Copy link

vrajashkr commented Aug 15, 2020

Awesome!

Here is the issue I ran into:

DEBU[0100] Got image name: altariax0x01/mybuntu:latest  
INFO[0100] Found new altariax0x01/mybuntu:latest image (sha256:77e1d6c5b9c0f022928f1732791ccd12fcb6029baf686b4cfcebafe7dbce6ec7) 
INFO[0100] Stopping /t1 (bbd9ce79fad7737c0fa0c9512d526d286ad38565004dcbfd123adfbed11ff0d6) with SIGTERM 
DEBU[0101] Removing container bbd9ce79fad7737c0fa0c9512d526d286ad38565004dcbfd123adfbed11ff0d6 
2020/08/15 15:46:46 cron: panic running job: runtime error: invalid memory address or nil pointer dereference
goroutine 13 [running]:
github.com/robfig/cron.(*Cron).runWithRecovery.func1(0xc0002c8500)
        /home/ubuntu/go/pkg/mod/github.com/robfig/cron@v0.0.0-20180505203441-b41be1df6967/cron.go:161 +0x9e
panic(0xae3ba0, 0x1021190)
        /home/ubuntu/go/src/runtime/panic.go:969 +0x175
github.com/containrrr/watchtower/pkg/container.Container.runtimeConfig(0x100, 0xc000485d40, 0x0, 0xc000392480)
        /home/ubuntu/watchtower/pkg/container/container.go:169 +0x4e
github.com/containrrr/watchtower/pkg/container.dockerClient.StartContainer(0xc89b40, 0xc00030c700, 0x1, 0x920100, 0xc000485d40, 0x0, 0x1, 0xc000020100, 0xc000485d40, 0x0)
        /home/ubuntu/watchtower/pkg/container/client.go:163 +0x86
github.com/containrrr/watchtower/internal/actions.restartStaleContainer(0x7faf5b8d0100, 0xc000485d40, 0x0, 0xc836e0, 0xc00000ee40, 0xc00002f960, 0x0, 0x2540be400, 0x0)
        /home/ubuntu/watchtower/internal/actions/update.go:121 +0xdd
github.com/containrrr/watchtower/internal/actions.restartContainersInSortedOrder(0xc0003e2420, 0x1, 0x1, 0xc836e0, 0xc00000ee40, 0xc00002f960, 0x0, 0x2540be400, 0x0)
        /home/ubuntu/watchtower/internal/actions/update.go:96 +0x255
github.com/containrrr/watchtower/internal/actions.Update(0xc836e0, 0xc00000ee40, 0xc00002f960, 0x0, 0x2540be400, 0x0, 0x1abab3a6, 0x2000000030001)
        /home/ubuntu/watchtower/internal/actions/update.go:53 +0x369
github.com/containrrr/watchtower/cmd.runUpdatesWithNotifications(0xc00002f960)
        /home/ubuntu/watchtower/cmd/root.go:211 +0xb3
github.com/containrrr/watchtower/cmd.runUpgradesOnSchedule.func1()
        /home/ubuntu/watchtower/cmd/root.go:168 +0xb6
github.com/robfig/cron.FuncJob.Run(0xc000448100)
        /home/ubuntu/go/pkg/mod/github.com/robfig/cron@v0.0.0-20180505203441-b41be1df6967/cron.go:92 +0x25
github.com/robfig/cron.(*Cron).runWithRecovery(0xc0002c8500, 0xc6dde0, 0xc000448100)
        /home/ubuntu/go/pkg/mod/github.com/robfig/cron@v0.0.0-20180505203441-b41be1df6967/cron.go:165 +0x59
created by github.com/robfig/cron.(*Cron).run
        /home/ubuntu/go/pkg/mod/github.com/robfig/cron@v0.0.0-20180505203441-b41be1df6967/cron.go:199 +0x76a
Steps to reproduce:
  1. Clone repo
  2. build watchtower
  3. create test container with test image
  4. start watchtower
  5. update test image
  6. push image to DockerHub
Expected:

The container is stopped and restarted with the new version of the base image.

What actually happened:

The container is stopped, but the program panics while trying to restart the container which fails.

Environment:

Ubuntu 20.04.1 LTS running on an AWS EC2 instance.
Docker server version: 19.03.12
Golang version: go1.15 linux/amd64

Any advice?

Thank you!

@piksel
Copy link
Member

piksel commented Aug 15, 2020

Yeah, this is because of this:
#612

You can base it on that branch to get started, or I will get it merged to master tomorrow!

@vrajashkr
Copy link

@piksel Thank you for the information! I'll get started with that branch to test my changes. I can make a PR for the changes once that branch is merged into master.

@Rush
Copy link
Author

Rush commented Feb 3, 2024

I know it's been a while. :) Likely there has been no progress but it doesn't hurt to ask.

@Codelica
Copy link

This can really be a tough issue. We have cloud hosts were a service container may have 50+ instances so downtime can be verrrry long waiting for all of them to shut down first. We're not a Go shop or we'd jump in, but hopefully someone has the skills. We would absolutely help test.

@matheuscmpm
Copy link

This can really be a tough issue. We have cloud hosts were a service container may have 50+ instances so downtime can be verrrry long waiting for all of them to shut down first. We're not a Go shop or we'd jump in, but hopefully someone has the skills. We would absolutely help test.

We started to use "ouroboros", another container update solution to avoid this same matter. It is working as intended for us. I've not tried watchtower in a couple years - so I don't know if they fixed or changed that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants