Long downtime during restart of multiple containers that are based on the same image #272

Rush · 2019-04-07T02:20:22Z

Let's say we have 10 containers based on the same image. Upon update watchtower will:

stop and remove all containers
re-create all containers

This causes downtime of N * (time to stop and start a container) - where N is the number of containers.

It would be nice if watchtower had an algorithm to:

For each contaienr:
- stop and remove container
- re-create container.

Is it possible? Is it a planned feature? Is it a known issue?

SmallFriendlyKiwi · 2019-04-07T09:55:42Z

I've noticed the same thing and if the above could be implemented, that would be awesome :-)

simskij · 2019-04-07T12:55:20Z

Thanks for your issue! This is definitely something we should take a look at. If you feel up for it, feel free to submit a pull request and I'll have a look. 👍

stale · 2019-06-02T07:43:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Rush · 2019-06-02T08:14:16Z

Why would a real issue be closed due to inactivity?

simskij · 2019-06-02T09:05:02Z

Still trimming in stale-bot. Some false positives remain to be ironed out.

simskij · 2019-06-02T11:28:24Z

And just to elaborate a bit on why we do this; I think the stale issues repo explains it all in a very good way:

In an ideal world with infinite resources, there would be no need for this app.

But in any successful software project, there's always more work to do than people to do it. As more and more work piles up, it becomes paralyzing. Just making decisions about what work should and shouldn't get done can exhaust all available resources. In the experience of the maintainers of this app—and the hundreds of other projects and organizations that use it—focusing on issues that are actively affecting humans is an effective method for prioritizing work.

To some, a robot trying to close stale issues may seem inhospitable or offensive to contributors. But the alternative is to disrespect them by setting false expectations and implicitly ignoring their work. This app makes it explicit: if work is not progressing, then it's stale. A comment is all it takes to keep the conversation alive.

With that said, your issue is added to a milestone as this might become an actual problem, and as such - wont be marked as stale.

Thanks for understanding. 🙏

demyxco · 2019-09-24T09:38:20Z

I stopped using watchtower because of this issue.

smallswan399 · 2019-10-21T13:19:08Z

I am looking for a way that instruct watchtower don't stop all my containers at the same time. This is really a problem! Lets say you have 3 instances behind a load balancer, watchtower will stop them all.

donce · 2019-12-15T13:02:07Z

As a work-around, you might run multiple watchtower instances, one instance for each container you want to monitor.

matheuscmpm · 2020-02-20T17:35:04Z

Is this still an issue? Thinking about implementing watchtower, but with this kind of behavior it won't be good for my scenario. I have more than one hundred containers using the same image in the same server. I really need something more close from what OP said.

simskij · 2020-03-21T13:41:46Z

Yes, this is still how it works. However, I'd be more than open to changing this behavior, although it would require some help from the community as I, to be fair, lack time at this point.

vrajashkr · 2020-08-15T05:45:40Z

Greetings @simskij !

Is this issue open to be worked on? I'd love to have a go at it if available.

Thank you!

simskij · 2020-08-15T06:05:27Z

For sure, go for it! 🙏🏼

vrajashkr · 2020-08-15T08:41:31Z

Thank you!

@simskij I ran into some trouble while trying out the application. Should I mention them here or on Gitter?

simskij · 2020-08-15T08:42:54Z

Here is better if someone else wants to assist, but Gitter works just as well! 👌

vrajashkr · 2020-08-15T15:52:18Z

Awesome!

Here is the issue I ran into:

DEBU[0100] Got image name: altariax0x01/mybuntu:latest  
INFO[0100] Found new altariax0x01/mybuntu:latest image (sha256:77e1d6c5b9c0f022928f1732791ccd12fcb6029baf686b4cfcebafe7dbce6ec7) 
INFO[0100] Stopping /t1 (bbd9ce79fad7737c0fa0c9512d526d286ad38565004dcbfd123adfbed11ff0d6) with SIGTERM 
DEBU[0101] Removing container bbd9ce79fad7737c0fa0c9512d526d286ad38565004dcbfd123adfbed11ff0d6 
2020/08/15 15:46:46 cron: panic running job: runtime error: invalid memory address or nil pointer dereference
goroutine 13 [running]:
github.com/robfig/cron.(*Cron).runWithRecovery.func1(0xc0002c8500)
        /home/ubuntu/go/pkg/mod/github.com/robfig/cron@v0.0.0-20180505203441-b41be1df6967/cron.go:161 +0x9e
panic(0xae3ba0, 0x1021190)
        /home/ubuntu/go/src/runtime/panic.go:969 +0x175
github.com/containrrr/watchtower/pkg/container.Container.runtimeConfig(0x100, 0xc000485d40, 0x0, 0xc000392480)
        /home/ubuntu/watchtower/pkg/container/container.go:169 +0x4e
github.com/containrrr/watchtower/pkg/container.dockerClient.StartContainer(0xc89b40, 0xc00030c700, 0x1, 0x920100, 0xc000485d40, 0x0, 0x1, 0xc000020100, 0xc000485d40, 0x0)
        /home/ubuntu/watchtower/pkg/container/client.go:163 +0x86
github.com/containrrr/watchtower/internal/actions.restartStaleContainer(0x7faf5b8d0100, 0xc000485d40, 0x0, 0xc836e0, 0xc00000ee40, 0xc00002f960, 0x0, 0x2540be400, 0x0)
        /home/ubuntu/watchtower/internal/actions/update.go:121 +0xdd
github.com/containrrr/watchtower/internal/actions.restartContainersInSortedOrder(0xc0003e2420, 0x1, 0x1, 0xc836e0, 0xc00000ee40, 0xc00002f960, 0x0, 0x2540be400, 0x0)
        /home/ubuntu/watchtower/internal/actions/update.go:96 +0x255
github.com/containrrr/watchtower/internal/actions.Update(0xc836e0, 0xc00000ee40, 0xc00002f960, 0x0, 0x2540be400, 0x0, 0x1abab3a6, 0x2000000030001)
        /home/ubuntu/watchtower/internal/actions/update.go:53 +0x369
github.com/containrrr/watchtower/cmd.runUpdatesWithNotifications(0xc00002f960)
        /home/ubuntu/watchtower/cmd/root.go:211 +0xb3
github.com/containrrr/watchtower/cmd.runUpgradesOnSchedule.func1()
        /home/ubuntu/watchtower/cmd/root.go:168 +0xb6
github.com/robfig/cron.FuncJob.Run(0xc000448100)
        /home/ubuntu/go/pkg/mod/github.com/robfig/cron@v0.0.0-20180505203441-b41be1df6967/cron.go:92 +0x25
github.com/robfig/cron.(*Cron).runWithRecovery(0xc0002c8500, 0xc6dde0, 0xc000448100)
        /home/ubuntu/go/pkg/mod/github.com/robfig/cron@v0.0.0-20180505203441-b41be1df6967/cron.go:165 +0x59
created by github.com/robfig/cron.(*Cron).run
        /home/ubuntu/go/pkg/mod/github.com/robfig/cron@v0.0.0-20180505203441-b41be1df6967/cron.go:199 +0x76a

Steps to reproduce:

Clone repo
build watchtower
create test container with test image
start watchtower
update test image
push image to DockerHub

Expected:

The container is stopped and restarted with the new version of the base image.

What actually happened:

The container is stopped, but the program panics while trying to restart the container which fails.

Environment:

Ubuntu 20.04.1 LTS running on an AWS EC2 instance.
Docker server version: 19.03.12
Golang version: go1.15 linux/amd64

Any advice?

Thank you!

piksel · 2020-08-15T22:54:26Z

Yeah, this is because of this:
#612

You can base it on that branch to get started, or I will get it merged to master tomorrow!

vrajashkr · 2020-08-16T05:18:29Z

@piksel Thank you for the information! I'll get started with that branch to test my changes. I can make a PR for the changes once that branch is merged into master.

Rush · 2024-02-03T06:16:26Z

I know it's been a while. :) Likely there has been no progress but it doesn't hurt to ask.

Codelica · 2024-06-26T17:42:34Z

This can really be a tough issue. We have cloud hosts were a service container may have 50+ instances so downtime can be verrrry long waiting for all of them to shut down first. We're not a Go shop or we'd jump in, but hopefully someone has the skills. We would absolutely help test.

matheuscmpm · 2024-06-26T18:18:25Z

This can really be a tough issue. We have cloud hosts were a service container may have 50+ instances so downtime can be verrrry long waiting for all of them to shut down first. We're not a Go shop or we'd jump in, but hopefully someone has the skills. We would absolutely help test.

We started to use "ouroboros", another container update solution to avoid this same matter. It is working as intended for us. I've not tried watchtower in a couple years - so I don't know if they fixed or changed that.

simskij added the Type: Enhancement label Apr 7, 2019

simskij added Priority: Medium Status: Available labels Apr 15, 2019

simskij added this to the v0.4.1 milestone May 12, 2019

stale bot added the Status: Stale label Jun 2, 2019

stale bot removed the Status: Stale label Jun 2, 2019

simskij added the Status: Help Needed label Mar 21, 2020

piksel added Status: In Progress and removed Status: Available Status: Help Needed labels Aug 16, 2020

vrajashkr linked a pull request Aug 17, 2020 that will close this issue

Fix container downtime #622

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long downtime during restart of multiple containers that are based on the same image #272

Long downtime during restart of multiple containers that are based on the same image #272

Rush commented Apr 7, 2019 •

edited

Loading

SmallFriendlyKiwi commented Apr 7, 2019

simskij commented Apr 7, 2019

stale bot commented Jun 2, 2019

Rush commented Jun 2, 2019

simskij commented Jun 2, 2019

simskij commented Jun 2, 2019 •

edited

Loading

demyxco commented Sep 24, 2019

smallswan399 commented Oct 21, 2019

donce commented Dec 15, 2019

matheuscmpm commented Feb 20, 2020

simskij commented Mar 21, 2020

vrajashkr commented Aug 15, 2020

simskij commented Aug 15, 2020

vrajashkr commented Aug 15, 2020

simskij commented Aug 15, 2020

vrajashkr commented Aug 15, 2020 •

edited

Loading

piksel commented Aug 15, 2020

vrajashkr commented Aug 16, 2020

Rush commented Feb 3, 2024

Codelica commented Jun 26, 2024

matheuscmpm commented Jun 26, 2024

Long downtime during restart of multiple containers that are based on the same image #272

Long downtime during restart of multiple containers that are based on the same image #272

Comments

Rush commented Apr 7, 2019 • edited Loading

SmallFriendlyKiwi commented Apr 7, 2019

simskij commented Apr 7, 2019

stale bot commented Jun 2, 2019

Rush commented Jun 2, 2019

simskij commented Jun 2, 2019

simskij commented Jun 2, 2019 • edited Loading

demyxco commented Sep 24, 2019

smallswan399 commented Oct 21, 2019

donce commented Dec 15, 2019

matheuscmpm commented Feb 20, 2020

simskij commented Mar 21, 2020

vrajashkr commented Aug 15, 2020

simskij commented Aug 15, 2020

vrajashkr commented Aug 15, 2020

simskij commented Aug 15, 2020

vrajashkr commented Aug 15, 2020 • edited Loading

Steps to reproduce:

Expected:

What actually happened:

Environment:

piksel commented Aug 15, 2020

vrajashkr commented Aug 16, 2020

Rush commented Feb 3, 2024

Codelica commented Jun 26, 2024

matheuscmpm commented Jun 26, 2024

Rush commented Apr 7, 2019 •

edited

Loading

simskij commented Jun 2, 2019 •

edited

Loading

vrajashkr commented Aug 15, 2020 •

edited

Loading