-
Notifications
You must be signed in to change notification settings - Fork 2.5k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker Healthcheck support on Portainer Container #3572
Comments
This is indeed a very useful suggestion. I also have been thinking on how to do this since some time. Please find a couple of comments from my own experience. First, I wouldn't advise on using Instead, I propose to implement a simple healthcheck routine in the Portainer binary itself that can then be used by Docker during healthchecks. In this case, Portainer can dial to itself requesting a status update and return the appropriate result and exit level if HTTP code is 2XX or non 2XX. Luckily, Portainer already implements a status API endpoint that can be leveraged for this proposal. Therefore we just need to implement a simple flag, e.g. For example:
With the above in place, then healthchecks can be enabled in a Portainer stack with the following: healthcheck:
test: ['CMD', 'portainer', '--healthcheck'] For reference, this is how the Kong API Gateway does healthcheck, i.e. Moreover, this same approach can also be implemented for the Portainer Agent binary. @itsconquest if you and the Portainer team agree on this idea, I can work on it relatively quick as it doesn't involve working with UI elements and I can easily test on my side. |
@ElleshaHackett @hhromic This would indeed be a nice way to go. |
@Ornias1993 no I have not started working on this :) @deviantony @itsconquest now that I've got more familiar with the Portainer codebase, perhaps I can code a prototype and submit as a PR for review? |
@hhromic Ahh, okey... Happens the best of us :) I read through most of the previous discussions about it. I think the fastest way of getting feedback is throwing in a prototype and work from there indeed. 👍 |
Alright then, I'll put a prototype together this week and see how it goes ! |
Sounds like a good idea! I look forward to reviewing your work @hhromic :) |
Could be good also to have control over the healthcheck of the image or even disable the healthcheck according to https://docs.docker.com/engine/reference/run/#healthcheck |
@rhuanbarreto You can always overrule it in docker. So thats a given. |
Yes. But is it possible to do it in portainer? |
Thats not the scope of this issue, there is another issue for handling healthchecks inside portainer though. |
Actually this was already implemented way before this issue... And got reverted just because it isn't compatible with the --ssl flag (which makes it unsuitable to add to the dockerfile). |
Hey guys, Just stumbled across this, was there any movement on the --healthcheck? I understand there were a few issues with the previous solution Thanks! |
Maintainers are not interested it seems. |
Would really like this feature also, it's a little odd that a platform designed for managing and monitoring your docker containers doesn't include the option to monitor itself. 🤷♂️ |
@hhromic was there any updates your end? |
@modem7 , all, |
Sorry for the silence on that one, we're interested in that feature it's just that we have a lot of stuff to deal with as well. We've been giving it more thoughts and we're thinking about bringing support for this feature along #821, this should work around the potential issue we had so far with HTTP/HTTPS and the healthcheck. We have #821 in our backlog at the moment and we'll start thinking about this one based on the existing implementations that have been provided by contributors. |
For those that are just using a pure docker run \
-d \
--name portainer \
--restart always \
--health-cmd='wget --no-verbose --tries=1 --spider http://localhost:9000 || exit 1' \
--health-interval=60s \
--health-retries=3 \
--health-timeout=5s \
--health-start-period=20s \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /path/to/docker/portainer/data:/data \
-v /path/to/docker/portainer/ssl:/ssl \
portainer/portainer-ce:alpine \
--bind-https ":443" \
--sslcert /ssl/portainer.crt \
--sslkey /ssl/portainer.key Where --health-cmd='wget --no-verbose --tries=1 --spider http://localhost:9000 || exit 1' \
--health-interval=60s \
--health-retries=3 \
--health-timeout=5s \
--health-start-period=20s \ Are the major health check configurations. |
That doesn't seem to work since there is no shell or wget in the container, as far as I can tell:
|
It does, make sure you're using the |
When I use the Also using If you are running Currently when the healtcheck is enabled for an agent in Ideally this is an issue with the agent that Portainer should fix so the Healthchecks can be enabled on the agents. If you running on a rapsberry pi or a congested/busy swarm, the Portainer UI - using
Portainer Agents - using
No need for "too much" hackery! :)
|
@t0mtaylor image names seems to be reversed... Please double check so I can test asap :-) |
this definitely does not work for me |
Can you post your compose file so we can see what you're trying to do? As wget etc 100% works on the alpine images. |
|
When I run the command in the container, I get that it is open: localhost (127.0.0.1:9001) open But for some reason setting it as a healthcheck makes the container not connectable. |
|
He is saying that you have the nc command for the Portainer UI and the wget command for your agent... In your example. |
@sgtcoder see the updated comment - #3572 (comment), i've added Just make sure your using the latest docker compose version, im using Also added screenshot of it working to the main comment too 🕺 |
Did that work for you @sgtcoder with the |
@t0mtaylor I just booted my computer and ssh'ing and checking now. Thank you for the updates. I will let you know. |
It's strange because I am still getting "Environment is unreachable."
I know the command works in the container itself. It's literally no matter what healthcheck I put on the portainer agent, it becomes unreachable |
Only difference with mine, is I have a separate network for the agents (which is defined for both ui and agents, then a seperate network for the ui only which is accessible via the load balancer), but you are also missing this set below your image declaration on the agent service:
|
Thank you for that information. I will dig deeper. I did try the environment and still same issue. Definitely strange. And I never saw that environment line in the code sample portainer provided us since it's also ran in the command section. Per Portainer Swarm Setup
|
|
AGENT_CLUSTER_ADDR: localhost This seemed to work. for some reason it doesn't let the DNS work properly in healthcheck |
After a while, i had this issue on the agents - i think the agents got restarted but then couldnt start due to a dns problem
And the UI reported this
I tried something similar but it doesn't work in a docker swarm, although for single node swarm or services it should be ok What im looking at now is how to trigger all the portainer containers to restart if one of the agent fails the healthcheck, maybe with a seperate docker container monitoring them - or updating the healthcheck to trigger the parent docker host to relaunch the containers. FYI - I've also updated the comment with a healthcheck api call so you know its up and running for the UI
|
With (Portainer still needs a proper |
@lonix1 i prefer to call It pretty much doing what a healthcheck endpoint is doing, just giving more info about the status 🚀 |
@t0mtaylor I didn't consider the log. Good idea. The response is this:
So to be complete, in a script, I'd do something like this: [ $(wget --quiet -O- --tries=1 http://localhost:9000/api/system/status | sed -nE 's/.*Version":"([^"]*)".*/\1/p' | wc -l) = 1 ] \
&& echo up || echo down That not only checks that the page exists, but that it is returning expected data. I've extracted the However in a compose file, I'd do something simpler: healthcheck:
# ...
test: wget --no-verbose --tries=1 --spider http://localhost:9000/api/system/status || exit 1 |
@lonix1 Yea i would keep it simple for the healthceck as its giving you enough to determine its healthy I do something similar checking the version in a bash script which checks services are running every 5 mins and also check how many containers are running per service, as docker can still be a bit flakey and services vanish from the swarm! I've updated the main comment #3572 (comment) as theres an issue with the healthcheck for agents when running in swarm mode - but running single node on a rapsberry pi for example both healthchecks for UI and Agents work, as @sgtcoder has confirmed on his setup 👍 |
Thank you guys for all the updates. I applied a bunch of the suggestions. I still had to use localhost on single swarm node, but it seems to work aside from the TLS handshake log errors. I had issues in general with using more than one docker node swarm with trying to replicate storage with both performance issues and overhead, so I just stick with one node for now. Start period of 5 seconds seems to be fine for me. Running on a dedicated HPe DL380 Gen9 server with the docker VM configured with 32GB RAM and 32vCPU. Here is what I have now
|
@sgtcoder try the wget for the agent healthcheck and that will remove the tls handshake errors :)
These healthchecks work As a workaround, I have a separate bash script checking with docker that the agent containers are up and running on each server, and ive exposed port 9001 so i can wget that also on each server - not ideal but a way forward until @tamarahenson and team improve the agent - ideally they add a http |
I tried the wget again, but for whatever reason, that causes the check to fail, whereas the nc command works. |
@sgtcoder Have you tried the wget via sh in the container whilst the agent is running? whats the output? does it have an error?
with returned the shell ready to use on the agent container
my output is this - its an error 400 but thats good as it hit the agent on port 9001:
|
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Describe the feature
Being able to see a "health status" of the Portainer Docker container.
Describe the solution you'd like
I would like support for the Docker Healthcheck (that is also shown in Portainer.io 's own dashboard and probably other Docker management software).
Describe alternatives you've considered
Alternative is setting up something similarly without the use of the already existing tools within Docker.
Additional context
The
Dockerfile
could contain something like this:HEALTHCHECK --interval=60s --timeout=10s --retries=3 CMD curl -sS http://localhost:9000 || exit 1
.For debugging and testing purposses you can use:
docker inspect --format "{{json .State.Health}}" containername
The text was updated successfully, but these errors were encountered: