Issue while connecting to docker daemon from containerized Airflow #19516
Replies: 6 comments 1 reply
-
I have tried a few things through which I will try to summarize my findings:
Making these changes allows the dockeroperator to run with the default docker url = unix://var/run/docker.sock but still doesn't work with a TCP hosted URL. I tried this also on my personal laptop with ubuntu 20.04.3 LTS. |
Beta Was this translation helpful? Give feedback.
-
This is generally a problem with "how do I access the host's docker daemon/socket from within a container" and isn't anything particular to Airflow, so I'm going to close out this issue. |
Beta Was this translation helpful? Give feedback.
-
@ashb isn't TCP hosted url not being accessible an issue? |
Beta Was this translation helpful? Give feedback.
-
@kaustubhharapanahalli Nothing about that is specific to running Airflow in a docker container -- it would apply equally as much if you ran |
Beta Was this translation helpful? Give feedback.
-
I still think this thread deserves a response with leads on what is the recommended way of dealing with deploying a DockerOperator task on Kubernetes. I have understood that you are meant to use the KubernetesPodOperator (https://stackoverflow.com/questions/65497763/how-can-i-run-tasks-using-the-dockeroperator-from-within-a-kubernetes-deployment). But we want our developers to be able to test their tasks locally, so the KubernetesPodOperator is really something we want to avoid because this means asking everyone to setup a minikube for example. In my case I think I'm going to end up writing a Operator wrapper that will use DockerOperator if Airflow is running locally, and KubernetesPodOperator if it's on the deployed version but I would really like some insight on how other people deal with this problem, or if I'm missing a simple tricks allowing to have code working locally and once deployed on k8s. |
Beta Was this translation helpful? Give feedback.
-
This is nothing Airflow specific. Running docker workloads from within Kubernetes POD is a generic Kubernetes question and it has plenty of answers and depending on Your deployment needs, limitations etc. you might choose the solution that is best for you. If you search in google/stack overflow, you will find plenty of tutorials on how to run Docker in Kubernetes. I just googled it and found this https://applatix.com/case-docker-docker-kubernetes-part-2/ - but I can neither endorse or confirm that this is good. There is no "recommended" way from Airflow. Adding it in Airlfow docs or threads, where you have plenty of recommendations and discussions of pros/cons available how to do it for K8S makes very little sense from our side, because there is nothing Airflow-specific there - it adds no value and any recommendation that we can make will simply be a duplication of information existing elsewhere - and additionally the "elsewhere" information get updated and evolves over time, so we would have to keep up and monitor if the recommendations change/when there are new ways/when people find that there are other reasonss. You are managing your deployments and you should choose the way of running docker from kubernetes in the way that fits your expectations, rules, etc. |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow Provider(s)
docker
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==2.3.0
apache-airflow-providers-celery==2.1.0
apache-airflow-providers-cncf-kubernetes==2.0.3
apache-airflow-providers-docker==2.2.0
apache-airflow-providers-elasticsearch==2.0.3
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-google==6.0.0
apache-airflow-providers-grpc==2.0.1
apache-airflow-providers-hashicorp==2.1.1
apache-airflow-providers-http==2.0.1
apache-airflow-providers-imap==2.0.1
apache-airflow-providers-microsoft-azure==3.2.0
apache-airflow-providers-mysql==2.1.1
apache-airflow-providers-odbc==2.0.1
apache-airflow-providers-postgres==2.3.0
apache-airflow-providers-redis==2.0.1
apache-airflow-providers-sendgrid==2.0.1
apache-airflow-providers-sftp==2.1.1
apache-airflow-providers-slack==4.1.0
apache-airflow-providers-sqlite==2.0.1
apache-airflow-providers-ssh==2.2.0
Apache Airflow version
2.2.1 (latest released)
Operating System
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
Deployment
Docker-Compose
Deployment details
What happened
Hello, I was trying to connect to docker daemon from a containerized airflow setup. I saw this one issue: #16803 > looking at this, I tired to migrate my Airflow version to 2.2.1 and set mount_tmp_dir=False
I tried to run this using two approaches.
In the first approach, I got this error:
In the second approach, I got this error:
In both the cases, the error similar saying Error while fetching server API version: HTTPConnectionPool.
I am not able to debug further. If you can guide me on how I can fix this issue, that would be of great help!
Thanks.
What you expected to happen
No response
How to reproduce
Setup a ubuntu OS on an EC2 instance. Create an airflow container using airflow version = 2.2.1
Setup docker and docker-compose
Setup tcp access to docker daemon using following commands:
Create a simple docker operator in a dag and run it.
Anything else
No response
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions