Great vid! How can I remove properly it's postgresql service and volume? I am trying to just compose up a airflow service then hook it to my postgresql container, I kept getting error upon composing
Using CeleryExecutor you have the possibility to scale up with more workers. But If you are running it on a single machine, there is not much difference as LocalExecutor.
Can i ask why in the step install airflow in airflow_tutorial, I can open the web UI, but to the vid airflow_docker i cannot open the web UI although i have done exactly as you instructed. Please give me some helps, i have been stucking with that for 2 days
Thanks for the video. It can't get clearer than this. I was wondering: What if I decide not to edit the docker-compose.yaml file? Does it really matter?
Hi coder2j, just want to ask if there is any big difference in running airflow on Kubernetes and on Docker. I know that Kubernetes can auto reallocate resources to other Pods when some other Pods re done. Would the airflow on docker do the same? Thank you so much!
They are different. Airflow on docker means running airflow in docker container runtime. Kubernetes is a tool to orchestrate container bases application running on a cluster of servers. Running airflow in (docker) container doesn't mean it has auto scale out of box, but it is a prerequisite for tools like kubernetes to manage it in scale.
@@coder2j Thank you so much for your reply! In practice, do we commonly used Kubernetes to manage the airflow in docker? I found it is fairly complicated to do that even we use the Helm chart.😅
It depends on the way you use airflow. If you outsource the heavy computation, like to spark cluster. Airflow is only doing the basic scheduling and management jobs, which don't need a lot of resources. Otherwise, you need to scale the airflow either using CeleryExecutor or kubernetes.
If we use local executor, all the airflow jobs run in the scheduler container. Workers are needed if you use distributed setup, like celery executor for example.
If you are running with source code from GitHub repo, make sure check this commit. github.com/coder2j/airflow-docker/commit/576fb2f78549c62d554e1675af0045956f7f0d69
What about your postgres container? Does it also keep restarting? Check and compare your docker-compose.yaml file with this github.com/coder2j/airflow-docker/commit/576fb2f78549c62d554e1675af0045956f7f0d69
I am trying to add a new dag in the dags folder, but I am getting "Import airflow could not be resolved" error in my vscode. Whats the best way to fix this? Thanks in advance.
If you are running airflow in docker, your airflow package dependency is installed in docker container, which is not visible to VSCode. Therefore, you can either ignore the error or create a python environment in VSCode, install airflow and tell VSCode the path of your Python environment. The issue could be resolved!
The easiest way to do that is to extend the apache airflow official docker image. So basically you create a Dockerfile as following: FROM apache/airflow:2.0.1 COPY requirements.txt /requirements.txt RUN pip install --user --upgrade pip RUN pip install --no-cache-dir --user -r /requirements.txt You will have to create a requirements.txt file in the same directory as the Dockerfile which will be copied into the image and installed. Then you use docker build command to build the extended image: docker build . --tag my_airflow:latest After that, you need to replace the airflow docker image name from the official image to your extended image my_airflow:latest in the docker-compose.yaml file. That's it, the rest steps will be the same, you call the docker-compose up airflow-init and docker-compose up to launch the airflow webserver and scheduler.
@@coder2j Yes; I figured out that on same day just after posting my comments :-) . We have airflow 1.x setups in our project having everything in "requirements.txt" which executes by "entrypoint.sh" during container initialization (refer to 1.x git and entrypoint.sh); and we were struggling to add that way in 2.x poc environment. Later we have found all these details in 2.x git (refer to Dockerfile of 2.x)... but thanks for replying. Looking forward to see more videos like task chaining, dag chaining, dynamic task creations on the fly to leverage multi-processing in parallel. I am reading those from 2.x documents, but good to have those in videos. Thanks again.
Theoretically you can run it locally to follow the tutorial, but It is recommended to install docker as the following videos are running airflow in docker.
i've been following your guidance, but when i'm about to test run dag manually, it always running but never finished... when i see .log file, it's some kinda looping... do you know why? thanks for the reply
It's hard to tell where exactly went wrong with the info provided. I think you can try to check your dag implementation. It might have some loop logic that never stops.
Airflow installing on docker gives message to upgrade airflow db. But when I try airflow db upgrade it get error: airflow command not found. Please help
@@coder2j I was able to start airflow webserver, I had to enable my permissions to my path. But, now I ran into different error: "latest-test-repo-airflow-webserver-1 | error: option --workers not recognized" and "latest-test-repo-airflow-scheduler-1 | error: invalid command 'scheduler'". Also, please can you let me know how and where can I share docker-compose yaml file with you.
My airflow environment was so slow, after this run like a charm, thank you!
I think airflow installation is getting complicated nowadays for a developer with sound knowledge of infra , this video made my day
Very clear and helpful tutorial so far, really appreciate it!
Thank you!
Thanks man, you saved my life. Love from india
thank you so much sir, finally i got airflow installed well
You are welcome. 🤗
Thankyou so much for the amazing explanation.
Thank you!
i like the way you say "excutable"
Hello coder2j...
Thanks for the clear explanation, I'm going to try this at home tonight. Gotta learn fast.
Looking forward for more content! ^^
Have fun! :-)
Great tutorial. amazing explanation. thank u so much
You are welcome.
Came to learn airflow, stayed for boom 💥
🙌🙌
This is soo amazing the best tutorial by far!! Thank you so very much!!!! amazing!!
Glad it helped!
hola coder2j, estuvo super! muchas gracias
Hey and thanks for the tutorial! It is great! It also would be nice to see the terminal commands that you use in the videos. :)
Do you mean certain terminal commands are not visible in the video or you suggest having them in the video description?
Great content. Pls keep doing it
Great vid! How can I remove properly it's postgresql service and volume? I am trying to just compose up a airflow service then hook it to my postgresql container, I kept getting error upon composing
You can remove the postgres definition in the docker compose yaml file.
Need this same to install kafka, could be possible a tutorial, thanks a lot
How can we setup this for multiple environments like Dev, Prod can you please guide us through?
You can use the same docker compose config and deploy them to different virtual machines or ec2 for staging and production environments
THAT WAS GREAT! SUBBED!
docker desktop is stuck on "starting..." I've tried pretty much everything suggested on stack to fix it(wsl --update). Any ideas? I'm on windows 10
very helpful, thanks
You are welcome!
If you want to keep it running on CeleryExecutor what is the difference in effect between that and LocalExector?
Using CeleryExecutor you have the possibility to scale up with more workers. But If you are running it on a single machine, there is not much difference as LocalExecutor.
@@coder2j Ok, thanks very much!
Can i ask why in the step install airflow in airflow_tutorial, I can open the web UI, but to the vid airflow_docker i cannot open the web UI although i have done exactly as you instructed. Please give me some helps, i have been stucking with that for 2 days
Please check the log in airflow webserver and see what is the error.
Thanks for the video. It can't get clearer than this. I was wondering: What if I decide not to edit the docker-compose.yaml file? Does it really matter?
The only difference is that you will be use CeleryExecutor instead of LocalExecutor if you don't change anything.
thanks for sharing
Thanks for watching!
Hi coder2j, just want to ask if there is any big difference in running airflow on Kubernetes and on Docker. I know that Kubernetes can auto reallocate resources to other Pods when some other Pods re done. Would the airflow on docker do the same? Thank you so much!
They are different. Airflow on docker means running airflow in docker container runtime. Kubernetes is a tool to orchestrate container bases application running on a cluster of servers. Running airflow in (docker) container doesn't mean it has auto scale out of box, but it is a prerequisite for tools like kubernetes to manage it in scale.
@@coder2j Thank you so much for your reply!
In practice, do we commonly used Kubernetes to manage the airflow in docker? I found it is fairly complicated to do that even we use the Helm chart.😅
It depends on the way you use airflow. If you outsource the heavy computation, like to spark cluster. Airflow is only doing the basic scheduling and management jobs, which don't need a lot of resources. Otherwise, you need to scale the airflow either using CeleryExecutor or kubernetes.
@@coder2j Thank you for your reply!
I got your points, they really make sense. Thank you.
finally!!
Boom!! I did it
Nice to hear that.
Hi coders2j, is the password "airflow" in the yaml file different from the password of postgres running in the machine?
No, they are the same. Check if you already have postgres instance running locally on port 5432.
I was wondering, why you deleted airflow worker on docker compose and what the reasons? Is it fine run airflow without airflow worker?
If we use local executor, all the airflow jobs run in the scheduler container. Workers are needed if you use distributed setup, like celery executor for example.
Thanks
Coder2j, thanks for this great video... Please, I am having problems with docker-compose up airflow-init. I'm getting this error consistently
docker-compose up airflow-init
[+] Running 0/15
⠹ postgres Pulling 10.1s
⠸ f1f26f570256 Pulling fs layer 1.2s
⠸ 1c04f8741265 Pulling fs layer 1.2s
⠸ dffc353b86eb Pulling fs layer 1.2s
⠸ 18c4a9e6c414 Waiting 1.2s
⠸ 81f47e7b3852 Waiting 1.2s
⠸ 5e26c947960d Waiting 1.2s
⠸ a2c3dc85e8c3 Waiting 1.2s
⠸ 17df73636f01 Waiting 1.2s
⠸ 124bb42a3852 Waiting 1.2s
⠸ dfb19482a052 Waiting 1.2s
⠸ bbb12a596105 Waiting 1.2s
⠸ aa8960c4e383 Waiting 1.2s
⠸ fdbdb6eba8dc Waiting 1.2s
⠿ airflow-init Error 10.1s
Error response from daemon: pull access denied for extending_airflow, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Any ideas please?
If you are running with source code from GitHub repo, make sure check this commit. github.com/coder2j/airflow-docker/commit/576fb2f78549c62d554e1675af0045956f7f0d69
hey, I used your exact steps but my containers for the scheduler and webservice keep restarting. thus i cannot visualize anything!! please help
What about your postgres container? Does it also keep restarting? Check and compare your docker-compose.yaml file with this github.com/coder2j/airflow-docker/commit/576fb2f78549c62d554e1675af0045956f7f0d69
how to install new python module in installed airflow via docker ?
You can check out this video ua-cam.com/video/0UepvC9X4HY/v-deo.html
@@coder2j i get isuue ModuleNotFoundError: No module named 'pymysql'
even though I have added pymysql in the requirements.txt file
PyMySQL==1.0.2
I am trying to add a new dag in the dags folder, but I am getting "Import airflow could not be resolved" error in my vscode. Whats the best way to fix this? Thanks in advance.
If you are running airflow in docker, your airflow package dependency is installed in docker container, which is not visible to VSCode. Therefore, you can either ignore the error or create a python environment in VSCode, install airflow and tell VSCode the path of your Python environment. The issue could be resolved!
how can I add some python packages, I mean pyspark, s3, and so on?
Check out this video: ua-cam.com/video/0UepvC9X4HY/v-deo.html
Good to start 2.0; I have a question how to add python libraries into the image like usually we do RUN pip install
The easiest way to do that is to extend the apache airflow official docker image. So basically you create a Dockerfile as following:
FROM apache/airflow:2.0.1
COPY requirements.txt /requirements.txt
RUN pip install --user --upgrade pip
RUN pip install --no-cache-dir --user -r /requirements.txt
You will have to create a requirements.txt file in the same directory as the Dockerfile which will be copied into the image and installed.
Then you use docker build command to build the extended image:
docker build . --tag my_airflow:latest
After that, you need to replace the airflow docker image name from the official image to your extended image my_airflow:latest in the docker-compose.yaml file. That's it, the rest steps will be the same, you call the docker-compose up airflow-init and docker-compose up to launch the airflow webserver and scheduler.
@@coder2j Yes; I figured out that on same day just after posting my comments :-) . We have airflow 1.x setups in our project having everything in "requirements.txt" which executes by "entrypoint.sh" during container initialization (refer to 1.x git and entrypoint.sh); and we were struggling to add that way in 2.x poc environment. Later we have found all these details in 2.x git (refer to Dockerfile of 2.x)... but thanks for replying. Looking forward to see more videos like task chaining, dag chaining, dynamic task creations on the fly to leverage multi-processing in parallel. I am reading those from 2.x documents, but good to have those in videos. Thanks again.
You are welcome! I am glad to hear that you found a solution. :-)
@cookie you are welcome!
Is that necessary to install docker
Theoretically you can run it locally to follow the tutorial, but It is recommended to install docker as the following videos are running airflow in docker.
i've been following your guidance, but when i'm about to test run dag manually, it always running but never finished... when i see .log file, it's some kinda looping... do you know why? thanks for the reply
It's hard to tell where exactly went wrong with the info provided. I think you can try to check your dag implementation. It might have some loop logic that never stops.
@@coder2j i'm running example_bash_operator dags
im sorry, it's my bad, i didn't turn on the dag and just found out it won't running even you click it manually... sorry beginner error
Airflow installing on docker gives message to upgrade airflow db. But when I try airflow db upgrade it get error: airflow command not found. Please help
Can you share your docker compose yaml file?
@@coder2j I was able to start airflow webserver, I had to enable my permissions to my path. But, now I ran into different error: "latest-test-repo-airflow-webserver-1 | error: option --workers not recognized" and "latest-test-repo-airflow-scheduler-1 | error: invalid command 'scheduler'". Also, please can you let me know how and where can I share docker-compose yaml file with you.
when I run the 'airflow webserver -p 8080 ' command: error import pwd
ModuleNotFoundError: No module named 'pwd'
I need some help!!! thanks
Why do you need this command if you are running airflow in docker?
unable to install docker
username is airflow but is the password?
The password is also airflow in the demo I shown.
i forfot to input -d
Without -d, the container will run in the foreground.