Airflow for Beginners - Run Spotify ETL Job in 15 minutes!

Поділитися
Вставка
  • Опубліковано 11 лис 2020
  • In this long-awaited Airflow for Beginners video I'm showing you how to install Airflow from scratch, and how to schedule your first ETL job in Airflow!
    We will run the Spotify ETL job that we created in my Data Engineering Course for Beginners. But if you have your own code to schedule in Airflow - feel free to use it instead!
    In this Airflow tutorial I just wanted to get you started with Airflow, because I know that configuring Airflow can be non-trivial. I won't be going in-depth into all Airflow's features, so hopefully this tutorial will be as beginners friendly as it can get!
    I'll show you how to fix some installation errors, how to set up your project folders, how to edit Airflow config file and where it sits, and how to write your first DAG file!
    Get ready for a ride!
    Oh, but grab some useful links first:
    -github code repo: github.com/karolina-sowinska/...
    -Airflow quick start guide: airflow.apache.org/docs/stabl...
    -Spotify access token: developer.spotify.com/console...
    Don't forget to subscribe if you're new here, more content is coming!
    :)
    ---------
    If you'd like to learn data engineering, I recommend following the 4 simple steps below to land you the first job interview:
    1. Learn Python
    I recommend following the Python for Everybody specialization course on Coursera, which is one of the most popular courses there:
    imp.i384100.net/x9gVO3
    2. Learn SQL
    SQL is still the lingua franca of data. I recommend going with Learn SQL Basics for Data Science course, because it contains some chapters which are very releavant to data engineering in partcular, e.g. distributed computing with Spark
    imp.i384100.net/QOMZ09
    3. Learn Bash scripting/Linux
    I wouldn't take a full course on it, but at least read a good article.
    if you do prefer to take a course/guided project, I think this one is short and good:
    www.coursera.org/projects/com...
    4. Learn how to develop on the cloud, e.g. on AWS
    There are a few good courses around there, but I think the Coursera one is the most comprehensive
    imp.i384100.net/P0MJBM
    Discloure: The above contain affiliate links, meaning when you click the links and make a purchase, I receive a commission!
    Find me on Instagram:
    @karo_sowinska
    You can also warm my heart with a cup of coffee as a thanks!... :)
    ko-fi.com/karolina_sowinska
  • Наука та технологія

КОМЕНТАРІ • 286

  • @ElPapelMan
    @ElPapelMan 2 роки тому +4

    One of the best etl series I've ever watched on youtube... thank you.

  • @tejagoud4871
    @tejagoud4871 3 роки тому +40

    Watching this is really worth the time. Not like other UA-cam channels where they run promotions for a minute or two. Above all, it is really a good video on getting started with Airflow. Great work Karolina. You are an amazing instructor.

    • @karolinasowinska
      @karolinasowinska  3 роки тому

      I really appreciate this comment, thanks so much Teja!

    • @ricardoarbois2839
      @ricardoarbois2839 3 роки тому

      @@karolinasowinska Nice video.i really love your installation., hope you dont mind if i post here my yt vid about installation airflow in heroku....thanks and please more vid in airflow...ua-cam.com/video/Xa4Rw-cbKIQ/v-deo.html

  • @chrish.4734
    @chrish.4734 3 роки тому +3

    Great video, thanks a lot Karolina! I really like your clear way to explain, which is straight to the point and your great energy!

  • @Marc_B.4
    @Marc_B.4 3 роки тому +5

    The video I was waiting for! I'm happy to see it, it's very well presented.
    It was really useful, I now have a good feeling about how Airflow works. Can't wait to see what's next on your channel :)

    • @karolinasowinska
      @karolinasowinska  3 роки тому +1

      Aw I'm super glad to hear that it met your expectations! Thanks! :)

  • @zma314125
    @zma314125 2 роки тому +2

    Thank you for explaining such a complicated topic in a simple way. This will definitely be a help as a foundation to data engineering. Keep up the great work!

  • @geetanshkumar1854
    @geetanshkumar1854 3 роки тому +4

    Amazing work ma'am. I am new to all this and this tutorial was so simple and clear. Your way of explaining is also unique because you talk about errors as well which very few people do.

  • @brosnandegenaar4273
    @brosnandegenaar4273 3 роки тому +8

    So helpful! Thank you so much for this mini series, I've learnt alot.

  • @datexland
    @datexland 3 роки тому +2

    You're really good to explain the basic concepts, Thanks so much for sharing, definitely you gonna win a lot of subscriptors

  • @camilastenico2299
    @camilastenico2299 3 роки тому +3

    Thank you! I loved the videos. You explained core concepts in a clear and simple way, well done :)

  • @josecarlossilva3670
    @josecarlossilva3670 2 роки тому +1

    Awesome content!! I ve been struggling with that stuff for few months. Thanks for sharing

  • @MrDavisv
    @MrDavisv 3 роки тому +1

    Best explanation I’ve see about DAGs and super helpful intro to Airflow. Makes complete sense. Thank you!

  • @aureliusnt
    @aureliusnt 2 роки тому

    Amazing work, Karolina. You teach very well. Thank you so much.

  • @Paperwood360
    @Paperwood360 Рік тому

    What a great tutorial, the best I've seen so far for Airflow! Thank you very much

  • @BestevertechBlogspot
    @BestevertechBlogspot 3 роки тому +3

    Thank you for posting such relevant content. These are really worth it!

  • @yuriershov6530
    @yuriershov6530 3 роки тому +4

    Great content! Just what I needed before starting my data engineering courses

  • @ChernobylPizza
    @ChernobylPizza 2 роки тому +3

    I like your tutorials because they are simple. I kinda got stuck biting off more than I can chew and I was going in circles for a while. Data careers are about so many things it's easy to get lost (python, hadoop/spark, airflow, ML, math/stats, visualization, cloud).... I just needed something simple I can do easily to get going.

  • @josesebastiancolaneri7125
    @josesebastiancolaneri7125 3 роки тому +1

    Excellent video! Thank you very much Karolina!

  • @Nedwin
    @Nedwin 3 роки тому +1

    Love the extract, transform, load part! ❤️

  • @itsrainingcatsanddogs
    @itsrainingcatsanddogs 3 роки тому +7

    I'll be needing more of these airflow tutorial

  • @pranoygowda4595
    @pranoygowda4595 3 роки тому +1

    Could get on the concepts and working with Airflow just by watching the video. Very much helpful video to get started with. Amazing!

  • @HenriqueP
    @HenriqueP 3 роки тому +1

    Karolina, I was enlightened by your explanation/methodology, helped me a lot to get started with Apache Airflow, mad props for this! Keep up with the work!
    Cheers from Brazil

    • @karolinasowinska
      @karolinasowinska  3 роки тому

      I'm so glad that my effort didn't go to waste! Thanks for your comment! :)

  • @Shagysami
    @Shagysami 3 роки тому +1

    Thank you for the neat video !
    I'm new to data engineering and may be nailing my upcoming job interview thanks to you

  • @dnrocha1
    @dnrocha1 3 роки тому +1

    Great content! Thanks, it was very helpful!

  • @akshayshenoy6827
    @akshayshenoy6827 3 роки тому +1

    Thank you so much Karolina! I learnt alot today by watching your video. :)

  • @nataindata
    @nataindata 3 роки тому +4

    Karolina, thank you a lot for your efforts and for making these videos! You've sparkled ingenuine interest in me to try the project out. Plus, it's really great to know that Data Engineering community is empowered by women. I'm only starting my way in DE, so it's great to follow you and learn.
    Love ❤️

    • @karolinasowinska
      @karolinasowinska  3 роки тому +1

      Thanks so much for this lovely comment! Good luck on your DE journey and I hope I'll see you around here! :)

  • @McMurchie
    @McMurchie 3 роки тому +10

    Great video Karolina, for those struggling with pip install - I suggest doing a quick learn of conda so you can create a quick conda environment to install airflow without messing up your primary python/pip libs and versions. I agree though, Airflow is so tricky to set up.

  • @alexanderbenavides1887
    @alexanderbenavides1887 Рік тому

    Wuau, an amazing video tutorial. I love your videos :)

  • @DDAN48LIFE
    @DDAN48LIFE 2 роки тому

    I love Karolina , you are the best

  • @AlexAcostaB
    @AlexAcostaB 3 роки тому +2

    This is such a great introduction to Airflow. I already designed one pipeline and I am ready to implement it. Thank you so much.

    • @karolinasowinska
      @karolinasowinska  3 роки тому +1

      That's fantastic, how did it go? :)

    • @AlexAcostaB
      @AlexAcostaB 3 роки тому

      @@karolinasowinska it work well. I’m getting data from one customer’s FTP and it is failing using Python 🐍 working on a solution and it will be ready for deployment

    • @karolinasowinska
      @karolinasowinska  3 роки тому +1

      @@AlexAcostaB Awesome stuff :)

  • @karinaahmedova5221
    @karinaahmedova5221 2 роки тому

    superb explanation! thanks

  • @AndresHernandez-mz3xh
    @AndresHernandez-mz3xh 3 роки тому +2

    Wow, thank you so much Karolina! It helped me a lot with my project!

  • @gcmmartinsleme
    @gcmmartinsleme Рік тому

    Thanks. It was very helpful for me!

  • @McCallumClips
    @McCallumClips 3 роки тому +1

    Your video was exactly what you said it would be. An introduction. VERY GOOD JOB! Thank you.

    • @karolinasowinska
      @karolinasowinska  3 роки тому

      Yes indeed, it'd be very hard to discuss details in a 15-minute video! I'm glad you liked it! :)

  • @tylersnard
    @tylersnard 3 роки тому +1

    You are a clear communicator. Thank you.

  • @imdadood5705
    @imdadood5705 2 роки тому

    Something that I was looking for. I know Python, SQL, R and a good amount of machine learning. But I didn’t know what to do next. I just search for Apache Airflow and I got this! Thank you!

  • @srh1034
    @srh1034 2 місяці тому

    Wow, simply amazing!

  • @hisky74
    @hisky74 3 роки тому +1

    The nano tip is very useful!! Very good content! Thank you!

  • @AthanasiouApostolos
    @AthanasiouApostolos 3 роки тому +1

    Love your tutorials. Well done :)

  • @desarrollojava
    @desarrollojava 3 роки тому +1

    So much help here. You have wonderful skills for teaching.

  • @edragon1412
    @edragon1412 3 роки тому +1

    This is really helpful for Airflow beginners like I am. I am appreciated your work a lot. Keep working those topic like this, girl ;)

  • @heikokraemer2735
    @heikokraemer2735 3 роки тому +1

    Thank you Karolina, very useful, totally no waste of time.

  • @troymann5115
    @troymann5115 2 роки тому +3

    Nice video! One thing to think about concerning running Docker containers from the DAG: Airflow 1.x apparently has an issue which leaves containers in a non-started state. (At least it was a problem in our environment.) Airflow 2.0 seems to have resolved it. Thank you for making this video.

  • @nskeip
    @nskeip 2 роки тому +7

    Nice video. And about direc acyclic graphs - actualy, you could draw an arrow from 3 to 2 in the graph you showed as an example ^_^ (because there was no way to go back from 2 to 3, so it would not make a cycle)

  • @buithanhlam3726
    @buithanhlam3726 2 роки тому +1

    I found it very hard in the documentation, the book... then I found your video. Thanks a lot!

  • @xtopy9145
    @xtopy9145 2 роки тому +1

    You are the best, Thank you!

  • @Tien-Tjie
    @Tien-Tjie 3 роки тому +1

    oh my god I was waiting for this

    • @karolinasowinska
      @karolinasowinska  3 роки тому

      Aw, I'm super glad to hear that! ;) I hope you enjoyed it!

  • @rodrigoamoedo8523
    @rodrigoamoedo8523 3 роки тому +1

    your content is getting better every time

  • @sirosala
    @sirosala Рік тому +1

    Excellent Karo !!!! 💪💪💪

  • @migueldias1292
    @migueldias1292 3 роки тому +1

    very cool video! well done!

  • @mehdiyahiacherif2326
    @mehdiyahiacherif2326 3 роки тому +2

    well , i had a BI (business intelligence) project this year and i had no idea what are those etl and reporting tools , i searched for like 10 days and tested a lot of softwares some of them were usefull and some of them were ... meh
    and i actually liked airflow and dbeaver and both are in this video what a surprise ,
    for people who wants to test some bi tools tou have (free and open source) :
    etl :airflow , knime , pentaho DI
    reporting : superset,also some cool dashboards in pentaho server
    db RAT and gui tools : dbeaver and also DbSchema
    data mining : tanagra and weka
    take a look also at apache kylin (i did not knew how to setup it to get postgres as a datasource so ...)
    good luck guys and great video lady

    • @karolinasowinska
      @karolinasowinska  3 роки тому +1

      Awesome, thanks so much for the tips! :)

    • @mehdiyahiacherif2326
      @mehdiyahiacherif2326 3 роки тому +1

      @@karolinasowinska thanks for your reply , I would be very happy if you make a video of how to connect apache kylin to nex data sources ( it is painfull lol I searchedand itis not well documented)
      A serie about bi tools and data manipulation can be a great idea since not a lot of ppl do it on youtube
      Good luck

  • @avinandanbanerjee9568
    @avinandanbanerjee9568 3 роки тому +7

    Important note - You won't find the airflow directory until you run something on the CLI using airflow
    Just type in airflow once and hit enter to find the config file

    • @kelvin5685
      @kelvin5685 Рік тому

      Thank you! Also one needs to run "source airflow-venv/bin/activate" before running the command "airflow". That way you don't get an error that "airflow" command is not found

  • @OPopoola
    @OPopoola 3 роки тому +1

    Thanks for this introduction. I have been wondering what the big deal is with Airflow. Now I see the potential.

  • @samuelabolo
    @samuelabolo 2 роки тому +2

    "back to downgrading our future" got me cracked up

  • @kadourkadouri3505
    @kadourkadouri3505 2 роки тому

    Very concise nice tutorial ! Thanks

  • @ahmedtremo
    @ahmedtremo 3 роки тому +1

    I love you, thanks for the video

  • @andizvlogs
    @andizvlogs 3 роки тому +1

    I love your channel, tahnk you so much, I am learning a lot.

  • @marvhan888
    @marvhan888 5 місяців тому +1

    what a well done mentor's job you are doing.

  • @remi954
    @remi954 2 роки тому +1

    Thank you very much!

  • @blakegirardot5813
    @blakegirardot5813 2 роки тому

    Thank you very much, this style of quick start works really well for me, thank you for making it and sharing it!

    • @karolinasowinska
      @karolinasowinska  2 роки тому

      Glad it helped!

    • @blakegirardot5813
      @blakegirardot5813 2 роки тому

      Also, before the scheduler would run, after installation, I had to run `airflow db init` the the scheduler started right up.

  • @arjunjadhav3062
    @arjunjadhav3062 3 роки тому +1

    Love you and thanks.

  • @marianosin93
    @marianosin93 4 місяці тому +1

    Great job!!!

  • @Paulo-oy1gs
    @Paulo-oy1gs 3 роки тому +1

    great stuff! thanks

  • @marcosmel1087
    @marcosmel1087 2 роки тому

    YOU ARE AMAZING!

  • @manoharannadar9499
    @manoharannadar9499 2 роки тому +1

    Best.... keep posting 🙏🙏

  • @jparrax1
    @jparrax1 3 роки тому +1

    AMAZING VIDEOOOOOOO! Thank you for your time!

  • @xmagcx1
    @xmagcx1 2 роки тому +1

    I Love it!

  • @ankushojha5089
    @ankushojha5089 3 роки тому +1

    thanks for each & every videos 👍

  • @ashishk81
    @ashishk81 3 роки тому +1

    i am data scientist by profession and wanted to learn data engineering in details , i didnt found single free online resource to learn all data engineering skills ..you are doing great job ..waitng for your videos

    • @karolinasowinska
      @karolinasowinska  3 роки тому

      I'm super glad my videos are useful! :)

    • @ashishk81
      @ashishk81 3 роки тому

      @@karolinasowinska
      Can you please suggest free online resources to learn end to end data engineer

  • @resap.9128
    @resap.9128 3 роки тому +1

    Great one again, Thank you

  • @PenStab
    @PenStab 2 роки тому +4

    Really impressive explanations and teaching approach. You were concise but covered so many small in-between points that I would have otherwise missed. I'm definitely subscribing and going to watch other videos!
    My only complaint would be the resolution of the capture of the VS Code window - can it be a 16:9 ratio? It was so small on my phone.

    • @karolinasowinska
      @karolinasowinska  2 роки тому

      Thanks! I'll try to improve resolution going forward ;)

  • @foxohnfire
    @foxohnfire 2 роки тому +1

    Thanks Karol :)

  • @quinnluong114
    @quinnluong114 2 роки тому

    She a real one, you can tell because she's showing all the problems she's running into

  • @SASUKEUCHIHA-yc6er
    @SASUKEUCHIHA-yc6er 3 роки тому +32

    The moment I saw the video
    Thought she has over a million subscribers
    She deserves more subscribers and also more views....

  • @randolphralph8322
    @randolphralph8322 2 роки тому +1

    This is a great tutorial. I am having difficulties setting this up in Windows 10 environment. I was able to setup the virtual environment, but the install process for Airflow differs.

  • @baotran4175
    @baotran4175 3 роки тому +1

    Thanks very much. I come from Vietnam. Right now intern Data Engineer. I hope you can do more topics on Data Engineers in the near future

    • @karolinasowinska
      @karolinasowinska  3 роки тому +1

      Hello there! Nice to hear from a fellow techie. I will do for sure! ;)

  • @rawcodes
    @rawcodes 2 роки тому

    Thank you 💕

  • @AbdurrahmanKocukcu
    @AbdurrahmanKocukcu 3 роки тому +1

    Great video 🤗

  • @NewyJimmy
    @NewyJimmy 3 роки тому +3

    More on airflow please! This was great!

    • @karolinasowinska
      @karolinasowinska  3 роки тому

      I'll see what I can do! I'm glad you enjoyed it! ;)

  • @prafulmaka7710
    @prafulmaka7710 3 роки тому +1

    Very nice!!

  • @IppoCrypto
    @IppoCrypto 3 роки тому +1

    love your work

  • @TJ-hs1qm
    @TJ-hs1qm 2 роки тому

    I wasn't looking for airflow nor did I know what airflow even is but stayed the whole video.... know I know 🤣

  • @robetatisgmail
    @robetatisgmail 10 місяців тому +2

    Thanks a million! Very clear video, I really appreciated the transparency regarding errors. One note: At least in my case, the file ~/airflow/airflow.cfg only appeared after running airflow webserver -p 8080 for the first time, not after installation. Thanks!

  • @jae11011
    @jae11011 3 роки тому

    Great video for airflow beginner!! I have tried to run airflow for too much time and always stuck even before start webserver.
    This is the first time I successfully run it!!
    For anyone who is also new with Airflow, I got some small issue when I follow with the video.
    Here is how I solve it, just in case anyone encounter the same issue.
    To start airflow
    1. After install airflow, need to run airflow first, to create the airflow.cfg in the home.(If you haven't run it before)
    Simply type "airflow" will do the work. I didn't run it first, so I couldn't find the cfg file anywhere.
    2. I also need to run "airflow db init" to create the db for logs.
    3. Last, I need to create a user before I use the webserver, otherwise there will no user for me to login.
    These steps are available in the airflow document quick start as well.
    To run the dag as in the video.
    4. I switched the toggle to ON in the dag view, otherwise the task will remain in running forever.
    5. To run the extract.py or run_spotify.py, I need to put extract.py in the dags folder first.
    I just put the file directly in the dag folder, but I saw others put the whole python package(subfolder with __init.py) in the dag folder.
    The latter approach is better for bigger project with no doubt. But I still want to know does everyone put the packages directly in dag folder in real world?
    Since it's still a little messy for me putting dags file with scripts itself.
    A few questions I have though, should I terminate and restart airflow scheduler everytime I change my script or it will pick up automatically?
    I am still having token expired issue when run the script in airflow, even I updated my token in the script and ran fine in local machine.
    But it is a awesome video to me! Thanks to Karolina!

  • @fajarabdulkarim7672
    @fajarabdulkarim7672 3 роки тому

    Hi carolina, thanks for nice video. Easy to understand. Btw do you know how to test the dag ? Its Like unit test or the test which QA did in software development

  • @avinashbasetty
    @avinashbasetty 3 роки тому +1

    Thank you for the Spotify api demo. Luckily I have installed airflow 2.0 version today in my ubuntu 18.04 with no errors.

    • @karolinasowinska
      @karolinasowinska  3 роки тому +1

      Oh wow, looks like I was unlucky with my environment setup, good to hear that your installation went smoothly!

  • @MrBrykin
    @MrBrykin 2 роки тому

    Great video! Very helpful. Do you plan to make new videos about data engineering and Airflow?

  • @mahammadnabizade9408
    @mahammadnabizade9408 Рік тому

    Thanks for the amazing tutorials, just curious why you created a different virtual env for airflow ?

  • @ernestogomez6199
    @ernestogomez6199 3 роки тому

    Hi!! really like your vids, been learning a lot. During this one I've encountered an issue with the dag_folders location. I've changed it a lot of times, but I get 'dev/null'. I've look at stack overflow, but there is not enough info. Do you have any idea of what I should do? I've tried everything in sudo and still the same

  • @TaylorNelson1
    @TaylorNelson1 3 роки тому +2

    Ah the joys of finding new errors when you try to install things for a new production environment... this is such an accurate depiction of real engineering life.

  • @davidlr97
    @davidlr97 3 роки тому +1

    just watched your short data engineering course, very helpful intro to the topic. Thought I should mention that the playlist is out of order, though.

  • @romanlukichev4971
    @romanlukichev4971 Рік тому

    This is the first time I've heard about Airflow. Apache Airflow is an open-source workflow management platform for data engineering pipelines. It started at Airbnb in October 2014[2] as a solution to manage the company's increasingly complex workflows. Creating Airflow allowed Airbnb to programmatically author and schedule their workflows and monitor them via the built-in Airflow user interface.[3][4] From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a top-level Apache Software Foundation project in January 2019.
    Airflow is written in Python, and workflows are created via Python scripts.

  • @sanchesrfl
    @sanchesrfl 3 роки тому +1

    DAG = a directed collection of tasks without going back. Thanks!!!!!!!!!

  • @nachoggz
    @nachoggz 3 роки тому

    Excellent video!
    Someone know how can I import a postgres certificate for my db connection? Im trying but airflow can not find the file. im running airflow on docker

  • @ilia_meysak
    @ilia_meysak 2 роки тому

    Спасибо!

  • @veronicalima5664
    @veronicalima5664 3 роки тому +1

    You so Funny! I like soooooo much your video! Thanks!!! 😍😍😍😍❤️❤️

  • @GustavoLeig
    @GustavoLeig 2 роки тому

    Thanks for this awesome tutorial Karolina, one question, the first time I run, the Airflow starts lots of jobs, and the final table gets 20 songs, is that correct? Why so many jobs to get 20 rows?

  • @Indianvloggerinfinland
    @Indianvloggerinfinland 3 роки тому +1

    Hi Karolina , Thank you for the video. I need a help as am kinda stuck at one place where you will be editing the airflow.cfg file. I use Macbook. I couldnt find the file even after installing the aiflow. I dont see the folder airflow at all in spite of giving the command "export AIRFLOW_HOME=/airflow" . Need your help on this.

  • @prod.bythisjustin8449
    @prod.bythisjustin8449 3 роки тому

    Can you talk about the new Macbook pro with the m1 chip and what it means for developers and machine learning engineers?

  • @severlight
    @severlight 3 роки тому

    Karolina, can you suggest your favorite youtube accounts on the data engineering theme?) or any other sources)

  • @thepakcolapcar
    @thepakcolapcar 9 місяців тому

    Great video.
    Is there a way to pass configurations to the DAG and also they can be accessed by different tasks with in the dag?
    I am aware of XCOM and Variables etc. But is there a way a config file in form of json or yaml can be passed to the dag? And without using xcomm or variables from admin menu is there any other way to set and get values across diffetnt tasks with in dag?