Zillow Data Analytics (RapidAPI) | End-To-End Python ETL Pipeline | Data Engineering Project |Part 1

Поділитися
Вставка
  • Опубліковано 18 вер 2024
  • This is the part 1 of this Zillow data analytics end-to-end data engineering project.
    In this data engineering project, we will learn how to build and automate a python ETL process that would extract real estate properties data from Zillow Rapid API, loads it unto amazon s3 bucket which then triggers a series of lambda functions which then ultimately transforms the data, converts into a csv file format and load the data into another S3 bucket using Apache Airflow. Apache airflow will utilize an S3KeySensor operator to monitor if the transformed data has been uploaded into the aws S3 bucket before attempting to load the data into an amazon redshift.
    After the data is loaded into aws redshift, then we will connect amazon quicksight to the redshift cluster to then visualize the Zillow (rapid data) data.
    Apache Airflow is an open-source platform used for orchestrating and scheduling workflows of tasks and data pipelines. This project will entirely be carried out on AWS cloud platform.
    In this video I will show you how to install Apache airflow from scratch and schedule your ETL pipeline. I will also show you how to use sensor in your ETL pipeline. In addition, I will show you how to setup aws lambda function from scratch, set up aws redshift and aws quicksight.
    As this is a hands-on project, I highly encourage you to first watch the video in its entirety without typing along so that you can better understand the concepts and the workflows after which you should either try to replicate the example I showed without watching the video but consult the video when you are stuck or you could watch the video again the second time in its entirety while also typing along this time.
    Remember the best way to learn is by doing it yourself - Get your hands dirty!
    If you have any questions or comments, please leave them in the comment section below.
    Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos.
    **************** Commands used in this video ****************
    sudo apt update
    sudo apt install python3-pip
    sudo apt install python3.10-venv
    python3 -m venv endtoendyoutube_venv
    source endtoendyoutube_venv/bin/activate
    pip install --upgrade awscli
    sudo pip install apache-airflow
    airflow standalone
    pip install apache-airflow-providers-amazon
    *Books I recommend*
    1. Grit: The Power of Passion and Perseverance amzn.to/3EZKSgb
    2. Think and Grow Rich!: The Original Version, Restored and Revised: amzn.to/3Q2K68s
    3. The Book on Rental Property Investing: How to Create Wealth With Intelligent Buy and Hold Real Estate Investing: amzn.to/3LLpXRy
    4. How to Invest in Real Estate: The Ultimate Beginner's Guide to Getting Started: amzn.to/48RbuOb
    5. Introducing Python: Modern Computing in Simple Packages amzn.to/3Q4driR
    6. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition: amzn.to/3rGF73G
    **************** USEFUL LINKS ****************
    How to remotely SSH (connect) Visual Studio Code to AWS EC2: • How to remotely SSH (c...
    Extract current weather data from Open Weather Map API using python on AWS EC2: • Extract current weathe...
    How to send out email alert ON RETRY and ON FAILURE in Apache airflow | Airflow Tutorial • How to send out email ...
    Monitor workflow with slack alert upon DAG failure | Airflow Tutorial • Monitor workflow with ...
    How to build and automate a python ETL pipeline and slack alert with airflow | Airflow Tutorial • How to build and autom...
    PostgreSQL Playlist: • Tutorial 1 - What is D...
    Rapid API: rapidapi.com/hub
    AWS Lambda function - Create your first Lambda Function | Lambda Function Tutorial for beginners • AWS Lambda function - ...
    Github Repo: github.com/Yem...
    airflow.apache...
    airflow.apache...
    airflow.apache...
    airflow.apache...
    Part 2: • Zillow Data Analytics ...
    Part 3: • Zillow Data Analytics ...
    Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos.
    DISCLAIMER: This video and description has affiliate links. This means when you buy through one of these links, we will receive a small commission and this is at no cost to you. This will help support us to continue making awesome and valuable contents for you.
    #dataengineering #airflow

КОМЕНТАРІ • 84

  • @GeAsita
    @GeAsita 3 дні тому

    Loves from Argentina mate!! Amazing tutorial! Not only you clearly show having worked with these tools (not like most "gurus'). Plus you also explain the theory at the start, this is just golden content and God bless you for making it!

  • @i_am_out_of_office_
    @i_am_out_of_office_ 7 місяців тому +6

    Awesome Tutorial
    Your dedication to teaching end-to-end data engineering pipelines is truly inspiring. Your guidance has not only deepened my understanding of complex concepts but also empowered me to navigate the intricacies of building robust data pipelines. Thank you for your unwavering support and commitment to fostering knowledge in this dynamic field. Love from India 🚩

    • @krishnakumarkumar5710
      @krishnakumarkumar5710 7 місяців тому

      Being out of the office you learn all??

    • @tuplespectra
      @tuplespectra  7 місяців тому

      Thank you so much for your comment. It really means a lot to me.

  • @seth_king_codes
    @seth_king_codes 10 місяців тому +3

    BEST channel on youtube for learning about data engineering...thank you man
    your content inspires me

    • @tuplespectra
      @tuplespectra  10 місяців тому

      Thanks so much for this comment. It really means a lot to me

  • @bodzio7843
    @bodzio7843 3 дні тому

    You are great mate. Such amount of knowledge

  • @gyungyoonpark
    @gyungyoonpark 7 місяців тому +1

    thank you for the awesome tutorial!!! can't wait to start part 2.
    just one correction. in the "commands used", please add "sudo apt install awscli" as well.

  • @nicholasmageto6110
    @nicholasmageto6110 3 місяці тому

    The best ETL video I have ever come across. Thank you sir ❤‍🔥❤‍🔥❤‍🔥💯💯

  • @dudee420
    @dudee420 2 місяці тому +1

    Bro, your explanation is really amazing. Nobody explain at that level. if possible can you start some videos on GCP cloud data engineering projects also. Thank you for great learning

    • @avinash390
      @avinash390 2 місяці тому

      Hey bro .... Did you complete this project on AWS , how much was the total cost or it was within the free tier limit

  • @tuananhdo6006
    @tuananhdo6006 3 місяці тому

    This is just what I have been searching for, thank you good sir, please kindly post more videos, you are awesome

  • @R_SinghRajput
    @R_SinghRajput 4 місяці тому

    Since I’m a mech engineer coding is almost like mandarin to me but u sir the Great explanation 🙏🏻🔥🫡🫡 really loved it n totally understood ❤❤

  • @nameisnani5573
    @nameisnani5573 9 місяців тому

    Awesome Brother, This is the Best channel i have ever seen in youtube to learn something real. Great work, Nobody can explain like you did, Thank you soo much, Lots of love for you. Keep doing this. Thanks a lott again.

    • @tuplespectra
      @tuplespectra  9 місяців тому

      Thanks so much for your comment. I really appreciate it, and it means a lot to me and motivates me to do more.

  • @srinivasrepala1
    @srinivasrepala1 10 днів тому

    ❤ good content

  • @pareekshitgaddam9912
    @pareekshitgaddam9912 2 місяці тому

    Amazing content! Thank you brother. Please do upload more such videos!!

  • @zuesbenz
    @zuesbenz 5 місяців тому

    another good video from you. keep it going, keep it simple to the point and let it flow together end to end. just as you have been doing.

    • @tuplespectra
      @tuplespectra  5 місяців тому

      Thanks so much for your comment.

  • @nayanroy13
    @nayanroy13 Рік тому +1

    Your content is very useful!

    • @tuplespectra
      @tuplespectra  Рік тому

      Thanks so much. Your comment means a lot to us and I'm glad that you find our contents useful and valuable.

  • @tolu_datacation
    @tolu_datacation Рік тому +2

    Very explanatory!

  • @rajkumardubey5486
    @rajkumardubey5486 Місяць тому

    We can also use the .env file for encryption of api key and use envloader

  • @kandoras.guzman6705
    @kandoras.guzman6705 10 місяців тому

    This was awesome! Thank you for this resource.

  • @Friendsforever-rg2bq
    @Friendsforever-rg2bq Місяць тому +1

    Amazing man..!

  • @assieneolivier5560
    @assieneolivier5560 8 місяців тому

    Great and explicative video guys!! Amazing!!!

    • @tuplespectra
      @tuplespectra  8 місяців тому

      Thanks so much! I'm glad you like it.

  • @shivanshhedaoo7268
    @shivanshhedaoo7268 10 місяців тому +1

    Hi after airflow standalone i am getting error:
    ModuleNotFoundError: No module named 'connexion.decorators.validation'
    How do I fix this?

  • @sibisuriyanarayantiruchira2302

    Very helpful! Thank you so much :)

    • @tuplespectra
      @tuplespectra  Рік тому

      Thank you so much. I'm glad you find it helpful.

  • @joshuaroberts3987
    @joshuaroberts3987 10 місяців тому +1

    My ip address refuses to connect after i established port 8080. It showed airflow login and i put in credentials then show a refused to connect screen

  • @akj3344
    @akj3344 Рік тому

    At 19 seconds, already liked and subscribed.

    • @tuplespectra
      @tuplespectra  Рік тому

      Awesome. Thanks so much. And thanks for finding our video valuable.

  • @omkarmore2198
    @omkarmore2198 Рік тому

    Excellent ...

  • @shumengshi5925
    @shumengshi5925 5 місяців тому

    Thank you for the wonderful tutorial! It's been incredibly helpful, and I've already subscribed to your UA-cam channel!
    I have a question about the necessity of using EC2 in this project. Would it be possible to achieve the same results by simply installing Apache Airflow locally within a Python virtual environment? I followed your steps closely, but when I run a DAG with tasks to extract Zillow data via the Rapid API, the DAG seems to get stuck in the running state indefinitely without completing, and it doesn't generate any logs.
    Interestingly, when I test the Rapid API locally in a plain Python file, it works perfectly fine. Additionally, when I create a DAG without making requests to the API, it also works without any issues. The problem only arises when the DAG task attempts to access Zillow data via the Rapid API.
    I'm curious if this is why EC2 is used in the project. Any insights you could provide would be greatly appreciated! Thanks again for putting out great Data Engineering content!!

  • @QuanNguyen-z2g
    @QuanNguyen-z2g 10 місяців тому

    i just wonder this data pipeline using Lambda for loading and transforming data instead of Glue spark jobs?

  • @himanshupatil6661
    @himanshupatil6661 7 місяців тому

    I am getting an error while executing apache standalone
    TypeError: SqlAlchemySessionInterface.__init__() missing 6 required positional arguments: 'sequence', 'schema', 'bind_key', 'use_signer', 'permanent', and 'sid_length'

  • @salmanshikalgar4482
    @salmanshikalgar4482 Місяць тому

    Pip install --upgrade awscli command not running in virtual environment

  • @sophialawal7306
    @sophialawal7306 3 місяці тому

    which app did you use to create the data pipeline visualization?

  • @HarrisKeith-r5x
    @HarrisKeith-r5x 10 місяців тому

    Hey, let say I do this end to end mapping myself how much will it cost me to use their services? Can I do this in free tier plus additional cost I may incur using ec2 instance that is not free like you mentioned?

  • @maxubani9219
    @maxubani9219 8 місяців тому

    GOD BLESS YOU!❤

  • @AlDamara-x8j
    @AlDamara-x8j Рік тому

    Thanks for this great tutorial! Questions: Is it possible to use Cloud 9 as our IDE and from there access our EC2, or viceversa?

    • @tuplespectra
      @tuplespectra  Рік тому

      I believe you should be able to use it. Although I have not used it for my airflow project before. You will have to provision a cloud9 IDE and use it but you will have to pay for it except if there is a free-tier that you can use.

  • @abduljaweed8131
    @abduljaweed8131 Рік тому +2

    Make one ETL project with Apache airflow without using any cloud

    • @akj3344
      @akj3344 Рік тому +2

      Why through? In your job, youll be expected to work with cloud.

    • @abduljaweed8131
      @abduljaweed8131 Рік тому +1

      @@akj3344 yes but cloud is expensive so to understand the technology doing on local machine I think its good then after know the tech doing experimentation with cloud

    • @Edbwalz
      @Edbwalz Рік тому

      ​​@@abduljaweed8131Let me take you through an overview of a project that you can do without using cloud:
      First start by working with a CSV file. What you do is upload that file to an s3 bucket, and then load the data from the s3 bucket and basically transform it to parquet data type, and then write to another s3 bucket. After that you can use airflow to orchestrate the tasks.
      Now instead of using s3, you can use minio. It's an open source tool that works exactly like s3. Infact, the airflow operators for s3 can be used on minio as well.
      You can use pandas dataframe to do the transformation to parquet and write the file to the minio bucket. If you want to get a bit more fancy, you can use spark to do the same thing(it leverages the use of dataframes)
      After working with a file, then you easily change the data source to api endpoint.
      I can help if you want. Just ask if you need more clarification. I just gave an overview basically.

    • @cOnfidentialcOrp
      @cOnfidentialcOrp 10 місяців тому

      @@abduljaweed8131
      Main reason why cloud is used among big companies because its cheap vs building your own data center
      Also , aws and azure have free tier plans , enough for you to learn aswell

  • @navaneethur5466
    @navaneethur5466 6 місяців тому

    Hi sir,airflow option is not visibile in the vs code interface even after installing it in the ubuntu instance

    • @nikhitabiradar2146
      @nikhitabiradar2146 2 місяці тому

      Hi, I'm facing the same issue. Were you able to resolve it?

  • @darshan9340
    @darshan9340 10 місяців тому +1

    Hi,
    The project is really good, got to learn so much.
    I have an error while I am trying to transfer my file from ec2 to s3 bucket.
    File "/usr/local/lib/python3.10/dist-packages/airflow/operators/bash.py", line 210, in execute
    raise AirflowException(
    airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 127.
    I have checked my bash code, it is perfectly fine. My first dag, python operator is running and creating the file but when it comes to bash operator task, it is failing.

    • @pranalidarekar_5852
      @pranalidarekar_5852 10 місяців тому

      this happened with me too..what is the solution?

    • @darshan9340
      @darshan9340 10 місяців тому

      ​@@pranalidarekar_5852 I had a spelling mistake in my code, that's the reason why it was not running.

    • @pranalidarekar_5852
      @pranalidarekar_5852 10 місяців тому

      where exactly did you make the mistake
      @@darshan9340

    • @gyungyoonpark
      @gyungyoonpark 7 місяців тому

      @@darshan9340 I have the same error. can you tell me where you were wrong?

  • @pranalidarekar_5852
    @pranalidarekar_5852 10 місяців тому

    Thanks for the tutorial, I am trying to connect VSC with the same EC2 instance we created in this project but it showing that permission is denied due to public key. I followed each steps from your other video which is 'How to remotely SSH (connect) Visual Studio Code to AWS EC2'. Please help me with this. I have tired everything but showing me same issue. Iam using Macbook. Thankyou for your time!

    • @tuplespectra
      @tuplespectra  10 місяців тому

      May be you need to grant permission to the .pem file such as writing "chmod 400 path/to/filename". Another issue might also be the syntax in your config file. You need to make sure you write it the way it should with lower case where it supposed to be etc.

  • @inadaldaldaldal8231
    @inadaldaldaldal8231 16 днів тому

    can you Azure platform

  • @amanpirjade9
    @amanpirjade9 Рік тому

    Make video on AWS data analytics services project

  • @lesa7p2lmansion
    @lesa7p2lmansion 11 місяців тому

    guys anybody can help with timestamps on the videos ? it will be really helpful
    I am doing the project and putting it on github and linkedin when I finish...Thanks

  • @Mehtre108
    @Mehtre108 7 місяців тому

    Domain name pls

  • @kanchandendge1517
    @kanchandendge1517 11 місяців тому +1

    Airflow Standalone command getting stuck. not creating user and password . @tuplespectra, could you please help

    • @tuplespectra
      @tuplespectra  11 місяців тому

      Can you kill the server(CTR + C) and then restart it?

    • @kartikeymishra2673
      @kartikeymishra2673 11 місяців тому

      hey were you able to fix this error?
      I also faced the same issue !

    • @kartikeymishra2673
      @kartikeymishra2673 11 місяців тому

      @@tuplespectra well this really helped , thanks :)

    • @tuplespectra
      @tuplespectra  10 місяців тому

      @@kartikeymishra2673 You are welcome.

    • @nikkim94nikhil
      @nikkim94nikhil 8 місяців тому

      @@tuplespectra Hey, i'm getting a typeerror and not getting stuck but not creating user and password either! Can you help please

  • @Nari_Nizar
    @Nari_Nizar Рік тому

    Thank you so much for such and awesone tutorial. I wanted to run these codes and I am getting this error:
    WARNING - Error when trying to pre-import module 'airflow.providers.amazon.aws.sensors.s3' found in /home/ubuntu/airflow/dags/zillowanalytics.py: No module named 'airflow.providers.amazon'
    Please help!

    • @Nari_Nizar
      @Nari_Nizar Рік тому

      @tuplespectra could you please help?

    • @tuplespectra
      @tuplespectra  Рік тому

      @@Nari_Nizar did you remember to do a "pip install apache-airflow-providers-amazon"?

    • @Nari_Nizar
      @Nari_Nizar Рік тому

      @@tuplespectra it worked! Thank you very much, this is an excellent project!

    • @tuplespectra
      @tuplespectra  Рік тому

      @@Nari_Nizar Thanks. I'm glad it worked and you found the project valuable. Please help Like our videos and Share with your friends, team mates, colleagues so more people can benefit. Thanks so much.