Customer Churn Data Analytics|Data Pipeline using Apache Airflow, Glue, S3, Redshift, PowerBI|Part 2

Поділитися
Вставка
  • Опубліковано 15 вер 2024
  • This is the part 2 of this customer churn python ETL data engineering project using Apache Airflow and different AWS services.
    In this customer churn data engineering project, we will learn how to build and automate a python ETL pipeline that would use AWS glue to load data from AWS S3 bucket into an Amazon Redshift data warehouse thereafter connect PowerBi to the Redshift cluster to then visualize the data to obtain insight. AWS glue involves using AWS glue crawler to crawl the S3 bucket to infer schemas and then create a data catalogue on that. We can then also use Amazon Athena to write SQL queries on top of the data catalogue to get insight from the data. The AWS glue also helps in loading the crawled data onto the Redshift cluster. Apache airflow would be used to orchestrate and automate this process.
    Apache Airflow is an open-source platform used for orchestrating and scheduling workflows of tasks and data pipelines. This project will entirely be carried out on AWS cloud platform.
    In this video I will show you how to install Apache airflow from scratch and schedule your ETL pipeline.
    Remember the best way to learn data engineering is by doing data engineering - Get your hands dirty!
    Customer Churn Data Analytics|Data Pipeline using Apache Airflow, Glue, S3, Redshift, PowerBI|Part 1 • Customer Churn Data An...
    If you have any questions or comments, please leave them in the comment section below.
    Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos.
    *Books I recommend*
    1. Grit: The Power of Passion and Perseverance amzn.to/3EZKSgb
    2. Think and Grow Rich!: The Original Version, Restored and Revised: amzn.to/3Q2K68s
    3. The Book on Rental Property Investing: How to Create Wealth With Intelligent Buy and Hold Real Estate Investing: amzn.to/3LLpXRy
    4. How to Invest in Real Estate: The Ultimate Beginner's Guide to Getting Started: amzn.to/48RbuOb
    5. Introducing Python: Modern Computing in Simple Packages amzn.to/3Q4driR
    6. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition: amzn.to/3rGF73G
    **************** Commands used in this video ****************
    sudo apt update
    sudo apt install python3-pip
    sudo apt install python3.10-venv
    python3 -m venv customer_churn_youtube_venv
    source customer_churn_youtube_venv/bin/activate
    sudo pip install apache-airflow
    pip install apache-airflow-providers-amazon
    airflow standalone
    CREATE TABLE IF NOT EXISTS customer_churn(
    CustomerID VARCHAR(255),
    City VARCHAR(255),
    Zip_Code INTEGER,
    Gender VARCHAR(255),
    Senior_Citizen VARCHAR(255),
    Partner VARCHAR(255),
    Dependents VARCHAR(255),
    Tenure_Months INTEGER,
    Phone_Service VARCHAR(255),
    Multiple_Lines VARCHAR(255),
    Internet_Service VARCHAR(255),
    Online_Security VARCHAR(255),
    Online_Backup VARCHAR(255),
    Device_Protection VARCHAR(255),
    Tech_Support VARCHAR(255),
    Streaming_TV VARCHAR(255),
    Streaming_Movies VARCHAR(255),
    Contract VARCHAR(255),
    Paperless_Billing VARCHAR(255),
    Payment_Method VARCHAR(255),
    monthly_charges FLOAT,
    Total_Charges FLOAT,
    Churn_Label VARCHAR(255),
    Churn_Value INTEGER,
    Churn_Score INTEGER,
    Churn_Reason TEXT
    )
    **************** USEFUL LINKS ****************
    How to remotely SSH (connect) Visual Studio Code to AWS EC2: • How to remotely SSH (c...
    PostgreSQL Playlist: • Tutorial 1 - What is D...
    Github Repo: github.com/Yem...
    www.kaggle.com...
    Apache Airflow Playlist • How to build and autom...
    How to build and automate a python ETL pipeline with airflow on AWS EC2 | Data Engineering Project • How to build and autom...
    airflow.apache...
    airflow.apache...
    registry.astro...
    Download PowerBI www.microsoft....
    DISCLAIMER: This video and description has affiliate links. This means when you buy through one of these links, we will receive a small commission and this is at no cost to you. This will help support us to continue making awesome and valuable contents for you.

КОМЕНТАРІ • 27

  • @vaibhavtyagi1588
    @vaibhavtyagi1588 6 місяців тому

    THANKS ALOT FOR YOUR VIDEOS, YOU DESERVES MILLION OF FOLLOWERS.

    • @tuplespectra
      @tuplespectra  6 місяців тому

      I appreciate your comment. Thx

  • @camila92500
    @camila92500 5 місяців тому +1

    Thank you for sharing!

  • @jalisarenee55
    @jalisarenee55 5 місяців тому +1

    Thank you for this tutorial!

  • @ataimebenson
    @ataimebenson 3 місяці тому +1

    Thanks alot for this video, I learnt alot from it

  • @maverick6111
    @maverick6111 4 місяці тому

    Awesome Tutorial. Helped me a lot. Thank you..!!

  • @kenneth1691
    @kenneth1691 11 місяців тому +2

    Thanks a lot for your videos

  • @NgoVanKhanh-n4c
    @NgoVanKhanh-n4c 11 місяців тому

    thank for your videos, it helps me a lot

    • @tuplespectra
      @tuplespectra  11 місяців тому +1

      I'm glad our videos have been helpful to you. Your comment means a lot to me. Thank you!

  • @manojkumaar8221
    @manojkumaar8221 19 днів тому

    where can we add pyspark transformations ?

  • @basavarajpn4801
    @basavarajpn4801 2 місяці тому

    Where I can find the airflow scripts??

  • @tulas11
    @tulas11 7 місяців тому

    Best explanation so far
    Need some more projects which I could be doing with free AWS services, Being student couldn't afford paid services

  • @olaoluwaolaniyi2241
    @olaoluwaolaniyi2241 11 місяців тому

    Thank you once again for your videos. I just got onto part 2, and I was wondering if using Amazon Redshift Serverless is a possible alternative to Clusters in this video?

    • @tuplespectra
      @tuplespectra  11 місяців тому

      Thanks. I'm glad you are taking the time to work through the project. Yes I believe the redshift serverless could be used but note that all depends on the requirements. I will explore using it in another project.

  • @sulthonmajid8111
    @sulthonmajid8111 11 місяців тому

    My airflow providers don't have Amazon Web Services installed, so Amazon Web Services isn't available in that type of connection. Why is that?

    • @tuplespectra
      @tuplespectra  11 місяців тому

      Did you remember to do a "pip install apache-airflow-providers-amazon"?

    • @sulthonmajid8111
      @sulthonmajid8111 11 місяців тому

      @@tuplespectra yes i did. But still no connection type for aws

    • @user-ft7fq6ts1f
      @user-ft7fq6ts1f 11 місяців тому

      @@sulthonmajid8111 Have you solved this issue?

    • @kobe1187
      @kobe1187 11 місяців тому

      @@sulthonmajid8111 i have the same problem and only i had to restart the airflow and the providers are available

    • @JathinSuryaTeja
      @JathinSuryaTeja 4 місяці тому

      Amazon provided MWAA service you can work on it