End to End Data Analytics Project (Python + SQL)

Поділитися
Вставка
  • Опубліковано 12 кві 2024
  • In this video we will do an end to end data analytics project using python and SQL. We will use Kaggle API to download the dataset and to data processing and cleaning using pandas and load the data into sql server. Lastly we will answer some interesting questions using SQL.
    github link:
    github.com/ankitbansal6/data_...
    Data Analytics high quality content: www.namastesql.com/
    Zero to hero(Advance) SQL Aggregation:
    • All About SQL Aggregat...
    Most Asked Join Based Interview Question:
    • Most Asked SQL JOIN ba...
    Solving 4 Trick SQL problems:
    • Solving 4 Tricky SQL P...
    Data Analyst Spotify Case Study:
    • Data Analyst Spotify C...
    Top 10 SQL interview Questions:
    • Top 10 SQL interview Q...
    Interview Question based on FULL OUTER JOIN:
    • SQL Interview Question...
    Playlist to master SQL :
    • Complex SQL Questions ...
    Rank, Dense_Rank and Row_Number:
    • RANK, DENSE_RANK, ROW_...
    #sql #dataengineer #datanalysis #dataanalytics

КОМЕНТАРІ • 135

  • @ankitbansal6
    @ankitbansal6  Місяць тому +12

    Please like the video as it takes lots of effort to record these videos.
    Checkout my high quality data analytics courses :
    www.namastesql.com/

    • @revathigangisetty9065
      @revathigangisetty9065 Місяць тому +1

      Hi Ankit, can you please kindly help the below requirement.
      create table #temp ( DepartmentId int, Name varchar(255),Hiredate date,Sal float,Ruleid int)
      Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'Sai','2021-10-23',5500)
      Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'Sairam','1999-10-23',6000)
      Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'Saikrishna','2002-10-23',3000)
      Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'Sair','2021-10-23',5000)
      Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'Raj','2015-10-23',8000)
      Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'SRK','2021-10-23',5000)
      Create table #Rules (Departmentid int,Ruleid int, Condition Varchar(2000))
      Insert into #Rules values ( 10,1,'Name like ''Sai%''')
      Insert into #Rules values ( 10,2,'Hiredate >= ''2000-01-01''')
      Insert into #Rules values ( 10,3,'sal >= 5000')
      output
      Departmentid Name Hiredate sal ruleid
      10 Sai 10/23/2021 5500 2,10,30
      10 Sairam 10/23/1999 6000 2,30
      10 Saikrishna 10/23/2002 3000 2,10
      10 Sair 10/23/2021 5000 2,10,30
      10 Raj 10/23/2015 8000 10,30
      10 SRK 10/23/2021 5000 10,30

    • @revathigangisetty9065
      @revathigangisetty9065 Місяць тому +1

      DepartmentId is used to join #temp and #Rules.
      check each condition dynamically from #Rules against the corresponding rows in #temp.
      If a condition is satisfied, concatenate the Ruleid value from #Rules to the existing Ruleid in #temp

    • @ankitbansal6
      @ankitbansal6  Місяць тому +2

      ​@@revathigangisetty9065 send the problem on sql.namaste@gmail.com

    • @revathigangisetty9065
      @revathigangisetty9065 Місяць тому

      @@ankitbansal6 i shared the details in the email ankit. its urgent requirement ankit please kindly help me. i will be eagerly waiting for your input.

    • @revathigangisetty5734
      @revathigangisetty5734 Місяць тому +1

      i shared the problem to you via email and tried multiple times but i didnot get the exact solution. please kindly help ankit

  • @HarshGupta-rn6db
    @HarshGupta-rn6db Місяць тому +3

    You are the torchbearer for many Ankit, trust me!
    Cheers!

  • @piyushsharma8294
    @piyushsharma8294 Місяць тому +1

    Thank you bro for the best explanation i’ve ever seen about this topic.

  • @joydeepsarkar3092
    @joydeepsarkar3092 10 днів тому

    Very Helpful video Ankit! Thanks a lot for the efforts you have put for creating this guided project. Looking forward for many more guided projects :)

  • @satishwalaki
    @satishwalaki Місяць тому

    This was really helpful as a beginner me. thanks a lot & we need more n more videos like this.

  • @jeniyakulshreshtha7555
    @jeniyakulshreshtha7555 Місяць тому

    Today only i was seraching for data analytics ..and just saw your vedio ...thank you so much..pls post more such content on data analytics and any course on this as well please 😊

  • @ankushjain4128
    @ankushjain4128 Місяць тому +2

    Great video! Loved the easy explanation of the full ETL process and data analysis. Keep up the good work!

    • @HaroutBlack
      @HaroutBlack 8 днів тому

      Jesus is the only way to healing, restoration and salvation to all souls. Please turn to him and he will change your life, depression into delight, soul heading from hell to heaven all because of what he did on the cross
      “Whoever calls upon the name of the Lord shall be saved” Romans 10:13

  • @bodybuildingmotivation5438
    @bodybuildingmotivation5438 Місяць тому

    really love this kind of content, please make more video like this.

  • @nikhilreddy4582
    @nikhilreddy4582 Місяць тому +1

    Thankyou so much Ankit sir For this project👍

  • @macx8360
    @macx8360 Місяць тому +1

    thanks , i was waiting for this ....honestly

  • @gunaravi6224
    @gunaravi6224 3 дні тому

    Hello Ankit Sir, am a fresher and am looking forward to start my career in data analytics or data engineering field. This video is awesome and very helpful for me. Thank you so much for making the End ot End videos like this. And the way you explaining each and everything is very nice and clear. Keep it up Sir

  • @divyanshisharma4054
    @divyanshisharma4054 Місяць тому

    I thoroughly enjoyed this video and followed along with you. Thanks for this. Please keep posting more of such end to end analysis problems. Thanks a ton for taking the effort to make these videos so that we keep learning :)

  • @randyawuri1664
    @randyawuri1664 12 днів тому +1

    You sir, are a wonderful teacher! I am currently learning Data Engineering. This video has enabled me to catch up and understand some core concenpts that I found challenging because I missed live classes. Thank you!

  • @reddysekhar459
    @reddysekhar459 Місяць тому

    Thank you so much for the video! great video👍

  • @abolisabne4887
    @abolisabne4887 Місяць тому

    Thankyou so much for this Vedio Ankit.

  • @deviprasadgharpehai3315
    @deviprasadgharpehai3315 Місяць тому

    Thanks a lot bhai love nd support from berhampur❤

  • @ShekoniTaofeek
    @ShekoniTaofeek 2 дні тому

    Thank you very much. Never believed I could really joy it as much as I did. I really really appreciate your effort

  • @proud_indian0161
    @proud_indian0161 20 днів тому

    Very nice explanation. Thanks

  • @user-de7mr2uv2t
    @user-de7mr2uv2t Місяць тому

    Oh my God Amazing project

  • @jaybharat3949
    @jaybharat3949 6 днів тому

    First of all, thank you very much for the project. But there’re couple of concerns which you may have overlooked -
    1. In every query where we’re doing analysis based on sales, I guess it’s better to consider sale_price*quantity as sales rather than sale_price only
    2. In the last problem, profit should have been taken into consideration rather than sale_price
    But again I’m repeating that, above are some modification we can make to get the best business answers, otherwise in the video you’ve shown the approach how to handle an end to end project - from that pov it’s absolutely fine. We students can make necessary changes.

    • @ankitbansal6
      @ankitbansal6  6 днів тому +1

      Thanks for the feedback. Appreciate it 🙂

  • @vivekkumar6546
    @vivekkumar6546 Місяць тому

    Ankit bhai no words.. new things learned today..❤

  • @ravitejatavva7396
    @ravitejatavva7396 Місяць тому

    Wow, Appreciate your effort and work! 😍

  • @gurutech7398
    @gurutech7398 Місяць тому

    It's really great bro❤

  • @aniket9595
    @aniket9595 19 днів тому

    Thanks a lot Ankit for creating such an interesting project

  • @rakeshkoli4384
    @rakeshkoli4384 Місяць тому

    thanks bro and kindly make video on ETL process

  • @bira-the_billi
    @bira-the_billi 11 днів тому

    create table df_orders(
    [order_id] int primary key
    ,[order_date] date
    ,[ship_mode] varchar(20)
    ,[segment] varchar(20)
    ,[country] varchar(20)
    ,[city] varchar(20)
    ,[state] varchar(20)
    ,[postal_code] varchar(20)
    ,[region] varchar(20)
    ,[category] varchar(20)
    ,[sub_category] varchar(20)
    ,[product_id] varchar(50)
    ,[quantity] int
    ,[discount] decimal(7,2)
    ,[sale_price] decimal(7,2)
    ,[profit] decimal(7,2))

  • @akashjha7277
    @akashjha7277 Місяць тому

    Great video❤

  • @brishtisaha4785
    @brishtisaha4785 Місяць тому

    Sir please upload more SQL projects with datasets from kaggle.. Thank you for this..

  • @Azkaban819
    @Azkaban819 12 днів тому

    Finallyy, thanks!

  • @PravinMane-el5fo
    @PravinMane-el5fo 7 днів тому

    Nice one

  • @kuntalchowdhury5336
    @kuntalchowdhury5336 25 днів тому

    Excellent.

  • @Abhilash_Jose
    @Abhilash_Jose Місяць тому +7

    Hi Ankit, You have made this video at the correct time, I was looking for something like this. Could you also make a project on end to end problems while loading data in CSV to MySQL. I recently had a issue where I was not able to fully load all the data , only partially even though I did null treatment. Please make a video on it and also post it in your LinkedIn once you have done it so i may get notified about it

  • @tenzinchoepheldev
    @tenzinchoepheldev 9 днів тому

    Thanks for the effort 🎉😂

  • @vijaypratap8173
    @vijaypratap8173 Місяць тому

    Awesome😍😍😍😍

  • @WiseCoder-rp2zn
    @WiseCoder-rp2zn 11 днів тому

    Good Video, Keep it up

  • @beaver499
    @beaver499 7 днів тому +1

    Hi Sir! I am not able to connect the SQL Server at 22:10. I have used the same syntax because the driver name is same still getting errors. Please help

  • @anoopdube9581
    @anoopdube9581 Місяць тому +1

    Liked your video, I am from Australia.

  • @AKAK-nn5gy
    @AKAK-nn5gy Місяць тому

    You rock bro

  • @navaneeth6414
    @navaneeth6414 Місяць тому

    Hi Ankit, Wonderful Explanation. Can I have those DDL statement in SqL Server. Not able to find in Github

  • @mohitjoshi8984
    @mohitjoshi8984 Місяць тому +2

    Thank you ankit it's really helpfull.
    Can you make a practical video on A/B Testing plzz

  • @sowmya6471
    @sowmya6471 Місяць тому

    V useful

  • @ShivamGupta-wn9mo
    @ShivamGupta-wn9mo 20 днів тому

    brilliant

  • @amithvalentine1479
    @amithvalentine1479 8 днів тому

    hey Ankit, great content, thanks fro the video! How did you obtain the list the columns names with the datatypes and memory allocation to create the new empty table?

    • @ankitbansal6
      @ankitbansal6  8 днів тому

      You can right click on the table name in the browser and choose create to

  • @ActiveLifePicks
    @ActiveLifePicks 9 днів тому

    Hello Ankit, all these queries can be made while using pandas library, can you tell why your made sql server database and used sql for queries?

  • @rahulagrawal622
    @rahulagrawal622 Місяць тому

    Hi Ankit
    Can you please also show us
    How to add triggers
    To update the data every month on database
    Thanks

  • @srikrishnanr8155
    @srikrishnanr8155 Місяць тому

    To insert or load data frames into sql what is the maximum number of rows that can be inserted

  • @SunilKumar_67
    @SunilKumar_67 Місяць тому +1

    Can't add the ODBC Sql Server even after having the MySQL that was installed for working on your SQL Course. I am getting an error message saying TEST FAILED. Can you please share any link that can help us to get that server name present in the list of servers available to connect or better a follow up video showing how to do it on your system. Thanks in advance

  • @MayankGadiya-uq1el
    @MayankGadiya-uq1el 26 днів тому

    "for each category which month had highest sales"
    above question has different output from your youtube video and sql queries that u have provided in sql file.

  • @g-ij9km
    @g-ij9km Місяць тому

    yaaaaaay!!

  • @kantipudisaisowmith9628
    @kantipudisaisowmith9628 19 днів тому

    Can you explain how to connect sql server on mac with jupytor

  • @satyajeet7633
    @satyajeet7633 16 днів тому

    In the last question of sql
    Should we calculate growth by profit change or sale change because the question is for profit change
    Also in total revenue quantity of each order is not included

  • @ShashankVeerabomma
    @ShashankVeerabomma 28 днів тому

    with t1 as (select month(order_date) as month1,sum(sale_price) as sales1 from orders where year(order_date)=2022 group by 1),
    t2 as
    (select month(order_date) as month2,sum(sale_price) as sales2 from orders where year(order_date)=2023 group by 1)
    select month1 as months,sales1 as 22sales, sales2 as 23sales from t1 inner join t2 on t1.month1=t2.month2 order by 1 asc;
    another way for the year on year query

  • @saiteja-gb8ho
    @saiteja-gb8ho Місяць тому

    Thank you so much Ankit bro

  • @rajvardhan465
    @rajvardhan465 12 днів тому

    Your a goat 🐐 in data filed sir

  • @yadavikasAnalyst
    @yadavikasAnalyst 8 днів тому +1

    Sir when you solved the first question why didnt you multiplied the quantity column with the sale price?? For total revenue by product I'd?

    • @jaybharat3949
      @jaybharat3949 6 днів тому

      Yes, I think likewise. Revenue should have been qty*sale_price

  • @PANDURANG99
    @PANDURANG99 5 днів тому

    We can use power bi

  • @asrarahmad1691
    @asrarahmad1691 8 днів тому

    Hello Ankit when I am Importing date from python to SQL and running code in my Jupyter Notebook I am getting below error
    AttributeError: 'Connection' object has no attribute 'cursor'
    Please let me know

  • @user-jq7de2nf9z
    @user-jq7de2nf9z 3 дні тому

    Anaconda3\lib\site-packages\sqlalchemy\exc.py", line 258 except Exception, e:
    how to fix sqlalchemy error?

  • @randyawuri1664
    @randyawuri1664 12 днів тому

    Can this be done using VScode?

  • @chintanshah35
    @chintanshah35 Місяць тому

    Hi all, i am not able to see my odbc drivers details for sql.can anybody please help me out

  • @75hard_profitable
    @75hard_profitable 12 днів тому

    ❤❤❤

  • @sulaimans4720
    @sulaimans4720 12 днів тому

    where can i find dataset ankit ?

  • @professorx5823
    @professorx5823 11 днів тому

    Hi sir how can I add this to my resume and what are the exact project description should I write for this project..please help ..So that I can write the same thing by myself for other kind of projects

    • @ankitbansal6
      @ankitbansal6  10 днів тому

      Create a GitHub profile and put it there and use that link in your resume .

  • @milindzuge906
    @milindzuge906 Місяць тому

  • @vijaypradeep8315
    @vijaypradeep8315 Місяць тому

    bro can you show us an project where you have done the analysis in python why did you do that and store in s3 bucket and so on .Can you make an vedio on pyspark as well

  • @subodhthore6454
    @subodhthore6454 Місяць тому +1

    For first question, Don't we need to multiply sale_price with the quantity for revenue generated for each product?

    • @ankitbansal6
      @ankitbansal6  Місяць тому +1

      We can do that. I assumed it was total sales in the sale price.

    • @subodhthore6454
      @subodhthore6454 Місяць тому

      @@ankitbansal6 ok sir

    • @yousrasaadaoui1238
      @yousrasaadaoui1238 21 день тому

      same question. my answer is
      SELECT top 10 product_id, SUM(sale_price * quantity) AS total_revenue
      FROM df_order
      GROUP BY product_id
      ORDER BY total_revenue DESC;

  • @user-wi7mt5st2s
    @user-wi7mt5st2s 3 дні тому

    hi , kaggle.json file is not found in home directory . please advise on this

  • @vivekbhatt7357
    @vivekbhatt7357 Місяць тому

    Hello ankit i followed your video its great to learn . I have one scenario where i have two database one is postgrey and second mysql. what i need is there are 25 lakh product in mysql product table which has pricing and inventory which i want to update in postgrey database based on sku column which will be same. I used query and api its taking 4 to 5 hours to update can we do something with database to update pricing and inventory using procedure database to database. Please suggest

    • @ankitbansal6
      @ankitbansal6  Місяць тому

      Using python you can move data from MySQL to postgres and then run update on postgres

  • @gameply347
    @gameply347 11 днів тому

    Sir will we have a PowerBI course like tableau ?

  • @hritikapal683
    @hritikapal683 Місяць тому

    Can we expect something related to cloud as well?

    • @ankitbansal6
      @ankitbansal6  Місяць тому +1

      Here you go
      ua-cam.com/video/52CWagk3-jw/v-deo.html

  • @rohansharma3770
    @rohansharma3770 9 днів тому

    I dont have .kaggle folder in my directory... What to do?

  • @shivam_kumar1994
    @shivam_kumar1994 Місяць тому

    Not able to kaggle API understand

  • @devendrabarasker4987
    @devendrabarasker4987 Місяць тому

    Sir will not we multiply the quantity with sales price to get the overall sales price of particular product and then sum it????

  • @ushihaya
    @ushihaya Місяць тому

    How to find length of a feature in df to create identical column in database table

    • @ankitbansal6
      @ankitbansal6  Місяць тому

      df.dtypes

    • @ushihaya
      @ushihaya Місяць тому

      Ji how to know what is the maximum length of a feature with decimal postions so that there will not be any data loss while uploading data to table

  • @sarihaddu
    @sarihaddu Місяць тому +1

    I'm 40 and I am trying to switch career to Data Analytics from a completely different background. I learned PowerBI + SQL. Can I get entry into this field with these two modules? Can I learn Python later have a good career? That is my plan over the next 5 years.

    • @ankitbansal6
      @ankitbansal6  Місяць тому +1

      Yes that works. Make sure your SQL is strong .

    • @sarihaddu
      @sarihaddu Місяць тому

      @@ankitbansal6 Definitely. I was recommended your channel for SQL by a youtuber and actually made a lot of progress in the past two months from your channel. Thanks is very small word for the effort you put into your content. But Thanks anyways 👍😊

    • @akbarkhan-nq3xl
      @akbarkhan-nq3xl 14 днів тому

      Brother I'm 30 and trying to get into data analytics from the Mechanical domain.
      Many times I get frustrated and lose hope but your comment restored my faith in the process.
      I wish you all the best for your endeavours 👍.

  • @NickMaverick4
    @NickMaverick4 8 днів тому

    Hey Ankit I'm my PC isn't showing any .kaggle icon in my users - username- file how to get that.. I have installed kaggle pip install kaggle.

  • @macx8360
    @macx8360 Місяць тому

    i didnt understand why did we use kaggle Api. We could have directly downloaded the dateset from kaggle and read the file using pandas ,right?

    • @ankitbansal6
      @ankitbansal6  Місяць тому +6

      Yes but how we will learn to use API 😄

    • @prashantjaiswal5260
      @prashantjaiswal5260 Місяць тому

      @@ankitbansal6Can you make small video on setup of mysql server how to make server in it

    • @iqtech6065
      @iqtech6065 Місяць тому

      😂😅​@@ankitbansal6

  • @maduguyamini6805
    @maduguyamini6805 Місяць тому

    Hi sir
    I have a doubt
    Top 5 highest selling products should be in terms of quantity right?

  • @nikitanaik-xj5pm
    @nikitanaik-xj5pm Місяць тому

    hi i am not getting .kaggle folder in my pc why so

  • @user-lq9dx7uu8m
    @user-lq9dx7uu8m Місяць тому

    I am unable to extract zip file 😢

  • @VarunArora14
    @VarunArora14 11 днів тому

    For the query to find each category which month has highest sales, you can do this as well in mysql which looks easier -
    with cte as
    (select category, sum(sale_price) as sales, date_format(order_date, '%m%Y') as order_month from df_orders group by category,order_month order by sales desc)
    select category, order_month, sales from cte c1 where sales = (select max(sales) from cte c2 where c1.category = c2.category);

  • @saikatofficial420
    @saikatofficial420 Місяць тому

    sir in the last question I am getting a little bit confused it tells about but it is solved on total sales growth can you clarify for me, please
    with cte as(SELECT sub_category,year(order_date) as order_year,sum(profit) as total_profit
    FROM df_orders
    group by sub_category,year(order_date)
    ),
    --order by sub_category,year(order_date)
    cte1 as(
    select sub_category,
    sum(case when order_year=2022 then total_profit else 0 end) as profit_2022 ,
    sum(case when order_year=2023 then total_profit else 0 end) as profit_2023
    from cte
    group by sub_category
    --order by sub_category
    )
    select cte1.sub_category,((profit_2023-profit_2022)*100/profit_2022) as profit_growth
    from cte1
    order by profit_growth desc
    this is my solution. please rectify if I made any mistake.

  • @HaroutBlack
    @HaroutBlack 8 днів тому

    Jesus is the only way to healing, restoration and salvation to all souls. Please turn to him and he will change your life, depression into delight, soul heading from hell to heaven all because of what he did on the cross
    “Whoever calls upon the name of the Lord shall be saved” Romans 10:13

  • @SIDDHANTBHOWATE
    @SIDDHANTBHOWATE Місяць тому

    bhai tuze padhana nhi aata bilkul, sorry to say

  • @naveenvjdandhrudu5141
    @naveenvjdandhrudu5141 Місяць тому

    hi ankit can you please explain this below hierarchy query (or) make a video on it with your own style.........
    I was unable to crack this need your support thanks in advance......❤❤✌✌
    CREATE TABLE company
    (
    employee varchar(10) primary key,
    manager varchar(10)
    );
    INSERT INTO company values ('Elon', null);
    INSERT INTO company values ('Ira', 'Elon');
    INSERT INTO company values ('Bret', 'Elon');
    INSERT INTO company values ('Earl', 'Elon');
    INSERT INTO company values ('James', 'Ira');
    INSERT INTO company values ('Drew', 'Ira');
    INSERT INTO company values ('Mark', 'Bret');
    INSERT INTO company values ('Phil', 'Mark');
    INSERT INTO company values ('Jon', 'Mark');
    INSERT INTO company values ('Omid', 'Earl');
    SELECT * FROM company;
    /* Given graph shows the hierarchy of employees in a company.
    Write an SQL query to split the hierarchy and show the employees corresponding to their team.*/
    WITH RECURSIVE cte_teams AS (
    SELECT mng.employee,
    CONCAT('Team', ROW_NUMBER() OVER (ORDER BY mng.employee)) AS teams
    FROM company root
    JOIN company mng ON root.employee = mng.manager
    WHERE root.manager IS NULL
    ),
    cte AS (
    SELECT c.employee, c.manager, t.teams
    FROM company c CROSS JOIN cte_teams t
    WHERE c.manager IS NULL
    UNION
    SELECT c.employee, c.manager, COALESCE(t.teams, cte.teams) AS teams
    FROM company c JOIN cte ON cte.employee = c.manager
    LEFT JOIN cte_teams t ON t.employee = c.employee )
    SELECT teams, GROUP_CONCAT(employee, ', ') AS members
    FROM cte
    GROUP BY teams
    ORDER BY teams;

    • @ankitbansal6
      @ankitbansal6  Місяць тому +1

      Please send the question and expected output on SQL.namaste@gmail.com

    • @naveenvjdandhrudu5141
      @naveenvjdandhrudu5141 Місяць тому

      Hi Ankit hope you received my mail waiting for your response

  • @aniket4627
    @aniket4627 Місяць тому

    if u use google colab instead of jupyter
    use this command before import kaggle
    from google.colab import files
    uploaded = files.upload()
    for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
    name=fn, length=len(uploaded[fn])))

    # Move kaggle.json to the expected location
    !mv kaggle.json ~/.kaggle/

    # Change the file permissions
    !chmod 600 /root/.kaggle/kaggle.json