Exploratory Data Analysis (comment your best insight on the data)

Поділитися
Вставка
  • Опубліковано 28 вер 2024

КОМЕНТАРІ • 152

  • @savant_logics
    @savant_logics 2 роки тому +188

    The first 15 minutes is pure gold. Great insight on what and how to search for something. I'm so tired of other UA-camr's showing what time they wake up, work out, eat, do some work (without actually showing, what they do) and call the video "A Day in the Life of a Data Scientist"

    • @hypnyx
      @hypnyx 2 роки тому +8

      i hope you realize he made a mistake there. if he had used parse_dates['date_added'] instead of date_parser['date_added'] this situation would never have arose. Pandas can identify most d/t formats on its own.

    • @cocoarecords
      @cocoarecords 2 роки тому +1

      yeah hes very practical and unique

    • @savant_logics
      @savant_logics 2 роки тому +2

      @@hypnyx yes I noticed that but since he shows his work unlike the others it can be forgiven.

    • @sebastianc09
      @sebastianc09 2 роки тому

      and actually I was struggling with the SAME dates problem before watching his video...so helpful

    • @guitarrocksX21
      @guitarrocksX21 2 роки тому +4

      Oh man lmao. I get those recommended all the time. It's always some fancy, over-produced cringeworthy clips of them pouring coffee, turning off their alarm, and taking nap breaks. Oooohhh so cool and unique, a 9to5 job! Wow!.

  • @AMFIT93
    @AMFIT93 2 роки тому +18

    Really loved the first 20 minutes as other's mentioned, nice to see you forget things as well. Knowing how to find the answers you're looking for is an underrated skill!

  • @harrryyy9975
    @harrryyy9975 2 роки тому +5

    As a data analyst, i can tell this is a great video by watching the first 20 mins. Awesome demonstration of basic EDA!!

    • @nishushroff9656
      @nishushroff9656 2 роки тому

      I am a beginner, how the video helped to get insights? I mean what's comes after this?

  • @konstantinostzaferis5318
    @konstantinostzaferis5318 2 роки тому +1

    Just noticed your channel and you are becoming my favorite creator!
    Learning data science myself,and I have my first job interview two days from now your videos make my anxiety go away!

  • @BinaryBrainbow
    @BinaryBrainbow 2 роки тому +6

    I'm really excited that I found your channel! I actually start a MS Business Analytics and Data Science degree this January! Focusing on marketing analytics! Can't wait to watch this channel continue to grow.

  • @DhimanRoy
    @DhimanRoy 2 роки тому +1

    Hi Shashank
    This has been of great help in understanding the process of data cleaning and eda.
    I was stuck with a bit of multi index column data for a couple of days but your enthusiasm with this was inspiring and helped me push forward.
    Thank you.

  • @guoyitang4001
    @guoyitang4001 2 роки тому +5

    love how you guide us step by step. Keep it up the good work man. Really appreciate it

    • @ShashankData
      @ShashankData  2 роки тому +3

      Of course! I want to show people that the process takes a lot of looking up and going back, it’s not linear progress.

  • @tanvirahmed7727
    @tanvirahmed7727 2 роки тому

    amazingly good, honestly saying this is the perfect channel I have been looking for a few months

  • @leonhumbug149
    @leonhumbug149 2 роки тому +49

    I enjoy these videos alot! Watching you fail (and finding a solution!) makes me feel confident, that it's just normal to research even small things and I actually learn alot from it. Something I miss from other videos where everything just works perfectly fine.
    Thanks and keep up the great work

    • @ShashankData
      @ShashankData  2 роки тому +5

      Thanks so much Leon! I try and keep as much of that in to show people the process.

  • @valiuddinqureshi6492
    @valiuddinqureshi6492 2 роки тому

    Thank you so much for putting this work over here. This channel is so different from all other walkthroughs. The real scenario.

  • @thandavakrishnam830
    @thandavakrishnam830 2 роки тому +1

    I have watched some EDA sessions on UA-cam almost everyone made it look tough and boring but after watching your session I feel like EDA is more fun and exciting part of the Data Science Process.

  • @aditya730
    @aditya730 2 роки тому

    I love the way how you, Google the problems you get and show us how to do it!!!

  • @shubhamchoudhary5461
    @shubhamchoudhary5461 2 роки тому +6

    for getting month
    data['month_name'] = data["date_added"].dt.month_name(local='English')
    it'll give us month name like.. January , February, march , April......., December

  • @kanikajaswal6363
    @kanikajaswal6363 2 роки тому +1

    Really liked how you are solving this... unlike other videos where every query works perfectly fine.
    Whenever I start coding after watching those videos I feel like am I the only one not able to write any line of code prefect at once. but after this got to know that this is every coder struggle and stack overflow is the ultimate destination to find problems. 😂

  • @madrerik
    @madrerik 2 роки тому

    Literal gold mine of a channel.

  • @reidjennings7832
    @reidjennings7832 2 роки тому

    This is the content i've been looking for

  • @samk1707
    @samk1707 2 роки тому +2

    You keep bringing the best videos like GOD for me 🙇🏾‍♀️

  • @oyindamolaoronigbagbe13
    @oyindamolaoronigbagbe13 2 роки тому

    I enjoy watching every bit of this video. gives a little more confidence in my analysis. feels like even well established data scientists go through almost the same problems.

  • @soto036
    @soto036 2 роки тому

    I love this video, such interest information from the dataset. Thanks for sharing this, awesome work and nice example of a quickly EDA.

  • @ape853
    @ape853 2 роки тому

    Dude you rock. Thanks a lot for the videos.

  • @stevenrivera876
    @stevenrivera876 2 роки тому

    This was such a fantastic video of the process. Can't thank you enough for this view into your world!

  • @SpacecowboyGeo
    @SpacecowboyGeo 2 роки тому

    Hey Shashank, great vid! Very interesting and valuable EDA examples. Can’t wait to see more!

  • @theforester_
    @theforester_ 2 роки тому

    awesome video man! you showed how the real thing works... as the guy below commented i'm tired of these ppl who knows everything... anyways. awesome video again, greetings from brazil

    • @ShashankData
      @ShashankData  2 роки тому +1

      Hahaha hey thank you so much for commenting! Yes this channel is all about showing you how the real work of a data analyst is. I have a bunch of free tutorials and other videos about this as well if you’re interested

    • @theforester_
      @theforester_ 2 роки тому

      @@ShashankData sure. I'll be watching them. Thanks

  • @smellypunks
    @smellypunks 2 роки тому +4

    Love these cold coding videos they are much like my life 5-9. Another thing have you considered switching to a 4K monitor, I think you will like it. 16:9 makes life easier and the extra pixel height with 2160px mean less scrolling code. 31/32" seems to give a comfortable size at full res.

  • @andydataguy
    @andydataguy 2 роки тому

    Amazing video. Going to check out your patreon. Keep it up brother 🙏🏾

  • @d.ia.s
    @d.ia.s 2 роки тому

    keep up the amazing work, you are a great teacher and I'm sure you'll get bigger and bigger in no time! Congrats on the channel and the video.

  • @hardikacharya2664
    @hardikacharya2664 2 роки тому +1

    Enjoying it with coffee :)
    P.S. Great tips on your setup.

  • @manuelatienzo9764
    @manuelatienzo9764 2 роки тому

    Awesome video!.. Very clear explanation and how to search what you need. Keep up with the series...

  • @dolfinho87
    @dolfinho87 2 роки тому

    Very, very nice! I love how you search some answers! Thanks for the video!

    • @ShashankData
      @ShashankData  2 роки тому

      Ofc! I want to show everyone that even experienced analysts are always looking stuff up

  • @neilthomas5026
    @neilthomas5026 2 роки тому

    Good shit man !love this stuff

  • @dariashatilo6994
    @dariashatilo6994 2 роки тому

    My favorite type of the videos 😍😍 thanks Shashank

  • @chadgregory9037
    @chadgregory9037 2 роки тому

    Awesome video man.... tbh I think you're a great actor... going thru this process to teach people was done well, because obviously I know you know how to define the style of a parsed date in a dataframe without looking it up lol

  • @H99x2
    @H99x2 2 роки тому

    These type of videos are of great value! Would be cool if you'd start a series and then label them in the process. For example 1 big advanced dataset from start to finish, beginning with the most basic of steps and ending in the most advanced way of analysing. We could all save locally and resume working on the file everytime a new video on the series drops :)

    • @ShashankData
      @ShashankData  2 роки тому

      This is an absolutely amazing idea, I think this will become the next hit series on the channel. I'll see if I can get a video on this out ASAP

    • @ShashankData
      @ShashankData  2 роки тому

      Might use this Dataset
      www.kaggle.com/rohanrao/formula-1-world-championship-1950-2020?select=constructor_standings.csv

    • @H99x2
      @H99x2 2 роки тому

      @@ShashankData Damn that's a huge one. Could be a potential candidate i'd say. Thanks bhai!

  • @hukunamutata
    @hukunamutata 2 роки тому +1

    Loved watching this

  • @voggo
    @voggo 2 роки тому

    did visual studio throw a tantrum when you said "the beauty of jupyter notebook" while using it xD

  • @onajiteewhrudjakpor8913
    @onajiteewhrudjakpor8913 2 роки тому

    Majority of Rajiv Chilaka's Movies were added on this date 22nd July, 2021

  • @jake_runs_the_world
    @jake_runs_the_world 2 роки тому

    Man this is so good

  • @tuckercomm
    @tuckercomm 2 роки тому

    Great info.

  • @matheusm6786
    @matheusm6786 2 роки тому +3

    Shas, i saw your old video doing a data analysis with a similar data. To fix the data format problem you just wrote:
    pd.to_datetime(data['date_added'])
    and the problem was fixed. I got the same dtype doing this code, or i miss something?
    Ps: Thank you very much for your videos! I am watching all of them, it is being hugely important since practice is essential for learning this kind of stuff! Marry christmas!

  • @shubhamchoudhary5461
    @shubhamchoudhary5461 2 роки тому

    we can use following line for getting description column..
    data [data ['release_year'] == 1925]. description

  • @krgoutham8852
    @krgoutham8852 2 роки тому

    At 47 Min, you can sort by Movie to show if Multiple countries have correctly been assigned to the same movie and correct number of rows are added. Eg: Sankofa has 6 Languages, so Sankofa needs to have 6 rows

  • @oleksandrarsentiev7152
    @oleksandrarsentiev7152 2 роки тому

    Anupam Kher seems to be the actor with the most number of movies on Netflix (39), pretty impressive EDA!

  • @mohamed.montaser
    @mohamed.montaser 2 роки тому +1

    when you split the country column by "," all the movies with one country became NaN value, so all this records weren't calculated in your analysis

    • @boyzone5000
      @boyzone5000 2 роки тому

      Could you suggest a solution? Ty

  • @yonas8212
    @yonas8212 2 роки тому

    The first 15 minutes :) parse_dates=True argument would have done the job

  • @NyiTun-zr5qw
    @NyiTun-zr5qw Рік тому

    Thanks!

  • @onajiteewhrudjakpor8913
    @onajiteewhrudjakpor8913 2 роки тому

    For Duration Rajiv Chilaka's longest movie is 87 minutes that is an hour and 27 minutes

  • @EtrnalDeath
    @EtrnalDeath 2 роки тому

    where was this video before my Blackstone Interview

  • @marcioandre6469
    @marcioandre6469 2 роки тому

    Shashank you are the best man 👏👏👏👏👏👏👏👏👏👏👏

    • @ShashankData
      @ShashankData  2 роки тому +2

      Thanks so much! Let me know if there’s any other content you’d like to see

  • @olegsobadov4967
    @olegsobadov4967 2 роки тому

    57:40 will be error ('same type float' ) thanks for 1 hour my time to find another way (remember basic)

  • @onajiteewhrudjakpor8913
    @onajiteewhrudjakpor8913 2 роки тому

    Director Rajiv Chilaka has the most number of movies

  • @davidlotito2815
    @davidlotito2815 2 роки тому

    at 26:55 you can do data_import[data_import['release_year' == 1925][['description'] and that will pull up the data in a dataframe format, I think thats what you were trying to do?

  • @shubhamchoudhary5461
    @shubhamchoudhary5461 2 роки тому

    we can plot bar chart on 'type' column in order to find what type of movies they released most

  • @onajiteewhrudjakpor8913
    @onajiteewhrudjakpor8913 2 роки тому

    Rajiv Chilaka has movies mostly listed as Children & Family Movies

  • @Luis-kd2te
    @Luis-kd2te 2 роки тому

    Great Video!

  • @ritikajaiswal3824
    @ritikajaiswal3824 2 роки тому

    where is the link of the video where you have done data cleaning like you mentioned in the beginning of this video?

  • @lauterix223
    @lauterix223 2 роки тому +1

    Nice video! Do you have a tutorial on how to setup an IDE for data Science/Analytics in VSCode?

    • @ShashankData
      @ShashankData  2 роки тому +3

      Yes here it is: ua-cam.com/video/LwazHUkU5IQ/v-deo.html

    • @lauterix223
      @lauterix223 2 роки тому +1

      @@ShashankData Thank you so much!

  • @palashtiwari7980
    @palashtiwari7980 2 роки тому

    Hie Shashank can please help in how to add plotly.scatter_geo chart for the country_count dataframe. thanks in advance!!

  • @Mrroy08657
    @Mrroy08657 2 роки тому

    Pls , Suggest Such : Exact Job Roles in Data Science where in Every Day after working Hrs , I'll get Enough Time for UPSC IAS Govt. Exam Preperation .
    Pls Suggest & Guide me 🙏🙏.

  • @ragavendrabharathi
    @ragavendrabharathi 2 роки тому

    I see lot of null values in the dataset. Please upload a video on handling null values

  • @nivedc8478
    @nivedc8478 2 роки тому +1

    Hey, do you have Kerala roots?

  • @Adinasa2
    @Adinasa2 2 роки тому +1

    Jupyter notebook link

  • @seondeon6204
    @seondeon6204 2 роки тому

    You are one of my mentor and I look up to you, please can I interview you on a project for my class in pace111? Its an assignment for informational interview pleaseee? Thank you

  • @nelohenriq
    @nelohenriq 2 роки тому

    Just use pd.to_datetime on the date_added column like this pd.to_datetime(data_import['time_added'], inplace=True) and it does the conversion to datetime format

  • @RobotIsaac12
    @RobotIsaac12 2 роки тому

    I wasn't able to find the video that you used to show how you setup the VS code environment. Could you send a link for that please?

    • @ShashankData
      @ShashankData  2 роки тому +1

      Here it is: ua-cam.com/video/LwazHUkU5IQ/v-deo.html

    • @RobotIsaac12
      @RobotIsaac12 2 роки тому

      @@ShashankData Thanks!

  • @thrashshorts1703
    @thrashshorts1703 2 роки тому

    hey guys, exactly in 4:47 my jupyter notebooks doesnt show this little square showing all the info about the pd.read_csv() function, i think i messed up something, anyone having the same issue?

  • @Lnd2345
    @Lnd2345 2 роки тому

    Wouldn’t the argument parse_dates=True do the trick in the beginning!? That would have saved you 15 mins.

  • @kayquedemorais
    @kayquedemorais 2 роки тому

    Simplesmente, o "dataframe sem sujeira" da ciência de dados.
    (entendedores entenderão)

  • @vishalmane3139
    @vishalmane3139 2 роки тому

    Free resources for data analysis?

  • @Akash_158
    @Akash_158 2 роки тому

    Can i make my Final project on "EDA on covid" ??

    • @ShashankData
      @ShashankData  2 роки тому

      100%! Do you have a dataset you’re interested in?

    • @Akash_158
      @Akash_158 2 роки тому

      @@ShashankData yes

    • @ShashankData
      @ShashankData  2 роки тому

      @@Akash_158 what’s the dataset you’re looking at?

    • @Akash_158
      @Akash_158 2 роки тому

      @@ShashankData kaegle covid 19 in india

  • @nelsonrajd9237
    @nelsonrajd9237 2 роки тому

    bro r u Tamil origin, if yes just put a video about yourself

  • @ixternal9295
    @ixternal9295 2 роки тому +2

    coulda saved 10 minutes if you changed date parser to parse dates in the beginning

    • @ShashankData
      @ShashankData  2 роки тому +3

      UA-cam held your comment for review

  • @sirbootylord6880
    @sirbootylord6880 2 роки тому

    Thank you sir, this video helped me develop a better process while doing my own EDA

  • @fakerrain
    @fakerrain 2 роки тому +19

    Thanks for the video. I learned so much from it. Watching someone do the actual work is so helpful. I especially like how you show researching the different function you need.

  • @ishandandekar1808
    @ishandandekar1808 2 роки тому +11

    Great vid shashank :D, please continue the practical statistics for data scientists book, been waiting for part 4

    • @ShashankData
      @ShashankData  2 роки тому +6

      Thanks so much for the support! I was planning on putting the book away because a lot of the content after chapter 3 is covered in my Hands on Machine Learning guide. I might take a look at it again thanks for the suggestion and feedback.

  • @shubhamchoudhary5461
    @shubhamchoudhary5461 2 роки тому +2

    Thanks sir , from this video i got some clear idea about eda .. Everytime i was thinking like 'from where to start?" now i got clear idea.. eda is nothing but digging data as we can..
    Thank you sir for your efforts for us ...!! 🙏 ❤️
    waiting for next video...

    • @ShashankData
      @ShashankData  2 роки тому +2

      Yeah! There’s no real set process, just start somewhere and keep asking questions

  • @ivetastripeikaite307
    @ivetastripeikaite307 2 роки тому +2

    Whow amazing amazing video! So helpful! Cannot wait to see your data cleaning video. My entire masters is based on R unfortunately but the concepts of data analytics is so useful beyond belief! Thank you so so so so so so so so much!

    • @Traumatised311
      @Traumatised311 2 роки тому

      What country are you studying in ?
      Can you email me your syllabus

  • @AmiDenni
    @AmiDenni 2 роки тому +1

    Funny. Today at work I also did an exploratory data analysis and needed all the functions you showed :D But I had to read in a messy log file and it was necessary to split all the information for certain columns -> which can be very annoying... I teached myself to code with python and pandas and your video gave me the feedback that I am on the right path! Thanks :)

  • @JH-py9wf
    @JH-py9wf 2 роки тому +1

    This is some quality content and is realistic in terms of how a data analyst tackles a data problem. Can’t wait to check out your other videos

  • @marionagi2914
    @marionagi2914 2 роки тому +1

    Amazing as usual, really learned something new like the use of plotly and melt()
    but i think this dataset is lacking more needed info like (the number of views, or user rating, genre of the show) so you can draw more conclusions like - what is the most popular genre or is user rating (1 to 5) for most viewed shows -
    maybe a bigger dataset from IMDB will give more insights to the movie/TV industry

  • @onajiteewhrudjakpor8913
    @onajiteewhrudjakpor8913 2 роки тому

    Majority of Rajiv Chilaka's Movies were released in the year 2013

  • @rohanbhatt7691
    @rohanbhatt7691 2 роки тому

    Hello guys, I needed to ask something and that is when I do analysis many a times I am not able to make beautiful graphs so I look for others kaggle notebook to take their approach on how to perform analysis for story telling like in this case yours and then apply those analysis on different datasets so is it a good method. Kindly do reply to my question sir.

  • @prajjwalsingh5884
    @prajjwalsingh5884 2 роки тому

    Actually this is real. Not for likes but it is for pure intent to teach.

  • @naineshrathod2392
    @naineshrathod2392 2 роки тому

    Aaaahh !!!! no wait, thats exaclty the same thing. 36:54

  • @tuckercomm
    @tuckercomm 2 роки тому

    Thank you! Your clear and very thorough.

  • @shukritadicha6144
    @shukritadicha6144 5 місяців тому

    pure Gold. Keep going sir

  • @haiderrizvi1710
    @haiderrizvi1710 2 роки тому

    pd.options.display.max_colwidth = 200
    to display full data while printing dataframe

    • @haiderrizvi1710
      @haiderrizvi1710 2 роки тому

      to print titles with certain ratings.
      rated_NC17 = data_import.set_index('title').eq("NC-17")
      rated_NC17.index[rated_NC17['rating']]

  • @MusicaParaCod
    @MusicaParaCod 2 роки тому

    I was looking for a data analysis with real problems like the first 10 minutes of the video, everytime I see videos about data analysis is always a perfect dataset with crear variables and not real problems on it. Thanks for the video.

  • @yonas8212
    @yonas8212 2 роки тому

    Nice tutorial. I wonder what cast count colored by rating would be.

  • @gauravpatwal4204
    @gauravpatwal4204 2 роки тому

    How do I install anaconda on Mac? Please guide me. Also, great video!

    • @ShashankData
      @ShashankData  2 роки тому

      Thanks so much for watching. I go over that here: ua-cam.com/video/LwazHUkU5IQ/v-deo.html

  • @ishandandekar1808
    @ishandandekar1808 2 роки тому

    Hi shashank, just have one request, can you show how you made the anaconda env of python 3.9.7 and don't know why but for the date histogram I was getting all the dates as labels in y-axis, any workaround for that?

  • @rizz_z1380
    @rizz_z1380 2 роки тому

    This is so interesting. This definitely helped me gain more interest in Data Analysis. Thank you.

  • @ishandandekar1808
    @ishandandekar1808 2 роки тому

    Hi shashank, for the type column we could make a pie chart?

  • @tirumaleshn8504
    @tirumaleshn8504 2 роки тому

    This is what aspiring data scientists and data analysts need🙌 Awsm bro

  • @elioenai
    @elioenai 2 роки тому

    Loved this content! Great video, Shashank!

  • @nikitaandriievskyi3448
    @nikitaandriievskyi3448 2 роки тому

    Great video! Please, do more of these:)

  • @shubhamdandekar20
    @shubhamdandekar20 2 роки тому

    Great content, looking forward to it.

  • @TheMISBlog
    @TheMISBlog 2 роки тому

    I love these videos , Thanks Shashank, Good Luck Bro

  • @fernandourrutia2566
    @fernandourrutia2566 2 роки тому

    Great video man !

  • @nelsontous667
    @nelsontous667 2 роки тому

    Thanks for the video..very helpful!.. I found that we can read full strings by setting pandas to pd.set_option('display.max_colwidth', None)

  • @nishushroff9656
    @nishushroff9656 2 роки тому

    What is the insight of this data

    • @Akash_158
      @Akash_158 2 роки тому

      what do u mean???

    • @ShashankData
      @ShashankData  2 роки тому +1

      Hey Nishu great question! This video was really supposed to provide people with some ideas of how to start an EDA on the data. Oftentimes just knowing where to start can be very difficult

    • @nishushroff9656
      @nishushroff9656 2 роки тому

      @@ShashankData thank u so much sir for clearing my doubts...

    • @Riael
      @Riael 6 місяців тому

      I'm here 2 years later with the same question