The first 15 minutes is pure gold. Great insight on what and how to search for something. I'm so tired of other UA-camr's showing what time they wake up, work out, eat, do some work (without actually showing, what they do) and call the video "A Day in the Life of a Data Scientist"
i hope you realize he made a mistake there. if he had used parse_dates['date_added'] instead of date_parser['date_added'] this situation would never have arose. Pandas can identify most d/t formats on its own.
Oh man lmao. I get those recommended all the time. It's always some fancy, over-produced cringeworthy clips of them pouring coffee, turning off their alarm, and taking nap breaks. Oooohhh so cool and unique, a 9to5 job! Wow!.
Really loved the first 20 minutes as other's mentioned, nice to see you forget things as well. Knowing how to find the answers you're looking for is an underrated skill!
Just noticed your channel and you are becoming my favorite creator! Learning data science myself,and I have my first job interview two days from now your videos make my anxiety go away!
I'm really excited that I found your channel! I actually start a MS Business Analytics and Data Science degree this January! Focusing on marketing analytics! Can't wait to watch this channel continue to grow.
Hi Shashank This has been of great help in understanding the process of data cleaning and eda. I was stuck with a bit of multi index column data for a couple of days but your enthusiasm with this was inspiring and helped me push forward. Thank you.
I enjoy these videos alot! Watching you fail (and finding a solution!) makes me feel confident, that it's just normal to research even small things and I actually learn alot from it. Something I miss from other videos where everything just works perfectly fine. Thanks and keep up the great work
I have watched some EDA sessions on UA-cam almost everyone made it look tough and boring but after watching your session I feel like EDA is more fun and exciting part of the Data Science Process.
for getting month data['month_name'] = data["date_added"].dt.month_name(local='English') it'll give us month name like.. January , February, march , April......., December
Really liked how you are solving this... unlike other videos where every query works perfectly fine. Whenever I start coding after watching those videos I feel like am I the only one not able to write any line of code prefect at once. but after this got to know that this is every coder struggle and stack overflow is the ultimate destination to find problems. 😂
I enjoy watching every bit of this video. gives a little more confidence in my analysis. feels like even well established data scientists go through almost the same problems.
awesome video man! you showed how the real thing works... as the guy below commented i'm tired of these ppl who knows everything... anyways. awesome video again, greetings from brazil
Hahaha hey thank you so much for commenting! Yes this channel is all about showing you how the real work of a data analyst is. I have a bunch of free tutorials and other videos about this as well if you’re interested
Love these cold coding videos they are much like my life 5-9. Another thing have you considered switching to a 4K monitor, I think you will like it. 16:9 makes life easier and the extra pixel height with 2160px mean less scrolling code. 31/32" seems to give a comfortable size at full res.
Awesome video man.... tbh I think you're a great actor... going thru this process to teach people was done well, because obviously I know you know how to define the style of a parsed date in a dataframe without looking it up lol
These type of videos are of great value! Would be cool if you'd start a series and then label them in the process. For example 1 big advanced dataset from start to finish, beginning with the most basic of steps and ending in the most advanced way of analysing. We could all save locally and resume working on the file everytime a new video on the series drops :)
Shas, i saw your old video doing a data analysis with a similar data. To fix the data format problem you just wrote: pd.to_datetime(data['date_added']) and the problem was fixed. I got the same dtype doing this code, or i miss something? Ps: Thank you very much for your videos! I am watching all of them, it is being hugely important since practice is essential for learning this kind of stuff! Marry christmas!
At 47 Min, you can sort by Movie to show if Multiple countries have correctly been assigned to the same movie and correct number of rows are added. Eg: Sankofa has 6 Languages, so Sankofa needs to have 6 rows
at 26:55 you can do data_import[data_import['release_year' == 1925][['description'] and that will pull up the data in a dataframe format, I think thats what you were trying to do?
Pls , Suggest Such : Exact Job Roles in Data Science where in Every Day after working Hrs , I'll get Enough Time for UPSC IAS Govt. Exam Preperation . Pls Suggest & Guide me 🙏🙏.
You are one of my mentor and I look up to you, please can I interview you on a project for my class in pace111? Its an assignment for informational interview pleaseee? Thank you
Just use pd.to_datetime on the date_added column like this pd.to_datetime(data_import['time_added'], inplace=True) and it does the conversion to datetime format
hey guys, exactly in 4:47 my jupyter notebooks doesnt show this little square showing all the info about the pd.read_csv() function, i think i messed up something, anyone having the same issue?
Thanks for the video. I learned so much from it. Watching someone do the actual work is so helpful. I especially like how you show researching the different function you need.
Thanks so much for the support! I was planning on putting the book away because a lot of the content after chapter 3 is covered in my Hands on Machine Learning guide. I might take a look at it again thanks for the suggestion and feedback.
Thanks sir , from this video i got some clear idea about eda .. Everytime i was thinking like 'from where to start?" now i got clear idea.. eda is nothing but digging data as we can.. Thank you sir for your efforts for us ...!! 🙏 ❤️ waiting for next video...
Whow amazing amazing video! So helpful! Cannot wait to see your data cleaning video. My entire masters is based on R unfortunately but the concepts of data analytics is so useful beyond belief! Thank you so so so so so so so so much!
Funny. Today at work I also did an exploratory data analysis and needed all the functions you showed :D But I had to read in a messy log file and it was necessary to split all the information for certain columns -> which can be very annoying... I teached myself to code with python and pandas and your video gave me the feedback that I am on the right path! Thanks :)
Amazing as usual, really learned something new like the use of plotly and melt() but i think this dataset is lacking more needed info like (the number of views, or user rating, genre of the show) so you can draw more conclusions like - what is the most popular genre or is user rating (1 to 5) for most viewed shows - maybe a bigger dataset from IMDB will give more insights to the movie/TV industry
Hello guys, I needed to ask something and that is when I do analysis many a times I am not able to make beautiful graphs so I look for others kaggle notebook to take their approach on how to perform analysis for story telling like in this case yours and then apply those analysis on different datasets so is it a good method. Kindly do reply to my question sir.
I was looking for a data analysis with real problems like the first 10 minutes of the video, everytime I see videos about data analysis is always a perfect dataset with crear variables and not real problems on it. Thanks for the video.
Hi shashank, just have one request, can you show how you made the anaconda env of python 3.9.7 and don't know why but for the date histogram I was getting all the dates as labels in y-axis, any workaround for that?
Hey Nishu great question! This video was really supposed to provide people with some ideas of how to start an EDA on the data. Oftentimes just knowing where to start can be very difficult
The first 15 minutes is pure gold. Great insight on what and how to search for something. I'm so tired of other UA-camr's showing what time they wake up, work out, eat, do some work (without actually showing, what they do) and call the video "A Day in the Life of a Data Scientist"
i hope you realize he made a mistake there. if he had used parse_dates['date_added'] instead of date_parser['date_added'] this situation would never have arose. Pandas can identify most d/t formats on its own.
yeah hes very practical and unique
@@hypnyx yes I noticed that but since he shows his work unlike the others it can be forgiven.
and actually I was struggling with the SAME dates problem before watching his video...so helpful
Oh man lmao. I get those recommended all the time. It's always some fancy, over-produced cringeworthy clips of them pouring coffee, turning off their alarm, and taking nap breaks. Oooohhh so cool and unique, a 9to5 job! Wow!.
Really loved the first 20 minutes as other's mentioned, nice to see you forget things as well. Knowing how to find the answers you're looking for is an underrated skill!
As a data analyst, i can tell this is a great video by watching the first 20 mins. Awesome demonstration of basic EDA!!
I am a beginner, how the video helped to get insights? I mean what's comes after this?
Just noticed your channel and you are becoming my favorite creator!
Learning data science myself,and I have my first job interview two days from now your videos make my anxiety go away!
I'm really excited that I found your channel! I actually start a MS Business Analytics and Data Science degree this January! Focusing on marketing analytics! Can't wait to watch this channel continue to grow.
Hi Shashank
This has been of great help in understanding the process of data cleaning and eda.
I was stuck with a bit of multi index column data for a couple of days but your enthusiasm with this was inspiring and helped me push forward.
Thank you.
love how you guide us step by step. Keep it up the good work man. Really appreciate it
Of course! I want to show people that the process takes a lot of looking up and going back, it’s not linear progress.
amazingly good, honestly saying this is the perfect channel I have been looking for a few months
I enjoy these videos alot! Watching you fail (and finding a solution!) makes me feel confident, that it's just normal to research even small things and I actually learn alot from it. Something I miss from other videos where everything just works perfectly fine.
Thanks and keep up the great work
Thanks so much Leon! I try and keep as much of that in to show people the process.
Thank you so much for putting this work over here. This channel is so different from all other walkthroughs. The real scenario.
I have watched some EDA sessions on UA-cam almost everyone made it look tough and boring but after watching your session I feel like EDA is more fun and exciting part of the Data Science Process.
I love the way how you, Google the problems you get and show us how to do it!!!
for getting month
data['month_name'] = data["date_added"].dt.month_name(local='English')
it'll give us month name like.. January , February, march , April......., December
Really liked how you are solving this... unlike other videos where every query works perfectly fine.
Whenever I start coding after watching those videos I feel like am I the only one not able to write any line of code prefect at once. but after this got to know that this is every coder struggle and stack overflow is the ultimate destination to find problems. 😂
🤣
Literal gold mine of a channel.
This is the content i've been looking for
You keep bringing the best videos like GOD for me 🙇🏾♀️
Thanks for the support Sam!
I enjoy watching every bit of this video. gives a little more confidence in my analysis. feels like even well established data scientists go through almost the same problems.
I love this video, such interest information from the dataset. Thanks for sharing this, awesome work and nice example of a quickly EDA.
Dude you rock. Thanks a lot for the videos.
This was such a fantastic video of the process. Can't thank you enough for this view into your world!
Hey Shashank, great vid! Very interesting and valuable EDA examples. Can’t wait to see more!
awesome video man! you showed how the real thing works... as the guy below commented i'm tired of these ppl who knows everything... anyways. awesome video again, greetings from brazil
Hahaha hey thank you so much for commenting! Yes this channel is all about showing you how the real work of a data analyst is. I have a bunch of free tutorials and other videos about this as well if you’re interested
@@ShashankData sure. I'll be watching them. Thanks
Love these cold coding videos they are much like my life 5-9. Another thing have you considered switching to a 4K monitor, I think you will like it. 16:9 makes life easier and the extra pixel height with 2160px mean less scrolling code. 31/32" seems to give a comfortable size at full res.
Amazing video. Going to check out your patreon. Keep it up brother 🙏🏾
keep up the amazing work, you are a great teacher and I'm sure you'll get bigger and bigger in no time! Congrats on the channel and the video.
Enjoying it with coffee :)
P.S. Great tips on your setup.
Awesome video!.. Very clear explanation and how to search what you need. Keep up with the series...
Very, very nice! I love how you search some answers! Thanks for the video!
Ofc! I want to show everyone that even experienced analysts are always looking stuff up
Good shit man !love this stuff
My favorite type of the videos 😍😍 thanks Shashank
Awesome video man.... tbh I think you're a great actor... going thru this process to teach people was done well, because obviously I know you know how to define the style of a parsed date in a dataframe without looking it up lol
These type of videos are of great value! Would be cool if you'd start a series and then label them in the process. For example 1 big advanced dataset from start to finish, beginning with the most basic of steps and ending in the most advanced way of analysing. We could all save locally and resume working on the file everytime a new video on the series drops :)
This is an absolutely amazing idea, I think this will become the next hit series on the channel. I'll see if I can get a video on this out ASAP
Might use this Dataset
www.kaggle.com/rohanrao/formula-1-world-championship-1950-2020?select=constructor_standings.csv
@@ShashankData Damn that's a huge one. Could be a potential candidate i'd say. Thanks bhai!
Loved watching this
Thank you so much for watching
did visual studio throw a tantrum when you said "the beauty of jupyter notebook" while using it xD
Majority of Rajiv Chilaka's Movies were added on this date 22nd July, 2021
Man this is so good
Great info.
Shas, i saw your old video doing a data analysis with a similar data. To fix the data format problem you just wrote:
pd.to_datetime(data['date_added'])
and the problem was fixed. I got the same dtype doing this code, or i miss something?
Ps: Thank you very much for your videos! I am watching all of them, it is being hugely important since practice is essential for learning this kind of stuff! Marry christmas!
we can use following line for getting description column..
data [data ['release_year'] == 1925]. description
At 47 Min, you can sort by Movie to show if Multiple countries have correctly been assigned to the same movie and correct number of rows are added. Eg: Sankofa has 6 Languages, so Sankofa needs to have 6 rows
Anupam Kher seems to be the actor with the most number of movies on Netflix (39), pretty impressive EDA!
when you split the country column by "," all the movies with one country became NaN value, so all this records weren't calculated in your analysis
Could you suggest a solution? Ty
The first 15 minutes :) parse_dates=True argument would have done the job
Thanks!
For Duration Rajiv Chilaka's longest movie is 87 minutes that is an hour and 27 minutes
where was this video before my Blackstone Interview
Shashank you are the best man 👏👏👏👏👏👏👏👏👏👏👏
Thanks so much! Let me know if there’s any other content you’d like to see
57:40 will be error ('same type float' ) thanks for 1 hour my time to find another way (remember basic)
Director Rajiv Chilaka has the most number of movies
at 26:55 you can do data_import[data_import['release_year' == 1925][['description'] and that will pull up the data in a dataframe format, I think thats what you were trying to do?
we can plot bar chart on 'type' column in order to find what type of movies they released most
That’s an excellent idea!
Rajiv Chilaka has movies mostly listed as Children & Family Movies
These are some great insights! Amazing job
@@ShashankData Thanks
Great Video!
where is the link of the video where you have done data cleaning like you mentioned in the beginning of this video?
Nice video! Do you have a tutorial on how to setup an IDE for data Science/Analytics in VSCode?
Yes here it is: ua-cam.com/video/LwazHUkU5IQ/v-deo.html
@@ShashankData Thank you so much!
Hie Shashank can please help in how to add plotly.scatter_geo chart for the country_count dataframe. thanks in advance!!
Pls , Suggest Such : Exact Job Roles in Data Science where in Every Day after working Hrs , I'll get Enough Time for UPSC IAS Govt. Exam Preperation .
Pls Suggest & Guide me 🙏🙏.
I see lot of null values in the dataset. Please upload a video on handling null values
Hey, do you have Kerala roots?
Jupyter notebook link
You are one of my mentor and I look up to you, please can I interview you on a project for my class in pace111? Its an assignment for informational interview pleaseee? Thank you
Just use pd.to_datetime on the date_added column like this pd.to_datetime(data_import['time_added'], inplace=True) and it does the conversion to datetime format
I wasn't able to find the video that you used to show how you setup the VS code environment. Could you send a link for that please?
Here it is: ua-cam.com/video/LwazHUkU5IQ/v-deo.html
@@ShashankData Thanks!
hey guys, exactly in 4:47 my jupyter notebooks doesnt show this little square showing all the info about the pd.read_csv() function, i think i messed up something, anyone having the same issue?
Wouldn’t the argument parse_dates=True do the trick in the beginning!? That would have saved you 15 mins.
Simplesmente, o "dataframe sem sujeira" da ciência de dados.
(entendedores entenderão)
Free resources for data analysis?
Can i make my Final project on "EDA on covid" ??
100%! Do you have a dataset you’re interested in?
@@ShashankData yes
@@Akash_158 what’s the dataset you’re looking at?
@@ShashankData kaegle covid 19 in india
bro r u Tamil origin, if yes just put a video about yourself
coulda saved 10 minutes if you changed date parser to parse dates in the beginning
UA-cam held your comment for review
Thank you sir, this video helped me develop a better process while doing my own EDA
Thanks for the video. I learned so much from it. Watching someone do the actual work is so helpful. I especially like how you show researching the different function you need.
Thanks for the support!
Great vid shashank :D, please continue the practical statistics for data scientists book, been waiting for part 4
Thanks so much for the support! I was planning on putting the book away because a lot of the content after chapter 3 is covered in my Hands on Machine Learning guide. I might take a look at it again thanks for the suggestion and feedback.
Thanks sir , from this video i got some clear idea about eda .. Everytime i was thinking like 'from where to start?" now i got clear idea.. eda is nothing but digging data as we can..
Thank you sir for your efforts for us ...!! 🙏 ❤️
waiting for next video...
Yeah! There’s no real set process, just start somewhere and keep asking questions
Whow amazing amazing video! So helpful! Cannot wait to see your data cleaning video. My entire masters is based on R unfortunately but the concepts of data analytics is so useful beyond belief! Thank you so so so so so so so so much!
What country are you studying in ?
Can you email me your syllabus
Funny. Today at work I also did an exploratory data analysis and needed all the functions you showed :D But I had to read in a messy log file and it was necessary to split all the information for certain columns -> which can be very annoying... I teached myself to code with python and pandas and your video gave me the feedback that I am on the right path! Thanks :)
This is some quality content and is realistic in terms of how a data analyst tackles a data problem. Can’t wait to check out your other videos
Amazing as usual, really learned something new like the use of plotly and melt()
but i think this dataset is lacking more needed info like (the number of views, or user rating, genre of the show) so you can draw more conclusions like - what is the most popular genre or is user rating (1 to 5) for most viewed shows -
maybe a bigger dataset from IMDB will give more insights to the movie/TV industry
Majority of Rajiv Chilaka's Movies were released in the year 2013
Hello guys, I needed to ask something and that is when I do analysis many a times I am not able to make beautiful graphs so I look for others kaggle notebook to take their approach on how to perform analysis for story telling like in this case yours and then apply those analysis on different datasets so is it a good method. Kindly do reply to my question sir.
Actually this is real. Not for likes but it is for pure intent to teach.
Aaaahh !!!! no wait, thats exaclty the same thing. 36:54
Thank you! Your clear and very thorough.
pure Gold. Keep going sir
pd.options.display.max_colwidth = 200
to display full data while printing dataframe
to print titles with certain ratings.
rated_NC17 = data_import.set_index('title').eq("NC-17")
rated_NC17.index[rated_NC17['rating']]
I was looking for a data analysis with real problems like the first 10 minutes of the video, everytime I see videos about data analysis is always a perfect dataset with crear variables and not real problems on it. Thanks for the video.
Nice tutorial. I wonder what cast count colored by rating would be.
How do I install anaconda on Mac? Please guide me. Also, great video!
Thanks so much for watching. I go over that here: ua-cam.com/video/LwazHUkU5IQ/v-deo.html
Hi shashank, just have one request, can you show how you made the anaconda env of python 3.9.7 and don't know why but for the date histogram I was getting all the dates as labels in y-axis, any workaround for that?
This is so interesting. This definitely helped me gain more interest in Data Analysis. Thank you.
Hi shashank, for the type column we could make a pie chart?
This is what aspiring data scientists and data analysts need🙌 Awsm bro
Loved this content! Great video, Shashank!
Great video! Please, do more of these:)
Great content, looking forward to it.
I love these videos , Thanks Shashank, Good Luck Bro
Thank you so much !
Great video man !
Thanks for the video..very helpful!.. I found that we can read full strings by setting pandas to pd.set_option('display.max_colwidth', None)
pd.set_option(“display.max_columns, None)
Or _rows
What is the insight of this data
what do u mean???
Hey Nishu great question! This video was really supposed to provide people with some ideas of how to start an EDA on the data. Oftentimes just knowing where to start can be very difficult
@@ShashankData thank u so much sir for clearing my doubts...
I'm here 2 years later with the same question