Solving real world data science tasks with Python Beautiful Soup! (movie dataset creation)
Вставка
- Опубліковано 15 чер 2024
- Data is everywhere! Enhance your career and acquire new skills by taking a course on DataCamp! Click here to take the first chapter of any course for FREE: bit.ly/36lKg44 (you’ll be supporting my channel too!)
In this video we scrape Wikipedia pages to create a dataset on Disney movies.
The video is formatted with tasks for you to try to solve on your own throughout. For the best learning experience, at each task you should pause the video, try the task on your own, and then resume when you want to see how I would solve it.
We cover a wide range of Python & data science topics in this video. They include:
- Web scraping with BeautifulSoup
- Cleaning data
- Testing code with Pytest
- Pattern matching with regular expressions (Re library)
- Working with dates (datetime library)
- Saving & loading data with Pickle library
- Accessing data from an API using Requests library
Link to code & datasets: github.com/KeithGalli/disney-...
Previous tutorial on Beautiful Soup: • Comprehensive Python B...
If you enjoyed this video, make sure to like & subscribe :)
This video was sponsored by DataCamp
---------------------
Video timeline!
0:00 - Video overview
1:58 - Check out DataCamp! (sponsored)
3:12 - Setup
Task #1: Scrape the infobox from Toy Story 3 wiki page (save in python dictionary) (4:24)
Link: en.wikipedia.org/wiki/Toy_Sto...
Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries) (28:52)
Link: en.wikipedia.org/wiki/List_of...
30:30 - Robots.txt (Are you allowed to scrape a site?)
32:52 - Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries)
57:27 - Save & Load dataset checkpoint (JSON file)
Task #3: Clean our data! (1:02:04)
1:09:28 - Task #3.1: Strip out all references ([1],[2],etc) from HTML
1:16:39 - Task #3.2: Split up the long strings
1:25:02 - Task #3.3: Examine errors we are getting
1:30:27 - Task #3.4: Convert “Running time” field to an integer
1:44:57 - Task #3.5: Convert “Budget” & “Box office” fields to floats
2:33:53 - Task #3.6: Convert dates into datetime objects
2:47:36 - Saving our data again (using Pickle)
Task #4: Attach IMDB, Metascore, and Rotten Tomatoes scores to dataset (working with APIs) (2:53:18)
Task #5: Save final dataset as a JSON file and as a CSV file (3:13:48)
---------------------
Extra resources!
Setup Jupyter notebook: jupyter.readthedocs.io/en/lat...
Google Colab (cloud-based notebook): colab.research.google.com/
Learn regular expressions: • Python Tutorial: re Mo...
Practice your Python Pandas data science skills with problems on StrataScratch!
stratascratch.com/?via=keith
Join the Python Army to get access to perks!
UA-cam - / @keithgalli
Patreon - / keithgalli
---------------------
Follow me on social media!
Instagram | / keithgalli
Twitter | / keithgalli
If you are curious to learn how I make my tutorials, check out this video: • How to Make a High Qua...
*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.
Hey everyone! Been a while, but happy to be back 😊. Spent a while putting this video together so hope you enjoy it!
For more great learning resources (and to support my channel in the process), be sure to check out DataCamp. Click here to take the first chapter of any course for FREE: bit.ly/36lKg44
As always if you have any questions or suggestions for future videos, feel free to let me know here in the comments!
Welcome back,I enjoyed your video and learned a lot,thank youuu 💛keep up the good work💛
@@spiritedaway99 glad to hear it! Thanks for the kind words :)
Hi Keith, I've been on this bs4 project but navigating through "load more" button has been a great challenge. Checked stack overflow and web, more solutions ain't working while others suggested selenium which I don't wanna use (if that's the only solution, I will).
How have you been able to get more data embedded in load more button?
could anyone help me in writing description for this project with all the library mentioned, so that I can put it on my resume.
Hi! :) Just want to say a huge thank you! I have found your videos as a total beginner, so thanks to this comment, I found DataCamp. Courses there are just great, and students have 3 months trial for free with GitHub!
I am 24 and unemployed and desperately applying for jobs while daily watching Keith's videos and working on projects to shape up my CV. I don't know when will I get the breakthrough but I will definitely appreciate the honest work by Keith. I am falling in love with Data Science with each passing day. I hope someday I will be able to enter a job in Data Science. Much Love and Peace.
I wish you the best of luck! Keep at it, you'll get the breakthrough soon enough.
Same here. I'm 24 and I can't get a job with the course I studied in the university. Then I stumbled into data science and analytics. Now I know how to use several softwares. But I'm yet to apply them to real life jobs.
Keep that attitude, you'll get there. I'm a marketer which means i need to work with data everyday and keith's videos teach me a lot.
and how it's going so far bruh?
@@ukasz8631 mordeczko zajety jest dostal prace
Taking a data science bootcamp. I learned so much more from you than my program. They only explain concepts. You explain your thought process which is much more valuable. You’re a true blessing! Thank you so much!
Indeed
Probably one of the best data cleaning and analysis tutorials on youtube. Clean and concise, straight to the point.
Heyy, I really love the way you also show the behind the scenes process, it really teaches a lot and separates your tutorials from typical tutorials :)
You have seriously pushed me to a whole new level with Python. Thank you for your great videos and resources, man. Writing my thesis this spring, and your help has made me efficient enough to actually rely on Python for my data crunching.
Awesome Keith! I can't thank you enough for sharing your knowledge and skills to us at no cost. God bless you and more power!
Thanks for taking the time to make a scraping step by step video. Many other python videos exist but I love how you take the time to stop and explain things step by step. That has helped my learning journey so much! Really appreciate your hard work. Keep it up !
for months i have been struggling in how to structure my learning journey through projects and ive finally found you, THANKS MAN
Best instructor ever.
Dude your lecturing skills are priceless.
Amazing content ❤
Very long, very educative video here Keith. I have watched and followed along over a period of 2 weeks and its been worth my while. As a newbie in to the Data Science and Engineering world, I truly appreciate the work you put into these videos. Thanks a lot Keith
Hi Keith,
Thanks for making this video, made it much easy to understand the flow and practical use of bs4 for gathering data.
Great work.
Brilliant content and presentation style, Keith. I got everything working except extracting the API key from the Environment variable (ended up hard coding which worked). Thanks again!
i just want to say all ur vids has been sooo helpful. honestly. thank YOU
Hi Keith,
I am a final semester MS Data Science student and I really loved how well you explained complex tasks. You searching for code help in google to solve the problem made the videos more realistic and relatable. Thank you for sharing. Subscribed!
Love to watch this video and have learned a lot from you. Thank you so much for your kind work. You shine! 🙌
Bro , Love the way you Talk to yourself when figuring out a problem because that gives me an insight on how a professional thinks! Secondly the thing that you dont crop out the parts when you run into problems is veryyy helpful . Thanks man!
Can't think of any video better than this! Keep it up, man, we need these videos!
So much value in each video. You definitely found your niche.
Oh great. Thank you very much for another great video. I was just stoping learning Data Science then I watched your videos on Pandas and Numpy. Thank you for bringing me back to Data Science.
Such a great job Keith, really appreciate how you explain things in such a cool manner and in the most practical manner you do the very regular Google search, so that novices like myself can understand and relate very well. Your command in Python is commendable, Good job, keep it p and god bless.
Well structured tutorial with live bug solving !! This is actually what anybody should refer to !!
Thank you so much. This is a very big help in understanding the step-by-step data creation.
Thank you Keith. I am a beginner and I really enjoyed this. Keep up the good work.
Amazing! thank you for putting this out there, cannot wait to follow along.
Phew!! That was one incredible tutorial!! A BIG THUMBS UP!!
Hi Keith, I am a student currently in college studying geophysics. I just want to say thank you very much for your videos it helps me a lot. Big love from Indonesia. Again, thank you. (also I know this is really specific to my range of study, but I would love to learn 3D data modelling with python)
Thanks. Following you through the whole process is great learning experience, especially when you stop solving the little bugs underway which always will be there in webscraping. Well explained, structured, realistic and a gift for people who want to learn the topic - keep up the good work!
Thank you for the kind words! Glad you enjoyed :)
I don't deserve this. Thank you, Keith. I'll work on a little each day and finish by end of October lol.
Haha that seems like a good plan to me! The video isn't going anywhere :)
First time got to know Regular expression is such a powerful tool !! Thanks a ton!!
Hi! Thanks for this great video, i've been looking at your python tutorials and they are great!, thank You so much. Regards from Colombia Latam.
This is amazing magic Keith. Your tutorials helping me to become pro in Python and keeping me ahead of many of my friends. Thanks a ton SIR...
1. Get a head start
2. Flex on your friends by sharing this video and your results
3. Increase Keith's ad revenue
4. Becoming elite data scientists with your friends
Thank for such an easy to follow along video. Its my first lesson in beautiful soup
Dude your tutorials are awesome, keep it up!!!
An Absolutely amazing video. Followed it till the end and learned alot. Keep them coming please
Thank you so much for this. I took my time on this and have learned so much from this project. I really appreciate it!
Hey Keith.....Thanks for putting in this effort. It is truly appreciated.
A/B Testing! This would be a great next project, please do this
Will look into it!
Keith Galli yo thanks a ton! Been learning through you for a couple months now and I’ve improved so much.
@@MashiroRedo You're very welcome! Happy to hear the videos have been helpful :)
@@KeithGalli please do A/B testing project bro
@@akshaykumarsingh9770 yes I would love to see a real world A/B test project. Also the Seaborn library would be a nice tutorial.
Thank you so much! Now I have the confidence to do projects on my own, you changed my life. It would be great if you could do videos on Tableau! :)
Thank you for these videos Keith. Just finished my masters in physics and wanted to brush up on some python, your video style, coding expertise and enthusiastic approach is top-notch. SUBSCRIBED! **
Thanks for subscribing! I appreciate the kind words :)
I really learn so much from you, please keep making all this amazing video
Thanks alot Mr Keith Galli.......Your videos are extremely good.
Thanks for this fantastic leasson. It's so much useful a long video focus on solving real problem, besides the 'tutorial libs' videos. thanks again!
Watching him feels like pair programming with a good friend. 👍🏻👍🏻👍🏻
Thanks for your sharing, which indeed have been adding value to my data science carrier...
What a perfect timing. I was thinking of the same thing but instead of movie I wanted to scrape game information. I was looking for UA-cam to find something. And you just uploaded it in a perfect time. Thanks dude 😀
Haha that's awesome! Hope you enjoy it :)
Hey Keith, learning from your project tutorials for a month than any other online course. They are really really helpful. Can you make an intro tutorial on SQL and project tutorials too...?
I love learning from please keep it up your videos are very very fruitful thank you a lot !!
a couple of years ago (5 years) i did a pokemon pandas tutorial from you and it totally got me into the world of data science. i came back to say thanks for that tutorial. It really helped me. now am a python instructor.
Love it!! Thanks for the message. Glad I could play a small part in your journey. Congrats on being an instructor now.
Great tutorial man, love how you find content_key and and use as the key in the dict.
Dude your channel is a mine of gold
A very big thumbs up for you! Totally enjoyed it! Thank you so much for your effort!
You're very welcome! Glad you enjoyed :)
Really like this messy, real world, non-toy example. All my scrapping project get messy quick with dirty looking edge case handling. Glad to know I’m not actually doing it wrong, it’s apparently just part of the process.
Good project...Keith. Loved it very much. More expected.
This is great! I was testing my scraping skills on some wikipedia pages, but I couldn't find anything as rich as this. Thanks, I will enjoy watching this.
Glad it was helpful!
Excellent video. I enjoyed every minute of it and look forward to the next one. Thanks for your hard work!
Glad you enjoyed and you are very welcome!!
oh, man! you made my day - though I've just completed Task 1!
Thanks man! Learned a lot! Keep it up!
Hey! I love the videos they are super helpful in giving me the info to start my own projects. What would you think about doing regression videos using sklearn library (or a better library)? I can't find anything good on the internet that actually helps me learn how use it for myself later on.
Edit: I finally figured it out and it was surprisingly simple to do linear regression with just a few lines of code. Some regression analysis videos would still be awesome though.
yessssssss!!!
Glad to see your videos man, they have really helped me a lot. Keep it going!
Happy to hear that :). More to come!
Awesome
Thanks for sharing something good which help me to improve skills 🙂
So informative, just absolutely love to see it
Thank bro!! 🙏
Thank you for sharing! You are amazing. It would be great if you make videos about docker, and spark.
I am happy for you that back, I watched all new videoes , I am waiting specially for this dataset analysis.....
Thanks for such a wonderful video. This is very helpful
Data automation & Scheduling ! This would be a great next project, please do this keith
Yeah I agree something to do with automation/scheduling is a great idea. I'll look into it!
Very good video! Thanks a lot. Greetings from Chile!
Very clever what you did with the currency conversion. Had to twist my head around a few time before i got the hang of it 🙂
Hey Keith, great tuto !
I once tried to webscrap a book data base with a webbrowser addon, it was awfull. I'll check your other videos no doubt !
Man you are gold! Thank you!
Hey Keith, please do SQL tutorial!
Wow, great video. Thank you.
Exactly what I was looking for.
Really loved your videos! Helped a lot! Thank you so much. And btw interested in making any videos about ab testing or data engineering? 😊
You are amazing man keep up the great work
Awesome video!! Thanks for the effort.
Bro Really Superb, I really learned pandas because of you your realtime problem solving helped me a lot , put lot more videos , great job continue it ❤️❤️❤️❤️
awesome content keith , learned a lot ! Keep such videos coming !
We are infinitely indebted to you.
Thanks for sharing this wonderful content 🙏🏽🔥.
it gonna very exciting if you do an analysis report of Marketing Analytics. Thank you for making this video
Great video, excellent explanation. Could you make more video in solving real world data science projects? I would like to learn more from you. And could you make another video in cleaning data in Python? Thank you so much. Wish you all the best.
Keith, you are the GOAT
wonderful work in putting together this very easy to understand tutorial. a big thank you for me.
would be great for a follow up is to put this on a schedule and load this onto a Heroku server or in a docker file for a home NAS to run. this would be great to have a periodic scraping of news or Covid update data to be sent to our own phones via Telegram or Slack or Discord.
Hi Keith , thanks for the explanations you do it helps a lot
a want to ask a question:
I am following this playlist in your channel (data science), do you think just following and understanding the steps is enough or should i repeat them in order to learn better ?
Awesome tutorial thank you!
Hi Keith and Thank you for your great videos. Since you are interested in data science do you think that you should have gone for a master in data science or do you think that going for msc in AI really has many similarities?
3 hours well spend ,thank you @Keith Galli
Extremely 🥇 wonderful, Go ahead…
🙌🏻 looking forward to more of these data science works
Glad to hear it! :)
Keith Galli your sales analysis video is one of my favs. 💯❤️
kudos to your efforts!
Hey Brother I just love you for the amazing work....keep it up
I appreciate the support 😊
Nice , Your work is really great ,i want more videos like this!!!!!!!
Thank you! Hope to make more videos like this fairly soon
Sir your explanations are great and i heavily depend on your videos. please consider making a SciPy tutorial also as i'm not able to understand what my teachers are teaching. thank you sir
I don't know , but I learned a lot from your videos .....
Thanks keith
Happy to hear you learned a lot from the videos!
good it's a very real python web scraping. nice job
Thank you!
Thanks for this!👏🏻
Hi Keith,
First of all CONGRATS for all your great work, not only in this video but in general explaining all the path of the Data Science section within your profile.
I got one question regarding to the exercise #2:
I am using a different wikipedia link (similar to the Disney movies one) and when I try to print the info_list, this one appears empty. Do you know the reason? I guess it´s because the link format is slightly different but still I should be able to print the list.
Thanks in advance!
Why I didn’t find this channel earlier.❤️❤️❤️❤️❤️
you a blessing bro, thank you
Thank you for the kind words!