Comprehensive Python Beautiful Soup Web Scraping Tutorial! (find/find_all, css select, scrape table)

Поділитися
Вставка
  • Опубліковано 31 тра 2024
  • Practice your Python Pandas data science skills with problems on StrataScratch!
    stratascratch.com/?via=keith
    In this video we walk through web scraping in Python using the beautiful soup library. We start with a brief introduction to HTML & CSS and discuss what web scraping is. Next we start getting into the basics of the beautiful soup library. This includes how to load a webpage, the basic commands you need to know such as find & find_all, grabbing strings from an HTML elements, etc. The final section of this tutorial is a series of exercises where you can practice your skills. In this section we scrape a webpage for links, we learn how to scrape a table and load it into a pandas dataframe, and we see how you can scrape & download a web image. Hope you enjoy!
    I’m looking into making future videos on more complex things you can do with web scraping as well as other libraries that are helpful such as Selenium & ScraPy. Subscribe to not miss those.
    Join the Python Army to get access to perks!
    UA-cam - / @keithgalli
    Patreon - / keithgalli
    ---------------------
    Resources used in this video
    Simple webpage: keithgalli.github.io/web-scra...
    Example webpage: keithgalli.github.io/web-scra...
    Link to source code: github.com/KeithGalli/web-scr...
    Beautiful Soup Documentation: www.crummy.com/software/Beaut...
    CSS Selector Reference: www.w3schools.com/cssref/css_...
    ---------------------
    Learn more about HTML/CSS
    @Traversy Media HTML Crash Course: • HTML Crash Course For ...
    @Traversy Media CSS Crash Course: • CSS Crash Course For A...
    Codecademy: www.codecademy.com/catalog/la...
    ---------------------
    Video timeline!
    0:00 - Intro & Video Overview
    1:09 - What is web scraping?
    3:51 - Introduction to HTML
    Using the beautiful soup library (5:29)
    6:31 - Loading in a webpage (requests library)
    8:21 - Starting to scrape
    9:18 - find & find_all methods
    16:00 - Finding specific text/strings in our HTML (regex)
    18:38 - Select method (CSS path selections)
    25:55 - Grabbing the string/text from an HTML element
    28:17 - Getting a property of HTML element (href, src, id, class, etc)
    29:41 - Code navigation (parents, children, siblings)
    Let’s practice our skills! (33:57)
    35:53 - Exercise #1: Grab all social links on webpage in 3 different ways
    42:09 - Exercise #2: Scrape an HTML table into a Pandas Dataframe
    53:09 - Exercise #3: Grab all fun facts that contain the word “is”
    57:59 - Exercise #4: Use beautiful soup to help download an image from a webpage
    1:04:20 - Exercise #5: Solve the mystery challenge!!!
    ---------------------
    Follow me on social media!
    Instagram | / keithgalli
    Twitter | / keithgalli
    ---------------------
    If you are curious to learn how I make my tutorials, check out this video: • How to Make a High Qua...
    *I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

КОМЕНТАРІ • 448

  • @BennyHarassi
    @BennyHarassi 3 роки тому +429

    Shouts to Keith for giving us all an MIT education without the MIT debt

    • @KeithGalli
      @KeithGalli  3 роки тому +130

      Haha I took one for the team xD

    • @Viralvlogvideos
      @Viralvlogvideos 3 роки тому

      haha

    • @hkemal2743
      @hkemal2743 3 роки тому

      Haha. That was a good one.

    • @krishnahare3638
      @krishnahare3638 3 роки тому

      @@KeithGalli how to start never been good in math 50 years old sitting at home? thnx;-)

    • @tanmaytiwari2450
      @tanmaytiwari2450 2 роки тому

      @@KeithGalli since breaking bad minivans are you know swag 😉

  • @apsilal
    @apsilal 3 роки тому +37

    I paid a bootcamp for learning. But Keith you are way above all that. I understood the concepts from your video only. I owe you man!! Keep going and please don't stop putting up such videos.

    • @KeithGalli
      @KeithGalli  3 роки тому +5

      I appreciate the support! Happy that the videos are helpful

  • @TheFearlessGoat
    @TheFearlessGoat 3 роки тому +6

    I love that you have exercises for us to do in the videos! Learned so much from this.

  • @doomimic315
    @doomimic315 3 роки тому +16

    This tutorial was incredibly helpful! Web scraping is something I've always found interesting but just hadn't been bothered to start learning, yet this video made it easy to understand and covered a huge range of ways to deal with potential problems. Seriously can't thank you enough for this video and will certainly be sticking around for any new tutorials you upload.

  • @symnshah
    @symnshah 3 роки тому +5

    I have watched a couple of other videos on BeautifulSoup but believe me this one from Keith is the best one. Keith will take you from scratch to a decent level. Thank you so much.

  • @dusty6193
    @dusty6193 Місяць тому

    Only a third of the way through this video and I already feel like I understand this better. Thank you, brand new at this

  • @MarceloSantos-nc9wq
    @MarceloSantos-nc9wq 3 роки тому +3

    Keith, many thanks for giving us too many excellent information about hard topics. You do the things seem totally simple to do. Sincerely, your tutorials are the best. Again, thank you so much for sharing all of this with us.

  • @lefu7812
    @lefu7812 3 роки тому +63

    Your tutorials are the best, honestly. Thank you so much for doing this.

    • @KeithGalli
      @KeithGalli  3 роки тому +7

      Glad you enjoy them!! You're very welcome :)

  • @santoshvaidya3752
    @santoshvaidya3752 3 роки тому +3

    This is one of the finest videos i have ever seen on training. You are an amazing trainer and most importantly you are explaining things in very simple english, also with examples or exercises that would give an hands on experience for viewers......thanks.

  • @soesevenonesix
    @soesevenonesix 11 місяців тому

    Keith, your videos are excellent. You are totally getting me through grad school just watching your tutorials. Keep it up!

  • @OgoidRei
    @OgoidRei 3 роки тому +1

    Thank you Keith, amazing content, easy to follow, clear explanition, great exercices (with walkthrough) and love the funny breaks/comments during the video. Followed and like

  • @LoganNinefingers
    @LoganNinefingers 3 роки тому +3

    Keith you'll be the first one I cite when I write my nobel prize winning book or whatever it is nobel prize winners write. Golden content. Gracias!

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 2 роки тому

    The Best thing about your tutorial are that you start from scratch and teach basic and explain each fragment of code with concept. Love from India.

  • @chiranthchangappa6231
    @chiranthchangappa6231 3 роки тому +1

    One of the best web scraping contents I've seen to date. But the ending was hilarious!

  • @rogerwprice
    @rogerwprice 3 роки тому +1

    Another fabulous real-wold tutorial. Thanks for the google and stack overflow searches and the errors with recovery!

  • @investandcyclecheap4890
    @investandcyclecheap4890 3 роки тому +10

    This is such a great tutorial ! I loved being able to pause and figure out the problems on my own. I really learned a lot! Thanks Keith, you rock!

  • @ranveersharma1666
    @ranveersharma1666 3 роки тому +9

    i am from india . we really dont get this quality stuff here.. so thanks to youtue and you.. for spreading wonderful knowledge.. keep rocking !

  • @lokotock
    @lokotock 3 роки тому

    Thanks a lot! Your video are clear and pretty useful! And it’s a joy watching them! I’m glad that I found your channel ✨

  • @khinekhinezaw65
    @khinekhinezaw65 3 роки тому +4

    This is the best web scraping tutorial. Thank you so much!

  • @wahaha108
    @wahaha108 2 роки тому

    The best python video i have ever seen. No wasted words, dive into the important topic. Lol, great!

  • @gavreleric3493
    @gavreleric3493 3 роки тому

    Wow, really impressive. One of the best channel ! Keith you are very clear with your explanations.
    Thank you for sharing your knowledge :)

  • @nasser_omar
    @nasser_omar 2 роки тому +1

    Hi Keith,
    I'm really excited to watch this video. Actually, I used to watch your all Python-related videos, especially the Pandas one.
    Keep going, and I hope to meet you one day.
    THANKS, A LOT

  • @alic
    @alic Рік тому

    Brilliant, amazing channel. Major kudos to you Keith!!

  • @kinwong6383
    @kinwong6383 3 роки тому +3

    CV Update: Web Scraping expert.
    Joke aside what an awesome tutorial. Felt so satisfying to get the secret message with what you taught!!
    Brilliant work!

    • @KeithGalli
      @KeithGalli  3 роки тому +3

      Haha love to hear it! I had a lot of fun putting this one together, so I'm happy to hear that you enjoyed it :)

    • @andyn6053
      @andyn6053 3 роки тому

      @@KeithGalli your tutorials are really appreciated! thanks man :)

  • @user-ke5gm4sf8c
    @user-ke5gm4sf8c 7 місяців тому

    one of the best beautiful soup videos, and really want to say thanks! Keith

  • @modernmistyk4341
    @modernmistyk4341 2 роки тому

    You saved my life, I hope you're getting all the beautiful things in life you deserve

  • @dhruvrathore2022
    @dhruvrathore2022 3 роки тому +32

    Please do a Seaborn Tutorial ! like you did with Pandas, Matplotlib etc. I watched all of them, really glad i found your channel. Simple, informative & on point.

    • @andyn6053
      @andyn6053 3 роки тому

      @Lucas agree, Derek Banas has a great Seaborn tutorial at his channel!

    • @fardinahsan2069
      @fardinahsan2069 2 роки тому

      If you know matplotlib you know most of seaborn, its a matplotlib wrapper. all matplotlib methods work in seaborn too

  • @nallym82
    @nallym82 3 роки тому +1

    I am very glad that I found your videos. I learnt more from you than all other tutorials combined. Please do a tutorial on xlwings. Thank you

  • @lalitsharma-gl4kr
    @lalitsharma-gl4kr 3 роки тому

    Value for time invested in watching your videos. Along with the subject knowledge, we understand how to practically approach a problem. Thanks a ton for sharing your knowledge.

  • @andrewp319
    @andrewp319 3 роки тому +1

    This is by far the best tutorial I have found after searching through the internet for hours. I subscribed just because of this one great video. Please keep doing videos of practical applications of Python. Project tutorials are the best.

  • @adrianapetrova196
    @adrianapetrova196 3 роки тому +8

    The last time I tried to understand BeatifulSoup I gave up. You explain it so easy to understand. Thanks for the hard work and the time you spend on teaching us :)

    • @KeithGalli
      @KeithGalli  3 роки тому +2

      Love to hear it! You are very welcome :)

    • @rahuldavid4831
      @rahuldavid4831 3 роки тому +1

      Me too! It's almost like Keith is a godsend

    • @andyn6053
      @andyn6053 3 роки тому

      @@rahuldavid4831 he sure is :)

  • @adrianobavaresco76
    @adrianobavaresco76 Рік тому

    Thank you Keith! This is the best video that i watch about bs4. 👏👏

  • @Amulya7
    @Amulya7 Рік тому

    Beautiful video Kieth. Loved it.

  • @pablomora7880
    @pablomora7880 3 роки тому

    Well done! First class of Web Scrapping! Awesome

  • @sarahburkhardt2037
    @sarahburkhardt2037 3 роки тому

    Thanks for sharing this! I am mostly just popping in to learn, but this is helping me know how to think about data & see that there are a lot of options.

  • @shin-mg7hn
    @shin-mg7hn Рік тому

    Your video really help a lot to understanding the Beautiful Soup, thank you, Keith!

  • @mohitupadhayay1439
    @mohitupadhayay1439 2 роки тому

    As someone earlier said, Big SHOUT OUT to Keith for getting the community such amazing content!

  • @rodrigomonteiro8780
    @rodrigomonteiro8780 3 роки тому

    Man you save my life. your tutorials are amazing.

  • @Dee-bk3gk
    @Dee-bk3gk 3 роки тому

    You have a lifetime sub from me. Been looking for videos like this for a long time. Keep up with the great content!

  • @rahuldavid4831
    @rahuldavid4831 3 роки тому +8

    Thank you so much for this wonderful tutorial Keith! Words cannot describe how much I am grateful to you for making this gem of a video that covers everything you need to successfully scrape a webpage! Trust me when I tell you that NOBODY HAS MADE A BETTER VIDEO ON BEAUTIFULSOUP than you!!! If I could have the liberty of suggesting future videos, I would love if you made a video about "Regular Expressions". Keep up the good work and God bless!!!

    • @KeithGalli
      @KeithGalli  3 роки тому

      Very happy to hear you enjoyed!! A regex video is a great idea :)

  • @ikki411
    @ikki411 3 роки тому +1

    This tutorial was incredible. I've done 2 Python courses that touched the 'Web Scraping' subject, but I wasn't able to fully understand it. This video was one of the two videos that made me fully understand it, and I couldn't be more happy about it. And finding out the secret message was amazing too :D

    • @h4zmeister
      @h4zmeister 10 місяців тому

      wanna share the other video you found helpful ? :)

  • @futuregootecks
    @futuregootecks Рік тому

    Wow path navigation is so powerful! Thanks for this!

  • @Some_random_guy_16
    @Some_random_guy_16 3 роки тому

    Oh man.. your tasks are excellent. It helped me to get a better confidence in working with soup..

  • @irfanshaikh262
    @irfanshaikh262 3 роки тому +1

    Subscribing, coz I loved it.
    Glad I found you @keith.
    Exploring your channel now.
    Appreciate the way you did it so perfectly making it simpler to understand for me.

  • @PrielCohen1
    @PrielCohen1 11 місяців тому

    Thank you for the video!
    You explain things so clearly

  • @bhupindersingh4347
    @bhupindersingh4347 2 роки тому +1

    This is a very will organized web scrapping tutorial. Thanks for sharing.

  • @manu93ize
    @manu93ize 3 роки тому +1

    by far the best tutorial on youtube for web scraping. you are very good at dumming it down, even total beginner can even understand.
    waiting for NLTK tutorial.
    thank you

  • @esspi9
    @esspi9 3 роки тому +11

    Amazing.
    Thanks Keith!
    Looking forward to the Selenium and scrapy series.

    • @KeithGalli
      @KeithGalli  3 роки тому +3

      You're welcome!

    • @esspi9
      @esspi9 2 роки тому

      @@theduck3126 Try John watson Rooney channel.
      He's got everything covered.

  • @kaustubhgupta12
    @kaustubhgupta12 3 роки тому +17

    When keith do it, its perfect 🤩

    • @KeithGalli
      @KeithGalli  3 роки тому +1

      Aww I appreciate that 😊

  • @kallenmulilonalyanya4181
    @kallenmulilonalyanya4181 Рік тому

    I like how you make simple stuffs that were really scary. Bravo man.

  • @armandoacevedoluna3393
    @armandoacevedoluna3393 3 роки тому

    Yes! Awesome tutorial dude. Looking forward to your next web scraping video. Cheers!

  • @ClaireCodesStuff
    @ClaireCodesStuff 3 роки тому +6

    This is a fantastic tutorial. When I last tried to learn beautiful soup, we were in the awkward transition phase between python 2 and 3 and every tuturial was in python 2 because they hadn't released code for 3 yet. I learned 3 because it was "the future". Of course, I then wanted to use BS so I had try and figure out what I wanted to do in python 2. I gave up in total frustration. This is a crystal clear guide and now I actually understand how it works and how to use it. Thanks Keith!

    • @KeithGalli
      @KeithGalli  3 роки тому +1

      Happy that this tutorial could clarify the details and remove some of that frustration! :)

  • @jamesdavies5386
    @jamesdavies5386 Рік тому

    Hey this tutorial is great! I've been looking for a decent one like it for some time now and I can't believe it took the algorithm this long to show this on my recommended page

  • @abdoooooo8583
    @abdoooooo8583 3 роки тому

    Great video .. and I watched A LOT videos about beautiful soup. Keep going with the series

  • @iklintsov
    @iklintsov 3 роки тому

    best most concise and detailed tutorial on bs

  • @fabianrestrepo82
    @fabianrestrepo82 3 роки тому +2

    Man watching that ending was almost like watching Jack sink, beautiful ending!! keep it up man, great content

  • @muthonigathage263
    @muthonigathage263 Рік тому

    This was a fun video! Thank you Keith Galli.

  • @amranazad4540
    @amranazad4540 3 роки тому

    This guy deserves the world

  • @benlucke7763
    @benlucke7763 2 роки тому

    Thanks for the tutorial Keith! Keep up the great work

  • @ivm6878
    @ivm6878 3 роки тому

    Thank you Keith, love your tutorials ! I was able to solve the last exercise :D

  • @vatsdimri3675
    @vatsdimri3675 2 роки тому +1

    Really learned a lot. Loved the exercises.

  • @carlmerrigan5403
    @carlmerrigan5403 2 роки тому

    Thanks for great tutorial, Keith!

  • @gyugyugyu.1
    @gyugyugyu.1 3 роки тому

    Love your videos im watching them nonstop...thank you❤️❤️

  • @muhammadkazimraza3456
    @muhammadkazimraza3456 Рік тому +1

    Very very good video and great exercises specially last one.
    Thanks for such videos

  • @WondererSeeker
    @WondererSeeker 3 роки тому

    Very good video Keith! Very clear and useful. Thank you.

  • @luchoargentina1
    @luchoargentina1 3 місяці тому

    Son increibles tus videos!! Gracias Keith

  • @unsignedperson476
    @unsignedperson476 3 роки тому

    You are perfect ! You know how to teach. Thank you so much man. Liked your style, and got the subject i have been struggling. Liked and subbed.

  • @zainabkhan5859
    @zainabkhan5859 2 роки тому

    This is exactly what I was looking for. Thumbs up Keith for this awesom video :-)

  • @soumyaranjandash3597
    @soumyaranjandash3597 2 роки тому +1

    Amazing Lecture. Here we understood Everything. Thanks a lot Broo 🔥👍🙂

  • @tralfazy
    @tralfazy 9 місяців тому

    Great video and well done. I learned a lot from it. Thanks Keith!

  • @danniliu2544
    @danniliu2544 2 роки тому +1

    Hi Keith, i second many of the viewers comments. Your tutorial on web scraping is by far one of the best ones out there. Thank you so much for producing this. I do have a question though. Hope you can help clarify, I've not had much success googling. Can you clarify what the difference between select function vs. find_all function? when would you use one over another?

  • @andvad6475
    @andvad6475 3 роки тому

    Thanks Keith. A really great video. Keep them coming, really useful videos I am learning a great deal from you. Many thanks.

  • @pratiksarani4947
    @pratiksarani4947 Рік тому

    wow a fun exercise !! Have a great fun , Next one is the Pandas One

  • @chineduezeofor2481
    @chineduezeofor2481 3 роки тому

    Wow! This is just too good. Thanks for the video Keith

  • @sagebaram5951
    @sagebaram5951 2 роки тому

    How do you know you’ve learned something ?
    Completing the challenge within 1 minute no hints. Thank you so much for all your efforts :)!

  • @sunnywen9483
    @sunnywen9483 3 роки тому +1

    so surprised to find treasure youtuber here, will go through all your perfect tut in my summer holiday, hope that u will gain more and more subscribers~

  • @aagambakliwal3654
    @aagambakliwal3654 3 роки тому

    Thanks alot for the comprehensive tutorial! Really appreciate it

  • @narcwatcher
    @narcwatcher Рік тому

    SEMRUSH For Beginners!! Excellent Video.

  • @victordias8899
    @victordias8899 3 роки тому

    Bro you're great at these videos. Keep it up. I'm very glad I found your channel and I'm learning a lot from you.
    Regarding the task of getting the "is" from fun-facts, you can get them by this simple one liner:
    [li.get_text() for li in webpage.select('ul.fun-facts li') if 'is' in li.get_text()]
    no regex, no extra loops... just plain string methods with list comprehension!

  • @bernardobritto8352
    @bernardobritto8352 3 роки тому

    LOL, loved the secret message. Great work, thanks for the video

  • @jatinkumar4410
    @jatinkumar4410 3 роки тому

    As usual.... Awesome Tutorial!!!

  • @vargabh8180
    @vargabh8180 3 роки тому

    It was my first time learning from you and I must say it was pretty awesome:-)

  • @julianaaguiar6375
    @julianaaguiar6375 3 роки тому

    The best videos! Love your videos and way to present the ideas.

  • @schoolstudentarea4199
    @schoolstudentarea4199 3 роки тому

    i wish i had a cool teacher like you

  • @shrutipancholi3544
    @shrutipancholi3544 3 роки тому

    One spot for all my Python needs. Thanks Keith! ; )

  • @carlosroquesuarezgurruchag8681
    @carlosroquesuarezgurruchag8681 2 роки тому

    of course i will smash that button!! Sos un crack amigo, gracias por la buena onda y dedicacion!

  • @AndyRhye
    @AndyRhye 3 роки тому

    The idea with the secret message was super cool!) You've got that like! Well deserved.

    • @KeithGalli
      @KeithGalli  3 роки тому

      Glad you enjoyed it! I had fun setting that up :)

  • @monagulapa3022
    @monagulapa3022 3 роки тому

    Thank you so much for your tutorials. You are doing great!

  • @andyn6053
    @andyn6053 3 роки тому +1

    Awesome video av always! Would love to see tutorials for selenium and scrapy aswell. Also PyTorch and Seaborn would be very interesting to learn more about! Your videos are soo easy to follow and learn from :)

  • @sardai33
    @sardai33 3 роки тому

    Thanks for another great tutorial Keith :)

  • @ranveersharma1666
    @ranveersharma1666 3 роки тому +1

    we just love your content.. u taught me pandas very well....!!!

  • @Beyond..Horizon
    @Beyond..Horizon 2 роки тому

    That's the data scientist's way to tell "Like and Subscribe ". Thanks for sharing knowledge!!

  • @sreeragmsudheesh
    @sreeragmsudheesh 2 роки тому

    52:30 Not sure if this was posted before but this works for the duplicates. Thanks for all the help Keith!!!
    import pandas as pd
    table = soup.select("table.hockey-stats")
    df = pd.read_html(str(table))[0]
    df

  • @hemanthkumaar3681
    @hemanthkumaar3681 3 роки тому +1

    i learned numpy ,pandas and other things from ur play list. i was strucked for the past 3 days in webscraping i watched a lot of yt videos bt i coudnt understand as ur content...Thank you so much brother :D . Now i hit(smashed) the bell icon too...

    • @KeithGalli
      @KeithGalli  3 роки тому +1

      Awesome glad this video could help clarify some of the confusion you had. Thanks for smashing the bell icon! xD

  • @jongcheulkim7284
    @jongcheulkim7284 2 роки тому

    Thank you so much. This is very helpful.

  • @WysteriaGuitar
    @WysteriaGuitar 3 роки тому

    Great tutorial, thanks!

  • @MrBeezy514
    @MrBeezy514 3 роки тому

    Dang! that was a good tutorial. I love you Keith, sincerely.

  • @sinaabbasi7670
    @sinaabbasi7670 Рік тому

    Thank you so much for doing this.😍😍

  • @fahad203
    @fahad203 3 роки тому

    Are you god? I have a simple approach to your videos. I like them first, then I watch the video. Thanks a lot man, you and few others youtubers are going to put universities out of business

  • @marloscruzeiro5687
    @marloscruzeiro5687 3 роки тому

    Amazing video! Really helped me! Thank you!