Introduction to Web Scraping with Python and Beautiful Soup

Поділитися
Вставка
  • Опубліковано 15 вер 2024

КОМЕНТАРІ • 1,8 тис.

  • @muhammadisrarulhaq9052
    @muhammadisrarulhaq9052 5 років тому +508

    I was able to make a program for my client i never thought was possible. I got paid real money for this.
    Blessings so much learned, this is like magic

    • @GamingTechSnips
      @GamingTechSnips 4 роки тому +2

      Can you tell me hoow much time did it take? And is it recommended for a Uni Student to make it as a semester project?

    • @brandonhirdler
      @brandonhirdler 4 роки тому

      @@GamingTechSnips Depends on your skill as a programmer

    • @utkarsh1874
      @utkarsh1874 4 роки тому +17

      @@GamingTechSnipsLess than a week even when you have zero background knowledge

    • @maniafranzio3023
      @maniafranzio3023 4 роки тому +1

      Indox Me If You Can!!!
      I Need some tips from you..

    • @burinome
      @burinome 4 роки тому +10

      Damn you're lucky, my client paid me fake money. smh

  • @harsh3305
    @harsh3305 5 років тому +727

    MINOR SUGGESTION
    As of 10/03/2019, If you are following along this tutorial. "container.div" won't give you the div with the "item-info" class. Instead it will give you the div with the "item-badges" class. This is because the latter occurs before the former. When you access any tag with the dot(.) operator, it will just return the first instance of that tag. I had a problem following this along until i figured this out. To solve this just use the "find()" method to find exactly the div which contains the information that you want. For e.g. divWithInfo = containers[0].find("div","item-info")

    • @vincentn2059
      @vincentn2059 5 років тому +17

      Thank you. Can't express how helpful this was and unlocked everything for me. Only part I wasn't understanding. Thank you

    • @johntimo8570
      @johntimo8570 5 років тому +2

      Thanks for the tip!

    • @DR-fn3yv
      @DR-fn3yv 5 років тому +12

      Okay, but how do we further navigate into more embedded items? I'm trying to pull out 'title' out of the a class within item-branding, but doesn't work.

    • @vincentn2059
      @vincentn2059 5 років тому

      D R have you used find and/or the findall method? Doing a couple of searches on google and stack overflow helped me get further into methods.
      Also, do you know basic html?

    • @DR-fn3yv
      @DR-fn3yv 5 років тому +3

      @@vincentn2059 Yeah, I'm using the find Method. I've looked in quite a few places but can't find the information I need.
      I'm at the part in the video where he uses container.div.div.a.img. I've used containers[0].find('div', 'item-info') which works correctly but now I am stuck at the part where I have to navigate further to pull out the information I need.

  • @Tocy777isback0414
    @Tocy777isback0414 4 роки тому +205

    It's weird to think about it like that, but this video started my whole Python learning back in 2017 and I am SO SO SO much thankful for it.

    • @chinzzz388
      @chinzzz388 4 роки тому +6

      How good are you at python now? Just wondering how much progress one can make in 3 years

    • @DragonRazor9283
      @DragonRazor9283 3 роки тому +1

      yes, please update us now!

    • @Tocy777isback0414
      @Tocy777isback0414 3 роки тому +7

      @@DragonRazor9283 So the projects I have done by now are: Web scraping sec.gov xml files, converting them to excel, inserting them into SQL database, I have built a dynamic website around this in Flask (Python library). I have expanded my web scraping to sites that provide data in JSON which usually contains more data than it is available on the website directl and this way its more speed efficient, I have moved all this to pythonanywhere where I haven and FTP server as well and have automated tasks which run every hour/day. My main field is still web scraping but now I can run SQL queries with pyhton and display them as well. That is to say, I have learned all this in my free time after work.

    • @Tocy777isback0414
      @Tocy777isback0414 3 роки тому +5

      @@chinzzz388 Sorry I didn't see your comment somehow. So the projects I have done by now are: Web scraping sec.gov xml files, converting them to excel, inserting them into SQL database, I have built a dynamic website around this in Flask (Python library). I have expanded my web scraping to sites that provide data in JSON which usually contains more data than it is available on the website directl and this way its more speed efficient, I have moved all this to pythonanywhere where I haven and FTP server as well and have automated tasks which run every hour/day. My main field is still web scraping but now I can run SQL queries with pyhton and display them as well. That is to say, I have learned all this in my free time after work. This earned me a new position at my company which doubled my pay. This earned me a new position at my company which doubled my pay.

    • @chinzzz388
      @chinzzz388 3 роки тому +1

      @@Tocy777isback0414 that is amazing my man!! Congrats and keep grinding :)

  • @evanzhao3887
    @evanzhao3887 5 років тому +17

    If you had some prior experiences with web crawling, this video can makes your crawling skills into a whole new level. Allows you to crawl website containing complicated info about multiple items into a very organized dataset. The various tools introduced in the video are also fantastically helpful as well. A BIG THANK YOU

  • @Datasciencedojo
    @Datasciencedojo  5 років тому +225

    Table of Contents:
    0:00 - Introduction
    1:28 - Setting up Anaconda
    3:00 - Installing Beautiful Soup
    3:43 - Setting up urllib
    6:07 - Retrieving the Web Page
    10:47 - Evaluating Web Page
    11:27 - Converting Listings into Line Items
    16:13 - Using jsbeautiful
    16:31 - Reading Raw HTML for Items to Scrape
    18:34 - Building the Scraper
    22:11 - Using the "findAll" Function
    27:26 - Testing the Scraper
    29:07 - Creating the .csv File
    32:18 - End Result

    • @prem.sagar.m
      @prem.sagar.m 5 років тому

      Hi .how can we scrape if the web page is single page app

    • @alexzhang5816
      @alexzhang5816 4 роки тому

      Thank you for the tutorial, however I am not able to get all the list, it only prints one result so its not looping all the containers. Can you please help me out?
      containers = page_soup.findAll("div",{"class":"item-container"})
      for container in containers:
      brand_description = container.a.img["title"]
      price_box = container.findAll("li",{"class":"price-current"})
      price = price_box[0].strong.text
      print("brand_description:" + brand_description)
      print("price:" + price)

    • @ninananou7603
      @ninananou7603 4 роки тому

      @Data Science Dojo svp document pdf ou siteweb

    • @greysonnewton6284
      @greysonnewton6284 4 роки тому

      I am trying to scrape the prices off of new-egg's website. the price is nested within
      price=container.findAll ('ul' , {'class' : 'price'}), where I call:
      price[0].li.span ---> I dont get an output. when I call:
      price[0].li.span.text ---> I get an attribute non-existent error.
      How would I scrap the price in this new-egg example?
      Also, the current price they have wrapped within a 'strong' tag that is within a span class. How would we scrape this?

    • @syaifullaffandi
      @syaifullaffandi 3 роки тому +1

      Thx

  • @YasarHabib
    @YasarHabib 5 років тому +25

    This was by far the best introduction to web scraping I've found online. Clear, concise, and easy to digest. Thank YOU!

  • @Jack258jfisodjfjc
    @Jack258jfisodjfjc 5 років тому +89

    you look like a god when your writing multiple lines at the same time.

  • @adamhemeon734
    @adamhemeon734 2 роки тому +2

    Two years into a web program and a year working in the field and never bothered to learn how to do this. Great video, I followed along 5 years later in 2022 with Python 3.7.8 and it still works.

  • @arjoon
    @arjoon 7 років тому +68

    This was really good content, definitely the best intro to web scraping I've seen. You don't go through it as though you're reading from the documentation, there's more of a flow.

  • @viveksuman9600
    @viveksuman9600 4 роки тому +4

    I saw this video and then successfully wrote the entire code without looking at the video. Not even once. This is because i understood every line of it. Thank you man. Your explanation is very beginner friendly.

    • @wendikinglopez8842
      @wendikinglopez8842 4 роки тому

      Yes. It helped me UNDERSTAND finally, I think because he taught it with respect for the viewer.

  • @EustaceKirstein
    @EustaceKirstein 5 років тому +7

    32:30, I started cheesing at how awesome the end result of this whole project was. Definitely inspiring - thank you for the excellent guide!

  • @Winterbear009
    @Winterbear009 2 роки тому +1

    I am from commerce background. I have zero knowledge of all the programming language. I found your video and explanation so good that at least now I can start my journey into scrapping and coding. I am so thankful at the moment. Love your channel. Thank you so much.

    • @Datasciencedojo
      @Datasciencedojo  2 роки тому +1

      Hello Ella, glad to help you. Stay tuned with us for more tutorials!

    • @Winterbear009
      @Winterbear009 2 роки тому

      @@Datasciencedojo Yes Chief👍 Have subscribed already. 🤗

  • @pdubocho
    @pdubocho 6 років тому +6

    The man, the myth, the legend.
    You have no idea how much stress and lost time you have prevented. THANK YOU!

  • @saadiyafourie
    @saadiyafourie 5 років тому +33

    Absolute champion, quite possibly the best code tutorial I've ever watched. Oh the possibilities! Thank you :)

  • @frozy3155
    @frozy3155 4 роки тому +3

    wow even almost 3 years later this video helped me so much and helped me to make a program that picks a random steam game, this was so hard, but i figured it out, big props to you and this video

  • @devendravijay1303
    @devendravijay1303 5 років тому

    One of the best teacher I have come across UA-cam. Web Scraping explained so well that even a layman can follow and understand the basic concepts. I wish, in life I had a teacher/mentor/friend like the one teaching in this video.

  • @christophedamour6919
    @christophedamour6919 6 років тому +6

    A BIG BIG THANK YOU: the most understable tutorial I've ever seen on how to scrape a web page (and I have visionned like 100 of them)

  • @brendanp9415
    @brendanp9415 4 роки тому +1

    This is the best web scraping tutorial that I’ve found. I’ve been frustrated for hours trying to use other resources. Thank you for making this, your explanations are thorough and great!

  • @delt19
    @delt19 5 років тому +8

    Coming from an R user, this is a very well done introductory tutorial into web scraping in Python. I like the real world example with Newegg and troubleshooting along the way.

  • @SnehilSinghsl
    @SnehilSinghsl 6 років тому

    I cant believe I actually sat through 33 minutes learning web scrapping, something completely new to me. I was looking for a shortcut but your tutorial was just perfect! :D Thanks for this.!

  • @ThatGuyDownInThe
    @ThatGuyDownInThe 4 роки тому +4

    This is actually the coolest thing I've seen in my entire life. Wow. Thank you so much I love you man.

  • @andreabtahi9519
    @andreabtahi9519 5 років тому +1

    I am just starting web scrapping and I can honestly say that this video clearly explained everything. I watched this at 1.5 speed and it made sense. I would love more videos like this. I loved how you made it generic so it can apply to more than one website!

  • @lydialim2964
    @lydialim2964 6 років тому +7

    THIS IS AMAZING!!! Everything was very well-explained and instructed, I managed to get my first webscrape off an E-commerce site! Thanks so much, you have a loyal subscriber in me!
    Perhaps you could cover using time sleeps to avoid getting blacklisted by the websites we are scraping? And also how to scrape multiple pages in one go?

  • @sacroultima
    @sacroultima 3 роки тому

    You are sooooo comfortable to listen to. Not because you have a perfect pronanciation and a seamless script you are gliding through. You are just talking but not constantly jumping back and forwards. Accurate tempo and personality in your voice.
    New subscripion

    • @Datasciencedojo
      @Datasciencedojo  3 роки тому

      This makes us feel really motivated, Law! Thanks a lot :)

  • @edenhoward2053
    @edenhoward2053 3 роки тому +22

    UPDATE/SUGGESTION
    The findALL function has been renamed to the find_all function in Bs4 version 4.9.3

  • @theworkflow19
    @theworkflow19 4 роки тому +1

    You are a blessing seriously! The first tutorial that actually made sense from start to finish. I was able to understand so much from this! Please Please Please Please upload more videos on Python Web Scraping with BeautifulSoup.
    Thank you again for this blessing!

  • @bokabosiljcic8694
    @bokabosiljcic8694 5 років тому +4

    This was fast, precise and beautiful! By saying beautiful I didn't mean to state the obvious :) Thanks

  • @VIK2000GEV
    @VIK2000GEV 4 роки тому

    Very high-quality tutorial.
    How to set up everything before running any code is very nice to include, and timestamping it so people who already know it can quickly skip is just much appreciated.
    Keeping the tutorial example script and diverse is very welcome.
    Writing it from scratch just makes sooo useful for remembering what was where.
    I wish other people made tutorials like this... Timestamping is so useful when you just want to look-up that one thing and don't really remember when it appeared.

  • @LePnen
    @LePnen 7 років тому +7

    Thank you very much for this video!
    I hope you do a second one on this subject. I'd like to know how to scrape several pages as you mentioned in the end of video. This was just what I was hoping for. Thanks!

    • @funny_buddy_official2712
      @funny_buddy_official2712 6 років тому +1

      Hey, please help me, when I tried scrapping other site , I am getting 403 forbidden error , how do I fix that? Is it possible to scrap a secure site?

  • @Strajaize
    @Strajaize 4 роки тому

    I expected this video to take me 30 minutes to do, because it takes 30 minutes. 10 hours later I HAVE MY FIRST WEBSCRAPER THANK YOU VERY MUCH! I still did not manage to get it to be an csv, but made a .txt and it is fine for now. Thank you so much again! tutorial from dataquest.io came in very handy as well!

    • @Strajaize
      @Strajaize 4 роки тому

      made another one today and it is working with csv

  • @adrianramos2989
    @adrianramos2989 5 років тому +4

    This material is just amazing. Thank you! Have you considered making an intro to Web Scraping using R?

  • @felixkimutai8478
    @felixkimutai8478 4 роки тому

    I have watched all the web scraping videos on UA-cam but this one is the top, I learned a lot. Thank you.

  • @sarvagyaan1097
    @sarvagyaan1097 6 років тому +14

    enjoyed, data science ! Need more like this one

  • @h1ghpower
    @h1ghpower 2 роки тому

    dude, you are literally saving lives with this type of videos... I can't wait to digest all this precious info..... You save people so much time with this!! you are magic!!

    • @Datasciencedojo
      @Datasciencedojo  2 роки тому

      Thank you, John, for such kind words. Keep following us for more content!

  • @syomantakchaudhuri9935
    @syomantakchaudhuri9935 5 років тому +6

    Looks like they added another div at the very beginning of each item-container. The brand name can now be extracted with a little more effort-
    brand_container = x.findAll("div",{"class":"item-info"})
    print(brand_container[0].div.a.img["title"])

    • @rramey5597
      @rramey5597 5 років тому +1

      Try using a simpler one liner - print(container.a.img["title"].split(" ")[0])

    • @mikez9898
      @mikez9898 5 років тому +2

      Great, thank you. it worked!
      brand_container = container.findAll("div", {"class": "item-info"})
      brand = brand_container[0].div.a.img["title"]

    • @matthias1312
      @matthias1312 5 років тому

      Thank you! Took me forever to figure it out before I read this comment!

  • @redfeather22sa
    @redfeather22sa 3 роки тому

    it must have been a magic day when I saw this for the first time 1.5 years ago !!! its where i all started !!! Thanks! Best Video & Intro into webscraping for absolute begginers !! Thanks (notable mentions to Corey Shafer who I was watching a a few weeks earlier, who gave me the taste of it & how easy it could be to use/do). Thank you friends!! An amazing tool!!

  • @learnmandarinwithkaili1102
    @learnmandarinwithkaili1102 4 роки тому +14

    When I watched this tutorial, it seemed easy to scrap until I stuck a thousand times while actually scrapping a webpage. Happy Coding for dummies lollll

  • @robertnichols2673
    @robertnichols2673 3 роки тому +1

    Everyone learns in a different way, and absorbs information through different methods! This informal, laid back 'talk & walkthrough' (almost like sitting together with a mate) fits my style sooo much!! for me probably the best python lesson ever!! Will be looking for many more - thanks :P legendary !!

  • @schlongmasterlol2724
    @schlongmasterlol2724 5 років тому +7

    16:04 Command for it on windows is CTRL + SHIFT + P :)

  • @rolandszirmai3922
    @rolandszirmai3922 3 роки тому +1

    Mate, this is just perfect! I learned so much by doing this with you. Now I'm ready to tackle other websites!!! You're a legend!

  • @dragoxofield
    @dragoxofield 7 років тому +20

    Nice! I was wondering if you could do a page monitor where it tells you exactly where the website has changed?

    • @iloveanime9226
      @iloveanime9226 7 років тому

      yeah, that would be interesting, Basically you would save all the variables then check and save them into new variables compare old ones then change if there is a difference?

    • @Niccolatorres
      @Niccolatorres 6 років тому +10

      An easy way to do this is download the html from desired page and store it's md5. Check the same html periodically and compares both stored and current md5.
      This is an easy and less cpu-consuming way to check whether the website has changed.

    • @iloveanime9226
      @iloveanime9226 6 років тому +9

      yeah, that seems like a better way to do it but you would need to clean up the ad containers since they always change although the page content did not change.

  • @clivestephenson2793
    @clivestephenson2793 4 роки тому

    You are the most concise teacher of python I have come across
    Thanks
    I will definitely give your other videos a view

  • @ScremoSam1
    @ScremoSam1 5 років тому +3

    This has been so useful. Thanks so much. What I need to know now, is how I can get the scraper to continue working when there's a 'Load More' button, which doesn't take you to another page. If anyone knows anything about this please let me know.

    • @brandonhirdler
      @brandonhirdler 4 роки тому

      This is a really good question. Maybe click the load more button and then copy the URL? Or define how many results you want for that page then copy the URL I'm pretty sure when you hit load more its actually altering the html path?

    • @BrianGlaze
      @BrianGlaze 4 роки тому

      Maybe you can program in a click to load more function into your code.

    • @barodrinksbeer7484
      @barodrinksbeer7484 2 роки тому

      Late answer, but the solution is coding a click load more button. Similarily how you can do a click next page button for your script to continue onwards.

  • @Dynamite_mohit
    @Dynamite_mohit 4 роки тому

    Awsome, Good, Excellent, Nice, Best.
    Hope UA-cam's algorithm recommend this to every Scrapper Enthusiast.

  • @freediugh416
    @freediugh416 7 років тому +13

    wow this was great! I am completely new to this and still could follow perfectly fine and loved the explanations of everything. Would love to know how to run this script every day automatically and send results to phone or create alerts for changes and send those to a phone. Again, awesome job!

    • @96hugoS
      @96hugoS 7 років тому +2

      This is what I'm looking for as well, but I'm not getting any further unfortunately

    • @iloveanime9226
      @iloveanime9226 7 років тому

      so you would need to host a server online to run it always, you can use it to also link to an app that checks the changes and alerts you. Just some ideas you can search more on StackOverflow :)

    • @DonGass
      @DonGass 6 років тому +1

      twilio is a good service for sending text messages via API...you could combine it with the scraping functionality and some sort of compare logic to text you the changes...

    • @jackjackattack4384
      @jackjackattack4384 6 років тому

      The only solution is to be constantly updating your code. There's not really a good way outside of intelligently analyzing the picture, description, & brand.

  • @alexrobert4614
    @alexrobert4614 6 років тому

    This is one of the only clear|fun python tutorials out there. Congrats

  • @adammarsono8908
    @adammarsono8908 5 років тому +6

    Hello, at 20:14 , the tag (in my case) jumps to tag inside tag. How to choose which tag we want to grab if there is more than 1 tag with same name

    • @hieudao428
      @hieudao428 5 років тому +1

      I ran into a similar problem. You can use the "find()" method in python to find a specific tag.
      you can either have it in the following:
      a) container.find("a","item-brand")
      b) container.find("div","item-branding")
      once you are in a specific tag, you can just go with . notation to get to the next sub-tag.
      so for example, I had container.find("div", "item-branding").a.img["title"]
      You can just skip directly by searching for the "a" tag instead of the "div" tag or maybe even the "img" tag.

    • @DhirajShah
      @DhirajShah 5 років тому

      I was also stuck there but i found the solution. Directly Just find the div with class : item-branding and from there you can get the image which will give you title.

    • @felipeabarcaguzman1057
      @felipeabarcaguzman1057 4 роки тому

      @@hieudao428 Thankss!!

  • @TheStrikerHD
    @TheStrikerHD 6 років тому +1

    I don’t usually comment on videos but this was phenomenal. Thank you.

  • @ahmedalthagafi4492
    @ahmedalthagafi4492 7 років тому +6

    Great video. ..very easy to follow. hope you do more of that kind. Thanks.

    • @Datasciencedojo
      @Datasciencedojo  7 років тому +12

      Glad you enjoyed it! Did you mean more videos about web scraping, programming, data science, or data acquisition?

    • @siddhartha8886
      @siddhartha8886 7 років тому +3

      yes , I need more videos on web scraping. Thank you :)

    • @maxiewawa
      @maxiewawa 7 років тому

      number 1, i realise this was 5 months ago but still thought I'd make a suggestion.
      If you get good at data scraping you end up with enormous CSV files... how do you manipulate them? Like if I was looking for a certain price at a certain date in the past, putting all your data in a python list and iterating through it crashes my computer usually...

  • @snpranay
    @snpranay 6 років тому +1

    BY FAR the best tutorial I've watched for web scraping.
    But could someone just help me out with scraping through multiple pages? I know the guy mentioned something about it in the very end but still

  • @jdsr4c
    @jdsr4c 5 років тому +5

    I'm getting this error when I try to run it:
    File "", line 2, in
    NameError: name 'page' is not defined

    • @bmxng33
      @bmxng33 5 років тому

      he set it as page_soup, not page

  • @cwhizkid420
    @cwhizkid420 4 роки тому

    This is one of the most useful Web Scrapping videos I have ever come across. I could learn it from scratch. Thanks.

  • @anonyme103
    @anonyme103 5 років тому +4

    This is very well explained and I enjoyed every second of it ! please do more ^^

  • @Pulits
    @Pulits 4 роки тому +2

    I did a Web Scraper not so long ago with another set of tools. This video has motivated to create one, too!

  • @gauravsharma-mi2er
    @gauravsharma-mi2er 7 років тому +45

    Wow great video.Can you make a video on srapping data from multiple pages

  • @sajedayeasmin9003
    @sajedayeasmin9003 5 років тому +1

    We want more data scrapping video! This was awesome!

  • @haxxorlord7327
    @haxxorlord7327 5 років тому +5

    this soup is very beautiful, goddamn

  • @paulprice5860
    @paulprice5860 4 роки тому +1

    Thanks. I have a basic understanding of python and html and I found this tutorial very easy to follow. You do a great job of clearly explaining things in the code which is what I need at my current skill level. Much appreciated.

  • @yuriipidlisnyi2248
    @yuriipidlisnyi2248 6 років тому +6

    Maybe it's better to use find() instead of findAll() to get product's name? So code will be less complex, like this :
    title = container.find("a",{"class" : "item-title"}).text

  • @tntcaptain9
    @tntcaptain9 4 роки тому

    Saw many videos on web scraping but yours was probably the best one.

  • @arshdeepsinghahuja
    @arshdeepsinghahuja 7 років тому +17

    shipping_container = container.findAll("li",{"class":"price-ship"})
    GETTING THIS ERROR
    Traceback (most recent call last):
    File "", line 1, in
    TypeError: 'tuple' object is not callable

    • @cihansariyildiz1748
      @cihansariyildiz1748 4 роки тому +2

      try find instead findAll

    • @neilaybhalerao8373
      @neilaybhalerao8373 4 роки тому +1

      Same!!! I didn't understand when he said "oh I need to close this function".... Can anyone explain?

    • @cameroncrawley2217
      @cameroncrawley2217 4 роки тому +3

      @@neilaybhalerao8373 shipping_container = container.findAll("li", {"class":"price-ship"} is what he typed originally. He forgot to add the ending ) to close the function. So he should've typed shipping_container = container.findAll("li", {"class":"price-ship"})

  • @brendensong8000
    @brendensong8000 3 роки тому

    As of Nov 2020, I went through the whole thing without any issue! I used a different product name, but everything worked so well! Everything worked so perfectly! I learned so much from this video! this is awesome!!!! Thank you!!!!

  • @sk_4142
    @sk_4142 4 роки тому +3

    brand = make_rating_sp[0].img["title"].title()
    TypeError: 'NoneType' object is not subscriptable
    [Finished in 3.074s]
    anyone know why this is happening? or how to fix this?

    • @SourPickle-bv9gd
      @SourPickle-bv9gd 3 роки тому

      Did you get an answer? I’m having this problem aswell

  • @Billsethtoalson
    @Billsethtoalson 6 років тому +1

    DUDE! High Quality Content!! You are very good at walking through the logical steps for breaking down a page! Other tutorials are great but are always geared toward the specific task at hand. With this it felt like I also learned how to tackle a page!
    This helped a bunch!

  • @linuxit5869
    @linuxit5869 7 років тому +14

    Awesome tutorial, Please add how to scrap multiple pages :)

    • @johannbauer2863
      @johannbauer2863 6 років тому +6

      Linux IT make a list and a for loop?

    • @petersilie9504
      @petersilie9504 5 років тому

      Use multithreading for this

    • @diegugawa
      @diegugawa 5 років тому

      @@petersilie9504 Can you do this in python 3? I don't think it's possible (apparently the multithreading module it is not recommended). Sounds like a job for a compiler language.

    • @BackwardshturT
      @BackwardshturT 4 роки тому

      @@johannbauer2863 can you please explain? Thanks

  • @Jack258jfisodjfjc
    @Jack258jfisodjfjc 5 років тому

    your such a great teacher! Just because you can code doesn't mean you can teach. Awesome!

  • @chriswashingtonbeats
    @chriswashingtonbeats 5 років тому +2

    the first div that it showed was item badges how do i navigate to different divs?

  • @arujbudhraja
    @arujbudhraja 4 роки тому +2

    Awestruck! It's amazingly simple to follow along! Thank you, sir, for adding to the community of self-learners!

  • @WestSideLausanne1
    @WestSideLausanne1 4 роки тому +3

    when I try to follow, it gives me the following error message:
    brand = container.div.div.a.img["title"]
    AttributeError: 'NoneType' object has no attribute 'a'

    • @siddharthkrishna8365
      @siddharthkrishna8365 4 роки тому

      Hey helmut i am also getting the the same error. have u fixed tge error?

    • @anwowie
      @anwowie 4 роки тому +1

      I got the same error so I changed to code a bit which follows the same method as finding product name:
      brand_container = container.findAll("a", {"class":"item-brand"})
      brand_name = brand_container[0].img["title"]
      product_container = container.findAll("a", {"class":"item-title"})
      product_name = product_container[0].text

    • @siddharthkrishna8365
      @siddharthkrishna8365 4 роки тому

      yeah it worked

    • @siddharthkrishna8365
      @siddharthkrishna8365 4 роки тому

      can you tell me how to send requests to 3 different websites at same time without getting http timeout error.? I tried different ways to get rid of this error but no success.

    • @siddharthkrishna8365
      @siddharthkrishna8365 4 роки тому

      @@anwowie can u tell how to send request to 3 different websites at the same time without getting Http error? coz i am working on a project .i tried many ways but no success

  • @df6148
    @df6148 5 років тому

    Senior Data Scientist, Senior Database Engineer... I know a fellow gamer when I see one! Thx for the the Tutorial. All this time...all I ever wanted from most of the internet was the ability to "scrape" (new term for me) what I wanted so that I can do something with that data. I like to organize things and categorize them. I always thought rss was okay...twitter okay...reddit okay...but I just want specific feeds from those sites and this is exactly what I was looking for! Better than paying a monthly fee to somebody who won't even teach you how to do it. Maybe its from collecting cards as a kid or playing video games that had really in depth inventory systems (rpgs). But it is enjoyable when you can get the exact bit of information you want and then do something cool with it. This is helpful! Where were you when I needed to organize my bank in world of warcraft!!!

  • @edwardadams3727
    @edwardadams3727 4 роки тому +3

    brand = container.find("a", {"class":"item-brand"}).img.get('title')
    your welcome

    • @hanzenpeter3917
      @hanzenpeter3917 4 роки тому +1

      'NoneType' object has no attribute 'img' :D could you please send me your code?

  • @ilyamaldini
    @ilyamaldini 5 років тому

    The best tutorial. Thank you. Much better than all videos in russian lang

  • @shankargs7685
    @shankargs7685 7 років тому +8

    Hi Dojo, Really nice video. I have one doubt. The recent eCommerce sites done have class items constant, they have alpha numeric values like class="_3Hjcsab" how do you scrape when the site keeps on changing?

    • @ahmedramadan8153
      @ahmedramadan8153 7 років тому +3

      try the Xpath way!! i don't think they will change all the Attributes and the path of the element Periodically.

    • @Datasciencedojo
      @Datasciencedojo  7 років тому +3

      Then it gets harder! It's an adversarial problem. The time of development greatly increases because you have to build functions to check if the tag has all the features you are looking for before grabbing it. It's not as straight forward as grabbing by the div or id. In this case it might not be practical to scrape this sites because they clearly do not want to be scraped. Even if you scraped them successfully, they would be aware and change their code again accordingly.

    • @vidhishah5484
      @vidhishah5484 7 років тому

      Yeah, ran into the same problem, tried a lot to get around it but couldn't :/

    • @das250250
      @das250250 7 років тому

      Yes scraping may be a limited toolset as websites use more sophisticated format

    • @jasonrobinette4486
      @jasonrobinette4486 7 років тому

      Thanks great vid-easy to follow for a rookie

  • @DincerHoca
    @DincerHoca 5 років тому +1

    Thanks for the video. This was the best web scraping tutorial I have seen on youtube.

  • @wadephz
    @wadephz 7 років тому +4

    Hi, thanks for the video! How do you get to the second div tag in "container"?

    • @SpenderBara
      @SpenderBara 7 років тому +1

      Have the same question here. I've tryed different notations i.e. div[2], dic{2}, div(2) and others, but still don't get the second or third div

  • @BeTu4856
    @BeTu4856 5 років тому +1

    Truly enjoyed your simple step by step explanation on why each command or function is needed, and what it does. Your Python knowledge and skills are evident, as you are able to provide immediate solutions to errors and or challenges to the problem you are attempting to solve. Followed along with the tools and enjoyed the session. Thank you.

  • @blackalk9420
    @blackalk9420 5 років тому +6

    def Data Science Dojo():
    Data Science Dojo = ("like", "share", "sub")
    good job = (input.comment("Thanks you very much ! "))
    if good job in Data Science Dojo :
    print("love and respect from Kuwait")
    else:
    print("sorry maybe next time")
    Data Science Dojo()
    -------
    Output :-
    peace out and happy basic coding :D

  • @annatinaschnegg5936
    @annatinaschnegg5936 3 роки тому

    I really liked the tone, rythm and clarity of this tutorial! I‘m not a total beginner with python anymore and so was able to listen and (mostly?) understand while preparing lunch for my kids. (I‘ll rewatch to try and do it later)

  • @alibee6232
    @alibee6232 7 років тому +6

    when i type uclient = ureq(my_url) it gives me a 403 error forbidden and a bunch of timeout, does this mean that it works but it crashed or will crash if it runs?

    • @DDay_8
      @DDay_8 7 років тому

      4K Bahrami same here

    • @changleo4417
      @changleo4417 6 років тому

      DeeganCraft how does that work?

    • @lSh0x
      @lSh0x 6 років тому

      you guys are using pages instead of pages ... :)

    • @franciszekszombara8881
      @franciszekszombara8881 6 років тому +2

      this helped me:
      stackoverflow.com/questions/41214965/python-3-5-urllib-request-403-forbidden-error

  • @npithia
    @npithia 4 роки тому

    This is gold for someone learning python and seeing its application.

  • @WisdomSeller
    @WisdomSeller 7 років тому +12

    could you upload the script?

  • @BikeshBudhathoki
    @BikeshBudhathoki 4 роки тому

    loved it, though i m very beginner in data science and have zero knowledge in it, i watched the entire video and tried to grab everything possible discussed here

  • @jaromtollefson3127
    @jaromtollefson3127 5 років тому +4

    I keep getting 0 when I call len(containers)

  • @denisshmelev4990
    @denisshmelev4990 5 років тому

    I thought web scraping was hard until I found your video. Huge thanks man, you saved so much time for me!

  • @makedredd299
    @makedredd299 7 років тому +4

    Hi, I'm getting stuck at 28:50 when running the script. How do I solve this problem?
    $ python Dojo.py
    Traceback (most recent call last) :
    File "Dojo.py", line 18, in
    brand = container.div.img["title"]
    TypeError: 'NoneType' object is not SUBSCRIPTABLE
    Best Regards

    • @jayadrathas169
      @jayadrathas169 7 років тому

      That is a corner case error...your best bet is to apply a try or if else statement.

    • @pyhna-lol2625
      @pyhna-lol2625 7 років тому +4

      Hey, I got it too. It seems to come when they don't have the "3VGA" or whatever.
      I fixed it by taking the first word out from the output "title_container[0].text".
      So I tossed the original second part of "brand = xxx" and replaced it with "brand = title_container[0].text.split(' ', 1)[0]".
      Hope it helps.

    • @makedredd299
      @makedredd299 7 років тому

      Tnx then I'm not going crazy it's the website changing that causes this kind of errors ☺

    • @DavidDreesYT
      @DavidDreesYT 6 років тому

      Looks like you need to add another "div" tag. --> brand = container.div.div.a.img["title"]

    • @funny_buddy_official2712
      @funny_buddy_official2712 6 років тому

      Hey, please help me, when I tried scrapping other site , I am getting 403 forbidden error , how do I fix that? Is it possible to scrap a secure site?

  • @ShawneeUnion
    @ShawneeUnion 4 роки тому +2

    So thankful for this, I was able to run it and scrape similar information off of a coding website. I had some trouble with installing BS4. Tip, I used pip3 to install BS4 to keep everything clean.
    sudo pip3 install bs4

  • @Orokusaki1986
    @Orokusaki1986 5 років тому +5

    Just use pycharm, man :-P

    • @travisw5076
      @travisw5076 5 років тому +2

      just use vim and then go native linux your set now you can throw the desktop away and get a tiling WM

  • @sebastianpeters2296
    @sebastianpeters2296 4 роки тому

    Hey there! this guide really helped me to create a tailored scraper for a pilot project. Even though I am at the very beginning stage of learning Python, I could manage to create the entire script, and even learned while the process. Amazing, really appreciate this!

  • @Datasciencedojo
    @Datasciencedojo  2 роки тому +5

    Hello everyone, find the updated version of this tutorial here: ua-cam.com/video/rlR0f4zZKvc/v-deo.html

  • @insigpilot
    @insigpilot 5 років тому

    This was very good. I'm a beginner to Python and this webscraping tutorial left me with very little questions.

  • @Victor-dt1uq
    @Victor-dt1uq 2 роки тому +1

    6:28 - The goold old times where a mid-upper graphics card (GeForce GTX 1070) could be bought under 400$ :')
    Great video, thx!

  • @PanamaSoftwash
    @PanamaSoftwash 5 років тому

    I dont know much about coding but the way you explained this made perfect sense. I hope to learn a lot from your channel.

  • @idealsketch3778
    @idealsketch3778 2 роки тому +1

    Fantastic video dude, much more helpful than others I've seen on UA-cam

  • @samundraregmi8593
    @samundraregmi8593 4 роки тому

    I don't have any words to explain how much this video was helpful. Hope soon I will use this feature.

  • @alancoates
    @alancoates 6 років тому +1

    Your presentation and explanation are awesome! You have opened my eyes to the uses of Python and Beautiful Soup.

  • @rupertrussell1
    @rupertrussell1 5 років тому +2

    Fantastic tutorial! gave me 95% of what I needed for my first screen scraping project.

  • @souravmahanty7025
    @souravmahanty7025 6 років тому

    This is the first tutorial on this that actually makes sense. THANK YOU. You earned a subscriber.

  • @jonpotter5776
    @jonpotter5776 6 років тому

    As someone self learning Python (my first programming language) with a web scraping script in mind, this was great!

  • @pnocti
    @pnocti 4 роки тому

    One of the best scraping tutorials good job

  • @reginaldowusu3156
    @reginaldowusu3156 4 роки тому

    Loving your coding skills. Was just about giving up on Web Scraping. Then BOOM!!! I found this. :)