Beginners Guide To Web Scraping with Python - All You Need To Know

Поділитися
Вставка
  • Опубліковано 1 жов 2024

КОМЕНТАРІ • 190

  • @michaelmagill5466
    @michaelmagill5466 2 роки тому +135

    This editing is fantastic, the explanations are clear and concise and completely without obfuscation. You, sir, are a gentleman.

    • @chanson8508
      @chanson8508 7 місяців тому +1

      Big faxxx! so many nonsense intro to scraping vids, but not this one : ))

    • @Greshma123
      @Greshma123 5 місяців тому

      I’m sorry 😢 I’m not going

    • @SonicFusedWith_Goku
      @SonicFusedWith_Goku 4 місяці тому

      Bro this is crazy

    • @SonicFusedWith_Goku
      @SonicFusedWith_Goku 4 місяці тому

      I was trying to make a code to get stuff from my math homework website

  • @Sivarajansam931
    @Sivarajansam931 2 роки тому +74

    When world needed him the most, He returned.

  • @desecrated.eviscerated
    @desecrated.eviscerated 11 місяців тому +4

    if you get an error, try replacing the line of code: file = open('scrapped_quotes.csv', 'w', encoding='utf-8', newline='')

  • @fearlessAx
    @fearlessAx 2 роки тому +3

    Hey, I'm getting "NameError: name 'page_to_scrape' is not defined"

  • @Jj-qx1cj
    @Jj-qx1cj 3 місяці тому +2

    Lost me when you said and a raspberry pie

  • @JoaquinRoibal
    @JoaquinRoibal Рік тому +28

    Great introduction. Clear, concise and covered related topics without being distracting. I look forward to your other videos on Python.

  • @DTMPro
    @DTMPro 2 роки тому +13

    Where can we find out if we are allowed to scrape data from a specific website so that eventually we don't end up in trouble?
    Does scraping code/process works the same way for scraping product prices, e.g. trying to replicate camel for amazon or that takes additional authorization from amazon?

    • @Tinkernut
      @Tinkernut  2 роки тому +13

      Excellent question! All popular websites have a scraping/crawling text file called "robots.txt". This tells what can and can't be scraped from a website. Here is an example of Amazon's robots.txt file (spoiler, you can't scrape much) www.amazon.com/robots.txt

    • @jimavictor6022
      @jimavictor6022 2 роки тому +1

      @@Tinkernut what about those non popular websites with no robot.txt file

    • @JoaoPedro-ki7ct
      @JoaoPedro-ki7ct 2 роки тому +2

      @@jimavictor6022 As long as you don't scrape things like other people's documents from governamental sites or usernames plus passwords you should be fine with the rest.
      What website owners are really worried about are their website availability (whether they are online or offline) and bandwidth usage as they pay X for X amount of gigabytes consumed. (they pay for each gigabyte they send and receive from users)
      So as long as you don't consciously/unconsciously take down their site you're fine.

    • @JoaoPedro-ki7ct
      @JoaoPedro-ki7ct 2 роки тому +3

      @@jimavictor6022 On top of that they have their automated way to detect bots, the worst that can happen is getting your IP "banned" or simply restricted from viewing their webpages, that will happen way, way, way... before you getting sued by them.

    • @jimavictor6022
      @jimavictor6022 2 роки тому +2

      @@JoaoPedro-ki7ct I really appreciate the reply. Thank you..

  • @Syndesi
    @Syndesi 2 роки тому +13

    cool tutorial :D
    for more complicated data I use xpath, although its syntax is a bit weird at first.
    furthermore: validate, validate and validate your data. you do not want a program which crashes randomly, only because a value is missing, empty or malformed :)

  • @AirmanKolberg
    @AirmanKolberg 2 роки тому +13

    Web scraping is to copying and pasting manually, as copying and pasting manually is to using your eyeballs, memorising, then typing it into a file. There is no difference between surfing the web and web scraping. One is just faster. Like how copy/pasting something from Wikipedia is faster than reading and re-writing it.

    • @jalanmcrae
      @jalanmcrae Рік тому +1

      Yes, automation is a huge time saver 👍🏾

  • @sauceboss38
    @sauceboss38 2 роки тому +15

    This is exactly what I was looking for. Very concise and helpful, thank you!

  • @sagarnewpane8549
    @sagarnewpane8549 2 роки тому +4

    I need more content on Rasberry PICO !!

  • @myriadtechrepair1191
    @myriadtechrepair1191 2 роки тому +6

    Our lord has returned.

  • @TheJoyOfGaming
    @TheJoyOfGaming 2 роки тому +5

    haha awesome man. I don't even do coding but couldn't resist following along just to try it! Cheers!

  • @kedrovasuma2857
    @kedrovasuma2857 2 роки тому +17

    This smart man is still alive

    • @ten132
      @ten132 2 роки тому

      I was abput to comment the same lmao.

  • @Flying_turnip187
    @Flying_turnip187 Місяць тому +3

    Very cool project ! I am a beginner in Python and this was right up my alley. I think Data science is going to be my forte. Thanks so much for this !!

  • @InspiredInsights4U
    @InspiredInsights4U 2 роки тому +4

    A survey businessman could use web scraping to scrape a competitors website for product pricing to include product numbers photos prices and then use this to monitor their price changes and or adjust their own prices on their website to stay just a slight bit more competitive

  • @OtherDalfite
    @OtherDalfite 2 роки тому +2

    Halloween intro? At the end of November? This videos been a while in the making huh?😂

  • @mmuneebahmed
    @mmuneebahmed 2 роки тому +2

    Thanks for sharing the expertise! However, I get the following error when running the code.
    writer.writerow([quote.text, author.text])
    UnicodeEncodeError: 'latin-1' codec can't encode character '\u201c' in position 0: ordinal not in range(256)

  • @webslinger2011
    @webslinger2011 2 роки тому +24

    Your technological code geniusness shall be added to my own. Seriously looking for this. Thanks!

  • @arjunaudupi7956
    @arjunaudupi7956 2 роки тому +4

    @tinkernut you are the reason for me being a software developer..
    Thanks dude. Keep up the good work..

  • @reghawkins73
    @reghawkins73 2 роки тому +1

    I had to add encoding to the line--- file = open("scraped_quotes.csv", "w", encoding='utf-8')

  • @jacknobles8272
    @jacknobles8272 Рік тому +1

    not for beginners - immediately starts te tutorial with crap an experienced person would need to know.

  • @benjaminofurhie8178
    @benjaminofurhie8178 4 місяці тому +9

    I have searched for scraping tutorials for the last one month, but this is the BEST .Thanks so much

    • @japhethmutuku8508
      @japhethmutuku8508 2 місяці тому

      I can teach you web scraping form the basics to advanced......if that may help you can reach to me

    • @LrjkoDghhfhh
      @LrjkoDghhfhh Місяць тому

      @@japhethmutuku8508could you help me please bro

    • @dongvu2530
      @dongvu2530 21 день тому

      @@japhethmutuku8508please help me

  • @algj
    @algj 2 роки тому +4

    This is crazy to see your videos again being recommended :o
    it has been years since I saw your last video!

  • @lucasn0tch
    @lucasn0tch 2 роки тому +3

    Long time no see.
    This may be useful for tracking stock for a PS5/Xbox/Switch/GPU in these times.

    • @JoaoPedro-ki7ct
      @JoaoPedro-ki7ct 2 роки тому

      Even a Switch is being scalped?
      I heard about PS5, Xbox Series X|S, GPUs but not about the Switch itself.

  • @teomanefe
    @teomanefe 2 роки тому +5

    I actually needed this!

  • @lemonbread378
    @lemonbread378 Рік тому +7

    currently planning for my computer science A level project and wanted to learn what this web scraping thingamejiggy was all about
    this video was an amazing introduction! simple, clear, but not over proffessional
    didn't leave me feeling overwhelmed, and i'm going to watch more of your tuts now, cheers mate!

  • @mrmxyzptlk8175
    @mrmxyzptlk8175 Рік тому +2

    Error: "No module named bs4"

    • @recursion.
      @recursion. Рік тому +1

      Facing the same, were you able to fix it?

  • @DroidEagle
    @DroidEagle 2 роки тому +2

    dude where were u?

  • @JccChanco
    @JccChanco 4 місяці тому +2

    So far in my life, this has been the smoothest learning process I have ever experienced. Thank you kind sir!

  • @Raxer_th
    @Raxer_th 2 роки тому +8

    This channel used to have like 100k views. Now its down to just less than 10k. Idk why. When I was around 13, I wanted to make an fps game and found his video to be very interesting. I follow this channel since then. Tinkernut was the reason I started learning programming. After watching his HTML tutorial (create a website from scratch). Even though I neither have com-sci degree nor working as a programmer, I'm still learning python during my freetime. Thank you Daniel.

    • @toniphillips9269
      @toniphillips9269 2 роки тому

      Yeah poops yeah lol iaooapaoopp lol oowss d’s aIA

  • @bng3832
    @bng3832 2 роки тому +2

    I swear to god you are the best!
    I know see why youtube dont recommend great videos. Its because youtube dont want people to study tech!!

  • @jpsl5281
    @jpsl5281 Рік тому +1

    its not working with opentable

  • @RigzoTV
    @RigzoTV 2 роки тому +2

    Need more advance lessons on scraping.

  • @ArqitectTV
    @ArqitectTV Рік тому +1

    What if the data you are searching for is obtainable but is on separate pages within a given site.

  • @Geeksmithing
    @Geeksmithing 2 роки тому +2

    Hey man, this is great!! Happy to another video from ya!

  • @KowboyUSA
    @KowboyUSA 2 роки тому +2

    Just the inexpensive project I needed.

  • @bodaciouschad
    @bodaciouschad Рік тому

    Your guide amounts to "download this library that does it for you" which doesn't really teach how the process at hand works. Simplicity at the cost the utility and educational value.

  • @lundebc
    @lundebc 2 роки тому +2

    Thanks for this tutorial, Looking forward to the next part.

  • @craftedpixel
    @craftedpixel 2 роки тому +2

    The legend is back!

  • @proxyscrape
    @proxyscrape Рік тому +2

    I love that you used a Raspberry Pi in this tutorial. It's amazing to mess around on and do little experiments.

  • @donsurlylyte
    @donsurlylyte 2 роки тому +1

    dude, that intro proves you have a bright future in infomercials!

  • @HayCorvus
    @HayCorvus 6 місяців тому +1

    I grew up in the early youtube days. I was a enamored by the computers knowledge that I could only get from channels like Tinkernut. There really was no schools that offered nuanced coding/web lessons when I was growing up. It wasn't until I went to college and got my degree in Computer Science that I'd be able to build a foundation in computational theory and all sorts of other fun subjects related to computers.
    Thanks for helping me along the way to that journey, Tinker!

  • @wrzq
    @wrzq 8 місяців тому +1

    Beautiful tutorial, exactly what I've been looking for. Thanks a lot, Man!

  • @royalhermit
    @royalhermit 2 роки тому +1

    What is line 10 "w"? I am getting NameError: name 'scraped_quotes' is not defined

    • @ashrude1071
      @ashrude1071 2 роки тому +1

      You probably have a typo

    • @Tinkernut
      @Tinkernut  2 роки тому +2

      Running it with my code from github works fine github.com/gigafide/basic_python_scraping/blob/main/basic_scrape_csv_export.py

  • @JayD-jn9or
    @JayD-jn9or 5 місяців тому

    Thanks for the vid! After a VERY VERY long time i'm getting back into casual coding and looking to casually make some scraping info programs for games with the option to select which info the person wants to see.
    So if the site allows scraping would it be better to have my app in progress be independant, have checks done once a minute or every dive minutes? Or have the info scraped, processed and posted on a site i create and retrieved for ppl using the the app? That is if i start shareing the app. My concern is annoying the site owners by checking too often, forgive me if its a silly question, i'm not experiance with scraping.

  • @mr.mcloremcclure2522
    @mr.mcloremcclure2522 Місяць тому

    This is not so easy on windows. Im a beginner at this, but it keeps giving me the "ModuleNotFoundError: No module named bs4". I have spent hours online trying to figure this out.

  • @hussainmahady5295
    @hussainmahady5295 2 роки тому +1

    Awesome 🔥 bro. Can you make a tutorial about tunnelling and vpns

    • @Tinkernut
      @Tinkernut  2 роки тому

      Sure can! I made them both a few years ago ;-) Just search my channel

  • @justanotherguy6359
    @justanotherguy6359 Місяць тому

    cant call scraping illegal, thats like saying you cant film in a public place, if they dont want it to be interacted with they can pull it from the web....where it is....in the view of the public...

  • @martinmcbrown6437
    @martinmcbrown6437 2 місяці тому

    Ok, so this is amazing, thank you! How would you generalize a scraper like I want to scrape all the news sites in the world and extract the main articles?

  • @santiagoSosaH
    @santiagoSosaH 2 роки тому +1

    wooooow it's been years that i didn't see a video about tinkernut. i think about 10 years ago i learned sql and php with your tutorial about making a webpage with users passwords etc.
    man so nice to see a video of you.

  • @santoshpandey23
    @santoshpandey23 7 місяців тому

    Thanks, this was very good, can you share any link where you have done the same for teh website which require username and password, can you please share the same, thanks a ton

  • @KontrolStyle
    @KontrolStyle 2 роки тому +1

    well explained, ty

  • @Pixilmb12
    @Pixilmb12 10 місяців тому

    I use IDLE, but for soup reason in the 'soup.findAll' function it says 'nameerror - name 'soup' not defined' :(

    • @Pixilmb12
      @Pixilmb12 10 місяців тому

      Fixed 🤦‍♂

  • @martinrages
    @martinrages 2 роки тому +1

    Can websites detect scraping? If so, how do i escape the dutch AIVD

    • @JoaoPedro-ki7ct
      @JoaoPedro-ki7ct 2 роки тому

      Yes, they have their ways to detect automated requests, but what they do when they detect "bots" is up to each website.

    • @LiEnby
      @LiEnby 2 роки тому +1

      yes and no, you can check for things like user agent string or try run javascript or something like that, however its actually a really hard problem to solve because a scraping script can look indistinguishable from a browser ..

  • @benjaminblack8653
    @benjaminblack8653 2 роки тому +8

    So glad to see you posting again! I missed your videos so much. I believe my first video of yours was either How to Setup a Webserver or How to Make an Operating System. Both excellent videos!

  • @eduardolz12
    @eduardolz12 2 роки тому

    I would give you 2 likes if i could

  • @Yunghokage18
    @Yunghokage18 Місяць тому

    I’m so sorry but I used vscode and I can’t find the csv file please how do I go about this?

  • @almutabbil-jn2pt
    @almutabbil-jn2pt 4 місяці тому

    The code didn't create any csv file although I didn't get any error ! why is that?

  • @gokulkanna-fj9rr
    @gokulkanna-fj9rr 6 місяців тому +1

    Start from 1:17

  • @nikro7239
    @nikro7239 7 місяців тому

    when I write to csv file for some reason there is always one free row (with literally nothing) between the actual rows with data

  • @6in602
    @6in602 21 день тому

    Are you still gonna make the next video showing how to access sites that require a login?

  • @santiagorodriguezrodriguez3704
    @santiagorodriguezrodriguez3704 Місяць тому

    This is nice! Now, I just want to know how do I know if the page I want to scrap allows it?

  • @Squid666
    @Squid666 6 місяців тому +1

    I always end up back here when I need a refresher on scraping ❤ thank you!

  • @codingmaster24
    @codingmaster24 2 роки тому +1

    Best yotuber.

  • @wlatol6512
    @wlatol6512 Місяць тому

    Any idea on how to identify whether website owners allow data scraping or not?

  • @kingofcastlechaos
    @kingofcastlechaos Рік тому

    Great content, thank you. How do we know if a website has a problem with scraping without actually going to jail and having to explain why I destroyed everything in our life to my wife? (Please your honor, I NEED the death penalty for this heinous crime. DO NOT LET THAT WOMAN BAIL ME OUT!)

  • @nostalgicnow6001
    @nostalgicnow6001 5 місяців тому

    It feels like api requesting for JavaScript

  • @paaao
    @paaao 2 роки тому

    Every python coding video is just some dood typing out a bunch of random gobbledygook and never explaining how to find the libraries, keywords, and functions in the first place... Why? Is this maybe intentional? Keep everything mysteriously vague.. I dunno 🤷🏼‍♂️

  • @user-wp8mk3yg4o
    @user-wp8mk3yg4o 8 місяців тому

    Which sites are you NOT allowed to scrape?

  • @RENO_K
    @RENO_K 6 місяців тому

    I'm only giving a good comments bc my gf told me too,
    Good video👍

  • @AllanYacaman
    @AllanYacaman 3 місяці тому

    this seems so refreshing? Why did he stop uploading?

  • @Web.Scraping
    @Web.Scraping 2 місяці тому

    Fantastic video. Short and useful 👍

  • @industrialdonut7681
    @industrialdonut7681 Рік тому

    This is not all you need to know. If the page you're scraping dynamically loads content through javascript, you should use selenium to render that before parsing the HTML.

  • @kyrianrahimatulla1561
    @kyrianrahimatulla1561 2 роки тому

    I had no clue it was this easy, but how do I find out which websites I'm not allowed to scrape? All I get from Google is ways to prevent scraping on my own website (which I don't have, but that's beyond the point).

  • @OrianaVerity
    @OrianaVerity Рік тому

    The really dry jokes are surprisingly pleasant.. who could scrape the web without a web? What do you think all the spiders think about that?

  • @DarthJeep
    @DarthJeep 2 роки тому

    Davy504 fan? "Scrape it..." Just kinda reminded me of the ol' "SLAP IT!" line. lol

  • @window_eye_when_do_i
    @window_eye_when_do_i 2 роки тому +1

    Awesome video! the code didn't run for me using findALL, but it worked with this...
    quotes = soup.find_all("span", attrs={"class":"text"})
    authors = soup.find_all("small", attrs={"class":"author"})

    • @dehnzel1
      @dehnzel1 Рік тому +1

      Thank you so much for this!

  • @liamhughes7093
    @liamhughes7093 Рік тому

    Great video. With the phrase "web scraper", I can't help but picture a function that returns a digital box chevy with candy paint, 26" chrome rims, tinted windows, and triple 15" subs in the trunk with some Too $hort going. I hope someone else from Northern California is thinking the same thing, and cracks up seeing this.
    But thank you for your fantastic educational video! cheers.

  • @jenschristiannrgaard4878
    @jenschristiannrgaard4878 9 місяців тому

    how much more difficult is it if I want all sub-pages where you would normally find more information?

  • @IamTheHolypumpkin
    @IamTheHolypumpkin 2 роки тому +3

    I just checked a website I want to scalp in a future, but this will be significantly more difficult. I want to get live train schedules but to the live data is inside Java-Script pop-up window.

    • @JoaoPedro-ki7ct
      @JoaoPedro-ki7ct 2 роки тому +1

      You might need to use dedicated tools for that, maybe things like Selenium or something related could help you with that.

  • @EdgarPauloVerchez
    @EdgarPauloVerchez 22 дні тому

    OMG! your channel is still alive! i remember 8yrs ago i made a keylogger with the help of one of your videos

  • @MagnusFernby
    @MagnusFernby 10 місяців тому

    Thanks a lot for this clear video! How would I retrieve more information associated with the quote? For instance I would like to receive and print both the author and the associated tags.

  • @jackrider798
    @jackrider798 2 роки тому +1

    Love your videos, I don’t understand much of the content, but what’s the difference between taking these quotes via code and just copy pasting into a excel sheet? I’m a noob sorry

    • @JoaoPedro-ki7ct
      @JoaoPedro-ki7ct 2 роки тому +1

      You can do it automatically every X amount of time.
      You can use a "bot" to do something with that data you scraped.
      I don't use Excel, but if you're talking about what I am thinking, Excel is doing exactly what was talked on this video; web scraping.
      The thing is that Excel is doing it for you without the need of you programing it first, but that web scraping it does is very, very limited to what tools made for scraping can do.

    • @Ryan1456100
      @Ryan1456100 2 роки тому +2

      In practice? Nothing is different, you get the same result. However, let's say you have a website with 2000 quotes and you need to keep a sheet up to date. That's where a scraper would be useful, as its time you really only need to spend once, plus, at that kind of scale it would be faster to write the code than do it manually.

    • @jackrider798
      @jackrider798 2 роки тому +1

      @@JoaoPedro-ki7ct thank you!

  • @renaaaa05
    @renaaaa05 2 місяці тому

    I was given a task in my internship that involved web scraping and this was very helpful, thank you!

  • @SarahGamigbigboss
    @SarahGamigbigboss Рік тому +2

    Funny how it's titled Beginners Guide to Scraping and once he's done with the introduction and starts typing a bunch of codes that " beginners" have absolutely no clue how to do... Thanks, man great help!

  • @lolkek6807
    @lolkek6807 7 місяців тому

    what if I want just the first quote?not all

  • @havenurmom5375
    @havenurmom5375 2 місяці тому

    this is entertaining the first thirty seconds lol

  • @gamerguy9533
    @gamerguy9533 6 місяців тому

    Thanks! Super basic but it was what I needed to make my code start working!

  • @WassupCarlton
    @WassupCarlton 2 місяці тому

    Is it `quote.text` because in the html, we see itemprop = "text"? If (for example) the html were instead `“The end is only the beginning.”`, would we rock with `quote.banana`?

  • @mrklean0292
    @mrklean0292 5 місяців тому

    Man... I've seen other web scraping tutorials and they take you ten miles down the road and through all types of advanced garbage at you. Granted, I know what you have shown here is the quick and easy way, but that's all I have wanted to get an understanding of, what it is, and how it basically works. Thank you.

  • @slattbizz22
    @slattbizz22 Місяць тому

    Honestly this is just what I needed 😭

  • @colinbrown6629
    @colinbrown6629 4 місяці тому

    Amazing video to get you started with scraping, thanks!

  • @Code_Play_com
    @Code_Play_com 7 місяців тому

    Very practical and helpful video with very detailed explanation!

  • @deepvoyager01
    @deepvoyager01 8 місяців тому

    Thank you for the video
    it helped me to understand how scrapper works

  • @htstube1
    @htstube1 Рік тому +1

    great video! seems very straight forward and easy to follow. I will be trying it out in the next day or two

  • @kenjohnsiosan9707
    @kenjohnsiosan9707 Рік тому

    it's a coincidence that I have a task to scrape data and format it to CSV then send it to email. thank you for this tutorial, sir.

  • @silversurfer3837
    @silversurfer3837 2 місяці тому

    Helpful indeed, thanks!

  • @dillkhalifa
    @dillkhalifa 9 місяців тому

    you owe me bro. i just subscribed to your channel😂😂

  • @flobbie87
    @flobbie87 2 роки тому

    Last time i did something like that i used a line mode browser to flatten the webpage.