Browsers are Essential now? Scraping Amazon in 2023

Поділитися
Вставка
  • Опубліковано 22 лип 2023
  • Scraping Amazon in 2023 using Playwright and Python.
    Scraper API www.scrapingbee.com/?fpr=jhnwr
    Proxies I use: nodemaven.com/?a_aid=JohnWats...
    Patreon: / johnwatsonrooney
    Donations: www.paypal.com/donate/?hosted...
    Hosting: Digital Ocean: m.do.co/c/c7c90f161ff6
    Gear I use: www.amazon.co.uk/shop/johnwat...
  • Наука та технологія

КОМЕНТАРІ • 76

  • @AliceShisori
    @AliceShisori 10 місяців тому +8

    John, I'm not exaggerating at all when I'm saying your channel is one of the absolute best when it comes to hands-on tutorials of anything I've watched on UA-cam. Thank you so much for this!

    • @JohnWatsonRooney
      @JohnWatsonRooney  10 місяців тому +2

      thank you thats very kind!

    • @AliceShisori
      @AliceShisori 10 місяців тому +3

      @@JohnWatsonRooney if possible, can you perhaps create a longer video about web automation from beginner-level? I searched your playlist but it's not being updated consistently.
      or if you have created a course somewhere (udemy/coursera) I think many of us would consider buying it.

  • @eddie_2542
    @eddie_2542 11 місяців тому +11

    Your tips and tricks have helped me a lot and I always look forward to your videos. God bless you John

  • @giannisnik5295
    @giannisnik5295 11 місяців тому +2

    Excellent video John!!Thank you!

  • @valuetraveler2026
    @valuetraveler2026 11 місяців тому +1

    I always watch your videos when it comes to scraping.

  • @tonytiger6874
    @tonytiger6874 10 місяців тому +1

    perfect amount of time for videos. No fat to trim, great work.

  • @pldvs
    @pldvs 11 місяців тому +1

    Nice, thanks.

  • @FilPill
    @FilPill 11 місяців тому +1

    Thanks John, works likes a charm :)

  • @edo8647
    @edo8647 6 місяців тому

    Amazing video sir, thank you!
    Subed!

  • @JesFinkJensen
    @JesFinkJensen 11 місяців тому +1

    Thanks!

  • @tigerbojiteol
    @tigerbojiteol 11 місяців тому +2

    Thanks for the video. Really useful and helpful! Btw loved to see you got distracted with ridiculously expensive cameras 😂

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 місяців тому +2

      Haha yeah. I’m drawn in by them always!

  • @bagobullets4188
    @bagobullets4188 3 місяці тому +2

    11:19 - this is exactly why its so difficult to learn Python. I just be getting distracted every second like "Oh this dude did it this way, maybe I need that"

  • @sujitbiswas1995
    @sujitbiswas1995 11 місяців тому +1

    You are the boss for a reason. Take love ❤ boss

  • @panagiotisfessas3709
    @panagiotisfessas3709 11 місяців тому

    Great content! Master of web scraping out there! By any chance, would you consider making a video on how to scrape articles data from medium, given the url? Would be vey much appreciated 😊

  • @ekkyarmandi
    @ekkyarmandi 11 місяців тому +2

    You can do `:!python3 .py` to execute script directly from nvim

  • @Zer0G101
    @Zer0G101 11 місяців тому +7

    Applied This method to scrap about 6 amazon pages every 15mins after 2 days I have got captcha and can’t get around it 😟 using raspberry pi os

    • @AwesomeCameras
      @AwesomeCameras 24 дні тому

      im getting the same problem, any luck solving it?

  • @ydvkuldeep5246
    @ydvkuldeep5246 10 місяців тому

    When performing web scraping, if you encounter websites where certain HTML elements, like divs, lack consistent information (for example, one company's div has all details while another company's div is missing revenue information), it can lead to issues when converting the data into a CSV file.

  • @ericxls93
    @ericxls93 11 місяців тому

    Thank John, very good indeed.
    Not sure I liked the selectolax - will stick to bs4
    also on your Main() loop, for asin in asisns... Think you are launching a new browser for every asin/run...
    also, are headers needed? or will the PW chromium generate some?

  • @ItsWork-web
    @ItsWork-web 11 місяців тому +1

    👏👏

  • @rick-hoekman
    @rick-hoekman 11 місяців тому

    Love this. Quick no non-sense and the point! And you are also progressing at warp speed :) Quick question; When writing out .csv files in python normally I have to use 'import os'. Are some of the libraries you use including that already? And if so how can I check that?

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 місяців тому +1

      thanks! I don't think you need the OS module unless you are moving around the filesystem, and as i typically dump my outputs to the same folder I just use the CSV module and save the file

    • @rick-hoekman
      @rick-hoekman 11 місяців тому +1

      @@JohnWatsonRooney Yeah ussually it's the current directory.. It happened when I import some other library it suddenly shifted to the root. Thanks for the swift feedback! :D

  • @ottomanasina1254
    @ottomanasina1254 11 місяців тому +1

    Thanks for a nice video. Quick question: how would you scrape around 50k ASINs on Amazon. what async methods would you use, Generally what would be your approach, can you suggest? Thanks!

  • @bakasenpaidesu
    @bakasenpaidesu 11 місяців тому +2

    Long time no see.
    UA-cam was not recommending me ur vids :(.
    Do you have neovim code editor tutorial ?

  • @alessiogarau7948
    @alessiogarau7948 8 місяців тому +1

    Hi, and thank you! Is there a reason why I get this error when I run this code in the Spyder IDE using Anaconda?
    Error: It looks like you are using Playwright Sync API inside the asyncio loop.
    Please use the Async API instead.

  • @joj0ee
    @joj0ee 11 місяців тому +1

    Thanks!
    What is your first check when scraping a site? Look for API in network tab then try to recreate the request… if can’t find anything then resort to using a browser to load the html?

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 місяців тому +1

      first i check to see if the data i want is in the HTML, then the network tab, then decide if i need browser automation from there

  • @malwaredev33
    @malwaredev33 11 місяців тому

    Hi, bro Hope you are fine.

  • @omyele9315
    @omyele9315 11 місяців тому

    Sir, I am working on a web scaping project in which I will have 2 input headline, link for details of headline. Which we will provide. But in web scaping we have to tell class / id . Instead of that I want to create one function in which we will pass news headline and news details link and it will return image on that website and also article as per headline automatically. Everytime news headline and news details link will be different. It will automatically extract news details based on website. Can you make such web scaping video

  • @MrAmrmnabil74
    @MrAmrmnabil74 11 місяців тому +1

    Hi thank you for your awesome work
    What is the IDE you're using

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 місяців тому

      It's Neovim with oxocarbon theme, and @teej_dv 's starter config

  • @yamani3882
    @yamani3882 8 місяців тому +2

    This makes you undetected as bot? Cuz I don’t want to get blocked? By the way it would be great to demo things in the beginning so we know what to expect.

  • @peteralexander4892
    @peteralexander4892 11 місяців тому +1

    Thanks for the video, do you happen to know when Amazon implemented the login requirement?

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 місяців тому +1

      Last few months I think. I’m sure there’s a better way around it but for now I’m ok with this version

    • @peteralexander4892
      @peteralexander4892 11 місяців тому +1

      ​@@JohnWatsonRooney Oh, I see. I have a small setup on Zyte which uses the proxy API for retries, there has been a quite an uptick in failed requests, this may explain it.

  • @lom2086
    @lom2086 7 місяців тому +1

    Is it possible to deploy a Webapp which involves Playwright code? Need help

  • @divyanshugogna6152
    @divyanshugogna6152 13 днів тому

    Any suggestion how to scrape Amazon now in 2024 john?
    Given Amazon now only passes the visible region of page to html and needs us to scroll to see other initially non visible part of page to get to html ( but this duplicates previously stored variables randomly)

  • @mecrayavcin
    @mecrayavcin 11 місяців тому +2

    What is the difference between github scrapy-playwright and Playwright for Python?
    Scrapy-playwright was not workin on Windows! Playwright for Python does. Can we scrape Javascript based pages with Playwright for Python?
    Thanks

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 місяців тому +2

      Yes you can. Scrapy-playwright it’s the integration between scrapy and playwright. Playwright itself it’s the way to controll the browser you can use it by itself with python or JavaScript to scrape data

  • @Textras
    @Textras 11 місяців тому

    Do you ever use puppeteer in lieu of playright now? BiDi looks exciting but still not supported by Safari

  • @samy_crash
    @samy_crash 11 місяців тому +2

    Hi john, what ide are you using in this video?

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 місяців тому +2

      This is neovim, oxocarbon theme and teej_dv starter confit

  • @block_hacks
    @block_hacks 8 місяців тому

    Is it possible to get the source code from the video?

  • @cosmicblack
    @cosmicblack 11 місяців тому +1

    Dou you habe a video tunning neovim?

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 місяців тому

      I don't but essentially its just @teej_dv starter config from his repo. super easy to setup and use. I jsuta dded the Oxocarbon theme

  • @FabioRBelotto
    @FabioRBelotto 7 місяців тому

    Did you share the code github?

  • @quanghieuvu1012
    @quanghieuvu1012 11 місяців тому

    Can we bypass cloudflare? This is a hard problem but do you have any technique. T_T

  • @wicked9299
    @wicked9299 9 місяців тому +1

    can you do this to get reviews?

    • @JohnWatsonRooney
      @JohnWatsonRooney  9 місяців тому

      Yes, video for that is coming in a week or so!

  • @ffgaming-fe3cx
    @ffgaming-fe3cx 4 місяці тому +1

    what does asin mean?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 місяці тому

      its the amazon product code: Amazon Standard Identification Number

  • @kawsarlog
    @kawsarlog 11 місяців тому +1

    Why Playwright instead of selenium?

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 місяців тому +2

      They are both good I just prefer playwrights API and I find it easier to setup and get running

    • @kawsarlog
      @kawsarlog 11 місяців тому +1

      @@JohnWatsonRooney great, that's fair enough 😁

  • @jan-davidwiederstein1543
    @jan-davidwiederstein1543 11 місяців тому

    is this legal?

  • @Laughtube.01
    @Laughtube.01 5 місяців тому

    you always start code with middle never show code what first you have write just like 2:42

  • @malwaredev33
    @malwaredev33 11 місяців тому

    Bro, your content quality is awesome but your accent is not clear. Make sure your speaking quality is clear. please improve it as you can.👍

    • @ramelox
      @ramelox 9 місяців тому

      It's the British accent. It's a feature not a bug. Nothing to improve.

  • @sforjgoom5661
    @sforjgoom5661 7 місяців тому

    Test... my comments will be deleted automatically? Is that true? Why? ...

    • @JohnWatsonRooney
      @JohnWatsonRooney  7 місяців тому

      Did you post a link ?

    • @sforjgoom5661
      @sforjgoom5661 7 місяців тому

      The youtube-algorithm obviously recognized the a*s*i-n as a forbidden word. So i had to change it in the above comment. So try it at the german part 'de' of that shopping empire. It's much cheaper there today. and I think they deliver it to the UK too for a small or even no price. (.fr even cheaper - same asin as in Germany - you pay one-seven-eight-nine instead aof three-four-one-zero) - And thanks a lot for your interesting videos! Thanks to you I scrape this shopping empire across several european countries. Some asins are the same, other differ.