Browsers are Essential now? Scraping Amazon in 2023
Вставка
- Опубліковано 22 лип 2023
- Scraping Amazon in 2023 using Playwright and Python.
Scraper API www.scrapingbee.com/?fpr=jhnwr
Proxies I use: nodemaven.com/?a_aid=JohnWats...
Patreon: / johnwatsonrooney
Donations: www.paypal.com/donate/?hosted...
Hosting: Digital Ocean: m.do.co/c/c7c90f161ff6
Gear I use: www.amazon.co.uk/shop/johnwat... - Наука та технологія
John, I'm not exaggerating at all when I'm saying your channel is one of the absolute best when it comes to hands-on tutorials of anything I've watched on UA-cam. Thank you so much for this!
thank you thats very kind!
@@JohnWatsonRooney if possible, can you perhaps create a longer video about web automation from beginner-level? I searched your playlist but it's not being updated consistently.
or if you have created a course somewhere (udemy/coursera) I think many of us would consider buying it.
Your tips and tricks have helped me a lot and I always look forward to your videos. God bless you John
Awesome, thank you!
Excellent video John!!Thank you!
Many thanks!
I always watch your videos when it comes to scraping.
perfect amount of time for videos. No fat to trim, great work.
Thank you very kind!
Nice, thanks.
Thanks John, works likes a charm :)
Nice!
Amazing video sir, thank you!
Subed!
Thanks!
Thanks for the video. Really useful and helpful! Btw loved to see you got distracted with ridiculously expensive cameras 😂
Haha yeah. I’m drawn in by them always!
11:19 - this is exactly why its so difficult to learn Python. I just be getting distracted every second like "Oh this dude did it this way, maybe I need that"
You are the boss for a reason. Take love ❤ boss
Great content! Master of web scraping out there! By any chance, would you consider making a video on how to scrape articles data from medium, given the url? Would be vey much appreciated 😊
You can do `:!python3 .py` to execute script directly from nvim
Applied This method to scrap about 6 amazon pages every 15mins after 2 days I have got captcha and can’t get around it 😟 using raspberry pi os
im getting the same problem, any luck solving it?
When performing web scraping, if you encounter websites where certain HTML elements, like divs, lack consistent information (for example, one company's div has all details while another company's div is missing revenue information), it can lead to issues when converting the data into a CSV file.
Thank John, very good indeed.
Not sure I liked the selectolax - will stick to bs4
also on your Main() loop, for asin in asisns... Think you are launching a new browser for every asin/run...
also, are headers needed? or will the PW chromium generate some?
👏👏
Love this. Quick no non-sense and the point! And you are also progressing at warp speed :) Quick question; When writing out .csv files in python normally I have to use 'import os'. Are some of the libraries you use including that already? And if so how can I check that?
thanks! I don't think you need the OS module unless you are moving around the filesystem, and as i typically dump my outputs to the same folder I just use the CSV module and save the file
@@JohnWatsonRooney Yeah ussually it's the current directory.. It happened when I import some other library it suddenly shifted to the root. Thanks for the swift feedback! :D
Thanks for a nice video. Quick question: how would you scrape around 50k ASINs on Amazon. what async methods would you use, Generally what would be your approach, can you suggest? Thanks!
Long time no see.
UA-cam was not recommending me ur vids :(.
Do you have neovim code editor tutorial ?
welcome back. not yet sorry :!
Hi, and thank you! Is there a reason why I get this error when I run this code in the Spyder IDE using Anaconda?
Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.
Thanks!
What is your first check when scraping a site? Look for API in network tab then try to recreate the request… if can’t find anything then resort to using a browser to load the html?
first i check to see if the data i want is in the HTML, then the network tab, then decide if i need browser automation from there
Hi, bro Hope you are fine.
Sir, I am working on a web scaping project in which I will have 2 input headline, link for details of headline. Which we will provide. But in web scaping we have to tell class / id . Instead of that I want to create one function in which we will pass news headline and news details link and it will return image on that website and also article as per headline automatically. Everytime news headline and news details link will be different. It will automatically extract news details based on website. Can you make such web scaping video
Hi thank you for your awesome work
What is the IDE you're using
It's Neovim with oxocarbon theme, and @teej_dv 's starter config
This makes you undetected as bot? Cuz I don’t want to get blocked? By the way it would be great to demo things in the beginning so we know what to expect.
Thanks for the video, do you happen to know when Amazon implemented the login requirement?
Last few months I think. I’m sure there’s a better way around it but for now I’m ok with this version
@@JohnWatsonRooney Oh, I see. I have a small setup on Zyte which uses the proxy API for retries, there has been a quite an uptick in failed requests, this may explain it.
Is it possible to deploy a Webapp which involves Playwright code? Need help
Any suggestion how to scrape Amazon now in 2024 john?
Given Amazon now only passes the visible region of page to html and needs us to scroll to see other initially non visible part of page to get to html ( but this duplicates previously stored variables randomly)
What is the difference between github scrapy-playwright and Playwright for Python?
Scrapy-playwright was not workin on Windows! Playwright for Python does. Can we scrape Javascript based pages with Playwright for Python?
Thanks
Yes you can. Scrapy-playwright it’s the integration between scrapy and playwright. Playwright itself it’s the way to controll the browser you can use it by itself with python or JavaScript to scrape data
Do you ever use puppeteer in lieu of playright now? BiDi looks exciting but still not supported by Safari
Hi john, what ide are you using in this video?
This is neovim, oxocarbon theme and teej_dv starter confit
Is it possible to get the source code from the video?
Dou you habe a video tunning neovim?
I don't but essentially its just @teej_dv starter config from his repo. super easy to setup and use. I jsuta dded the Oxocarbon theme
Did you share the code github?
Can we bypass cloudflare? This is a hard problem but do you have any technique. T_T
can you do this to get reviews?
Yes, video for that is coming in a week or so!
what does asin mean?
its the amazon product code: Amazon Standard Identification Number
Why Playwright instead of selenium?
They are both good I just prefer playwrights API and I find it easier to setup and get running
@@JohnWatsonRooney great, that's fair enough 😁
is this legal?
you always start code with middle never show code what first you have write just like 2:42
Bro, your content quality is awesome but your accent is not clear. Make sure your speaking quality is clear. please improve it as you can.👍
It's the British accent. It's a feature not a bug. Nothing to improve.
Test... my comments will be deleted automatically? Is that true? Why? ...
Did you post a link ?
The youtube-algorithm obviously recognized the a*s*i-n as a forbidden word. So i had to change it in the above comment. So try it at the german part 'de' of that shopping empire. It's much cheaper there today. and I think they deliver it to the UK too for a small or even no price. (.fr even cheaper - same asin as in Germany - you pay one-seven-eight-nine instead aof three-four-one-zero) - And thanks a lot for your interesting videos! Thanks to you I scrape this shopping empire across several european countries. Some asins are the same, other differ.