This is awesome!! As an API Security Specialist, I always start by looking at the HTTP calls, searching for an API call that might have that same info. Saving me time from scraping the page. Most of the time I’m having success with that approach, especially when dealing with solid companies/websites/platforms.
Great one! I think that using pytest-playwright package can save several lines of code in the initialization part, because you can just use the page:Page fixture
Nooooo waaaay, i just found schema on another websites, nice trick anyway, but i find it more efficient to read the info from the category pages. Thanks for your videos, they always inspire me!!!
Hey John, can you please continue the scraping livestream with your test site? 😃 Would love to see how to handle the drop-down menus, Java script and how to handle stricter cloudflare rules Would be happy to hear about some news! Enjoy easter :)
On cloudflare One idea is usually using undetected chrome driver to avoid cloudflare and you can put delay while logging in to solve the captchas the first time and save the cookies. After that you no longer need to solve captchas it will be automatic.
really well explained! is there a way to run the loop in the original browser? say if were only interested in the first page of the pagination and the products on only page 1.
I'm following this exact code in VSCode and only the initial web is opened, it doesn't open the subsequent pages that direct to each of the product, no idea how to fix this...
Thank you John, I've been really enjoying your videos recently and applying everything at work where it comes in really handy. Would you consider creating a python/scraping course on Udemy or a similar platform?
@@рнт exactly how I felt. And specially since John usually has amazing videos and everything is so perfectly balanced in terms of theme and ease on eyes. I was a super shock
Thanks john, but now days most websites don't allow you to open links like you do they will block you after 3 or 4 pages open in same time another question If you can make a video on how we can use playwright inside a docker with proxy to make many requests at same time it will be very nice sorry for my English, I'm not a native speaker
can you please start talking about some difficult cases : - scraping a website that has cloudflare protection against bots (even using proxy rotation it didn't work) - scraping website that have captchas protection .. Thank you
One idea is usually using undetected chrome driver to avoid cloudflare and you can put delay while logging in to solve the captchas the first time and save the cookies. After that you no longer need to solve captchas it will be automatic.
Your content is good but i think you should engage with your audience more instead of speaking like you are talking to yourself. You will see that you will get much more views. Take Gotham chess channel for example he is not a Grandmaster of chess but His channels have more views and subscriber than Hikaru and Magnus because of his communication skills.
I like that image blocking tip!
This is awesome!!
As an API Security Specialist, I always start by looking at the HTTP calls, searching for an API call that might have that same info. Saving me time from scraping the page. Most of the time I’m having success with that approach, especially when dealing with solid companies/websites/platforms.
Great one!
I think that using pytest-playwright package can save several lines of code in the initialization part, because you can just use the page:Page fixture
Nooooo waaaay, i just found schema on another websites, nice trick anyway, but i find it more efficient to read the info from the category pages. Thanks for your videos, they always inspire me!!!
Good content as always. Enjoy your Easter break 😉👍
Really great tutorial! Thanks, John!
That split move was nice
Thank you John for the teaching. I seem to have issue with Xvfb for running 'headless'. Any suggestion or resources that I can learn from?
Hey John, can you please continue the scraping livestream with your test site? 😃
Would love to see how to handle the drop-down menus, Java script and how to handle stricter cloudflare rules
Would be happy to hear about some news! Enjoy easter :)
On cloudflare One idea is usually using undetected chrome driver to avoid cloudflare and you can put delay while logging in to solve the captchas the first time and save the cookies. After that you no longer need to solve captchas it will be automatic.
really well explained! is there a way to run the loop in the original browser? say if were only interested in the first page of the pagination and the products on only page 1.
I'm following this exact code in VSCode and only the initial web is opened, it doesn't open the subsequent pages that direct to each of the product, no idea how to fix this...
nvm, fixed it, turns out the data-selenium=...GridView... has been changed to [data-selenium='miniProductPageProductNameLink']
sir can you make a video how to deploy playwright script on google cloud function / vpc please
Thank you John, I've been really enjoying your videos recently and applying everything at work where it comes in really handy. Would you consider creating a python/scraping course on Udemy or a similar platform?
thanks for watching. I have thought about creating a course but no serious plans yet i;m afraid
@@JohnWatsonRooney thanks for the reply, if you change your mind you got my money 😂
Can’t you just do viewpoint for setting a screen size and header and run it headless with no issue
Omg why the white editor??
Exactly. When I saw it I immediately remembered this video: ua-cam.com/video/XlgqZeeoOtI/v-deo.html 😂
For some its easier on the eyes. MY eyes cant stand the dark themes.
@@рнт exactly how I felt. And specially since John usually has amazing videos and everything is so perfectly balanced in terms of theme and ease on eyes.
I was a super shock
Thanks john, but now days most websites don't allow you to open links like you do they will block you after 3 or 4 pages open in same time
another question If you can make a video on how we can use playwright inside a docker with proxy to make many requests at same time it will be very nice
sorry for my English, I'm not a native speaker
can you please start talking about some difficult cases :
- scraping a website that has cloudflare protection against bots (even using proxy rotation it didn't work)
- scraping website that have captchas protection
..
Thank you
One idea is usually using undetected chrome driver to avoid cloudflare and you can put delay while logging in to solve the captchas the first time and save the cookies. After that you no longer need to solve captchas it will be automatic.
why not headless?
can this work with amazon ? 🤔
Your content is good but i think you should engage with your audience more instead of speaking like you are talking to yourself. You will see that you will get much more views. Take Gotham chess channel for example he is not a Grandmaster of chess but His channels have more views and subscriber than Hikaru and Magnus because of his communication skills.
Fair point thanks for the advice