Scraping with Playwright 101 - Easy Mode

Поділитися
Вставка
  • Опубліковано 25 лис 2024

КОМЕНТАРІ • 32

  • @robertramirez2167
    @robertramirez2167 7 місяців тому +6

    I like that image blocking tip!

  • @bigoper
    @bigoper 5 місяців тому +1

    This is awesome!!
    As an API Security Specialist, I always start by looking at the HTTP calls, searching for an API call that might have that same info. Saving me time from scraping the page. Most of the time I’m having success with that approach, especially when dealing with solid companies/websites/platforms.

  • @alexanderkomanov4151
    @alexanderkomanov4151 8 місяців тому +2

    Great one!
    I think that using pytest-playwright package can save several lines of code in the initialization part, because you can just use the page:Page fixture

  • @Extrey
    @Extrey 8 місяців тому +1

    Nooooo waaaay, i just found schema on another websites, nice trick anyway, but i find it more efficient to read the info from the category pages. Thanks for your videos, they always inspire me!!!

  • @graczew
    @graczew 8 місяців тому +1

    Good content as always. Enjoy your Easter break 😉👍

  • @NomadicDmitry
    @NomadicDmitry 3 місяці тому +1

    Really great tutorial! Thanks, John!

  • @bgriffin5447
    @bgriffin5447 4 місяці тому +1

    That split move was nice

  • @elu1
    @elu1 8 місяців тому

    Thank you John for the teaching. I seem to have issue with Xvfb for running 'headless'. Any suggestion or resources that I can learn from?

  • @fredde7356
    @fredde7356 8 місяців тому +1

    Hey John, can you please continue the scraping livestream with your test site? 😃
    Would love to see how to handle the drop-down menus, Java script and how to handle stricter cloudflare rules
    Would be happy to hear about some news! Enjoy easter :)

    • @munchcup
      @munchcup 7 місяців тому

      On cloudflare One idea is usually using undetected chrome driver to avoid cloudflare and you can put delay while logging in to solve the captchas the first time and save the cookies. After that you no longer need to solve captchas it will be automatic.

  • @donaldandmijung
    @donaldandmijung 3 місяці тому

    really well explained! is there a way to run the loop in the original browser? say if were only interested in the first page of the pagination and the products on only page 1.

  • @user-wu4ip7mp3z
    @user-wu4ip7mp3z 6 місяців тому +1

    I'm following this exact code in VSCode and only the initial web is opened, it doesn't open the subsequent pages that direct to each of the product, no idea how to fix this...

    • @user-wu4ip7mp3z
      @user-wu4ip7mp3z 6 місяців тому

      nvm, fixed it, turns out the data-selenium=...GridView... has been changed to [data-selenium='miniProductPageProductNameLink']

  • @IshaqKhan010
    @IshaqKhan010 7 місяців тому +1

    sir can you make a video how to deploy playwright script on google cloud function / vpc please

  • @carloiurcovici
    @carloiurcovici 8 місяців тому +1

    Thank you John, I've been really enjoying your videos recently and applying everything at work where it comes in really handy. Would you consider creating a python/scraping course on Udemy or a similar platform?

    • @JohnWatsonRooney
      @JohnWatsonRooney  8 місяців тому

      thanks for watching. I have thought about creating a course but no serious plans yet i;m afraid

    • @carloiurcovici
      @carloiurcovici 8 місяців тому

      @@JohnWatsonRooney thanks for the reply, if you change your mind you got my money 😂

  • @s6yx
    @s6yx 7 місяців тому

    Can’t you just do viewpoint for setting a screen size and header and run it headless with no issue

  • @mohsinhassan88
    @mohsinhassan88 8 місяців тому +3

    Omg why the white editor??

    • @рнт
      @рнт 8 місяців тому

      Exactly. When I saw it I immediately remembered this video: ua-cam.com/video/XlgqZeeoOtI/v-deo.html 😂

    • @tendosingh5682
      @tendosingh5682 8 місяців тому +2

      For some its easier on the eyes. MY eyes cant stand the dark themes.

    • @mohsinhassan88
      @mohsinhassan88 8 місяців тому

      @@рнт exactly how I felt. And specially since John usually has amazing videos and everything is so perfectly balanced in terms of theme and ease on eyes.
      I was a super shock

  • @alexdin1565
    @alexdin1565 8 місяців тому

    Thanks john, but now days most websites don't allow you to open links like you do they will block you after 3 or 4 pages open in same time
    another question If you can make a video on how we can use playwright inside a docker with proxy to make many requests at same time it will be very nice
    sorry for my English, I'm not a native speaker

  • @badrenanna3961
    @badrenanna3961 8 місяців тому +4

    can you please start talking about some difficult cases :
    - scraping a website that has cloudflare protection against bots (even using proxy rotation it didn't work)
    - scraping website that have captchas protection
    ..
    Thank you

    • @munchcup
      @munchcup 7 місяців тому +3

      One idea is usually using undetected chrome driver to avoid cloudflare and you can put delay while logging in to solve the captchas the first time and save the cookies. After that you no longer need to solve captchas it will be automatic.

  • @archiee1337
    @archiee1337 5 місяців тому

    why not headless?

  • @danueecitizen
    @danueecitizen 7 місяців тому

    can this work with amazon ? 🤔

  • @pkavenger9990
    @pkavenger9990 4 місяці тому +2

    Your content is good but i think you should engage with your audience more instead of speaking like you are talking to yourself. You will see that you will get much more views. Take Gotham chess channel for example he is not a Grandmaster of chess but His channels have more views and subscriber than Hikaru and Magnus because of his communication skills.