Login and Scrape Data with Playwright and Python

Поділитися
Вставка
  • Опубліковано 29 січ 2025

КОМЕНТАРІ • 178

  • @AsfandSaleem
    @AsfandSaleem 2 роки тому +33

    This is the first time I got introduced to Playwright, so much more elegant than Selenium. Thanks for sharing!

  • @martynclarke8400
    @martynclarke8400 14 днів тому +1

    Great content man, keep it up. One of the best channels on scraping and automation

  • @wanderingfool7136
    @wanderingfool7136 2 роки тому +8

    You have the absolute best videos on UA-cam!!! I'm resisting the urge to type in all caps right now lol but seriously, this video just helped me finish a $200 project!! Thanks again for all you do for the community 🙏🙏🙏

  • @BartVanLandschoot
    @BartVanLandschoot 2 роки тому +9

    After seeing many videos and trials to do web scraping on secured websites, this has finally brought the solution. Thank you so much!
    Attention on cookies: Playwright acts as a new/clean browser. So opening a website from the script is like visiting it for the first time. I discovered that the website I wanted to scrape, started with a cookie banner that you have to click. So before filling in the username and password, I had to do a page.click('button#btn-accept-cookies')

  • @Zale370
    @Zale370 3 роки тому +35

    Great video John, as usual! I started using playwirght a few months ago and prefer it to selenium or helium, it is much faster, way less error prone and it is being updated constantly.

  • @dimaua1830
    @dimaua1830 3 роки тому +7

    Hi John. Just wanted to say thank you and please keep making these videos. I have been studying Data Analytics online and just got a job offer for analytics position. Even thought it does not directly require programming skills your helped me to stay motivated, opened up opportunities for automation and inspired to do some interesting projects.
    Thanks again and keep it up!

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому +2

      Thank you, very kind! I’m glad you have enjoyed my videos

  • @nch77884
    @nch77884 2 роки тому +4

    no nonsense and to the point short video. thanks John

  • @beware5159
    @beware5159 2 роки тому +9

    Great Work Man!
    Always right to the point and no fillers. Thanks for you hard work 🙌!

  • @guestvil
    @guestvil 2 місяці тому

    Thanks! How to retrieve the html using Playwright and pass it to Bs4 was exactly what I needed to know. Great video.

  • @Analyse_US
    @Analyse_US 3 роки тому +3

    Great stuff, as usual. I am using stuff I learn from your channel almost everyday.

  • @franky12
    @franky12 2 роки тому +4

    Great video and practical example! Would like to see more advanced stuff with playwright.

  • @danlee1027
    @danlee1027 Рік тому +1

    Very helpful as usual. I look forward to your Playwright series

  • @obi1998
    @obi1998 6 місяців тому

    Thank you for a useful and concise tutorial. So many Playwright videos are all tied up with Pytest, which I don't need for my usecases.

  • @andro_id
    @andro_id 2 роки тому +1

    My first intro into Playwright :)
    It's awesome, thank you!

  • @KendaBeatMaker
    @KendaBeatMaker 2 роки тому +2

    Getting Selenium running on Google Cloud Platform was so much stress.
    Last night I setup Playwright with no stress, no extra work was needed.

  • @LucLev
    @LucLev Рік тому +2

    I've been religiously watching your videos for the last week or so.
    Such a great source of information, you're a great teacher, very direct and to the point!
    I've succesfully set up a project scraping data from betting sites to find arbitrage opportunities - mainly via hidden API's.
    But some pesky websites seem to restrict their APIs - hoping to solve this with playwright :).

    • @JohnWatsonRooney
      @JohnWatsonRooney  Рік тому

      Thanks a lot I appreciate it! Good luck with your projects!

  • @jenozenB
    @jenozenB 2 місяці тому +1

    Wow! I absolutely love your videos! New subscriber coming in 🎉

  • @David-mj9st
    @David-mj9st 2 роки тому +1

    It's my luck to find your video,it make me learn python much easer.THANKS!

  • @andres777video
    @andres777video Рік тому +1

    I like this, very useful, and it can be combined with Selenium if desired...

  • @PaulDenman
    @PaulDenman 2 роки тому +1

    Excellent tute John, thank you for such clarity

  • @Cheenaah-tw8xx
    @Cheenaah-tw8xx 7 місяців тому +1

    2:38
    bro thought we couldnt see "bye"???
    btw your video helped greatly!

  • @haideralihassan5053
    @haideralihassan5053 3 роки тому +1

    Very informative video.
    Looking for more videos on playwright.

  • @dennistanui7085
    @dennistanui7085 3 роки тому +2

    Thanks for sharing these awesome tutorials. You sir, are a gem

  • @AshleyMush
    @AshleyMush 8 місяців тому

    Wow, this is much cleaner than selenium

  • @Saeed-ko9wp
    @Saeed-ko9wp Рік тому +1

    thanks john useful as usual

  • @kacheck855
    @kacheck855 3 роки тому +1

    Thank you John, please make more videos of playwright.

  • @youandainews
    @youandainews Рік тому +1

    Mate you're the best for this stuff. Your deadpan style also makes me laugh. I bet you have a wicked sense of humour. Remind me of the russians. Dry as anything, and wizards with code!

  • @bozok1903
    @bozok1903 2 роки тому +1

    It look much easier and cleaner than selenium. Thanks for the great video.

  • @ferilukmansyah_dev
    @ferilukmansyah_dev 3 роки тому +1

    Thanks for sharing john, this is a great tutorial ever

  • @Abdul_Rafay_Pal
    @Abdul_Rafay_Pal 2 роки тому +1

    Thank you very much. You made things so much simple, easy Thanks a lot

  • @muhammadazmulhaq
    @muhammadazmulhaq 2 роки тому

    Good competition between Cypress vs Playwright vs Selenium.
    Thanks for this video. Love from Pakistan 🇵🇰

  • @tonymudau3005
    @tonymudau3005 2 роки тому +1

    Thank you my brother!

  • @fabpx
    @fabpx 9 місяців тому +1

    Thank you so much. It helped me a lot..

  • @marcoalmeida2136
    @marcoalmeida2136 Рік тому +1

    Dropping by to say thank you for this tutorial!!!! And also ask which theme did you use?

    • @JohnWatsonRooney
      @JohnWatsonRooney  Рік тому +1

      Thanks- it’s one of the GitHub themes, my favourite for vs code at the moment

  • @Mreto17
    @Mreto17 Рік тому

    Thanks John for sharing. How can I reuse the login session?

  • @priyankarajput2208
    @priyankarajput2208 Рік тому

    Great video!!....is there more tutorials/videos on playwright (for scraping)?

  • @Ardassali
    @Ardassali 3 роки тому +1

    Thanks! Master Rooney.

  • @baghdadiabdellatif1581
    @baghdadiabdellatif1581 4 місяці тому

    Great work 👌👏💯

  • @GrantNaylor-b8l
    @GrantNaylor-b8l 9 місяців тому

    Finally my Selenium webdriver headache has gone :D

    • @GrantNaylor-b8l
      @GrantNaylor-b8l 9 місяців тому

      Any advice on getting Chromium driver to work? Webkit will work ;-)

    • @GrantNaylor-b8l
      @GrantNaylor-b8l 9 місяців тому

      An afternoon, cookie session, soup, json, all pushed to Google sheets webapp.. Loving this!

  • @gauravpainuly1800
    @gauravpainuly1800 2 роки тому +1

    subscribed....... please keep on making videos like that ...thanks

  • @androidmod183
    @androidmod183 3 роки тому +1

    Thank you for sharing, i like your channel. Keep it up mate.

  • @truonganhuynh9161
    @truonganhuynh9161 3 роки тому

    i got trouble, RuntimeError: Event loop is closed, when i ran the code in 7:28p. what are these errors?

  • @ghabcdef
    @ghabcdef 2 роки тому +1

    Thanks for the tutorial... I think the demo site has changed though, the last part of the script does not work. In particular the html output of page.inner_html('#content') looks nothing like the demo and the subsequent steps do not return the results in the tutoral.

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому

      thanks. unforuntately this is often the case, things change- this is why i try to demo the methods rather than specific sites. but it just furthers the need for me to build my own web scraping test site!

    • @garymichalske2274
      @garymichalske2274 Рік тому

      @@JohnWatsonRooney I have the same issue. Although it doesn't make sense because I can see the h2 tags in the html enclosed in online. It seems like playwright is ignoring the h2 tags. When I print(html) after the line html = page.inner_html('#content'), the result in the editor does not show any h2 tags. It doesn't come close to the section of code I see online.

  • @enamils
    @enamils Рік тому

    Realy help me thanks a lot i need to hide browser page after connect in for loop

  • @junivensaavedra882
    @junivensaavedra882 2 роки тому +1

    Hi John, I would like to ask if you still use Playwrite? or do you have new favorite? Like the tools I learn from you httpx and selectolax.

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому +1

      I sure do. If I need to run a headless browser I use playwright the other tools like httpx and selectolax do different things and are my go to for making requests and parsing html

    • @junivensaavedra882
      @junivensaavedra882 2 роки тому

      @@JohnWatsonRooney thank you very much for responding. :)

  • @Nabilh17
    @Nabilh17 Рік тому +1

    very interesting, thank you

  • @joeb.1163
    @joeb.1163 10 місяців тому +1

    Can playwright be pointed to the browser installed on the machine instead of the one that is playwright installs?

    • @JohnWatsonRooney
      @JohnWatsonRooney  10 місяців тому

      Yes you can connect via cdp(?) protocol to a running browser - it’s in the docs somewhere I’m sure

  • @stewart5136
    @stewart5136 2 роки тому +1

    10 out of10 again!
    Haven't installed Playwright yet and wondered how you found it for speed vs Selenium?
    In an earlier reply you mention that you prefer PyCharm now over VS Code. Will the community version work for most or do we need the Pro version?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому +1

      I think it’s about the same speed wise I just find it much easier to work with - yes PyCharm community is good it’s what I use!

  • @kthehatter
    @kthehatter 2 роки тому

    hey great video i loved it just wondering if it's possible to open multiple tabs simultaneously ?

  • @irfanshaikh262
    @irfanshaikh262 2 роки тому

    Hey John,
    I was precisely looking up for a technique like this for an upcoming project I'm aligned to where we need to login inside one of our company's internal web tool and scrape the leads generation table that appears post loggin in, write it to an excel file and resulting file will be attached to a Bi dashboard for automatic updates and publishing.
    Will this technique of yours work or would you care to give some more of yours experts advice?
    Thanks for being there.
    As a self taught pyhtoneer new to programming, you give me a lot of hope with your content.
    Thanks for being there for people like me. ❤❤❤❤

  • @locopollo666
    @locopollo666 3 роки тому +1

    Great video! Thanks

  • @seyproductions
    @seyproductions 2 роки тому

    Hi, does Playwright or its browser(s) need to be updated when a newer version of the browser that we are using for the scraping gets released?

  • @ruthlessmarketresearch4957
    @ruthlessmarketresearch4957 2 роки тому +1

    what code editor do you use?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому

      This is VS Code with one of the Github Themes - Dark Dimmed I believe. Honestly I prefer PyCharm nowadays

  • @jarekmor
    @jarekmor 2 роки тому +1

    Beautiful!!! 🙂

    • @jarekmor
      @jarekmor 2 роки тому

      John, no words to thank you for all your tutorials! I hope you will show us more cases based on playwright. Thank you!!!

  • @marius35mm
    @marius35mm 2 роки тому +1

    Thanks a lot!

  • @cassiolacaz
    @cassiolacaz 3 роки тому

    As always, a great video John! As an expert in Requests, do you know if it is possible to use Playwright together with Requests? Tks

  • @Wassilvideos
    @Wassilvideos 3 роки тому +1

    thanks bro, do you have any idea how to bypass a captcha with playwright ?

  • @AnimationLook
    @AnimationLook Рік тому

    Hello, please tell me, is it possible to somehow get the har file of a browser page without browsermob proxy?

  • @MeMonarch
    @MeMonarch Рік тому

    i write the exact same code but it doesnt seems to be working.can you help me out,John?

  • @satwikawasthi2002
    @satwikawasthi2002 Рік тому

    i am facing huge gap of loading a secong page after a first page with this method and still nothing is printing in console, please help what to do

  • @fernandomdcn2920
    @fernandomdcn2920 2 роки тому

    Thanks, Koushig. I have a question: When I log into google I get the following message: "This browser or app may not be secure" error when trying to sign in with Google on desktop apps

  • @zwykyziomek2570
    @zwykyziomek2570 2 роки тому +1

    wow how do you run headed browser just like that? on my wsl it wants me to xvfb-run (whatever this is) for some reason

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому +1

      I think it works differently in WSL - if you did it on windows or Linux it would work just like this

    • @zwykyziomek2570
      @zwykyziomek2570 2 роки тому

      @@JohnWatsonRooney i must try that then, thx

  • @KendaBeatMaker
    @KendaBeatMaker 2 роки тому +1

    Thanks 😁

  • @danielrosas2240
    @danielrosas2240 3 роки тому +1

    AWESOME!!!! 🙌🙌🙌

  • @dontwanttojoingoogle1799
    @dontwanttojoingoogle1799 6 днів тому

    I'm trying to follow along with Spyder, but I'm getting this error. What's wrong?
    Error: It looks like you are using Playwright Sync API inside the asyncio loop.
    Please use the Async API instead.

  • @martpagente7587
    @martpagente7587 3 роки тому +1

    Well done John, thank you so much . How fast is it Vs selenium?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому +1

      it's still got to open and run a browser but so far I think its faster

  • @binhole
    @binhole 6 місяців тому

    Bro how keeping browser open to collect datas dynamics?

  • @onapmek8763
    @onapmek8763 2 роки тому +1

    Using jupyter notebook, I get: "It looks like you are using Playwright Sync API inside the asyncio loop.
    Please use the Async API instead."

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому

      Hey so I don’t think you can use this with jupyter due to the way the internal loop is used, you’d have to write your own script instead in a py file

    • @onapmek8763
      @onapmek8763 2 роки тому

      @@JohnWatsonRooney Thanks and thanks for your video! I've read the same thing online, I'll try and see if it works in the a py file

  • @jobinnelson
    @jobinnelson 3 роки тому +1

    which theme are you running on vs code?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому +1

      It’s one of the GitHub official themes, i quite like it

  • @engineerbaaniya4846
    @engineerbaaniya4846 2 роки тому +1

    Amazing content Thanks sir

  • @tolulopeayemobola1446
    @tolulopeayemobola1446 3 роки тому +1

    Nice video. Is there a java equivalent for this? I also would like to have a word or two with you

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому

      Playwright works with Java - playwright.dev/java/
      You can email me if you like my email is on my main YT page.

    • @tolulopeayemobola1446
      @tolulopeayemobola1446 3 роки тому

      @@JohnWatsonRooney Thank you John. I emailed but I am yet to get a response from you. Just thought to reach out and let you know

  • @statsnow3354
    @statsnow3354 3 роки тому

    Hi, John can you make a video about asynchronous playwright to scrape multiple URLs?

  • @melih.a
    @melih.a 3 роки тому

    I'm wondering how we could scrape multiple pages, I've watched the crawl and follow links with scrapy video but I don't know if FormRequest is the way to go instead of playwright.

  • @demircan9464
    @demircan9464 2 роки тому +1

    total_orders = soup.find('h2', {'class': 'pull-right'}).text
    AttributeError: 'NoneType' object has no attribute 'text'
    what's the reason of this ?

    • @_manasikara
      @_manasikara Рік тому

      @@abel4776 same here, but after removing the 'text' I got as a result: "total orders = None". The code is exactly the same as show in the video.

  • @saadachab8425
    @saadachab8425 3 роки тому +1

    Hello John, please can playwright scrap Angular web pages?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому

      Yes it should easily be able too - angular is a JavaScript framework so playwright is a good option.

  • @vt2788
    @vt2788 3 роки тому +1

    Great! How do you decide whether to use playwright+bs or scrapy?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому +3

      Thanks! Depends what I am trying to achieve. If it’s something like this I wouldn’t bother with Scrapy. One of the videos coming up with be scrapy + playwright

    • @vt2788
      @vt2788 3 роки тому +1

      @@JohnWatsonRooney cool! Looking forward to that! I digged through your videos and got a bit confused with Itemloader. Should i use it if I just have to get just very static job info? I don't really need to process the data

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому +1

      @@vt2788 if your project is working for you without it then no don;t worry. it does make is easier when adding to databse etc as you can use it to clean your data out properly and stucture it

    • @vt2788
      @vt2788 3 роки тому +1

      @@JohnWatsonRooneyBrilliant! Okay, I see how I could customize my scraping that way. Thanks so much!

  • @rajkumarguptafx3907
    @rajkumarguptafx3907 2 роки тому

    Hii..Mr. John,
    I'm working on a playwright Python project where I want to print the response.json() of a particular response. Kindly make a video on the request-response in the playwright.

  • @jacoby8934
    @jacoby8934 2 роки тому

    Great video mate! really helpful!
    Question - any idea how page content can be displayed while using pytest with bs4? my tests passing successfully but i can't "scrape" data from websites so i can't see all the information in the inner_html. I'm using vscode both as IDE and terminal and besides passed tests in terminal, there is no other information. any ideas?

  • @luongvanhuy5365
    @luongvanhuy5365 2 роки тому

    Thanks for your great video. I have 2 problems can you help me about it:
    1. Use playwright to crawl website. But after click on button --> ajax call --> how i can reload data from ajax response.
    2. After use playwright to login, can we use scrapy to send new request and crawl data.

  • @gorilaz0n
    @gorilaz0n 2 роки тому

    Hi John. When I got the the line, sync_playwright, I got the error, saying that I was using the sync api inside the asyncio loop. Do you know how to resolve it?

    • @gorilaz0n
      @gorilaz0n 2 роки тому

      That and the AttributeError: “PlaywrightContextManager” object has no attribute “_playwright”

  • @ClarkWu-mx3zk
    @ClarkWu-mx3zk 2 роки тому

    Great video! BTW, can it log in website with recaptcha?

    • @ho0k17
      @ho0k17 10 місяців тому

      No, recaptcha is not supported by selenium

  • @MancePax
    @MancePax 2 роки тому +1

    Guys, please help me! Let's take a simple scenario, open a browser, go to google, search for 'word', press search, and the scripts ends. In selenium, after the search, the browser is still open and usable, i can browse through the search results. In playwright, the browser closes, even if i did not us browser.close(). How can I keep my browser open and analyze the search results of my google query?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому

      Hey! It’s being run in the context manager which automatically closes the browser when the code is finished. In the docs there is a bit about running it without the context manager this is what you want

    • @MancePax
      @MancePax 2 роки тому

      @@JohnWatsonRooney I have no idea what you mean :) but I will dig around!

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому

      @@MancePax no problem, look here: playwright.dev/python/docs/intro and go to the section "Interactive mode (REPL)". this code will work in your code editor too and you should be able to take it from there

  • @anonymoususer8849
    @anonymoususer8849 3 роки тому +1

    Which theme you use?

  • @__-tq3lm
    @__-tq3lm Рік тому

    thank you so much

  • @rajeevmenon1975
    @rajeevmenon1975 3 роки тому

    I am logging to a site with 2F authentication. First there is a captcha and then after keying captcha there is an OTP. How do we code to accept user input of captcha and OTP (selenium or playwright)?
    Help will.be appreciated

  • @scientificapproach6578
    @scientificapproach6578 2 роки тому

    Using code in video I get this error, how do I fix, thanks.
    page = browser.new_page()
    ^
    SyntaxError: invalid syntax

  • @TheTruepikvic
    @TheTruepikvic 8 місяців тому

    What about captcha?

  • @horus4862
    @horus4862 3 роки тому +1

    Nice!

  • @mdarifurrahmananik3973
    @mdarifurrahmananik3973 Рік тому

    you are the magician boss :)

  • @breal1460
    @breal1460 2 роки тому +1

    Unfortunately, this project is not working purposes... The sites I need to log in to say these browsers are outdated... :(

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому

      Really? That’s odd, try reinstalling playwright it should fetch the latest version of chrome

  • @karthikshaindia
    @karthikshaindia 3 роки тому +1

    Nice... How unique with helium? Seems bit lazy code ;) compared your tutorial. I'll try my end.

  • @kgztn
    @kgztn 2 роки тому

    I keep getting No module named 'playwright'

  • @leventbozkurt9796
    @leventbozkurt9796 3 роки тому +1

    John you are a great teacher. Thanks for your efforts. Can you please make a video for Amazon and Playwright. you know Black days are coming. Thanks

  • @zakyvids6566
    @zakyvids6566 3 роки тому +1

    Thanks for sharing this video I was wondering can you please make a short maybe an hour long python crash course
    Thanks a lot

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому

      I’ve thought about it but I wasn’t sure as there are lots of good videos out there already like that. You’ve given me some good idea though

  • @Islamallam99
    @Islamallam99 2 роки тому

    I have faced this error while applying the code, any suggestion?
    Looks like Playwright was just installed or updated. ║
    ║ Please run the following command to download new browsers

  • @oleksandrbondarenko9632
    @oleksandrbondarenko9632 2 роки тому

    I ger error RuntimeError: Event loop is closed
    sys:1: RuntimeWarning: coroutine 'Page.goto' was never awaited
    Can you help me?

  • @maxpenfold8699
    @maxpenfold8699 3 роки тому +1

    Nice Video

  • @ollie_the_wandererli7523
    @ollie_the_wandererli7523 2 роки тому +1

    wow! cooooool

  • @pratikshagarkar986
    @pratikshagarkar986 3 роки тому +1

    This video is also awesome. Thanks for sharing your knowledge with us. But I got the following error. Can you please help me for solving the error?

    File "D:\Project\My_Py\untitled2.py", line 10, in
    with sync_playwright() as p:
    File "C:\Users\user\Anaconda3\lib\site-packages\playwright\sync_api\_context_manager.py", line 45, in __enter__
    raise Error(
    Error: It looks like you are using Playwright Sync API inside the asyncio loop.
    Please use the Async API instead.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому

      Hmm are you running this in a Jupiter notebook or similar?

  • @abdulwali4920
    @abdulwali4920 2 роки тому

    Traceback (most recent call last):
    File "c:\Users\Sellitrage\Desktop\playwright test1\main.py", line 1, in
    from playwright.sync_api import sync_playwright
    File "C:\Users\Sellitrage\AppData\Local\Programs\Python\Python310\lib\site-packages\playwright\sync_api\__init__.py", line 25, in
    facing this error while running dont know how to solve this....please guide me.