HELIUM for simple DYNAMIC web scraping with Python

Поділитися
Вставка
  • Опубліковано 19 жов 2024

КОМЕНТАРІ • 75

  • @farmersneed
    @farmersneed 4 роки тому +6

    I used Selenium to automate testcases in an old school commerce website. Navigating s was like navigating the circles of hell. This looks very promising and I will use this for present and future prospects. Very good presentation. Thank you.

  • @0xbitbybit
    @0xbitbybit 4 роки тому +5

    Awesome, great video, love how easy it is! Just tried scraping with Javascript, took me all day yesterday to achieve what I just did with Python + Helium in about 5 minutes lol. You've gained a subscriber here :) Thanks

  • @EmanueleCannizzaro
    @EmanueleCannizzaro Рік тому

    John
    Thank you for the introduction to this tool.
    I tried it and I spotted that it has a strict dependency to Selenium 3.
    I played with the code few minutes and I got it working with selenium 4 and the latest version of Chrome.
    I have also submitted a pull request to Michael.
    This is the power of open source and knowledge sharing. 😀

    • @robinwang6399
      @robinwang6399 11 місяців тому

      Hello, was this pull request successful? I just installed helium and pip is telling me that my selenium version is too new. What are some of the changes you made?

  • @xilllllix
    @xilllllix 2 роки тому +1

    i've ignored helium for a long time bec i don't see the need for it but this vid convinced me to give it a few tries

  • @Klausi-uq4xq
    @Klausi-uq4xq 4 роки тому +2

    I Just started with BS4...but Helium looks nicer. Thank you for the good tuts

  • @coyoteden8111
    @coyoteden8111 Рік тому +1

    Wow. So simple. Is this your preferred library for dynamic scraping?

    • @JohnWatsonRooney
      @JohnWatsonRooney  Рік тому

      I’ve moved on from this, for browser automation I use playwright and other scraping httpx & selectolax

  • @dzeykop
    @dzeykop 3 роки тому +1

    Thank you John, great video again. Easy to understand.

  • @CompThatHouse
    @CompThatHouse Рік тому

    John, i have been watching many of your videos as you have a great way of explaining Web Scraping. I upgraded from requests and selenium and and started using Helium about 6 months ago. I also found the X PATH explanations very helpful. Thank you so much.
    I also like how easy it is to understand as you speak very clear English which is helpful to me.
    Today, i upgraded my Helium and Chrome and it all came crashing down. wont go onto the web. Wondering if you or anybody else has this problem, and what to do about it.
    Tarlton John

  • @JohnMusicbr
    @JohnMusicbr 3 роки тому +1

    Excellent again. Saved my day. Thank you.

  • @tejasgalande4557
    @tejasgalande4557 10 місяців тому

    Thanks buddy. I got the new corner to explore.

  • @martpagente7587
    @martpagente7587 4 роки тому +1

    I hope you can also talk about freelancing web scraping services, talk like how to price your service, scope etc.

  • @nomoreospf
    @nomoreospf 3 роки тому +1

    Very good tip, thanks I'll try it!

  • @Neil4Speed
    @Neil4Speed 4 роки тому +2

    Really like this approach, this ssems super valuable. Any ideas on how to get "everything" ie... Page Down until it won't let you not page down any further?

  • @mattmovesmountains1443
    @mattmovesmountains1443 3 роки тому +1

    Wow - didn't know about helium until this video, but I may start using this moving forward. Have you noticed any shortcomings since you've been working with helium? Any tradeoffs to the ease of use? Seems like even if there were any, they could be mitigated by adding standard selenium commands where needed.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому +1

      I haven't noticed anything no - I am not a heavy Selenium user though so perhaps someone like that would do, but for what i use it for (like my videos) its great. I have used it for placing test orders on a webstore and it worked wonders

    • @mattmovesmountains1443
      @mattmovesmountains1443 3 роки тому +1

      @@JohnWatsonRooney amazing; I'm an instant fan haha

  • @jonathanfriz4410
    @jonathanfriz4410 4 роки тому +1

    Thanks man, way fast and easy that only with selenium. Thanks for share. ¿Part 2?

  • @gregkan3964
    @gregkan3964 3 роки тому +1

    very nice video i learned something usefull ! however since this depends on selenium , if i had problems with detecting elements in selenium will these cascade to helium too ?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому

      Yes probably. The benefit of helium is I think it makes finding the elements simpler

  • @diegomairena
    @diegomairena 3 роки тому +2

    Hey John, great video, thanks a lot. Can you mix BeautifulSoup and Helium? So use bs4 to output a series of links and for Helium to go into each link, find a button, press it and output data?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому +2

      Yes that would work fine, just import both and create your list of links and loop through using helium commands to open each one up

    • @diegomairena
      @diegomairena 3 роки тому

      @@JohnWatsonRooney Thanks a lot mate

  • @dishydez
    @dishydez 3 роки тому +1

    Thanks a lot! This was great - got it working!

  • @Robls501510
    @Robls501510 3 роки тому

    Subscribed to your channel. This channel offers priceless content. Thank you sir for posting all these awesome videos.

  • @ns5575-j2w
    @ns5575-j2w 2 роки тому +1

    Excellent tutorial! Do i need to scroll to the bottom a dynamic website to get all results? If yes, then do i loop press(PAGE_DOWN)? Thanks!

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому +1

      Yes! you can do the page down command as many times as you like - I've actually got a new series on playwright which you might find interesting on my channel, its browser automation like this

  • @fatimaelmansouri9338
    @fatimaelmansouri9338 3 роки тому +3

    Awesome! Could you please do a video on how to webscrape using APIs ? (when neither BeautifulSoup nor Selenium can detect the html)

    • @daddyofalltrades
      @daddyofalltrades 3 роки тому

      Do you have an example where selenium can't detect the html ? I'm sure selenium works for every website !

  • @rock11ification
    @rock11ification 4 роки тому +1

    Great video I am looking for this.

  • @sarfarajansari9181
    @sarfarajansari9181 2 роки тому

    Thank you John , helium looks easier. But is there a way for running multiple browser instances at the same time? While in selenium,I was using threading, and then I used to create different drivers and each doing different tasks to speed it up. How can I do such things when we are not creating a browser object??

  • @vlada_janjanin
    @vlada_janjanin 3 роки тому +1

    would you choose helium or scrapy? if they are even comparable? i'm new to this, and i just tried out helium for something i need, and it worked great, but finding elements i needed took kind of long. i heard scrapy is optimized (multithreaded etc), and i'm not sure what to do.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому

      They are very different, i did a video on it here - ua-cam.com/video/J82SxHP5SWY/v-deo.html Basically it depends on what sort of project you are working on a what you want out of it! Helium (Selenium) is best for a last resort and generally isnt used as a web scraper. Scrapy is a full framework with all the toys!

    • @vlada_janjanin
      @vlada_janjanin 3 роки тому

      @@JohnWatsonRooney i'm basically scraping only two dynamic websites where i first need to click a few buttons, do some scrolling and then take out the data. it felt like learning scrapy only for that was unnecessary and i've heard that splash isn't that reliable (and i can't do anything without splash)

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому +1

      @@vlada_janjanin sure, sounds like helium would be a great choice for you project! Scrapy is definity for bigger scraper projects - although it might be worth learning to use it with Splash too if you have time

    • @vlada_janjanin
      @vlada_janjanin 3 роки тому

      @@JohnWatsonRooney thanks man! :)

  • @dnetvaggos4443
    @dnetvaggos4443 4 роки тому +1

    Great job!

  • @amitmalur3620
    @amitmalur3620 4 роки тому +1

    Hi,
    Have you noticed any difference in the speed of extracting data between Helium vs Selenium?
    Thanks
    Amiy

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 роки тому

      Hi Amit. Helium is built on Selenium so they should be the same speed. Helium saves me time in writing out a script due to its ease of use. Unfortunately in large scale scraping they can both be quite slow

    • @amitmalur3620
      @amitmalur3620 4 роки тому

      @@JohnWatsonRooney Thanks John,
      I am exploring splash Lua along with scrapy, as selenium is causing speed issues. Anyway you could help me with splash Lua based implementation?
      Website has login, and captcha behind the login.

  • @Mfbzai
    @Mfbzai 2 роки тому +1

    Is helium support dynamic content/JavaScript rendering?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому

      Yes it does. it’s a wrapper around selenium so can do mostly whatever selenium can. I’d also recommend looking at playwright too as another newer alternative

  • @sriramkasu5286
    @sriramkasu5286 3 роки тому +1

    Nice video sir

  • @p4r4d0x41x
    @p4r4d0x41x 2 роки тому

    Thank you so much!!

  • @azizaalkuatova9527
    @azizaalkuatova9527 2 роки тому +1

    Hello! Could u please help with the issue "helium can't open the latest version of chrome 89.0.4389.____ ." ?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому

      hmm maybe it didnt install properly. I'd suggest using Playwright instead now as i have show in one of my latest videos: ua-cam.com/video/H2-5ecFwHHQ/v-deo.html

  • @FirstnameLastname-ys1up
    @FirstnameLastname-ys1up 4 роки тому +1

    Perfect.

  • @-Giuseppe
    @-Giuseppe 3 роки тому +1

    Hey John, how do I manage the selection of an item in a dropdown menu?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 роки тому +1

      hey! that depends on how the drop down works, you can automat the click, or find the element of the drop down selection in the source/dom and use the xpath or css selector to find and click it

  • @anirudhsharma2697
    @anirudhsharma2697 4 роки тому

    Hey I was trying to use the press function to save some pdf's automatically using the function press(CONTROL + 's') but it doesn't do anything although it doesn't throw any error but nothing happens

  • @0xbitbybit
    @0xbitbybit 4 роки тому +1

    Hey, me again. I've written a script, no errors, but doesn't work, but when I type it out line by line EXACTLY the same in the interpreter, it works perfectly. Any ideas? I thought maybe the content I'm trying to scrape hadn't loaded yet before the script was trying to access, so added a sleep(10) but nope.

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 роки тому +1

      That’s odd, does it give you an error message? Deciphering that may help - or email me your code if you like I can have a look

    • @0xbitbybit
      @0xbitbybit 4 роки тому +1

      @@JohnWatsonRooney Yeah it's an index error, as in it's an empty list and I'm trying to pull data from it, its bizarre, I type the exact same thing out word for word in the interpreter, step through it, and it works fine. Where can I find your email? UPDATE: Nevermind, found it :)

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 роки тому +1

      @@0xbitbybit I've replied!

  • @gitgosc7075
    @gitgosc7075 2 роки тому

    thank you

  • @roykimson391
    @roykimson391 2 роки тому +1

    Is helium library still working? It seems that it does need chromeDriver

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому

      Honestly I haven’t used it for a while, I use Playwright almost exclusively now

  • @thatolebethe3238
    @thatolebethe3238 3 роки тому

    How do I grab the entire page after pageing down to parse with bs4

  • @tejaswiniparepalli2689
    @tejaswiniparepalli2689 4 роки тому

    how to run it automatically after a certain period of time like you mentioned it in previous video

  • @TheWuzyy
    @TheWuzyy 8 місяців тому

    How can i start firefox with my current profile?

  • @barneyharper8749
    @barneyharper8749 Рік тому

    Nice

  • @yatishkarkera1934
    @yatishkarkera1934 4 роки тому

    Hey, nice video. How do i get the href value using helium

    • @manafbargash9495
      @manafbargash9495 3 роки тому

      Hi! Did you get an answer for your question? I have the same issue.

  • @hello.devzzz
    @hello.devzzz Рік тому

    Yo there documentation was like helium is a lighter element than selenium damnnnn the disrespect

  • @somethingwithbryan
    @somethingwithbryan 2 роки тому

    why cant i ctrl + f with helium?

  • @sivaarwin8816
    @sivaarwin8816 4 роки тому

    Bro can u pls do a video on get and post response

  • @hello.devzzz
    @hello.devzzz Рік тому +1

    Take not tho u still have to know enough selenium