Web Scraping in Node.js using Cheerio, Puppeteer, and Fetch

Поділитися
Вставка
  • Опубліковано 7 лис 2024

КОМЕНТАРІ • 70

  • @andreacappuccio58
    @andreacappuccio58 4 роки тому +13

    Still at minute 1, but wanted to chime in and remember you that you're still my favourite teacher on the tube and you'de doing great!

    • @leighhalliday
      @leighhalliday  4 роки тому

      Thanks Andrea :) I hope you enjoy this video! Thanks for your support!

  • @kizhissery
    @kizhissery 3 роки тому

    13.22 rendered by react writteny at left side as discription!! awesome vid 😀

  • @krishnapavanvaidyula2246
    @krishnapavanvaidyula2246 4 роки тому +3

    Leigh every tutorial of yours gives me a new learning, Thank you for posting great content.

    • @leighhalliday
      @leighhalliday  4 роки тому

      Thank you Krishna! Glad you're enjoying these videos!

  • @ninjarogue
    @ninjarogue 3 роки тому +1

    Awesome tutorial, and a special thanks for sharing the copy selector tip in chrome devtools!

  • @christopher9638
    @christopher9638 3 роки тому

    Wow! The best explanation of web scraping of all I've seen on Yt or Udemy. Thank you :)

  • @tririfandani1876
    @tririfandani1876 3 роки тому

    I always learn new things by watching your videos. Thanks

  • @emee__
    @emee__ 2 роки тому +1

    Hello, I am new to web scraping I am using axios and cheerio to scrape. After I pass the html response from axios to (cheerio load) nothing seems to happen when I console log. I need help

  • @Yrplmr
    @Yrplmr 3 роки тому

    Great examples, clear explanations, thanks ! Greetings from France

  • @tyu3456
    @tyu3456 3 роки тому +1

    Great video, Leigh! For the last one, why not change the GraphQL query to only return the description field? Is that possible? That might make it even faster 😎

  • @Tibeb
    @Tibeb 4 роки тому +1

    great explanation thank you !! How can I then serve the data I got from the web scraping and display it in a react app? tried to create a function that does the scraping and then export it, and import it in my react app but I failed. Would be great if you can tell me how to do it. Thanks Again.

    • @leighhalliday
      @leighhalliday  4 роки тому

      Hey Tibeb! Scraped data needs to be stored in a database and then served through a backend... you could store it in Postgres, and then use api routes in Next.js with Prisma to load the data and expose it in the frontend, so that React can then display it.

    • @Tibeb
      @Tibeb 4 роки тому

      ​@@leighhalliday Thank you for the reply!! I have been trying to learn the technologies you mentioned above :) , it would be great if you made a video about it in the near future. Thank you again.

  • @ukaszzbrozek6470
    @ukaszzbrozek6470 3 роки тому

    Great video!
    It would also be interesting to see a use case when we have to log in on the page or navigate on the page to get data.

  • @4541047
    @4541047 4 роки тому +1

    Hi,
    Why you are doing for await (symbol of symbols), does the await is redundant here?

    • @leighhalliday
      @leighhalliday  4 роки тому +1

      I wanted to iterate each symbol in a synchronous way... the reason is that if I had 1000 symbols, I didn't want to bombard their website with 1000 requests concurrently... so the await here basically says, finish up each symbol before moving on to the next.

    • @4541047
      @4541047 4 роки тому

      @@leighhalliday Thanks for your reply!

  • @constantinecodes6388
    @constantinecodes6388 4 роки тому

    Great job Leigh !!. I quite enjoy your videos.

    • @leighhalliday
      @leighhalliday  4 роки тому

      Thanks Constantine! Glad you enjoyed it!

  • @mrkhoros
    @mrkhoros 4 роки тому

    Thank you for covering this. I was looking forward to this

  • @Grving
    @Grving 4 роки тому

    Wow this is awesome thank you! .. also loved that you used stocks for the example

    • @leighhalliday
      @leighhalliday  4 роки тому

      Thanks Irving! :D I was actually needing that data for an ebook I am working on (about ruby on rails + postgres searching), so I thought might as well turn this into a video and do it in node!

    • @Grving
      @Grving 4 роки тому

      @@leighhalliday this is perfect i have been wanting to build an app that keeps track of stocks .. also I'd like to use this idea for my ecommerce site so I can keep track of usps and ups shipping right on my website

  • @rezahosseini7851
    @rezahosseini7851 4 роки тому

    Thank you for this great video! Just wondering if the last approach is possible for all spa apps? I mean can you find all the dynamic data inside the app in the Network tap?

    • @leighhalliday
      @leighhalliday  4 роки тому

      Hey Reza! That should be possible! Most SPA apps grab their data from a GraphQL or REST api... use the same APIs as their frontend app does!

  • @heykike
    @heykike 3 роки тому

    great tips, very well explained - great video! Thanks!
    Do you have a paid course or channel ?

    • @leighhalliday
      @leighhalliday  3 роки тому

      Thank you Enrique! I do have a course: next.leighhalliday.com check it out!

  • @aliedfurdich
    @aliedfurdich 4 роки тому

    Great tutorial, Leigh! Thank you thank you

    • @leighhalliday
      @leighhalliday  4 роки тому

      Thank you Luke! Glad you enjoyed it :)

  • @ridl27
    @ridl27 4 роки тому +1

    wow, another subject that I am really interested in :D ty.

    • @leighhalliday
      @leighhalliday  4 роки тому

      Thanks Alex! Hope you enjoy it!

    • @ridl27
      @ridl27 4 роки тому

      @@leighhalliday ​ I hope you will continue to make tutorials on it. Maybe with some auth stuff, passing google recaptcha and others a bit advanced scraping things :)

  • @incarnateTheGreat
    @incarnateTheGreat 4 роки тому +1

    Thanks for this, Leigh.
    Hot tip: have a look at the thumbnail for this YT clip. :P

    • @leighhalliday
      @leighhalliday  4 роки тому

      Hehe... not sure if that is a hot tip, but I'll take it!

    • @incarnateTheGreat
      @incarnateTheGreat 4 роки тому

      All good! I used to scrape data by just tapping into the API and building out my own JSON -- something similar to your SPA. I probably wanna try Cheerio now. Thanks!

  • @igor_cojocaru
    @igor_cojocaru 4 роки тому

    Thank you man! As always great video.
    Recently I was trying to scrape URL from an embedded video. First, it waits for the user to click the play button and after a few seconds of an advertisement video launches the call to the server asking for a video itself. I tried to use puppeteer but failed. Do you have any idea how it could be done?
    And just to mention, I used xpath to access the elements. In devtools there is an option to copy it.
    Cheers

    • @leighhalliday
      @leighhalliday  4 роки тому

      Hey Igor! I'd love to help but I don't think I really have a good answer :)

    • @DarkSideChess
      @DarkSideChess 4 роки тому

      does the video tag have an src attribute? You could wait for the ad to finish and then grab what's in the src tag after...

  • @proribrajokproribrajok7789
    @proribrajokproribrajok7789 3 роки тому

    Is puppeteer scraping is like python scrapy framework? which one better performe?

  • @marklim454
    @marklim454 3 роки тому

    How do you bypass CORS?
    has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

  • @yuvrajagarkar8942
    @yuvrajagarkar8942 3 роки тому

    what's the diff between Axios and puppeteer ?

  • @MaxOnMaxxer90
    @MaxOnMaxxer90 4 роки тому

    If you would actually run the cheerio app in the browser or on a function, I get a CORB error. Do you have a solution to this error, because I'm breaking my head on it since a few days :) Great video, thumbs up!

    • @leighhalliday
      @leighhalliday  4 роки тому

      Hey Max! Sorry, I don't... I think if it's a CORS issue, you'll have to run this code in the backend somewhere rather than in browser.

  • @deliriumcode
    @deliriumcode 3 роки тому

    Nice, thanks! Going to create a project which will scrape Lorem Ipsum content. :D

  • @emsdy6741
    @emsdy6741 3 роки тому

    Thanks for the video. It really helps a lot. I hope you can make a video on how to scrape posts in a FB page or a group where I am not an admin.

  • @kizhissery
    @kizhissery 3 роки тому

    6/07/2021
    6/07/2021
    123
    123
    123
    How can i only select with number and exclude the multiple date value ,in cheerios?

  • @DarkSideChess
    @DarkSideChess 4 роки тому

    Love your videos. I've been looking for a headless approach to make my scraping faster. I've been using Python + Selenium / Chromedriver. You can make it click buttons and fill out forms with the sendKeys function. Any element you want just rightclick in devtools and "get element by xpath"... You can also look for elements by innerText. If I encounter a site that has fluctuating numbers of divs and table rows or columns, I sometimes pinpoint the where the title / heading / data label of that element is and then the actual data to grab is usually 1 or n elements over.

    • @leighhalliday
      @leighhalliday  4 роки тому

      Thank you ghjikhl! That sounds like a great technique :D

  • @bilal-khan
    @bilal-khan 3 роки тому

    Thank you .. really awesome video. It would be great if you could make an advance next js with typescript video . (Puppy eyes) :)

    • @leighhalliday
      @leighhalliday  3 роки тому +1

      Hey Bilal! Thank you :) You're in luck... that's basically my Next.js course! next.leighhalliday.com if you'd like to check it out!

    • @bilal-khan
      @bilal-khan 3 роки тому

      @@leighhalliday Yay!! Amazing

  • @tech4028
    @tech4028 4 роки тому

    Hey Leigh,
    Can you make a video on how to combine next.js with puppeteer?

    • @leighhalliday
      @leighhalliday  4 роки тому

      Hey tech! Are you talking about for the purposes of testing? Maybe! I'll keep it in mind :)

    • @tech4028
      @tech4028 4 роки тому

      @@leighhalliday No, to display things i web scraped from other websites!

  • @dileepdilraj5254
    @dileepdilraj5254 3 роки тому

    Can we are able to scrape data with Puppeteer by login in page like Instagram account.I am making an app that scrapes all the data in Instagram account in the user profile into my app. Can I do that is it possible ?

    • @leighhalliday
      @leighhalliday  3 роки тому

      You can try! They may try to stop you though, so I'm not sure if it is possible... Instagram doesn't really want you scraping their content :D

  • @chriswwweb
    @chriswwweb 4 роки тому

    Oh great video, have been using nodejs and cheerio too ( github.com/chrisweb/universal-nodejs-scraper ) to build a scraper that I updated this weekend and now your video pops up in my Subscriptions feed :), I love your use of Symbols, I like seeing more practical examples of symbols in js code

    • @leighhalliday
      @leighhalliday  4 роки тому +1

      Hey Chris! This isn't an actual JS Symbol, it's just a variable called symbol which represents a stock ticket symbol as a string.

    • @chriswwweb
      @chriswwweb 4 роки тому

      Oh ok, well at first you woke up my interest for symbols, because as I said all I know is that they exist but I need more practical examples to fully understand what they are good for and now that made me aware that it is actually not a symbol it gave me second reason to learn more about symbols so that next time I immediatly see the difference between a variable called symbol and an actual symbol 😉

  • @pp.uta7
    @pp.uta7 3 роки тому

    how to prevent cloudflare ?

  • @ГенаПетров-н5ы
    @ГенаПетров-н5ы 4 роки тому +2

    Open and close puppeteer on every request is a bad idea

  • @dugby6466
    @dugby6466 3 роки тому

    12:01

  • @stevejobs5919
    @stevejobs5919 4 роки тому

    is it possible to scrape all the heroes and their skills from m.mobilelegends.com/en /??

    • @leighhalliday
      @leighhalliday  4 роки тому +1

      Probably with puppeteer, or the graphql/rest api approach, not with Cheerio.

    • @stevejobs5919
      @stevejobs5919 4 роки тому

      @@leighhalliday i hope that works or ill be copying the 100 hereoes details 1 by 1 hahaha