The Best Tools to Scrape Data in 2024

Поділитися
Вставка
  • Опубліковано 25 лис 2024

КОМЕНТАРІ • 39

  • @itzcallmepro4963
    @itzcallmepro4963 7 місяців тому +24

    i have learnt so much in the past 2 years from you , keep it up ,the best channel in terms of scraping

    • @JohnWatsonRooney
      @JohnWatsonRooney  7 місяців тому +4

      Thanks i appreciate it!

    • @christinel8347
      @christinel8347 7 місяців тому

      Thank you for your videos, I learned a lot for my own projects from you, and I have started this year only.
      It is very easy to understand from your experiences, and it helps me to create my own programming code for my personal webscraping projects. You also have skills to teach, so congratulations for your videos and works.

  • @lucasmoratoaraujo8433
    @lucasmoratoaraujo8433 7 місяців тому +2

    Best channel on web scraping I have found so far! Hope things are going well for you and, if not, that they get back on track soon! Your content is fire! ❤

  • @ALXStrikers
    @ALXStrikers 7 місяців тому +5

    Great videos. I love how you make scraping seem so easy. I'm going to learn Scrapy. Thank you.

  • @CodePhiles
    @CodePhiles 7 місяців тому +2

    Thank you John for all your experience sharing, hope this comes back to you in a form of a good deeds and happiness

  • @heroe1486
    @heroe1486 7 місяців тому +4

    Thanks for the video. May you add relevant links to the description ? Especially at 6:34, you mentionned hrequests and ??
    Also isn't your concern about those "half backed" frameworks having their own ways of doing things and being harder to figure out what's done under the hood also applicable to paid solutions ? At least with the former in the worst case scenario you can still read the source code and even modify it, which is most likely not possible with paid and closed source solutions.

    • @JohnWatsonRooney
      @JohnWatsonRooney  7 місяців тому +1

      Yes of course, I will add the links. Regarding the paid - yes it’s similar but with one big difference, you’re paying for a service and if it’s not working you get them to fix it.. or you go somewhere else

  • @robertramirez2167
    @robertramirez2167 7 місяців тому +1

    John, I was thinking about your video recently. I had a question for you maybe you can follow up with an answer or even a video. In what scenarios are you thinking Scrapy is the better tool to use than the other tools you mentioned.

  • @EmilyAllan
    @EmilyAllan 6 місяців тому +1

    Love this. It is so helpful. Thank you.

  • @eziola
    @eziola 7 місяців тому

    Would you consider a video that demonstrates your "go-tos" that you describe at the end of the video in an end-to-end example?

  • @yoelcruz6254
    @yoelcruz6254 7 місяців тому

    Hey guys I started scrape last year and I've improved a lot since I discovered this channel but I'm struggling to know where I can sell my skills. Can you give me a hand telling me what kind of jobs I can look for or what kind of companies are looking for scrape?

  • @rexbreunsbach1552
    @rexbreunsbach1552 7 місяців тому +1

    Have you posted any video's that feature using the curl_cffi library?

  • @nageshnaik5343
    @nageshnaik5343 7 місяців тому +2

    Yup it's scrapy, I am using it

  • @rexsybimatrimawahyu3292
    @rexsybimatrimawahyu3292 7 місяців тому +2

    Hey mr Rooney can you help me out in a scraping? i have some issue with 403 forbidden error when scraping, it happened due to too many requests. i tried to implement time.sleep 5 seconds but still doesnt work. what is the best solution for such case?

  • @macktheknife2037
    @macktheknife2037 7 місяців тому +1

    What about Puppeteer? I know its JS but does it compare in performance?

    • @JohnWatsonRooney
      @JohnWatsonRooney  7 місяців тому +1

      if i was using JS i would have included puppeteer - the python port's github recommends using playwright instead

  • @saadatmakki1412
    @saadatmakki1412 7 місяців тому

    Are there any videos going over curl cffi

  • @Septumsempra8818
    @Septumsempra8818 7 місяців тому

    Modularity is important

  • @Tudorabil
    @Tudorabil 7 місяців тому

    can you make a video about scrapoxy?

  • @abdulrahmanharoon3165
    @abdulrahmanharoon3165 7 місяців тому

    Hi, I'm trying to scrape a website that use js, Seleniumbase works fine but slow, please can you recommend something like requests because requests didn't work in my case?

    • @Saeed-ko9wp
      @Saeed-ko9wp 7 місяців тому

      Playwright is a good choice

  • @DeveloperMan_
    @DeveloperMan_ 7 місяців тому

    why have never talked about scraping using js,i started scraping with js and its the smoothest sail so far

    • @heroe1486
      @heroe1486 7 місяців тому +1

      Probably because tons of people would want to stay as far as possible from JS/Node if not doing frontend stuff.
      Python is just way more convenient, elegant, has more quality libraries (not just for scrapping but for the rest of your backend toolchain) , a helpful standard library and so on. And all of that with little inconvenience besides speed, but the relative slowness of the language would probably never be the bottleneck in a web related environment where everything from requests' response time to the database itself is magnitude slower anyway, and critical libraries logic are most likely implemented in C anyway.
      The only argument Node has is that you can use the same language for both backend and frontend.

  • @Steliosgiannatos
    @Steliosgiannatos 7 місяців тому +1

    Make a neovim config video!

    • @heroe1486
      @heroe1486 7 місяців тому

      Just une kickstart and expend from there, it already has LSP/formating/snippets/package manager configured. Or use LazyVim if you don't mind deactivating what you don't like rather than adding what you like. For python you just need pyright and ruff for LSP and linting/formating

  • @vuufke4327
    @vuufke4327 5 місяців тому

    5:12 selector what??

  • @bakasenpaidesu
    @bakasenpaidesu 7 місяців тому +2

    ....

  • @ricardodelacrvz1400
    @ricardodelacrvz1400 7 місяців тому +1

    what about go? I cant seem to use bright data proxies with golang to scrape amazon result pages. Ill be trying selenium to see if it works.

    • @JohnWatsonRooney
      @JohnWatsonRooney  7 місяців тому +1

      I love Go but its not as good as python for scraping in my opinion - but should work perfectly well. I've scrape with Go and proxies no issue before. Selenium def works for amazon but i tihnk you can get away with just using curl_cffi to impersonate

    • @ricardodelacrvz1400
      @ricardodelacrvz1400 7 місяців тому

      I ended up migrating to python playwright thanks to one of your video and it worked perfectly. first time using python so it will be great to my learning especially with django afterwards. keep posting, youre the best in the scraping field on youtube fo sure!