Stop reinventing the web scraping wheel

Поділитися
Вставка
  • Опубліковано 29 січ 2025

КОМЕНТАРІ • 11

  • @zvnman
    @zvnman 22 години тому

    Thanks John, I'm looking forward to new publications.

  • @graczew
    @graczew День тому

    Hehe 😉 that's what i like. Have you ever use scrapyd for schedule spiders?

  • @MarcB-n4k
    @MarcB-n4k 3 години тому +1

    Why did you use 2 spiders with the csv rather than just create a third parse in the one spider to parse_product?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 години тому

      Because I’m expanding this project in the next video to include redis as a queue and multiple spiders to pull from that queue

  • @ghulam-e-mustafapatel4894
    @ghulam-e-mustafapatel4894 3 години тому

    plz make a course of basic scrapy to advanced

  • @RBC-KING0093
    @RBC-KING0093 День тому +1

    Can you scrape cloudflar

  • @Corsa.
    @Corsa. День тому

    9:14 Is it a good idea to use a .env file for importing proxies instead of zsh?

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 годин тому +1

      Either is fine, just keep them out of any git repo and not as text in your code

  • @tumejoryopodcast
    @tumejoryopodcast 22 години тому

    It’d be awesome if you made a video on how to scrape newspapers. They’re not using JSONs to fill the content, nor the schema… So it’s very rudimentary, unless there’s a technique better than xpaths which i am not aware of… 😅

    • @mintydevdaz
      @mintydevdaz 18 годин тому

      if you mean scraping the article text this would enter a legal grey area especially if you're trying to bypass a paywall. news websites are very different from each other. if they have a paywall, some will only render the page fully if you're logged in meaning that it requires a separate server-side request to hydrate the html. some have the article hidden where it won't show in dev tools but it will when you're parsing locally. this is case-specific and I doubt most youtubers will show this on their channels for fear of being sued.

  • @Aidas_Li
    @Aidas_Li День тому +1

    Another Nice. 😊