It’d be awesome if you made a video on how to scrape newspapers. They’re not using JSONs to fill the content, nor the schema… So it’s very rudimentary, unless there’s a technique better than xpaths which i am not aware of… 😅
if you mean scraping the article text this would enter a legal grey area especially if you're trying to bypass a paywall. news websites are very different from each other. if they have a paywall, some will only render the page fully if you're logged in meaning that it requires a separate server-side request to hydrate the html. some have the article hidden where it won't show in dev tools but it will when you're parsing locally. this is case-specific and I doubt most youtubers will show this on their channels for fear of being sued.
Thanks John, I'm looking forward to new publications.
Hehe 😉 that's what i like. Have you ever use scrapyd for schedule spiders?
Why did you use 2 spiders with the csv rather than just create a third parse in the one spider to parse_product?
Because I’m expanding this project in the next video to include redis as a queue and multiple spiders to pull from that queue
plz make a course of basic scrapy to advanced
Can you scrape cloudflar
9:14 Is it a good idea to use a .env file for importing proxies instead of zsh?
Either is fine, just keep them out of any git repo and not as text in your code
It’d be awesome if you made a video on how to scrape newspapers. They’re not using JSONs to fill the content, nor the schema… So it’s very rudimentary, unless there’s a technique better than xpaths which i am not aware of… 😅
if you mean scraping the article text this would enter a legal grey area especially if you're trying to bypass a paywall. news websites are very different from each other. if they have a paywall, some will only render the page fully if you're logged in meaning that it requires a separate server-side request to hydrate the html. some have the article hidden where it won't show in dev tools but it will when you're parsing locally. this is case-specific and I doubt most youtubers will show this on their channels for fear of being sued.
Another Nice. 😊