Oh my! This channel deserves more subscribers. I scrape a lot of tables in my job but never knew i could use pandas (had never it heard of it) until i saw one of your videos on Pandas. I look forward to more videos on scrapy now that i have the motivation to move away from BS4 and try scrapy.
Something you note at the 9:23 mark is that you can close the space with a dot (or period). To add a little bit more to that. Regardless of the number of spaces, you only need one period. So close the gap completely and put one dot. I struggled with this for a while, as I had a custom class with 5 spaces (no idea why the coder would do that) in the name and it just never occurred to me that I could just use one dot. None of the documentation in scrapy indicated that. I spent quite a while trying figure that out.
It's good to learn about css if you're going to use css selectors. The space is closed with a dot because in CSS, when you want to select an element based on a shared class, you write it like "class1.class2". If you were to do "class1 class2" it would mean yout want to select an element that has class2 that is inside of an element that has the class1. To make it clear, we could think of real html elements: "p a" would select any link(a) inside a paragraph(p).
a useful start. Followed along and got this working myself (which doesn't often happen when following python tutorials on YT). |Looking forward to finding out now how to get the stuff from page two and then hopefully finding out how to follow links
Great job again John! I've never used Scrapy but now I feel it may be something really useful and powerful. It'd be great if you could do a video comparing the different scraping approaches you've introduced and their scenarios. Thx.
@@JohnWatsonRooney, I hope you can make video also Scrapy-splash approach for scraping dynamic websites by doing some project or sample under this series, thanks!
Great intro to Scrapy! Everywhere I've looked people say Scrapy is hard to learn, but frankly this seems more straightforward to me than BS. Maybe that's not the case when things get more complex, but that's just my two cents - maybe you're just better at explaining it? I'm trying to scrape products and prices from Newegg and running into a road bump - I can get the item name and such, but the price is nested in a tag inside a list and finally a div. Any tips on selecting that?
Thank you!! Almost there but the spider doesnt return the right output. What could be wrong? I do see the 200 scraped items via the shell. Am on Windows.
Hey brother Thanx for the tutorials, can you make a tutorial on other files. eg:- middleware.py , items.py , settings.py And second thing how to use database in scrapy for reading & writing the data.
@@JohnWatsonRooney An implementation of all the scenarios we use in requests like proxies user agents etc in scrapy framework would be awesome!! Nice tutorial as always!
I want to scrapy the product features but it doesn't work properly, I want to get the 4 or 5 features but I get 1 or all features of the page instead, no idea how it's behaving I used this code *response.css("div.f-grid.prod-row ul.f-list.j-list li::text").get()* the code above will print one feature *response.css("div.f-grid.prod-row ul.f-list.j-list li::text").getall()* the code above will print all features of the page while I want to print 4 or 5 depends on the product
I only recently found your channel but all in all great content! I am however coming across problems with POST requests and selenium is sadly not an option for my project.
Thanks a lot for the video. I could scrape a website on my first try. I had a problem though. I get this error: raise ExpressionError( cssselect.xpath.ExpressionError: The pseudo-class :text is unknown ... When I changed 'a::text' into 'a::attr(href)' it worked. 'text' was also working in the shell but not in the py file. So, how can I get the texts in the file then?
I’m loving these webscraping tutorials. I did get an error though as soon as I tried to use the products variable, such as products.css(‘h3’) I get the error: AttributeError: ‘str’ object has no attribute ‘css’
Python scraping often involves the use of modules and packages. Once you have multiple python projects, if you don't use a virtual environment, you would have different projects using some of the same packages and modules. If you go to update a package for one project, you would break a different project relying on a previous version of the same package to work properly. A virtual environment isolates packages a modules associated with only one project, so that no matter what other projects use the same packages or modules, they don't interfere with each other. At least that's my understanding.
Oh my! This channel deserves more subscribers. I scrape a lot of tables in my job but never knew i could use pandas (had never it heard of it) until i saw one of your videos on Pandas. I look forward to more videos on scrapy now that i have the motivation to move away from BS4 and try scrapy.
Thanks for your kind words I’m glad it’s helped you!
Really lucid, well-judged in terms of content, and excellent videography. Timely too, given what I happen to be doing this week! Thankyou!
The best scrapy basic tutorial I`ve seen. Thanks a lot!!
Glad it was helpful!
Something you note at the 9:23 mark is that you can close the space with a dot (or period). To add a little bit more to that. Regardless of the number of spaces, you only need one period. So close the gap completely and put one dot. I struggled with this for a while, as I had a custom class with 5 spaces (no idea why the coder would do that) in the name and it just never occurred to me that I could just use one dot. None of the documentation in scrapy indicated that. I spent quite a while trying figure that out.
It's good to learn about css if you're going to use css selectors. The space is closed with a dot because in CSS, when you want to select an element based on a shared class, you write it like "class1.class2". If you were to do "class1 class2" it would mean yout want to select an element that has class2 that is inside of an element that has the class1.
To make it clear, we could think of real html elements: "p a" would select any link(a) inside a paragraph(p).
Nice one as always! Hope you would continue this as a series.
Thanks John for your tutorial. Really liked how easy and approachable you made it.
a useful start. Followed along and got this working myself (which doesn't often happen when following python tutorials on YT). |Looking forward to finding out now how to get the stuff from page two and then hopefully finding out how to follow links
Thank you very much your content is awesome
Man you're awesome! These videos are so informative and easy to understand, wish you all the success in this world
Thank you!
Dude I am loving your videos!! Opening up the wonderful; world of web scraping with these excellent Python tools. Thank you for the content ;]
thanks and nice work John 👌🏻 i was waiting for this in long time 🙏🏻
I'm a big fan of your work. Thanks, John.
Thank you
It tells the educational content very well
You're Great.
Thanks John!
Great job again John! I've never used Scrapy but now I feel it may be something really useful and powerful. It'd be great if you could do a video comparing the different scraping approaches you've introduced and their scenarios. Thx.
Thankyou so much for this John, I hope this will become series.
Thanks Mart it will
@@JohnWatsonRooney, I hope you can make video also Scrapy-splash approach for scraping dynamic websites by doing some project or sample under this series, thanks!
Thanks John! Great video.
as usual...great content...keep on the good work!
Thanks!
Thank John please upload all videos for scrapy
You are one of the best!!
Yay! It worked!
Nice presentation!
Great intro to Scrapy! Everywhere I've looked people say Scrapy is hard to learn, but frankly this seems more straightforward to me than BS. Maybe that's not the case when things get more complex, but that's just my two cents - maybe you're just better at explaining it?
I'm trying to scrape products and prices from Newegg and running into a road bump - I can get the item name and such, but the price is nested in a tag inside a list and finally a div. Any tips on selecting that?
Appreciated your work!
thank you very much wiht your help i did my first web scraping
That’s great!
Thank You for this fantastic training. Now I understand where scrapy is all about :)
Thank You!
You are so down to earth, salute to you for providing this type of content for free
Thanks! Very useful!
Just subscribed, Thank you sir .
Awesome, thank you!
what is the reason for the venv? are you using a different version of python?
Thank you!! Almost there but the spider doesnt return the right output. What could be wrong? I do see the 200 scraped items via the shell. Am on Windows.
Did you check through the shell response for the items you are after? A 200 can also be something like a captcha page or a blocking page
@@JohnWatsonRooney YES it's only returning Menu tabs and down there services contact tabs
Hey brother
Thanx for the tutorials, can you make a tutorial on other files.
eg:- middleware.py , items.py , settings.py
And second thing how to use database in scrapy for reading & writing the data.
Yes will be doing videos on those too
@@JohnWatsonRooney Thanks man
@@JohnWatsonRooney An implementation of all the scenarios we use in requests like proxies user agents etc in scrapy framework would be awesome!! Nice tutorial as always!
I really didn't understand the 11:33 part and how you do it btw am new to scrapy . Can you explain it?
I want to scrapy the product features but it doesn't work properly, I want to get the 4 or 5 features but I get 1 or all features of the page instead, no idea how it's behaving
I used this code
*response.css("div.f-grid.prod-row ul.f-list.j-list li::text").get()*
the code above will print one feature
*response.css("div.f-grid.prod-row ul.f-list.j-list li::text").getall()*
the code above will print all features of the page while I want to print 4 or 5 depends on the product
I only recently found your channel but all in all great content! I am however coming across problems with POST requests and selenium is sadly not an option for my project.
could you make running scrapy from python script rather from shell
Yes you can run scrapy from a script I have a video on it see
My channel
Thanks a lot for the video. I could scrape a website on my first try.
I had a problem though. I get this error:
raise ExpressionError(
cssselect.xpath.ExpressionError: The pseudo-class :text is unknown ...
When I changed 'a::text' into 'a::attr(href)' it worked. 'text' was also working in the shell but not in the py file. So, how can I get the texts in the file then?
Great
Can I use scrapy to scrape JavaScript generated content?
You can but you need to use the splash extension. I will be covering this soon when I release more scrapy content
@@JohnWatsonRooney 👍👍👍
what is difference scrapy on beautifulsoup
How do I scrape links from level3 or level4 drop down menus and get output in tree format of all child nodes?
I’m loving these webscraping tutorials. I did get an error though as soon as I tried to use the products variable, such as products.css(‘h3’)
I get the error: AttributeError: ‘str’ object has no attribute ‘css’
Useful video, thanks! You're handsome too..
Thanks..
Hi John
how can I clear the screen while I am in scrapy shell ? (I use powershell)
Sure I think typing clear works?
@@JohnWatsonRooney it works before i write the 'scrapy shell order',but after i enter in the response it doesn't work
But how i can put this data in html
Scrapy seems so intimidating.
It is when you first look at it, but once you dive in and break it down into parts it will click
yessssss, i'm the 500 liked person!
Wow thank you!
scrapy shell 'URL'
Doesn't work
scrapy shell "URL"
Double quote work
I never use quotes
Same for me. Thanks for this comment
try with xpath pls
Sure I’ll use xpath next time
@@JohnWatsonRooney thanks ✅👍
Sir make a video on how to scrape google search results.
you should pass useragent in headers
Why does he use a virtual environment?
Python scraping often involves the use of modules and packages. Once you have multiple python projects, if you don't use a virtual environment, you would have different projects using some of the same packages and modules. If you go to update a package for one project, you would break a different project relying on a previous version of the same package to work properly. A virtual environment isolates packages a modules associated with only one project, so that no matter what other projects use the same packages or modules, they don't interfere with each other. At least that's my understanding.
After the command : scrapy shell 'jessops.com/drones'
I got this as prompt : In [1] : instead of >>>
I don't know what I've done wrong...
Nevermind, it works fine anyway.
Also found out the hard way that indentation matters !!
it works without quotes
Thank you, very much.