Scrapy Basics - How to Get Started with Python's Web Scraping Framework

Поділитися
Вставка
  • Опубліковано 4 гру 2024

КОМЕНТАРІ • 88

  • @pythonantole9892
    @pythonantole9892 4 роки тому +5

    Oh my! This channel deserves more subscribers. I scrape a lot of tables in my job but never knew i could use pandas (had never it heard of it) until i saw one of your videos on Pandas. I look forward to more videos on scrapy now that i have the motivation to move away from BS4 and try scrapy.

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 роки тому

      Thanks for your kind words I’m glad it’s helped you!

  • @sampatankar1977
    @sampatankar1977 4 роки тому +3

    Really lucid, well-judged in terms of content, and excellent videography. Timely too, given what I happen to be doing this week! Thankyou!

  • @nadyamoscow2461
    @nadyamoscow2461 3 роки тому +2

    The best scrapy basic tutorial I`ve seen. Thanks a lot!!

  • @user8ZAKC1X6KC
    @user8ZAKC1X6KC 2 роки тому +3

    Something you note at the 9:23 mark is that you can close the space with a dot (or period). To add a little bit more to that. Regardless of the number of spaces, you only need one period. So close the gap completely and put one dot. I struggled with this for a while, as I had a custom class with 5 spaces (no idea why the coder would do that) in the name and it just never occurred to me that I could just use one dot. None of the documentation in scrapy indicated that. I spent quite a while trying figure that out.

    • @AmodeusR
      @AmodeusR Рік тому +1

      It's good to learn about css if you're going to use css selectors. The space is closed with a dot because in CSS, when you want to select an element based on a shared class, you write it like "class1.class2". If you were to do "class1 class2" it would mean yout want to select an element that has class2 that is inside of an element that has the class1.
      To make it clear, we could think of real html elements: "p a" would select any link(a) inside a paragraph(p).

  • @irfankalam509
    @irfankalam509 4 роки тому +3

    Nice one as always! Hope you would continue this as a series.

  • @hardwaregenie
    @hardwaregenie 2 роки тому +1

    Thanks John for your tutorial. Really liked how easy and approachable you made it.

  • @WildRover1964
    @WildRover1964 2 роки тому +1

    a useful start. Followed along and got this working myself (which doesn't often happen when following python tutorials on YT). |Looking forward to finding out now how to get the stuff from page two and then hopefully finding out how to follow links

  • @mahdi132
    @mahdi132 Рік тому +1

    Thank you very much your content is awesome

  • @sagar318
    @sagar318 3 роки тому +1

    Man you're awesome! These videos are so informative and easy to understand, wish you all the success in this world

  • @julz2020
    @julz2020 2 роки тому +1

    Dude I am loving your videos!! Opening up the wonderful; world of web scraping with these excellent Python tools. Thank you for the content ;]

  • @sinamobasheri3632
    @sinamobasheri3632 4 роки тому +2

    thanks and nice work John 👌🏻 i was waiting for this in long time 🙏🏻

  • @JohnMusicbr
    @JohnMusicbr 3 роки тому +1

    I'm a big fan of your work. Thanks, John.

  • @kavehmoradkhani8018
    @kavehmoradkhani8018 2 роки тому +1

    It tells the educational content very well
    You're Great.
    Thanks John!

  • @celerystalk390
    @celerystalk390 4 роки тому +8

    Great job again John! I've never used Scrapy but now I feel it may be something really useful and powerful. It'd be great if you could do a video comparing the different scraping approaches you've introduced and their scenarios. Thx.

  • @martpagente7587
    @martpagente7587 4 роки тому +1

    Thankyou so much for this John, I hope this will become series.

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 роки тому +1

      Thanks Mart it will

    • @martpagente7587
      @martpagente7587 4 роки тому +1

      @@JohnWatsonRooney, I hope you can make video also Scrapy-splash approach for scraping dynamic websites by doing some project or sample under this series, thanks!

  • @susannegelarehamiri4497
    @susannegelarehamiri4497 3 роки тому +1

    Thanks John! Great video.

  • @daniel76900
    @daniel76900 3 роки тому +1

    as usual...great content...keep on the good work!

  • @engineerbaaniya4846
    @engineerbaaniya4846 4 роки тому +2

    Thank John please upload all videos for scrapy

  • @litodemesa9699
    @litodemesa9699 2 роки тому +1

    You are one of the best!!

  • @theinstigatorr
    @theinstigatorr 3 роки тому +2

    Yay! It worked!

  • @chrissenanayake9891
    @chrissenanayake9891 3 роки тому +1

    Nice presentation!

  • @stephenwilson0386
    @stephenwilson0386 2 роки тому

    Great intro to Scrapy! Everywhere I've looked people say Scrapy is hard to learn, but frankly this seems more straightforward to me than BS. Maybe that's not the case when things get more complex, but that's just my two cents - maybe you're just better at explaining it?
    I'm trying to scrape products and prices from Newegg and running into a road bump - I can get the item name and such, but the price is nested in a tag inside a list and finally a div. Any tips on selecting that?

  • @RenatoEsquarcit
    @RenatoEsquarcit 3 роки тому +1

    Appreciated your work!

  • @kimodataworld5092
    @kimodataworld5092 2 роки тому

    thank you very much wiht your help i did my first web scraping

  • @edbull4891
    @edbull4891 2 роки тому +1

    Thank You for this fantastic training. Now I understand where scrapy is all about :)

  • @mrindia4178
    @mrindia4178 3 роки тому +2

    Thank You!

    • @mrindia4178
      @mrindia4178 3 роки тому +1

      You are so down to earth, salute to you for providing this type of content for free

  • @NXTTutorials
    @NXTTutorials 4 роки тому +1

    Thanks! Very useful!

  • @BChok420
    @BChok420 4 роки тому +1

    Just subscribed, Thank you sir .

  • @Modey3
    @Modey3 Рік тому

    what is the reason for the venv? are you using a different version of python?

  • @Hugo-pw5ud
    @Hugo-pw5ud Рік тому +1

    Thank you!! Almost there but the spider doesnt return the right output. What could be wrong? I do see the 200 scraped items via the shell. Am on Windows.

    • @JohnWatsonRooney
      @JohnWatsonRooney  Рік тому

      Did you check through the shell response for the items you are after? A 200 can also be something like a captcha page or a blocking page

    • @arpitakar3384
      @arpitakar3384 3 місяці тому

      ​@@JohnWatsonRooney YES it's only returning Menu tabs and down there services contact tabs

  • @ajayyadav-us8hd
    @ajayyadav-us8hd 4 роки тому +3

    Hey brother
    Thanx for the tutorials, can you make a tutorial on other files.
    eg:- middleware.py , items.py , settings.py
    And second thing how to use database in scrapy for reading & writing the data.

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 роки тому +1

      Yes will be doing videos on those too

    • @ajayyadav-us8hd
      @ajayyadav-us8hd 4 роки тому

      @@JohnWatsonRooney Thanks man

    • @greis790
      @greis790 4 роки тому

      @@JohnWatsonRooney An implementation of all the scenarios we use in requests like proxies user agents etc in scrapy framework would be awesome!! Nice tutorial as always!

  • @hardeepbhatti8619
    @hardeepbhatti8619 2 роки тому

    I really didn't understand the 11:33 part and how you do it btw am new to scrapy . Can you explain it?

  • @MohAmuza
    @MohAmuza 3 роки тому +1

    I want to scrapy the product features but it doesn't work properly, I want to get the 4 or 5 features but I get 1 or all features of the page instead, no idea how it's behaving
    I used this code
    *response.css("div.f-grid.prod-row ul.f-list.j-list li::text").get()*
    the code above will print one feature
    *response.css("div.f-grid.prod-row ul.f-list.j-list li::text").getall()*
    the code above will print all features of the page while I want to print 4 or 5 depends on the product

  • @stephennardone5437
    @stephennardone5437 4 роки тому

    I only recently found your channel but all in all great content! I am however coming across problems with POST requests and selenium is sadly not an option for my project.

  • @eldarmammadov7872
    @eldarmammadov7872 Рік тому +1

    could you make running scrapy from python script rather from shell

    • @JohnWatsonRooney
      @JohnWatsonRooney  Рік тому +1

      Yes you can run scrapy from a script I have a video on it see
      My channel

  • @SabriCanOkyay
    @SabriCanOkyay 3 роки тому

    Thanks a lot for the video. I could scrape a website on my first try.
    I had a problem though. I get this error:
    raise ExpressionError(
    cssselect.xpath.ExpressionError: The pseudo-class :text is unknown ...
    When I changed 'a::text' into 'a::attr(href)' it worked. 'text' was also working in the shell but not in the py file. So, how can I get the texts in the file then?

  • @SecurityTalent
    @SecurityTalent 2 роки тому +1

    Great

  • @samcamus3000
    @samcamus3000 4 роки тому +1

    Can I use scrapy to scrape JavaScript generated content?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 роки тому +2

      You can but you need to use the splash extension. I will be covering this soon when I release more scrapy content

    • @samcamus3000
      @samcamus3000 4 роки тому

      @@JohnWatsonRooney 👍👍👍

  • @artabra1019
    @artabra1019 4 роки тому

    what is difference scrapy on beautifulsoup

  • @pahehepaa4182
    @pahehepaa4182 4 роки тому

    How do I scrape links from level3 or level4 drop down menus and get output in tree format of all child nodes?

  • @SunDevilThor
    @SunDevilThor 3 роки тому

    I’m loving these webscraping tutorials. I did get an error though as soon as I tried to use the products variable, such as products.css(‘h3’)
    I get the error: AttributeError: ‘str’ object has no attribute ‘css’

  • @victory9654
    @victory9654 4 роки тому +3

    Useful video, thanks! You're handsome too..

  • @mohamad5005
    @mohamad5005 2 роки тому

    Hi John
    how can I clear the screen while I am in scrapy shell ? (I use powershell)

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому

      Sure I think typing clear works?

    • @mohamad5005
      @mohamad5005 2 роки тому

      @@JohnWatsonRooney it works before i write the 'scrapy shell order',but after i enter in the response it doesn't work

  • @haithemamir223
    @haithemamir223 2 роки тому

    But how i can put this data in html

  • @igordc16
    @igordc16 2 роки тому +1

    Scrapy seems so intimidating.

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 роки тому

      It is when you first look at it, but once you dive in and break it down into parts it will click

  • @d.developer
    @d.developer 2 роки тому +1

    yessssss, i'm the 500 liked person!

  • @Pistonheaddd
    @Pistonheaddd 3 роки тому +2

    scrapy shell 'URL'
    Doesn't work
    scrapy shell "URL"
    Double quote work

  • @-__--__aaaa
    @-__--__aaaa 4 роки тому +1

    try with xpath pls

  • @muhammadhananasghar3102
    @muhammadhananasghar3102 4 роки тому

    Sir make a video on how to scrape google search results.

    • @-__--__aaaa
      @-__--__aaaa 4 роки тому

      you should pass useragent in headers

  • @Don_ron666
    @Don_ron666 2 роки тому

    Why does he use a virtual environment?

    • @nateTheNomad23
      @nateTheNomad23 Рік тому

      Python scraping often involves the use of modules and packages. Once you have multiple python projects, if you don't use a virtual environment, you would have different projects using some of the same packages and modules. If you go to update a package for one project, you would break a different project relying on a previous version of the same package to work properly. A virtual environment isolates packages a modules associated with only one project, so that no matter what other projects use the same packages or modules, they don't interfere with each other. At least that's my understanding.

  • @angelesc2479
    @angelesc2479 4 роки тому +1

    After the command : scrapy shell 'jessops.com/drones'
    I got this as prompt : In [1] : instead of >>>
    I don't know what I've done wrong...

    • @angelesc2479
      @angelesc2479 4 роки тому +1

      Nevermind, it works fine anyway.
      Also found out the hard way that indentation matters !!

    • @MohAmuza
      @MohAmuza 3 роки тому

      it works without quotes

  • @ALVINMAN452
    @ALVINMAN452 Рік тому +1

    Thank you, very much.