Python Automation Series #7 : How to scrap newspapers and retrieve data using newspaper module ?

Поділитися
Вставка
  • Опубліковано 19 чер 2024
  • Newspaper is a Python module used for extracting and parsing newspaper articles, it was inspired by the famous library " requests :
    Requests which is one of the most downloaded Python package today, pulling in around 14M downloads / week- according to GitHub.
    ( Lucas Ou-Yang ) the creator of newspaper3k , a popular journalism NLP library, has built products at Facebook and Snap,
    and he is currently working at Facebook reality labs.
    References :
    github.com/codelucas/newspaper
    pypi.org/project/newspaper3k/
    Github link for the code : github.com/BekBrace/newspaper...
    Scraping Multiple URLS : github.com/BekBrace/Scraping-...
    DEV profile : dev.to/bekbrace
    Github profile : github.com/BekBrace

КОМЕНТАРІ • 26

  • @lusinedavtyan872
    @lusinedavtyan872 3 роки тому +2

    Thank you ! Best Python video so far. Will watch all your tutorials!

    • @BekBrace
      @BekBrace  3 роки тому

      Hello Lusine, thanks a lot for your kind words! 🙏 - means a lot to me🙂

  • @BrandonCastillo-eo1or
    @BrandonCastillo-eo1or 2 роки тому

    Thank you very much. It was very useful 👍🏽

    • @BekBrace
      @BekBrace  2 роки тому +1

      Great to hear my friend

  • @funnyvideos-zl9pu
    @funnyvideos-zl9pu 3 роки тому +2

    Is there any way to scrap data from site need to log in but its block any user make any scrap with selenium so i need to make scrap from my normal browser without need to log in every time

  • @brandonma4539
    @brandonma4539 Рік тому +1

    Hi, I see that you've used the Download function, however, where is it downloaded exactly? I cannot find the file after running the code

    • @BekBrace
      @BekBrace  Рік тому

      Hey Brandon.
      It's not downloaded on a physical location on your hard drive, this is just a method that is necessary to download the article as a stage to parse it and then print it - another example from Lucas' GitHub repo :
      >>> from newspaper import Article
      >>> url = 'www.bbc.co.uk/zhongwen/simp/chinese_news/2012/12/121210_hongkong_politics.shtml'
      >>> a = Article(url, language='zh') # Chinese
      >>> a.download()
      >>> a.parse()
      >>> print(a.text[:150])
      香港行政长官梁振英在各方压力下就其大宅的违章建
      筑(僭建)问题到立法会接受质询,并向香港民众道歉。

  • @supercompilations7211
    @supercompilations7211 2 роки тому +1

    Hii mr thanks for your tuto
    I have Q there is any way to build wordpress plugin using python ??

    • @BekBrace
      @BekBrace  2 роки тому +1

      Thank you friend!
      Honestly I don’t know, but I happen to find an answer for your question, I’ll let you know - peace ✌️

  • @metakokalj1207
    @metakokalj1207 Рік тому +1

    Hello. Do you maybe know the way to scrape multiple page comments (like 450 comments) from a newspaper article? Thank you so much for your help.

    • @BekBrace
      @BekBrace  Рік тому +1

      Thank you.
      I don't have a ready answer for this question, but if i happen to find a way I'll let you know

  • @solomon9846
    @solomon9846 3 роки тому +1

    Hello. Thank you for your explaining. May I know how to scrape for multiple URLs?

    • @BekBrace
      @BekBrace  3 роки тому +2

      Hello Solomon , thank you for watching and for your question.
      I have added a link in the description for a piece of code that can help you scraping multiple URLS - Tell me later if it went Okay

    • @singasik
      @singasik 2 роки тому

      @@BekBrace Hey I have tried your link but it doesn't seem to work. I am not sure why it didn't work. I have a feeling I am doing it wrongly
      ua-cam.com/video/9KZwRBg4-P0/v-deo.html
      I was trying to do a chatbot according to the link above, and I tried to combine with the link you provided. I required help from your video as the link above didn't show how to download articles into text from multiple articles.
      If you can show/ explain to me how to do so with regards to the example above, I would appreciate it greatly. Thank you for reading till here.

  • @ronaldoleoni7170
    @ronaldoleoni7170 Рік тому +1

    How can i do NLP of links that i already catch and stored as a list with other methods?

    • @BekBrace
      @BekBrace  Рік тому +1

      Not sure if if I understand

    • @ronaldoleoni7170
      @ronaldoleoni7170 Рік тому

      @@BekBrace I'm doing a web scraping of google scholar. I manage to catch many links to articles, but when I go to use the newspaper module, I realize that I can only catch articles from the web with it. I can enter with my data already collected.

  • @Programmingwithranawaqas
    @Programmingwithranawaqas 2 роки тому

    how can i crawl mutliple URLS?

  • @karanahuja7467
    @karanahuja7467 10 місяців тому

    how to filter certain keywords and not include them?

  • @user-gp7py5fe1m
    @user-gp7py5fe1m 3 роки тому +1

    When I tried "pip install newspaper" I got this: "ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output."

    • @BekBrace
      @BekBrace  3 роки тому

      Татьяна Тренихина check your python version using : python -version , please ? ; and if it’s not python version 3 then consider upgrading it. If it is python 3 + , then try the following command on your terminal : pip install -U setuptools ( this is just a permission problem ) and let me know how it went.

    • @user-gp7py5fe1m
      @user-gp7py5fe1m 3 роки тому

      @@BekBrace Now it works! It was my mistake. I first used pip install. The program works correctly with "pip3 install newspaper3k". Thanks!

  • @prakashkafle454
    @prakashkafle454 3 роки тому

    Not work for nepali news properly .