Python Automation Series #7 : How to scrap newspapers and retrieve data using newspaper module ?
Вставка
- Опубліковано 19 чер 2024
- Newspaper is a Python module used for extracting and parsing newspaper articles, it was inspired by the famous library " requests :
Requests which is one of the most downloaded Python package today, pulling in around 14M downloads / week- according to GitHub.
( Lucas Ou-Yang ) the creator of newspaper3k , a popular journalism NLP library, has built products at Facebook and Snap,
and he is currently working at Facebook reality labs.
References :
github.com/codelucas/newspaper
pypi.org/project/newspaper3k/
Github link for the code : github.com/BekBrace/newspaper...
Scraping Multiple URLS : github.com/BekBrace/Scraping-...
DEV profile : dev.to/bekbrace
Github profile : github.com/BekBrace
Thank you ! Best Python video so far. Will watch all your tutorials!
Hello Lusine, thanks a lot for your kind words! 🙏 - means a lot to me🙂
Thank you very much. It was very useful 👍🏽
Great to hear my friend
Is there any way to scrap data from site need to log in but its block any user make any scrap with selenium so i need to make scrap from my normal browser without need to log in every time
Hi, I see that you've used the Download function, however, where is it downloaded exactly? I cannot find the file after running the code
Hey Brandon.
It's not downloaded on a physical location on your hard drive, this is just a method that is necessary to download the article as a stage to parse it and then print it - another example from Lucas' GitHub repo :
>>> from newspaper import Article
>>> url = 'www.bbc.co.uk/zhongwen/simp/chinese_news/2012/12/121210_hongkong_politics.shtml'
>>> a = Article(url, language='zh') # Chinese
>>> a.download()
>>> a.parse()
>>> print(a.text[:150])
香港行政长官梁振英在各方压力下就其大宅的违章建
筑(僭建)问题到立法会接受质询,并向香港民众道歉。
Hii mr thanks for your tuto
I have Q there is any way to build wordpress plugin using python ??
Thank you friend!
Honestly I don’t know, but I happen to find an answer for your question, I’ll let you know - peace ✌️
Hello. Do you maybe know the way to scrape multiple page comments (like 450 comments) from a newspaper article? Thank you so much for your help.
Thank you.
I don't have a ready answer for this question, but if i happen to find a way I'll let you know
Hello. Thank you for your explaining. May I know how to scrape for multiple URLs?
Hello Solomon , thank you for watching and for your question.
I have added a link in the description for a piece of code that can help you scraping multiple URLS - Tell me later if it went Okay
@@BekBrace Hey I have tried your link but it doesn't seem to work. I am not sure why it didn't work. I have a feeling I am doing it wrongly
ua-cam.com/video/9KZwRBg4-P0/v-deo.html
I was trying to do a chatbot according to the link above, and I tried to combine with the link you provided. I required help from your video as the link above didn't show how to download articles into text from multiple articles.
If you can show/ explain to me how to do so with regards to the example above, I would appreciate it greatly. Thank you for reading till here.
How can i do NLP of links that i already catch and stored as a list with other methods?
Not sure if if I understand
@@BekBrace I'm doing a web scraping of google scholar. I manage to catch many links to articles, but when I go to use the newspaper module, I realize that I can only catch articles from the web with it. I can enter with my data already collected.
how can i crawl mutliple URLS?
how to filter certain keywords and not include them?
When I tried "pip install newspaper" I got this: "ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output."
Татьяна Тренихина check your python version using : python -version , please ? ; and if it’s not python version 3 then consider upgrading it. If it is python 3 + , then try the following command on your terminal : pip install -U setuptools ( this is just a permission problem ) and let me know how it went.
@@BekBrace Now it works! It was my mistake. I first used pip install. The program works correctly with "pip3 install newspaper3k". Thanks!
Not work for nepali news properly .
Do you get an error ?
@@BekBrace yup