If you are facing problem of ''RuntimeError: Event loop is closed'' in Windows then Just replace this code ''asyncio.run(main())'' with asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) asyncio.run(main()) Especially in the async_scraper.py file, Thanks.
Love your content, learnt alot from your channel. Thank you. Keep up the good work. . If possible create some tutorials to run scrapers on aws and scale it too.
Great question! So, it is true that Bs4 is sync. Note that in this particular example, I wanted to make sure that there is no sync library so I stayed away from it. Now, for practical web scraping, you have few options. 1st is to use bs4 along with aiohttp. The idea is that in web scraping, the larger problem is waiting for the network. Making that async will solve 95% of slowness. You can still use bs4 for the parsing, which would be a smaller part of the problem. There are some async parsing libraries but these are not popular and may not be stable. The best approach in my opinion is using Scrapy. With this, you don't have to worry about the details and it will give you best result. I use Scrapy for even calling APIs in non-web scraping work. This is what I recommend.
@@codeRECODE I recently started learning web scraping and your content is best among all I have watched till date. Thank You for explaining details and making things easy and understandable. I find it hard to write async code when developing a large application(tracking price from all eCommerce websites). There are lot of pitfalls when writing async code. I would love to see more content about this in future if possible.
please make a video on how we can convert our scrapy project into an exe file? I tried to search on youtube but failed to find even a single video. I also go through some answers in StackOverflow but am unable to understand what is happening ...
@@pythonically I will get to it when I can. Meanwhile, explore calling scripts from a batch file. Distribute the entire folder along with a batch file that runs your code. Your clients can simply double click the batch file to run the code.
canvas tag is usually updated using JavaScript dynamically. Maybe there are some libraries that can help. As a last resort, you can use Selenium or Platywright and take screenshots.
I'm having trouble with your code. When I run just one url, it is fast. Two or more urls take 20 seconds or more. Can you help? import re import aiohttp import asyncio import time start_time = time.time() links = ["en.wikipedia.org/wiki/Abkhazian_apsar", "en.wikipedia.org/wiki/Russian_ruble", ] links = ["en.wikipedia.org/wiki/Abkhazian_apsar", ] async def get_response(session, url): async with session.get(url) as resp: text = await resp.text() exp = r'().*()' return re.search(exp, text).group(0) async def main(): async with aiohttp.ClientSession() as session: tasks = [] for url in links: tasks.append(asyncio.create_task(get_response(session, url))) results = await asyncio.gather(*tasks) for result in results: print(result) asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) asyncio.run(main()) print("--- %s seconds ---" % (time.time() - start_time))
Thanks! for sharing this wonderful topic
If you are facing problem of ''RuntimeError: Event loop is closed'' in Windows then
Just replace this code ''asyncio.run(main())''
with
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
asyncio.run(main())
Especially in the async_scraper.py file, Thanks.
Requesting to add more videos on async, need more videos please!
It's really a helpful video! Thanks.
Glad to hear that!
Love your content, learnt alot from your channel. Thank you. Keep up the good work.
.
If possible create some tutorials to run scrapers on aws and scale it too.
Great idea! Thank you.
Thank you upendra for such quality content.
Thanks Pranav
holy sh..., great video...im creating a bot for trading but i havent understand async to now..thanks a lot Upendra.... :)
very impressive!!!
You mentioned, we can't use beautifulsoup with async, then what's the alternative?
Great question!
So, it is true that Bs4 is sync. Note that in this particular example, I wanted to make sure that there is no sync library so I stayed away from it.
Now, for practical web scraping, you have few options. 1st is to use bs4 along with aiohttp. The idea is that in web scraping, the larger problem is waiting for the network. Making that async will solve 95% of slowness. You can still use bs4 for the parsing, which would be a smaller part of the problem.
There are some async parsing libraries but these are not popular and may not be stable.
The best approach in my opinion is using Scrapy. With this, you don't have to worry about the details and it will give you best result.
I use Scrapy for even calling APIs in non-web scraping work. This is what I recommend.
Watch this video on the same subject - ua-cam.com/video/qQDB6SE0a9c/v-deo.html
@@codeRECODE I recently started learning web scraping and your content is best among all I have watched till date. Thank You for explaining details and making things easy and understandable.
I find it hard to write async code when developing a large application(tracking price from all eCommerce websites). There are lot of pitfalls when writing async code. I would love to see more content about this in future if possible.
@@codeRECODE amazing video 🤩
please make a video on how we can convert our scrapy project into an exe file? I tried to search on youtube but failed to find even a single video. I also go through some answers in StackOverflow but am unable to understand what is happening ...
This comes up a lot and exe is not the answers. Let me make a video soon.
@@codeRECODE waiting...
@@pythonically I will get to it when I can. Meanwhile, explore calling scripts from a batch file. Distribute the entire folder along with a batch file that runs your code. Your clients can simply double click the batch file to run the code.
Sir I am facing a problem.,sir can u tell me how to download a image in a website in a canvas tag with no url
U acn see below the url
canvas tag is usually updated using JavaScript dynamically. Maybe there are some libraries that can help. As a last resort, you can use Selenium or Platywright and take screenshots.
@@codeRECODE sir how can we do in selenium?
I'm having trouble with your code. When I run just one url, it is fast. Two or more urls take 20 seconds or more. Can you help?
import re
import aiohttp
import asyncio
import time
start_time = time.time()
links = ["en.wikipedia.org/wiki/Abkhazian_apsar",
"en.wikipedia.org/wiki/Russian_ruble",
]
links = ["en.wikipedia.org/wiki/Abkhazian_apsar",
]
async def get_response(session, url):
async with session.get(url) as resp:
text = await resp.text()
exp = r'().*()'
return re.search(exp, text).group(0)
async def main():
async with aiohttp.ClientSession() as session:
tasks = []
for url in links:
tasks.append(asyncio.create_task(get_response(session, url)))
results = await asyncio.gather(*tasks)
for result in results:
print(result)
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
asyncio.run(main())
print("--- %s seconds ---" % (time.time() - start_time))