How to Rotate Proxies with Python
Вставка
- Опубліковано 15 лис 2024
- Join the Discord to discuss all things Python and Web with our growing community! / discord
In this video I go through how to implement rotating proxies using requests with Python. We look at scraping some free proxies and writing a script to see if they work. Although free proxies aren't any use for actual web scraping projects the principles are the same.
Code here: github.com/jhn...
Proxies: nodemaven.com/...
If you are new, welcome! I am John, a self taught Python (and Go, kinda..) developer working in the web and data space. I specialize in data extraction and JSON web API's both server and client. If you like programming and web content as much as I do, you can subscribe for weekly content.
:: Links ::
My Patrons Really keep the channel alive, and get extra content / johnwatsonrooney (NEW free tier)
I Host almost all my stuff on Digital Ocean m.do.co/c/c7c9...
I rundown of the gear I use to create videos www.amazon.co....
Proxies I use nodemaven.com/...
Scraper API I use www.scrapingbe...
:: Disclaimer ::
Some/all of the links above are affiliate links. By clicking on these links I receive a small commission should you chose to purchase any services or items.
Great video.
I think 3 things are REALLY worthy of note because the answer to the failures isn't necessarily that the proxies were blocked:
- Sometimes the proxies weren't able to be connected to in the first place
- Sometimes the Server can't handle so many back to back requests
AND
Some proxies are set up by malicious actors to gain unauthorized access to connecting hosts.
Yes absolutely, great comments thank you for sharing
Also make sure to use elite proxies as the others still make your ip visible in the headers and therefore ip origin is known by the host
Try ipcola, residential IP proxy, sticky and rotation sessions,100% new, 99% pure
9:28 this part changed my programming outlook to drastic extent.
The wonders you can pull using the threaded approach is just sublime.
Thanks John once again.
It seems im running out of gratitudes to give you the due credit.
Ive been binging your content because of an amazon scraper I'm working on.
Can't help but giggle at the consistent struggle you have in typing "requests"
All in good fun! Keep up the great content!
Haha thanks. I’m better now but typing and talking at the same time makes more mistakes!!
@@JohnWatsonRooney the only multitasking I do is copy what you type while trying to understand the concept/s behind it, which you have done exceptionally well. Thanks for being a virtual tutor, John!
(Still having issues with bot detection with my bs4+requests, tried time.sleep and randomizing user agent with the fake user agent library, though)
Thanks to the man who has all the answers to my questions.
Man, you have the well of wisdom on behalve of scraping/python, JSON and what matters in that field.
Although I am a pro IT guy for many many years (PM, Consultant, Architect and Advisor) this field of expertise is rather unexplored for me,
but following your video's made it crystal clear to me.
Thanks again for sharing
Thank you! Very kind
Every time I have a python scraping problem I search your name
Great channel mate 👍
Thank you very kind!
Nice to see a video where I understand everything first time! Thanks
Thank you!
Beautiful content as always! I tried scraping the site in this video using bs4 as it's the only framework I know as of now. I hope you make a video on scraping this site as you said in the video.
Checked more than 10 videos, all videos are awesome
Thanks Bro
Thanks! Do you know any services with rotating proxies that rotate every 60 seconds and were you can choose mix geo or a specific country? I have mine from proxy-stоrе but this is my first service, want to have cheap alternatives and to find out other options
Thanks for this video and looking forward to your future video about working proxies which are useful for web scraping
when i choose a proxy and print the status code, it returns 200, but when I want to show the json, it returns exception. why?
Super Insightful, John!
Thank you for sharing this one 🙌
Nice video but how can you deal with auth proxies?
use selenium wire
proxies = { 'https' : 'user:password@proxyip:port' } r = requests.get('url', proxies=proxies)
Thankyou John! It's getting more interesting everytime you upload videos. By the way, can you start using Request+BeautifulSoup+Splash sooner in the future, especially in setting up? I'm have a couple of questions for you next Q&A series, I'm excited!
Thanks Mart!
@@JohnWatsonRooney Hello, How do I format the proxies in a list? Do I need to use a .csv format? Or can I just use a .txt file. Thanks!
why my {'origin': 'myip'} just showing my real ip
That was what I exactly looking for
Thanks chief, informative and easy to understand tutorial
you deserve Millions of subscribers❤❤
This trick saved my time. Thank you 🙌🏾
@Bryan Braydon to be honest I don't care.
ua-cam.com/video/AL9Hcq15R5s/v-deo.html
Hello even request.get response value 200 for a url and it looks like proxy is working
but when we load a this website etc then it's always show can't access, load timeout,rendering timeout..etc. So do we have any way to check those proxy could work as normal? Thank you so much
Unfortunately most free proxies are blocked from the main websites so that could be your issue. You can try to find some that do work but in my experience it can be tough
I'm quite familiar with using selenium.
But I gotta say, the way you explain requests is very didatic.
Thanks; very helpful and unique tutorial; more like this, please!👍
Hello, is there a way to make it so the proxies are constantly changing via a api?? For example you have a 10k list of proxies with numerous sources, but the proxies get updated every 5 minutes
Not entirely sure what you mean but if you can request a proxy list, store them and use them for a few minutes, the request again and update that would work. Easiest solution would be to download the proxy list every 5 mins and store in a file, and use that file to import new proxies into your scraper
@@JohnWatsonRooney would that way work without stopping the instance??
Loved the way you explain, this is the first time I've came across your content and I enjoyed learning every second,
Will this script be also applicable for socks5 proxy?
Hey John,
I am a Uni student studying Data Analytics. Currently doing a unit on "Data Acquisition" and your videos are far better walking through the complexities of web scraping than this current course!
I'm doing enough web scraping now where I think it is beneficial for me to start looking at paid for rotating residential proxies.
Do you have a service that you recommend? Even if you have affiliate links.
If you don't have any links, I think it would be beneficial to seek out such sponsorship possibilities soon.
Hey John! new subscriber here..! Im enjoying your channel very much, I have one suggestion though, in most of your videos you refer to previous ones and say that you're going to post the links somewhere but you don't. As a newcomer it is a bit difficult to find the video you're referring to since well your thumbnails and titles are in general, similar. Links will help new subscribers drive through your content smoothly. Cheers!
Hi - sure no problem I know I have a bad habbit of not adding in links when i said i would!
Hey! thank you for such a detailed video. Is it possible for me to skip Captchas by rotating working proxies on a website? Or is there a more efficient method to do it?
I think so yes. It's important to have working proxies but also to act like a real user as much as possible - use complete and real headers, don't send to many requests and rotate through proxies in randomly, not in an order
Can i use this method with residential proxies or datacenter?
Yes absolutely
Great video! Any idea how to fix SSL: CERTIFICATE_VERIFY_FAILED?
Great Explanetion. Thank you!
New skill learned. Thank you as always!
you said that the proxies will not work for Google but do you think they will work for other GCP products?
You can try but probably not- these free proxies are generally abused and are black listed for almost everywhere
Hi John, do you have any recommendation of the best paid proxy provider?
Is secure to use a free proxy page? In terms of cybersecurity I mean. Thank you
don’t send any personal or sensitive information over it but otherwise yes
Hi John, all your content is very helpfull like always. Can you make, well I supposed is possible. When you're scraping some site and after a few requests you get block or ask for some verification code, can you skip that current proxys and get another proxys from list of proxys? thank you!
Hi Jonathon. Sure that is very possible - instead of trying to handle the error of gettgin blocked i would jsut rotate through each proxy for each new request. You can spread the load out that way
@@JohnWatsonRooney Thank you I will try that. I often get blocked by a page even when i make high sleep times.
Hi John, great video and thank you for your time and effort for creating these videos for us. I was wondering if you added the updated version of this video as you mentioned because I could not find any other tutorial on proxy on your site.
I had the below error, and solved it by going into the documentation and used the example under proxies to setup the proxies. Maybe the requests library changed a bit since.
"requests.exceptions.InvalidURL: Proxy URL had no scheme"
Hi, I'm having the same problem... Were you able to solve it?
This video was really helpful. Thank you for your videos!
You’re welcome glad you enjoyed it!
I'm doing a bot traffic,and in authentication (log-in) i get a 429 error....I try to put some delays but nothing happent..Do you have any idea?
i try your code, the problem is 2:59, the response is 200, when i use print(r.json) there is error so go to except, but without json, proxylist show working, please tell me why print(r) and print(r.json) are different result
Hi John ,why result of crapeProxy is no port ?
Brilliant job John... Do you guys manage to scrape more or less all sort of websites or there are some impossible ones?
Some are definitely harder than others but there is always a way!
@@JohnWatsonRooney I'll keep trying then! 😉
How can you handle retries with a request? If one request fails how to retry with a different proxy?
How do you approach creating a bulk Instagram account? No API is available to create an account only way is scripting in the browser.
I'm currently scraping Facebook with Selenium for my final project. (I can't use the API for many reasons and I can't change the source as my project depends solely on facebook: if you're going to say it's illegal)
I switch user-agents but should I use proxy too? I get blocked quite often and I'm fairly new to this.
It's really helping material. Thanks dear professor
Great video. Thanks. How to add the UA-cam video link in this code ?
Thanks John for yet another useful video - I'm new to web scraping & have been blocked from a site I want to scrape, I was wondering & Im sure there are packages out there to save the full content of a website locally so we can scrape with no issues & Im not talking about big sites such as Amazon - do you think this is possible if so why no one else is talking about it? how would you go about it please?
Congrats John, it amazing video :)
Regards, Nelson
Thanks Nelson!
Great Video. I tried to use a proxy available online and returned back with a 200 status code. But if then try to print the text (page.text), I get a nonetype object. Can you help me why this would be a case
Sounds like the proxy works but is being detected by the website. Can you print anything, like a title or something?
Hello john , video is outstanding as usual,my question is can we use same method for request_html library
Yes you can!
Hey bro, how are u?
I'm looking to buy a rotate proxy ip, would you know where you have it?
I couldn't find any working proxies on that list. So, I created my own proxy pool, hahahah
That’s a good idea!
Proxy pool can you guide us
Can i ask something, with this technique we can still use Session from Requests to scrape faster?? or by using proxies we have to establish a new connection with the server from the start with every request?
Yeah that’s right, the proxy only changes your ip on each separate request - so if you are using a session it wouldn’t work, you have to create a new connection each time
@@JohnWatsonRooney Thanks a lot!! your videos are very nice and meaningful!
Hi John, please make a video on how to scrape aliexpress
Thanks for this video. I'm using requests_html for my scrapper, do you know what is the equivalent of (print(r.json)) ? I'd like to be sure that the scrapper is using the right proxy. Thank you!
This trick will help me to change my IP address continuously after a while to prevent google from blocking me to request?
thanks that was awesome thank you
do you suggest any method for searching around 20000 words in a day in google and get the results? without getting blocked?
Hello John, awesome video, does the same method work with SOCKS4/SOCKS5 proxies with pysocks??
What if i want to send a POST instead of get, how can i use proxies with post ?
Great video! How would I go about getting the equivalent of a r.json response (What IP used is what I want to know) when targeting a URL like Google for example, where the .json will not work?
i got this error "requests.exceptions.ProxyError: HTTPSConnectionPool Max retries exceeded with url: /ip (Caused by ProxyError('Cannot connect to proxy.', ConnectionResetError(54, 'Connection reset by peer')))" with every proxies i use. plese help me
Hello John, I am trying to implement this code with a list of proxies that already work in a csv file and the code runs without any errors but does not give me an outcome whatsoever. I believe my issue is originating from the extract function and I was hoping you could lend me a hand if that is possible. I am looking forward to the sequel of this video you said you would make so I can further understand. Thank you
How does your extract function looks like?
I know this video is specifically about requests, but can this be done using normal Selenium? I know the HOST:PORT proxy configuration for Selenium works, but can Selenium proxies be configured using a proxy network configuration (USER:PASSWORD@PROXY:PORT)?
From my research, questions on the internet, and support tickets with Chrome Driver and Selenium, it sounds like this isn't possible:
Thanks so much for this video, I've had alot of problems with rate limits thank you John Watson Rooney
No worries!
great explanation. How can I used this code to rotate proxies in my existing file of scraped data?
this was very, good but i need something such as how to do this with selenium
It's possible, the downside is as far as i am aware you need to close the browser and start a new one each time you rotate through the proxy. Adds a lot of time
really informative video.... but is it possible to use proxy for python program or module?....
i mean, can i use proxy for smtplib python module etc?..... sir, if you have any solution or reference please tell me.....
Couldn't agree more!
thanks for your video. so is it possible to rotate headers?
thanks for your sharing ,how can i come out the proxy:port????thank you again
Can we create a proxy script Purpose is to use them for scrapebox
you can see him peeking at his other screen
For some reason this script just returns my ip address not the proxy address?
Mine keeps failing... any ideas why? Good video!
It's really helpful John. I just wanna ask, is it possible if we use openVPN, thank you. I just wondering openVPN for requesting, I think it could be awesome, please.
did u have any luck?
how to use proxy and open 5 chrome browser at same time with different proxy and give individual task to them?
Can proxies be used in youtube autobot?
Yea, anywhere you make a http request you can use them
Can i work on US survey website by buying proxies and rotate them?
Yes.. probably. Technically it could work but I assume survey websites have some good blocker software to stop automated work
fully watch this video twice, but honestly in the video no coding is about "how to rotate proxies', can you make video truly about 'rotate'?
I didn’t make that part super clear, I am planning on revisiting proxy usage and will cover it better next time
@@JohnWatsonRooney thank you John, looking forward to it
Thank you john. It was really helpfull!
ua-cam.com/video/AL9Hcq15R5s/v-deo.html
Sir make video on scrapping Google search results
Great content hoss, new fan here for sure!
Where could i buy proxies?
there's a link in the description for the proxies I use
@@JohnWatsonRooney thx
Excelent !!!! Very Nice Video John !!!!
Thank you!
How do I do this with proxies that use a username and password
Nice ❤ Thank you 🙏
Thank you very much for this amazing tutorial. As for the code of scraping proxies, I tried to export the proxylist to csv file and it is ok, but I noticed that the value 0 is on the first row ( I recognize that this is the index of the column of the dataframe from pandas package). I tried searching how to get rid of this index of column but there is no luck. How can I get rid of the index of column of the dataframe
df = pd.DataFrame(proxylist)
df.to_csv('Table.csv', encoding='utf-8', index =False)
This works for the indexing of rows not for the columns.
great tutorial. How do i make it work with auth proxies and get them to display their speed besides showing that the proxies work?
ua-cam.com/video/AL9Hcq15R5s/v-deo.html
Can you do a video on this but with scrapy?
What is the process for socks5 proxy
I have tried so much proxies and I didn't find a working one. What's the best approach to get a working proxy? .. Another question: I have tried my ip address and the port and tried as a proxy but got failed too !!!
The free ones never really seem to work! unfortunately I believe you need to use a paid service and It's something I want to check out in the future, but haven't used right now
@@JohnWatsonRooney I just need two proxies that are working to test the codes. Can you lend me two only :)?
@@JohnWatsonRooney try making a proxy script
Why would you say there is only one ip
plz make a video on proxies with selenium
Thanks you so much❤
where i can find residental rotat proxy ??????? such as
IP: 51.91.197.158
Port: 2017
Type: Socks
Totally agree, bro!
There are two IP addresses one is identification of the device and the other identified the ISP ip
is it possible to do this with selenium?
It is, however I believe you have to close the browser and reopen it each time you want to go to the next proxy, which slows things down
@@JohnWatsonRooney can you explain how to change ip everytime it opens a tab
can anybody confirm this is still working? not working with the 'proxies' parameter . Only without