The real talk is nice. “It took ten hours over two weeks”. These are things people need to hear. Some people watch these videos on YT and think it is just that easy. This is why your channel is on my short list of channels I subscribed to. Thanks for all your time on these. Hey MS Excel - sponsor this channel!
I try to make it as realistic as possible - I used to think people could do this all off the top of their heads and I would get discouraged. Glad to hear that! :D
@@pkabir4625 go little bit up you will find id but you have to use strip function and [1:4] or insert the values as per your requirement to get the exact vales. this worked for me
@@AlexTheAnalyst hi. how to find the code you showed right of the 't-shirt' web page?..you selected price...then the code for price got selected. how to do that?
it's been a year on this project and despite me searching and watching other channels, I always come back to your channel ,you are simple the best person I have learned from . you are genuine and always able to get your point across .I hope you expand your "python for data analysis" series just like you did with SQL. Thank you so so much .
The section where you speak about how you shouldn't know this by heart is so good. Honestly... I am learning SQL as per your recommendation, but in the back of my head I am scared as I think I should learn and memorize each single block of code... And this is awful... Thank you for being honest and clear on that!
Hey Alex quick tip: When you were working on spaces like 34:21 if you select everything that you wanna move and press tab this way everything you choose goes 1 tab right. Little things like that improve your quality of life sometimes. Thanks for the tutorial :)
man i've been battling with the bot blocker from amazon and also some scraping issues with price because the website display was changed a while after this video was uploaded, but I've managed to pull it off so i hope this might help those recent viewers who might be as confused as me when I started writing this code on my own. apparently you need to divide the second cell so you need to run the soup1 first before you run the soup2, then for the price you need to pull three part of span class=a-price-symbol, span class=a-price-whole, and span class=a-price-fraction and combine it together into 1 new variable (price), then you need to clean it using strip() and replace() to clean the whitespaces and 's. hope this helps!
Hey Alex! Thanks for this helpful video! The best part of this video is whenever you said 'I don't know what that is' (12:50) , instead of some difficult theory. You don't know that, I don't know neither, so it makes me feel less pressure on learning python...
Hi Alex, I really appreciate how you shared how long this project actually took you. It helps to know the difference between what we go through on your channel and the work/time it actually takes behind the scenes. AWESOME project! I learned tons and found all of it very useful/helpful. You are such an AMAZING teacher and resource! As always, THANK YOU!!
As others described, if you get an error when running the second cell it's probably due to a captcha issue where amazon thinks you are a bot. You can force it by pressing ctrl + enter again and again until you get an output. I'm sure there is a better way to get around this but that's the quickest semi solution I found.
The while loop definitely doesn't work now that Amazon updated their website with some sort of blocker for bots. It might work a few times but eventually stops running in background.
Thank you for this, Alex. I felt so happy when I finally could scrap the website I had been trying to scrap (I applied your teaching to another website). Really appreciate your work.
Hi Alex, seems like this code is now not working. Would be grateful if you could do another web scraping project with EDA analysis? Love how instructive your videos are.
Always thank you for all your efforts and good work! I love watching your videos. Your positive attitude and way of expression make the lesson even more fun. I've seen a few people say the video is too long, but I think being able to walk through the lesson together rather than other videos that show written code is much better for learning. Thank you thank you thank you ☺
Your tutorials are so good. and i follow you on LinkedIn, your content is awesome. i love how you explain things in a clear way. keep up the great job!!
Thank you for demonstrating! I never thought that a simple project like this could use as a portfolio project. I just realized that I have what it takes to become a DA. Thank you for demonstrating projects!
Unfortunately it no longer works (due to Amazon website update I believe, as others have commented) :/ would love another scraping video so I can learn!! Love all the videos Alex and thanks so much!😊
@@VishalSharmaOfficialVS I unfortunately wasn't able to figure it out :/ This is one of the harder projects (to me) so I was going to circle back after going through the rest of Alex's projects. If you figure out how to bypass it plz comment here with an update!
at 14:20.. title = soup2.find(id="productTitle").get_text() is giving me this error: AttributeError: 'NoneType' object has no attribute 'get_text'.. can you or anyone else give an idea about why this is happening. Is it possible that amazon is no longer for scraping?
Amazon has basically caught onto this method of web scraping their site. A newer method involves rotating your user agents constantly - essentially to look like you're accessing Amazon from different devices. However, you also need to pair this with a proxy, otherwise Amazon would see you're trying to access Amazon from different devices, all from the same IP, hundreds of times a day. It's a lot more complicated now and the video is no longer working unfortunately.
Wow!!! This is awesome!!! You have so easy way to teach, I already have a base with Python but I’ve never made this before and you make this so smother and easy to do!!!! Thank you thank you ❤
OMG! i finally found this video after years, i recently opened this project i worked on a year ago and it popped up with some errors, now i can finally track back
I'd like to appreciate you for sharing this wonderful video! Thanks to you, I've just managed to make my own webscraper that helps me to save so much time. Otherwise, my coworker and I would have to spend more than 6hours per week😂
Hey Alex! It was a super helpful video. Thank you so much for posting it. Have you uploaded the next part of this video. If yes, Please share the link.
Thanks for sharing! This is an awesome video. I'm not sure if you did this but I think it would be cool to learn how to scrape multiple pages then append the data in a def function.
Mannnn pleaseeee keep going we need your help you tuts are on a whole diff level I am able to learn and understand with ease tnx a lotttttt and once again keep going
I followed the project till the 30:22 min,but I take as a result only title and price,without the Date even though I have followed all the steps just like Alex did and everythin went well without any error. @Alex the Analyst can u give me any suggestion or somebody that might have gone to the end of the project?
@@mulikinatisiddarthasiddu8245 sure man If u get the above error Go to the place where u entered the URL Then headers delete the header So it'll be URL page = request.get(url) Soup1 This should fix it If u have any other issues lemme know I just finished this code so I went through everything I'll share it
@@mulikinatisiddarthasiddu8245 price_symbol = soup2.find(class_='a-price-symbol').get_text(strip=True) price_whole = soup2.find(class_='a-price-whole').get_text(strip=True) price_fraction = soup2.find(class_='a-price-fraction').get_text(strip=True) price = f'{price_symbol}{price_whole}{price_fraction}' try this it will work if you are getting the error to get the price
I am so grateful for finding you. Almost feels like I know you personally. I'm still very new to this whole Data Analytics but I'm learning a lot. A quick question: I'm on the Google analytics course by coursera and the language is R. Any ideas on where I can learn python- preferably in a structured way that is beginner friendly? Again thank you for the work. Truly amazing.
I’m so glad to hear that! I honestly will ups do some UA-cam to just get the hang of it - then I would check out my Udemy Course recommendations in the description below - those are ones I’ve taken and loved. That would be my next step. Thanks for watching! 😁
One thing I'd like to point out here is you can easily switch from R to Python. There are plenty of courses out there like Alex had mentioned but the key take away from the course which I did finish and landed a data analysis job, is make sure when following Alex watch how he uses 'pandas' and other packages, which is essentially the same as the tidy-verse in R. Look at the packages and how he writes the code. I think that will help you out the most on top of taking courses.
@@nickmoritz1515 Hey man! How is it going? Can you share some tips that helped you to land a Data Analyst job, maybe there is additional stuff, I've almost finished Alex's data analyst bootcamp and pre-graduate cource bachelor student. I would be grateful for sharing, please?
Hello Alex, thanks for sharing, I have found the error for my code of this section title = soup2.find(id='productTitle').get_text() print(title) output: 'NoneType' object has no attribute 'get_text' Please I need your advise
Hello! You can do this: page = requests.get(url,headers=headers) soup= BeautifulSoup(page.content, "html.parser") and then get your data: title = soup.find(id='productTitle').get_text().strip() price = soup.find('span',class_='a-offscreen').get_text().replace("$","") You don't need prettify anymore as your computer can easely read that
If anyone else has a problem like I did with getting a captcha output when printing soup2, I solved it by putting soup2 and the print statement in a different cell then run the first cell with soup1 then run the second cell with soup2 and the print statement separately.
Hello Alex! One more step is done!!! It's so exciting, I got stuck at the stage where I had to get price data. I missed this metric to be scraped. Since the time you recorded this video some parts of html have been updated. So now price does not exist in the format of "ID=", it lives now as "div class=". So now it is challenging for to find out how to scrap the price though :))) will go deeper to the topic. Thanks much for your time and for sharing of your knowledge.
Thank you so much, Alex! Your teaching style has made learning incredibly enjoyable and accessible. I've learned a lot in just one month and completed my portfolio projects, even though I skipped Excel and Power BI for now. Your anecdotes about your dog, family, and personal experiences have added a fun touch to the learning process. Your impact on learners like me is undeniable, and I'm looking forward to purchasing a course from your website soon. Keep up the fantastic work! 🥂🥂
Dude! I'm an amazon seller and this kind of work would come in super handy. Thank you. Did you ever get around to making the next video where you pull data from all the search results page? I'd be really interested to see that one.
Hey Alex, first I want to thank you for this amazing series and everything you do to help the community. Second, I am working on this project and it seems Amazon implemented a CAPTCHA to prevent scrapping. Is there any way around this? Would love to know if this project is applicable and doable even 2 years later. Cheers!
Yeah, same here. There are ways to bypass but it looks like it might be borderline unethical. Zenscrape, Apify, or ScraperAPI give you the ability to fetch the data directly from the API instead of the HTML page(beautiful soup).
if you are running into an issue with the header, try this: headers = {"User-Agent": ".......", "Accept-Encoding": ".....", "Accept": "......" } just put in whatever you get from the User-Agent link in the video description
Can you please make a video on how to present these projects? I've seen your video about the portfolio website, but I don't have an idea on how to actually present the github.. And thank you very much. Your channel has been very inspirational to me through out my learning journey!
@@AlexTheAnalyst Hi Alex, to further add to my comment - I've taken a look at other "best example" portfolios online but comparing it to the Google data analytics portfolio guidelines, they are very different. Hence my conflict and lack of general understanding on how to present these projects in a website. Thank you.
thanks Alex! really a great video... request you to kindly do a similar one on stocks realtime price capturing with time series and configure an email notification when the current price drops below say 50 day moving average .... I am looking forward for many videos like this...thank you!
Loved the video. But I really bursted out laughing when you said: I don't want my head to be here for the entire time. I'm gonna get rid of myself!" I thought to myself: Not a good head space to be in. 😅 You are naturally funny. Thanks for the knowledge and laughter.
It was a fun project. Please drop the other version (the complete version) of this project @Alex The Analyst
11 місяців тому+6
Help im really having issues with "AttributeError: 'NoneType' object has no attribute 'get_text'" I have tried everything I could think of, how can I resolve this.
@@yashwanthgunturi8762 Also for 'title ', you can type the text below after soup's definition: title = soup.find(id='productTitle').get_text() print(title)
Hello Alex, thank you so much for all you do. I am using this video now when Amazon doesn't include the word 'price'. While inspecting how do I go about that. I hope you reply because I am sure lot of new learners are having this issue.
Amazon has basically caught onto this method of web scraping their site. A newer method involves rotating your user agents constantly - essentially to look like you're accessing Amazon from different devices. However, you also need to pair this with a proxy, otherwise Amazon would see you're trying to access Amazon from different devices, all from the same IP, hundreds of times a day. It's a lot more complicated now and the video is no longer working unfortunately.
I'm getting this error when trying to print the title using soup2: tried to resolve it but not.. let me know if anyone has the solution for this: AttributeError: 'NoneType' object has no attribute 'get_text'
Been having the same issue, for a second the code worked and all of a sudden it stopped working @alex could you please guide why this is the case when we run the code and how to resolve it. (I think its due to amazons security protocols that detect either a bot or a programming language that's trying to fetch the data.)
@@farazbhatti6120 Amazon has basically caught onto this method of web scraping their site. A newer method involves rotating your user agents constantly - essentially to look like you're accessing Amazon from different devices. However, you also need to pair this with a proxy, otherwise Amazon would see you're trying to access Amazon from different devices, all from the same IP, hundreds of times a day. It's a lot more complicated now and the video is no longer working unfortunately.
This seems like an amazing project, sadly something change in Amazon policy for scrapping their data and I couldn't access, if someone find a way to make it work I would love to hear it 😁I'll keep going with the other projects!!
so whoever is not able to find the id for the price and are getting tag 'span' and class on clicking on the price(mentioned on the website of the product) in inspect can follow this code price=soup2.find("span", attrs={'class':'a-price-whole'}).text.strip() print(price) replace 'a-price-whole' with whatever you are getting for the class
Where you able to access the site at all? I did get an error 503 right from the beginning while trying to request the url. However I decided to use my selenium web drivers and it worked for me…. If you don’t know how to use that then I suggest you scrap another website. Amazon has gotten tighter.
Can someone direct me to the other project that you discussed doing in this video. Where you build a scrawled that goes through each page’s content and gets their prices
The code gives me an error with the price (the product title works though). I get an error "AttributeError: 'NoneType' object has no attribute 'get_text'"
Great video! I am stuck on the part where you print the price. I cannot find 'priceblock_ourprice' anywhere. It seems like they changed the way they display their price somehow.
if you have problems with the price not having an id use its class instead, your code should look like this price = soup2.find('span', {'class':'a-offscreen'}).get_text() this should give you the price.
The real talk is nice. “It took ten hours over two weeks”. These are things people need to hear. Some people watch these videos on YT and think it is just that easy. This is why your channel is on my short list of channels I subscribed to. Thanks for all your time on these.
Hey MS Excel - sponsor this channel!
I try to make it as realistic as possible - I used to think people could do this all off the top of their heads and I would get discouraged. Glad to hear that! :D
@@AlexTheAnalyst For the same product, I couldn't find the id for price...it shows div class...what to do?
This should work if you tweak it well enough
@@pkabir4625 go little bit up you will find id but you have to use strip function and [1:4] or insert the values as per your requirement to get the exact vales. this worked for me
it didnot work for me , not showing price id ,its in span tag
Alex is so honest and down to earth, he doesnt have that usual UA-camr vibe that we are accustomed to. Man we're so lucky to find you as a mentor.
That means a lot! Thanks for watching! :D
@@AlexTheAnalyst hi. how to find the code you showed right of the 't-shirt' web page?..you selected price...then the code for price got selected. how to do that?
@@pulakkabir2276 right click and click in inspect or use ctrl+shift+i
it's been a year on this project and despite me searching and watching other channels, I always come back to your channel ,you are simple the best person I have learned from . you are genuine and always able to get your point across .I hope you expand your "python for data analysis" series just like you did with SQL.
Thank you so so much .
The section where you speak about how you shouldn't know this by heart is so good. Honestly... I am learning SQL as per your recommendation, but in the back of my head I am scared as I think I should learn and memorize each single block of code... And this is awful... Thank you for being honest and clear on that!
How is it going?
Hey Alex quick tip: When you were working on spaces like 34:21 if you select everything that you wanna move and press tab this way everything you choose goes 1 tab right. Little things like that improve your quality of life sometimes. Thanks for the tutorial :)
16:30 Solution to get the price:
price_symbol = soup2.find(class_='a-price-symbol').get_text(strip=True)
price_whole = soup2.find(class_='a-price-whole').get_text(strip=True)
price_fraction = soup2.find(class_='a-price-fraction').get_text(strip=True)
price = f'{price_symbol}{price_whole}{price_fraction}'
print(price)
Thanks bro!
finally seemes like they already changed the html. thanks bro
but pricing in the website is $16.99 but after executing this code it is $12.97.
Thank you a lot!!!)
thanks man 😇
man i've been battling with the bot blocker from amazon and also some scraping issues with price because the website display was changed a while after this video was uploaded, but I've managed to pull it off so i hope this might help those recent viewers who might be as confused as me when I started writing this code on my own.
apparently you need to divide the second cell so you need to run the soup1 first before you run the soup2, then for the price you need to pull three part of span class=a-price-symbol, span class=a-price-whole, and span class=a-price-fraction and combine it together into 1 new variable (price), then you need to clean it using strip() and replace() to clean the whitespaces and
's.
hope this helps!
Brother please elaborate it, I am stuck
Hey bro can you explain it or can you share your code, how you pulled three part together. I am stuck in this part
Could you please explain it, I am stuck in getting Title itself
@@sdivi6881 Hey i just solved it can you tell little more where you are getting stuck
@@deeplakshmiyadav
price_symbol = soup2.find(class_='a-price-symbol').get_text(strip=True)
price_whole = soup2.find(class_='a-price-whole').get_text(strip=True)
price_fraction = soup2.find(class_='a-price-fraction').get_text(strip=True)
price = f'{price_symbol}{price_whole}{price_fraction}'
print(price)
Hey Alex! Thanks for this helpful video! The best part of this video is whenever you said 'I don't know what that is' (12:50) , instead of some difficult theory. You don't know that, I don't know neither, so it makes me feel less pressure on learning python...
You don't imagine how this tutorial has helped me in my new position. Thank you so much!!
So glad it was helpful! :D
@@AlexTheAnalyst Did you ever make the second one? So many people want to see it? Please do send it out!
This project gave me a taste of how challenging web scraping is. Great video that makes things look easy and less intimidating.
Hi Alex, I really appreciate how you shared how long this project actually took you. It helps to know the difference between what we go through on your channel and the work/time it actually takes behind the scenes. AWESOME project! I learned tons and found all of it very useful/helpful. You are such an AMAZING teacher and resource! As always, THANK YOU!!
Bro did you got nonetype error and how you solved it?
I have this error bro, and don't how to solve it
@@valadhruv6920
@@valadhruv6920 bro, did u found the solution for this? i can't figure it out
So great Alex! I followed along with this entire project and added it to my portfolio! I'll be sure to give you credit in my README file. :)
you're already doing a great job man. Thanks a ton, and hats off to you.
But,
We need that part 2. Please do it asap Alex.
As others described, if you get an error when running the second cell it's probably due to a captcha issue where amazon thinks you are a bot. You can force it by pressing ctrl + enter again and again until you get an output. I'm sure there is a better way to get around this but that's the quickest semi solution I found.
The while loop definitely doesn't work now that Amazon updated their website with some sort of blocker for bots. It might work a few times but eventually stops running in background.
thank you so much
Sir, I am very near to get my first job through your project
Thank you
And this is also my first project
Thank you for this, Alex. I felt so happy when I finally could scrap the website I had been trying to scrap (I applied your teaching to another website). Really appreciate your work.
29:30 - quick tip: select the file, hold shift and right-click to get “copy as path” in the context menu.
What attracted me from your video hh is that you have 3 kids , this is a great man
God bless your family
Hi Alex, seems like this code is now not working. Would be grateful if you could do another web scraping project with EDA analysis?
Love how instructive your videos are.
Always thank you for all your efforts and good work! I love watching your videos. Your positive attitude and way of expression make the lesson even more fun. I've seen a few people say the video is too long, but I think being able to walk through the lesson together rather than other videos that show written code is much better for learning. Thank you thank you thank you ☺
The long awaited one ❤️💯
Your tutorials are so good. and i follow you on LinkedIn, your content is awesome. i love how you explain things in a clear way. keep up the great job!!
Super early, love your stuff as always Alex!
You are very early! Lol Thanks for watching 😁
Thank you for demonstrating! I never thought that a simple project like this could use as a portfolio project. I just realized that I have what it takes to become a DA. Thank you for demonstrating projects!
Thrilled to successfully get to the end of this @Alex - appreciate these real-world worked examples.
Wow, this is EXACTLY what I have been looking for. Alex the GOAT in DA. :) You are 1000x Awesome!
This man a God send gift to ALL the Broke data analyst students
No kidding
Hi Alex, I have learn a lot from 65 videos of the Bootcamp. God bless with everything. Thanks!!!
By any chance was there a part 2 to this with the more advance scraping? Would love to see that :)
Looking for the part 2 you mentioned in the vid!! Thanks
This is what I have been waiting for!
Thank you
Unfortunately it no longer works (due to Amazon website update I believe, as others have commented) :/ would love another scraping video so I can learn!! Love all the videos Alex and thanks so much!😊
@@nezzylearns happy to help
Were you able to bypass the Amazon scraping detection? I am also receining NoneType error.
@@VishalSharmaOfficialVS I unfortunately wasn't able to figure it out :/ This is one of the harder projects (to me) so I was going to circle back after going through the rest of Alex's projects. If you figure out how to bypass it plz comment here with an update!
@@krystlestevens2585 sure! I’m working on it. As soon as I have a concrete solution, I will post it here. Thanks for your reply.
@@VishalSharmaOfficialVS did you ever figure this out?
at 14:20.. title = soup2.find(id="productTitle").get_text() is giving me this error: AttributeError: 'NoneType' object has no attribute 'get_text'.. can you or anyone else give an idea about why this is happening. Is it possible that amazon is no longer for scraping?
Amazon has basically caught onto this method of web scraping their site. A newer method involves rotating your user agents constantly - essentially to look like you're accessing Amazon from different devices. However, you also need to pair this with a proxy, otherwise Amazon would see you're trying to access Amazon from different devices, all from the same IP, hundreds of times a day. It's a lot more complicated now and the video is no longer working unfortunately.
Wow!!! This is awesome!!! You have so easy way to teach, I already have a base with Python but I’ve never made this before and you make this so smother and easy to do!!!! Thank you thank you ❤
OMG! i finally found this video after years, i recently opened this project i worked on a year ago and it popped up with some errors, now i can finally track back
One of the only channels with least haters ✨
I wish I had more so I could be cool
I'd like to appreciate you for sharing this wonderful video! Thanks to you, I've just managed to make my own webscraper that helps me to save so much time. Otherwise, my coworker and I would have to spend more than 6hours per week😂
I really like these long videos where you explain things like this instead short video, thanks for uploading Alex !
Glad to hear it! I try to change it up every so often :)
I am absolutely fascinated by your thorough explanation
Hey Alex! It was a super helpful video. Thank you so much for posting it. Have you uploaded the next part of this video. If yes, Please share the link.
Though the project was quite tricky, I got over it.
Thank you so much Alex.
Thanks Alex I am working on my own web scraping project for checking placements of searches and this video definitely helped
Thanks for sharing! This is an awesome video. I'm not sure if you did this but I think it would be cool to learn how to scrape multiple pages then append the data in a def function.
Great video alex ... it was really helpful for a module in my course . Please i have been looking for the intermediate video you spoke about
Mannnn pleaseeee keep going we need your help you tuts are on a whole diff level I am able to learn and understand with ease tnx a lotttttt and once again keep going
When I try to print the title im getting an error message "'NoneType' object has no attribute 'get_text''. What is the issue here?
Same
same
I followed the project till the 30:22 min,but I take as a result only title and price,without the Date even though I have followed all the steps just like Alex did and everythin went well without any error.
@Alex the Analyst can u give me any suggestion or somebody that might have gone to the end of the project?
"AttributeError: 'NoneType' object has no attribute 'get_text'" to solve this
delete headers
still showing the same error
It worked
@@Kshitij-Yadav can you elaborate it bro
@@mulikinatisiddarthasiddu8245
sure man
If u get the above error
Go to the place where u entered the URL
Then headers
delete the header
So it'll be
URL
page = request.get(url)
Soup1
This should fix it
If u have any other issues lemme know I just finished this code so I went through everything I'll share it
@@mulikinatisiddarthasiddu8245 price_symbol = soup2.find(class_='a-price-symbol').get_text(strip=True)
price_whole = soup2.find(class_='a-price-whole').get_text(strip=True)
price_fraction = soup2.find(class_='a-price-fraction').get_text(strip=True)
price = f'{price_symbol}{price_whole}{price_fraction}'
try this it will work if you are getting the error to get the price
I am so grateful for finding you. Almost feels like I know you personally. I'm still very new to this whole Data Analytics but I'm learning a lot.
A quick question: I'm on the Google analytics course by coursera and the language is R. Any ideas on where I can learn python- preferably in a structured way that is beginner friendly?
Again thank you for the work. Truly amazing.
I’m so glad to hear that! I honestly will ups do some UA-cam to just get the hang of it - then I would check out my Udemy Course recommendations in the description below - those are ones I’ve taken and loved. That would be my next step. Thanks for watching! 😁
Very nice 👍 that be good for checking the prices on udemy courses. 😅
I am also taking the Google Analytics course, one question that I would like to ask is, how do you know or prevent bias from the data being collected?
One thing I'd like to point out here is you can easily switch from R to Python. There are plenty of courses out there like Alex had mentioned but the key take away from the course which I did finish and landed a data analysis job, is make sure when following Alex watch how he uses 'pandas' and other packages, which is essentially the same as the tidy-verse in R. Look at the packages and how he writes the code. I think that will help you out the most on top of taking courses.
@@nickmoritz1515 Hey man! How is it going? Can you share some tips that helped you to land a Data Analyst job, maybe there is additional stuff, I've almost finished Alex's data analyst bootcamp and pre-graduate cource bachelor student. I would be grateful for sharing, please?
Thanks man. You are helping a lot of people like me. Keep doing this portfolio videos!
Thanks a lot for enlightening on Web Scraping. Came to know only after watching this video that such stuff can be done.
man it was super easy to understand, you nailed it
So glad to hear it!
Hello Alex, thanks for sharing, I have found the error for my code of this section
title = soup2.find(id='productTitle').get_text()
print(title)
output:
'NoneType' object has no attribute 'get_text'
Please I need your advise
same here
Hello!
You can do this:
page = requests.get(url,headers=headers)
soup= BeautifulSoup(page.content, "html.parser")
and then get your data:
title = soup.find(id='productTitle').get_text().strip()
price = soup.find('span',class_='a-offscreen').get_text().replace("$","")
You don't need prettify anymore as your computer can easely read that
@@cocojamborambo5435 Weirdly enough, this works for one moment, and then it stops when I run it again.
Same.
You edit to headers and add 'Referer'
If anyone else has a problem like I did with getting a captcha output when printing soup2, I solved it by putting soup2 and the print statement in a different cell then run the first cell with soup1 then run the second cell with soup2 and the print statement separately.
Hello Alex!
One more step is done!!! It's so exciting, I got stuck at the stage where I had to get price data. I missed this metric to be scraped. Since the time you recorded this video some parts of html have been updated. So now price does not exist in the format of "ID=", it lives now as "div class=". So now it is challenging for to find out how to scrap the price though :))) will go deeper to the topic. Thanks much for your time and for sharing of your knowledge.
Simply the best video, thanks Alex.
Thank you so much, Alex! Your teaching style has made learning incredibly enjoyable and accessible. I've learned a lot in just one month and completed my portfolio projects, even though I skipped Excel and Power BI for now. Your anecdotes about your dog, family, and personal experiences have added a fun touch to the learning process. Your impact on learners like me is undeniable, and I'm looking forward to purchasing a course from your website soon. Keep up the fantastic work! 🥂🥂
"Guys if you can't tell, I'm in need of some help here" 😂😂😂
The struggle is real
Really cool project with an email feature in the end! Thanks, Alex.
Dude! I'm an amazon seller and this kind of work would come in super handy. Thank you. Did you ever get around to making the next video where you pull data from all the search results page? I'd be really interested to see that one.
Very nice tutorial! The amazon seems to change the code on the id = priceblock_ourprice part, could you update the code accordingly?
It was a lot of fuuuuuun! Thank you Alex. Your channel has became one of my favorite about Python and SQL 🤓
So glad to hear that! :D
Hey Alex, first I want to thank you for this amazing series and everything you do to help the community.
Second, I am working on this project and it seems Amazon implemented a CAPTCHA to prevent scrapping. Is there any way around this? Would love to know if this project is applicable and doable even 2 years later. Cheers!
Yeah. I have been trying for 2 hours to get into Amazon. I think it is a bit more difficult now. Were you able to find a way?
Having the same problem.
Yeah, same here. There are ways to bypass but it looks like it might be borderline unethical. Zenscrape, Apify, or ScraperAPI give you the ability to fetch the data directly from the API instead of the HTML page(beautiful soup).
if you are running into an issue with the header, try this:
headers = {"User-Agent": ".......", "Accept-Encoding": ".....", "Accept": "......" }
just put in whatever you get from the User-Agent link in the video description
Thanks Alex! this was really useful. I am waiting for the second part with the pagination 😅😅
After a looooong time delay cause by many things, finally I can finish this portfolio.
Alex please make a video upon how to create your own Data set for data analyst job ? Please make
Can you please make a video on how to present these projects? I've seen your video about the portfolio website, but I don't have an idea on how to actually present the github..
And thank you very much. Your channel has been very inspirational to me through out my learning journey!
Good idea!
@@AlexTheAnalyst Hi Alex, to further add to my comment - I've taken a look at other "best example" portfolios online but comparing it to the Google data analytics portfolio guidelines, they are very different. Hence my conflict and lack of general understanding on how to present these projects in a website.
Thank you.
Thank you! Amazing. Waiting for the next video 😉
was waiting for this😍😍😍
Thanks Alex. I’m a big fan.
Thanks so much Alex!! Just what I was looking for.
thanks Alex! really a great video... request you to kindly do a similar one on stocks realtime price capturing with time series and configure an email notification when the current price drops below say 50 day moving average ....
I am looking forward for many videos like this...thank you!
Loved the video. But I really bursted out laughing when you said: I don't want my head to be here for the entire time. I'm gonna get rid of myself!" I thought to myself: Not a good head space to be in. 😅 You are naturally funny.
Thanks for the knowledge and laughter.
His simplicity and humor gets me every time and it helps with the flow of his lessons. So amazing
It was a fun project. Please drop the other version (the complete version) of this project @Alex The Analyst
Help im really having issues with "AttributeError: 'NoneType' object has no attribute 'get_text'" I have tried everything I could think of, how can I resolve this.
experiecing the same. Used an if loop to determine whether the title exists, apparently it doesn't
Same here !!!
I'M STUCKED IN MINUTE 16:35,
the code doesn't work :(
¿Could someone help us please?
Thanks you!!
did you find a fix? I am having the same problem.
@@yashwanthgunturi8762
Also for 'title ', you can type the text below after soup's definition:
title = soup.find(id='productTitle').get_text()
print(title)
@@yashwanthgunturi8762 yeah I found the error use this code
title = soup2.find(class_='a-size-large product-title-word-break').get_text(strip=True)
Hello Alex,
thank you so much for all you do. I am using this video now when Amazon doesn't include the word 'price'. While inspecting how do I go about that. I hope you reply because I am sure lot of new learners are having this issue.
I thought you were now only gonna only make videos on management and stuff. Glad you are still making tutorials
Nah, content really won't change much - I'll be doing Tableau tutorials very soon
14:47 i get an error when trying to print the title. something about a none type?
Amazon has basically caught onto this method of web scraping their site. A newer method involves rotating your user agents constantly - essentially to look like you're accessing Amazon from different devices. However, you also need to pair this with a proxy, otherwise Amazon would see you're trying to access Amazon from different devices, all from the same IP, hundreds of times a day. It's a lot more complicated now and the video is no longer working unfortunately.
@@mDevinm thanks for the explanation
You are great, this is exactly what is am looking for...
Oh man I was thinking about a project related to amazon data scraping and here youtube suggested me B-)
Hope it helps!
@@AlexTheAnalyst Yes it was, Thank You :-)
This script is only giving me short html and I've got "NonType" value at the end.
Wonderful! I'll practice with this tonight!
Did you uploaded the second part.
I loved this one.
Please share second one
Great video. Thanks Alex!
I'm getting this error when trying to print the title using soup2: tried to resolve it but not.. let me know if anyone has the solution for this:
AttributeError: 'NoneType' object has no attribute 'get_text'
getting the same error
Have you been able to solve it?
Been having the same issue, for a second the code worked and all of a sudden it stopped working
@alex could you please guide why this is the case when we run the code and how to resolve it.
(I think its due to amazons security protocols that detect either a bot or a programming language that's trying to fetch the data.)
@@farazbhatti6120 Amazon has basically caught onto this method of web scraping their site. A newer method involves rotating your user agents constantly - essentially to look like you're accessing Amazon from different devices. However, you also need to pair this with a proxy, otherwise Amazon would see you're trying to access Amazon from different devices, all from the same IP, hundreds of times a day. It's a lot more complicated now and the video is no longer working unfortunately.
same issue here for me! I want to continue with the project but I cannot due to this same get_test error message! @alex please help
similar error
Thanks for this awesome video, this would help me in the nearest future.
This seems like an amazing project, sadly something change in Amazon policy for scrapping their data and I couldn't access, if someone find a way to make it work I would love to hear it 😁I'll keep going with the other projects!!
so whoever is not able to find the id for the price and are getting tag 'span' and class on clicking on the price(mentioned on the website of the product) in inspect can follow this code
price=soup2.find("span", attrs={'class':'a-price-whole'}).text.strip()
print(price)
replace 'a-price-whole' with whatever you are getting for the class
Where you able to access the site at all?
I did get an error 503 right from the beginning while trying to request the url. However I decided to use my selenium web drivers and it worked for me…. If you don’t know how to use that then I suggest you scrap another website. Amazon has gotten tighter.
You can use Selenium to bypass all of their human checks. But it's a bit more advanced subject.
Can someone direct me to the other project that you discussed doing in this video. Where you build a scrawled that goes through each page’s content and gets their prices
Thanks a lot Alex, I have learned a lot from your channel. Please keep on posting
Love this .. I'm curious about the headers part I didn't know about that before
Hey Alex, thanks for the walkthrough. When is the next web scraping project coming? I'm so hyped.
Looking forward for that too
NoneType' object has no attribute 'get_text'. Iam facing this error
you saved me a lot of time. I really appreciate it.
BTW some of the website code amazon has changed. New people will need to adjust accordingly.
Hey Alex.....Thank You For teaching us ......
This is super interesting, thank you very much!
The code gives me an error with the price (the product title works though). I get an error "AttributeError: 'NoneType' object has no attribute 'get_text'"
Great video! I am stuck on the part where you print the price. I cannot find 'priceblock_ourprice' anywhere. It seems like they changed the way they display their price somehow.
Same here
Looking at HTML from another product I found id="corePrice_feature_div" and id="corePrice_desktop" . I tried the first one and it worked
@@diogenes1683 thanks for the info, nearly broke my brain looking for a price that made sense.
but mine worked with id='apex_desktop'
if you have problems with the price not having an id use its class instead, your code should look like this
price = soup2.find('span', {'class':'a-offscreen'}).get_text()
this should give you the price.
You have 3 kids! and you are 28, omg you are back as my old wise sensei, master yoda.
Hello Sir, could you please give the link of the difficult project on web scarping that you were talking about in your video
if you dont pull in the data due to the captcha, dont use the headers as second argument.