MINOR SUGGESTION As of 10/03/2019, If you are following along this tutorial. "container.div" won't give you the div with the "item-info" class. Instead it will give you the div with the "item-badges" class. This is because the latter occurs before the former. When you access any tag with the dot(.) operator, it will just return the first instance of that tag. I had a problem following this along until i figured this out. To solve this just use the "find()" method to find exactly the div which contains the information that you want. For e.g. divWithInfo = containers[0].find("div","item-info")
Okay, but how do we further navigate into more embedded items? I'm trying to pull out 'title' out of the a class within item-branding, but doesn't work.
D R have you used find and/or the findall method? Doing a couple of searches on google and stack overflow helped me get further into methods. Also, do you know basic html?
@@vincentn2059 Yeah, I'm using the find Method. I've looked in quite a few places but can't find the information I need. I'm at the part in the video where he uses container.div.div.a.img. I've used containers[0].find('div', 'item-info') which works correctly but now I am stuck at the part where I have to navigate further to pull out the information I need.
@@DragonRazor9283 So the projects I have done by now are: Web scraping sec.gov xml files, converting them to excel, inserting them into SQL database, I have built a dynamic website around this in Flask (Python library). I have expanded my web scraping to sites that provide data in JSON which usually contains more data than it is available on the website directl and this way its more speed efficient, I have moved all this to pythonanywhere where I haven and FTP server as well and have automated tasks which run every hour/day. My main field is still web scraping but now I can run SQL queries with pyhton and display them as well. That is to say, I have learned all this in my free time after work.
@@chinzzz388 Sorry I didn't see your comment somehow. So the projects I have done by now are: Web scraping sec.gov xml files, converting them to excel, inserting them into SQL database, I have built a dynamic website around this in Flask (Python library). I have expanded my web scraping to sites that provide data in JSON which usually contains more data than it is available on the website directl and this way its more speed efficient, I have moved all this to pythonanywhere where I haven and FTP server as well and have automated tasks which run every hour/day. My main field is still web scraping but now I can run SQL queries with pyhton and display them as well. That is to say, I have learned all this in my free time after work. This earned me a new position at my company which doubled my pay. This earned me a new position at my company which doubled my pay.
If you had some prior experiences with web crawling, this video can makes your crawling skills into a whole new level. Allows you to crawl website containing complicated info about multiple items into a very organized dataset. The various tools introduced in the video are also fantastically helpful as well. A BIG THANK YOU
Table of Contents: 0:00 - Introduction 1:28 - Setting up Anaconda 3:00 - Installing Beautiful Soup 3:43 - Setting up urllib 6:07 - Retrieving the Web Page 10:47 - Evaluating Web Page 11:27 - Converting Listings into Line Items 16:13 - Using jsbeautiful 16:31 - Reading Raw HTML for Items to Scrape 18:34 - Building the Scraper 22:11 - Using the "findAll" Function 27:26 - Testing the Scraper 29:07 - Creating the .csv File 32:18 - End Result
Thank you for the tutorial, however I am not able to get all the list, it only prints one result so its not looping all the containers. Can you please help me out? containers = page_soup.findAll("div",{"class":"item-container"}) for container in containers: brand_description = container.a.img["title"] price_box = container.findAll("li",{"class":"price-current"}) price = price_box[0].strong.text print("brand_description:" + brand_description) print("price:" + price)
I am trying to scrape the prices off of new-egg's website. the price is nested within price=container.findAll ('ul' , {'class' : 'price'}), where I call: price[0].li.span ---> I dont get an output. when I call: price[0].li.span.text ---> I get an attribute non-existent error. How would I scrap the price in this new-egg example? Also, the current price they have wrapped within a 'strong' tag that is within a span class. How would we scrape this?
Two years into a web program and a year working in the field and never bothered to learn how to do this. Great video, I followed along 5 years later in 2022 with Python 3.7.8 and it still works.
This was really good content, definitely the best intro to web scraping I've seen. You don't go through it as though you're reading from the documentation, there's more of a flow.
I saw this video and then successfully wrote the entire code without looking at the video. Not even once. This is because i understood every line of it. Thank you man. Your explanation is very beginner friendly.
I am from commerce background. I have zero knowledge of all the programming language. I found your video and explanation so good that at least now I can start my journey into scrapping and coding. I am so thankful at the moment. Love your channel. Thank you so much.
wow even almost 3 years later this video helped me so much and helped me to make a program that picks a random steam game, this was so hard, but i figured it out, big props to you and this video
One of the best teacher I have come across UA-cam. Web Scraping explained so well that even a layman can follow and understand the basic concepts. I wish, in life I had a teacher/mentor/friend like the one teaching in this video.
This is the best web scraping tutorial that I’ve found. I’ve been frustrated for hours trying to use other resources. Thank you for making this, your explanations are thorough and great!
Coming from an R user, this is a very well done introductory tutorial into web scraping in Python. I like the real world example with Newegg and troubleshooting along the way.
I cant believe I actually sat through 33 minutes learning web scrapping, something completely new to me. I was looking for a shortcut but your tutorial was just perfect! :D Thanks for this.!
I am just starting web scrapping and I can honestly say that this video clearly explained everything. I watched this at 1.5 speed and it made sense. I would love more videos like this. I loved how you made it generic so it can apply to more than one website!
THIS IS AMAZING!!! Everything was very well-explained and instructed, I managed to get my first webscrape off an E-commerce site! Thanks so much, you have a loyal subscriber in me! Perhaps you could cover using time sleeps to avoid getting blacklisted by the websites we are scraping? And also how to scrape multiple pages in one go?
You are sooooo comfortable to listen to. Not because you have a perfect pronanciation and a seamless script you are gliding through. You are just talking but not constantly jumping back and forwards. Accurate tempo and personality in your voice. New subscripion
You are a blessing seriously! The first tutorial that actually made sense from start to finish. I was able to understand so much from this! Please Please Please Please upload more videos on Python Web Scraping with BeautifulSoup. Thank you again for this blessing!
Very high-quality tutorial. How to set up everything before running any code is very nice to include, and timestamping it so people who already know it can quickly skip is just much appreciated. Keeping the tutorial example script and diverse is very welcome. Writing it from scratch just makes sooo useful for remembering what was where. I wish other people made tutorials like this... Timestamping is so useful when you just want to look-up that one thing and don't really remember when it appeared.
Thank you very much for this video! I hope you do a second one on this subject. I'd like to know how to scrape several pages as you mentioned in the end of video. This was just what I was hoping for. Thanks!
I expected this video to take me 30 minutes to do, because it takes 30 minutes. 10 hours later I HAVE MY FIRST WEBSCRAPER THANK YOU VERY MUCH! I still did not manage to get it to be an csv, but made a .txt and it is fine for now. Thank you so much again! tutorial from dataquest.io came in very handy as well!
dude, you are literally saving lives with this type of videos... I can't wait to digest all this precious info..... You save people so much time with this!! you are magic!!
Looks like they added another div at the very beginning of each item-container. The brand name can now be extracted with a little more effort- brand_container = x.findAll("div",{"class":"item-info"}) print(brand_container[0].div.a.img["title"])
it must have been a magic day when I saw this for the first time 1.5 years ago !!! its where i all started !!! Thanks! Best Video & Intro into webscraping for absolute begginers !! Thanks (notable mentions to Corey Shafer who I was watching a a few weeks earlier, who gave me the taste of it & how easy it could be to use/do). Thank you friends!! An amazing tool!!
When I watched this tutorial, it seemed easy to scrap until I stuck a thousand times while actually scrapping a webpage. Happy Coding for dummies lollll
Everyone learns in a different way, and absorbs information through different methods! This informal, laid back 'talk & walkthrough' (almost like sitting together with a mate) fits my style sooo much!! for me probably the best python lesson ever!! Will be looking for many more - thanks :P legendary !!
yeah, that would be interesting, Basically you would save all the variables then check and save them into new variables compare old ones then change if there is a difference?
An easy way to do this is download the html from desired page and store it's md5. Check the same html periodically and compares both stored and current md5. This is an easy and less cpu-consuming way to check whether the website has changed.
yeah, that seems like a better way to do it but you would need to clean up the ad containers since they always change although the page content did not change.
This has been so useful. Thanks so much. What I need to know now, is how I can get the scraper to continue working when there's a 'Load More' button, which doesn't take you to another page. If anyone knows anything about this please let me know.
This is a really good question. Maybe click the load more button and then copy the URL? Or define how many results you want for that page then copy the URL I'm pretty sure when you hit load more its actually altering the html path?
Late answer, but the solution is coding a click load more button. Similarily how you can do a click next page button for your script to continue onwards.
wow this was great! I am completely new to this and still could follow perfectly fine and loved the explanations of everything. Would love to know how to run this script every day automatically and send results to phone or create alerts for changes and send those to a phone. Again, awesome job!
so you would need to host a server online to run it always, you can use it to also link to an app that checks the changes and alerts you. Just some ideas you can search more on StackOverflow :)
twilio is a good service for sending text messages via API...you could combine it with the scraping functionality and some sort of compare logic to text you the changes...
The only solution is to be constantly updating your code. There's not really a good way outside of intelligently analyzing the picture, description, & brand.
I ran into a similar problem. You can use the "find()" method in python to find a specific tag. you can either have it in the following: a) container.find("a","item-brand") b) container.find("div","item-branding") once you are in a specific tag, you can just go with . notation to get to the next sub-tag. so for example, I had container.find("div", "item-branding").a.img["title"] You can just skip directly by searching for the "a" tag instead of the "div" tag or maybe even the "img" tag.
I was also stuck there but i found the solution. Directly Just find the div with class : item-branding and from there you can get the image which will give you title.
number 1, i realise this was 5 months ago but still thought I'd make a suggestion. If you get good at data scraping you end up with enormous CSV files... how do you manipulate them? Like if I was looking for a certain price at a certain date in the past, putting all your data in a python list and iterating through it crashes my computer usually...
BY FAR the best tutorial I've watched for web scraping. But could someone just help me out with scraping through multiple pages? I know the guy mentioned something about it in the very end but still
Thanks. I have a basic understanding of python and html and I found this tutorial very easy to follow. You do a great job of clearly explaining things in the code which is what I need at my current skill level. Much appreciated.
Maybe it's better to use find() instead of findAll() to get product's name? So code will be less complex, like this : title = container.find("a",{"class" : "item-title"}).text
shipping_container = container.findAll("li",{"class":"price-ship"}) GETTING THIS ERROR Traceback (most recent call last): File "", line 1, in TypeError: 'tuple' object is not callable
@@neilaybhalerao8373 shipping_container = container.findAll("li", {"class":"price-ship"} is what he typed originally. He forgot to add the ending ) to close the function. So he should've typed shipping_container = container.findAll("li", {"class":"price-ship"})
As of Nov 2020, I went through the whole thing without any issue! I used a different product name, but everything worked so well! Everything worked so perfectly! I learned so much from this video! this is awesome!!!! Thank you!!!!
brand = make_rating_sp[0].img["title"].title() TypeError: 'NoneType' object is not subscriptable [Finished in 3.074s] anyone know why this is happening? or how to fix this?
DUDE! High Quality Content!! You are very good at walking through the logical steps for breaking down a page! Other tutorials are great but are always geared toward the specific task at hand. With this it felt like I also learned how to tackle a page! This helped a bunch!
@@petersilie9504 Can you do this in python 3? I don't think it's possible (apparently the multithreading module it is not recommended). Sounds like a job for a compiler language.
when I try to follow, it gives me the following error message: brand = container.div.div.a.img["title"] AttributeError: 'NoneType' object has no attribute 'a'
I got the same error so I changed to code a bit which follows the same method as finding product name: brand_container = container.findAll("a", {"class":"item-brand"}) brand_name = brand_container[0].img["title"] product_container = container.findAll("a", {"class":"item-title"}) product_name = product_container[0].text
can you tell me how to send requests to 3 different websites at same time without getting http timeout error.? I tried different ways to get rid of this error but no success.
@@anwowie can u tell how to send request to 3 different websites at the same time without getting Http error? coz i am working on a project .i tried many ways but no success
Senior Data Scientist, Senior Database Engineer... I know a fellow gamer when I see one! Thx for the the Tutorial. All this time...all I ever wanted from most of the internet was the ability to "scrape" (new term for me) what I wanted so that I can do something with that data. I like to organize things and categorize them. I always thought rss was okay...twitter okay...reddit okay...but I just want specific feeds from those sites and this is exactly what I was looking for! Better than paying a monthly fee to somebody who won't even teach you how to do it. Maybe its from collecting cards as a kid or playing video games that had really in depth inventory systems (rpgs). But it is enjoyable when you can get the exact bit of information you want and then do something cool with it. This is helpful! Where were you when I needed to organize my bank in world of warcraft!!!
Hi Dojo, Really nice video. I have one doubt. The recent eCommerce sites done have class items constant, they have alpha numeric values like class="_3Hjcsab" how do you scrape when the site keeps on changing?
Then it gets harder! It's an adversarial problem. The time of development greatly increases because you have to build functions to check if the tag has all the features you are looking for before grabbing it. It's not as straight forward as grabbing by the div or id. In this case it might not be practical to scrape this sites because they clearly do not want to be scraped. Even if you scraped them successfully, they would be aware and change their code again accordingly.
Truly enjoyed your simple step by step explanation on why each command or function is needed, and what it does. Your Python knowledge and skills are evident, as you are able to provide immediate solutions to errors and or challenges to the problem you are attempting to solve. Followed along with the tools and enjoyed the session. Thank you.
def Data Science Dojo(): Data Science Dojo = ("like", "share", "sub") good job = (input.comment("Thanks you very much ! ")) if good job in Data Science Dojo : print("love and respect from Kuwait") else: print("sorry maybe next time") Data Science Dojo() ------- Output :- peace out and happy basic coding :D
I really liked the tone, rythm and clarity of this tutorial! I‘m not a total beginner with python anymore and so was able to listen and (mostly?) understand while preparing lunch for my kids. (I‘ll rewatch to try and do it later)
when i type uclient = ureq(my_url) it gives me a 403 error forbidden and a bunch of timeout, does this mean that it works but it crashed or will crash if it runs?
loved it, though i m very beginner in data science and have zero knowledge in it, i watched the entire video and tried to grab everything possible discussed here
Hi, I'm getting stuck at 28:50 when running the script. How do I solve this problem? $ python Dojo.py Traceback (most recent call last) : File "Dojo.py", line 18, in brand = container.div.img["title"] TypeError: 'NoneType' object is not SUBSCRIPTABLE Best Regards
Hey, I got it too. It seems to come when they don't have the "3VGA" or whatever. I fixed it by taking the first word out from the output "title_container[0].text". So I tossed the original second part of "brand = xxx" and replaced it with "brand = title_container[0].text.split(' ', 1)[0]". Hope it helps.
So thankful for this, I was able to run it and scrape similar information off of a coding website. I had some trouble with installing BS4. Tip, I used pip3 to install BS4 to keep everything clean. sudo pip3 install bs4
Hey there! this guide really helped me to create a tailored scraper for a pilot project. Even though I am at the very beginning stage of learning Python, I could manage to create the entire script, and even learned while the process. Amazing, really appreciate this!
I was able to make a program for my client i never thought was possible. I got paid real money for this.
Blessings so much learned, this is like magic
Can you tell me hoow much time did it take? And is it recommended for a Uni Student to make it as a semester project?
@@GamingTechSnips Depends on your skill as a programmer
@@GamingTechSnipsLess than a week even when you have zero background knowledge
Indox Me If You Can!!!
I Need some tips from you..
Damn you're lucky, my client paid me fake money. smh
MINOR SUGGESTION
As of 10/03/2019, If you are following along this tutorial. "container.div" won't give you the div with the "item-info" class. Instead it will give you the div with the "item-badges" class. This is because the latter occurs before the former. When you access any tag with the dot(.) operator, it will just return the first instance of that tag. I had a problem following this along until i figured this out. To solve this just use the "find()" method to find exactly the div which contains the information that you want. For e.g. divWithInfo = containers[0].find("div","item-info")
Thank you. Can't express how helpful this was and unlocked everything for me. Only part I wasn't understanding. Thank you
Thanks for the tip!
Okay, but how do we further navigate into more embedded items? I'm trying to pull out 'title' out of the a class within item-branding, but doesn't work.
D R have you used find and/or the findall method? Doing a couple of searches on google and stack overflow helped me get further into methods.
Also, do you know basic html?
@@vincentn2059 Yeah, I'm using the find Method. I've looked in quite a few places but can't find the information I need.
I'm at the part in the video where he uses container.div.div.a.img. I've used containers[0].find('div', 'item-info') which works correctly but now I am stuck at the part where I have to navigate further to pull out the information I need.
It's weird to think about it like that, but this video started my whole Python learning back in 2017 and I am SO SO SO much thankful for it.
How good are you at python now? Just wondering how much progress one can make in 3 years
yes, please update us now!
@@DragonRazor9283 So the projects I have done by now are: Web scraping sec.gov xml files, converting them to excel, inserting them into SQL database, I have built a dynamic website around this in Flask (Python library). I have expanded my web scraping to sites that provide data in JSON which usually contains more data than it is available on the website directl and this way its more speed efficient, I have moved all this to pythonanywhere where I haven and FTP server as well and have automated tasks which run every hour/day. My main field is still web scraping but now I can run SQL queries with pyhton and display them as well. That is to say, I have learned all this in my free time after work.
@@chinzzz388 Sorry I didn't see your comment somehow. So the projects I have done by now are: Web scraping sec.gov xml files, converting them to excel, inserting them into SQL database, I have built a dynamic website around this in Flask (Python library). I have expanded my web scraping to sites that provide data in JSON which usually contains more data than it is available on the website directl and this way its more speed efficient, I have moved all this to pythonanywhere where I haven and FTP server as well and have automated tasks which run every hour/day. My main field is still web scraping but now I can run SQL queries with pyhton and display them as well. That is to say, I have learned all this in my free time after work. This earned me a new position at my company which doubled my pay. This earned me a new position at my company which doubled my pay.
@@Tocy777isback0414 that is amazing my man!! Congrats and keep grinding :)
If you had some prior experiences with web crawling, this video can makes your crawling skills into a whole new level. Allows you to crawl website containing complicated info about multiple items into a very organized dataset. The various tools introduced in the video are also fantastically helpful as well. A BIG THANK YOU
Table of Contents:
0:00 - Introduction
1:28 - Setting up Anaconda
3:00 - Installing Beautiful Soup
3:43 - Setting up urllib
6:07 - Retrieving the Web Page
10:47 - Evaluating Web Page
11:27 - Converting Listings into Line Items
16:13 - Using jsbeautiful
16:31 - Reading Raw HTML for Items to Scrape
18:34 - Building the Scraper
22:11 - Using the "findAll" Function
27:26 - Testing the Scraper
29:07 - Creating the .csv File
32:18 - End Result
Hi .how can we scrape if the web page is single page app
Thank you for the tutorial, however I am not able to get all the list, it only prints one result so its not looping all the containers. Can you please help me out?
containers = page_soup.findAll("div",{"class":"item-container"})
for container in containers:
brand_description = container.a.img["title"]
price_box = container.findAll("li",{"class":"price-current"})
price = price_box[0].strong.text
print("brand_description:" + brand_description)
print("price:" + price)
@Data Science Dojo svp document pdf ou siteweb
I am trying to scrape the prices off of new-egg's website. the price is nested within
price=container.findAll ('ul' , {'class' : 'price'}), where I call:
price[0].li.span ---> I dont get an output. when I call:
price[0].li.span.text ---> I get an attribute non-existent error.
How would I scrap the price in this new-egg example?
Also, the current price they have wrapped within a 'strong' tag that is within a span class. How would we scrape this?
Thx
This was by far the best introduction to web scraping I've found online. Clear, concise, and easy to digest. Thank YOU!
you look like a god when your writing multiple lines at the same time.
Two years into a web program and a year working in the field and never bothered to learn how to do this. Great video, I followed along 5 years later in 2022 with Python 3.7.8 and it still works.
This was really good content, definitely the best intro to web scraping I've seen. You don't go through it as though you're reading from the documentation, there's more of a flow.
I saw this video and then successfully wrote the entire code without looking at the video. Not even once. This is because i understood every line of it. Thank you man. Your explanation is very beginner friendly.
Yes. It helped me UNDERSTAND finally, I think because he taught it with respect for the viewer.
32:30, I started cheesing at how awesome the end result of this whole project was. Definitely inspiring - thank you for the excellent guide!
I am from commerce background. I have zero knowledge of all the programming language. I found your video and explanation so good that at least now I can start my journey into scrapping and coding. I am so thankful at the moment. Love your channel. Thank you so much.
Hello Ella, glad to help you. Stay tuned with us for more tutorials!
@@Datasciencedojo Yes Chief👍 Have subscribed already. 🤗
The man, the myth, the legend.
You have no idea how much stress and lost time you have prevented. THANK YOU!
Absolute champion, quite possibly the best code tutorial I've ever watched. Oh the possibilities! Thank you :)
wow even almost 3 years later this video helped me so much and helped me to make a program that picks a random steam game, this was so hard, but i figured it out, big props to you and this video
One of the best teacher I have come across UA-cam. Web Scraping explained so well that even a layman can follow and understand the basic concepts. I wish, in life I had a teacher/mentor/friend like the one teaching in this video.
A BIG BIG THANK YOU: the most understable tutorial I've ever seen on how to scrape a web page (and I have visionned like 100 of them)
This is the best web scraping tutorial that I’ve found. I’ve been frustrated for hours trying to use other resources. Thank you for making this, your explanations are thorough and great!
Coming from an R user, this is a very well done introductory tutorial into web scraping in Python. I like the real world example with Newegg and troubleshooting along the way.
I cant believe I actually sat through 33 minutes learning web scrapping, something completely new to me. I was looking for a shortcut but your tutorial was just perfect! :D Thanks for this.!
This is actually the coolest thing I've seen in my entire life. Wow. Thank you so much I love you man.
I am just starting web scrapping and I can honestly say that this video clearly explained everything. I watched this at 1.5 speed and it made sense. I would love more videos like this. I loved how you made it generic so it can apply to more than one website!
THIS IS AMAZING!!! Everything was very well-explained and instructed, I managed to get my first webscrape off an E-commerce site! Thanks so much, you have a loyal subscriber in me!
Perhaps you could cover using time sleeps to avoid getting blacklisted by the websites we are scraping? And also how to scrape multiple pages in one go?
You are sooooo comfortable to listen to. Not because you have a perfect pronanciation and a seamless script you are gliding through. You are just talking but not constantly jumping back and forwards. Accurate tempo and personality in your voice.
New subscripion
This makes us feel really motivated, Law! Thanks a lot :)
UPDATE/SUGGESTION
The findALL function has been renamed to the find_all function in Bs4 version 4.9.3
You are a blessing seriously! The first tutorial that actually made sense from start to finish. I was able to understand so much from this! Please Please Please Please upload more videos on Python Web Scraping with BeautifulSoup.
Thank you again for this blessing!
This was fast, precise and beautiful! By saying beautiful I didn't mean to state the obvious :) Thanks
Very high-quality tutorial.
How to set up everything before running any code is very nice to include, and timestamping it so people who already know it can quickly skip is just much appreciated.
Keeping the tutorial example script and diverse is very welcome.
Writing it from scratch just makes sooo useful for remembering what was where.
I wish other people made tutorials like this... Timestamping is so useful when you just want to look-up that one thing and don't really remember when it appeared.
Thank you very much for this video!
I hope you do a second one on this subject. I'd like to know how to scrape several pages as you mentioned in the end of video. This was just what I was hoping for. Thanks!
Hey, please help me, when I tried scrapping other site , I am getting 403 forbidden error , how do I fix that? Is it possible to scrap a secure site?
I expected this video to take me 30 minutes to do, because it takes 30 minutes. 10 hours later I HAVE MY FIRST WEBSCRAPER THANK YOU VERY MUCH! I still did not manage to get it to be an csv, but made a .txt and it is fine for now. Thank you so much again! tutorial from dataquest.io came in very handy as well!
made another one today and it is working with csv
This material is just amazing. Thank you! Have you considered making an intro to Web Scraping using R?
I have watched all the web scraping videos on UA-cam but this one is the top, I learned a lot. Thank you.
enjoyed, data science ! Need more like this one
dude, you are literally saving lives with this type of videos... I can't wait to digest all this precious info..... You save people so much time with this!! you are magic!!
Thank you, John, for such kind words. Keep following us for more content!
Looks like they added another div at the very beginning of each item-container. The brand name can now be extracted with a little more effort-
brand_container = x.findAll("div",{"class":"item-info"})
print(brand_container[0].div.a.img["title"])
Try using a simpler one liner - print(container.a.img["title"].split(" ")[0])
Great, thank you. it worked!
brand_container = container.findAll("div", {"class": "item-info"})
brand = brand_container[0].div.a.img["title"]
Thank you! Took me forever to figure it out before I read this comment!
it must have been a magic day when I saw this for the first time 1.5 years ago !!! its where i all started !!! Thanks! Best Video & Intro into webscraping for absolute begginers !! Thanks (notable mentions to Corey Shafer who I was watching a a few weeks earlier, who gave me the taste of it & how easy it could be to use/do). Thank you friends!! An amazing tool!!
When I watched this tutorial, it seemed easy to scrap until I stuck a thousand times while actually scrapping a webpage. Happy Coding for dummies lollll
Everyone learns in a different way, and absorbs information through different methods! This informal, laid back 'talk & walkthrough' (almost like sitting together with a mate) fits my style sooo much!! for me probably the best python lesson ever!! Will be looking for many more - thanks :P legendary !!
Keep following for more content, Robert!
16:04 Command for it on windows is CTRL + SHIFT + P :)
Mate, this is just perfect! I learned so much by doing this with you. Now I'm ready to tackle other websites!!! You're a legend!
Stay tuned with us for more content!
Nice! I was wondering if you could do a page monitor where it tells you exactly where the website has changed?
yeah, that would be interesting, Basically you would save all the variables then check and save them into new variables compare old ones then change if there is a difference?
An easy way to do this is download the html from desired page and store it's md5. Check the same html periodically and compares both stored and current md5.
This is an easy and less cpu-consuming way to check whether the website has changed.
yeah, that seems like a better way to do it but you would need to clean up the ad containers since they always change although the page content did not change.
You are the most concise teacher of python I have come across
Thanks
I will definitely give your other videos a view
This has been so useful. Thanks so much. What I need to know now, is how I can get the scraper to continue working when there's a 'Load More' button, which doesn't take you to another page. If anyone knows anything about this please let me know.
This is a really good question. Maybe click the load more button and then copy the URL? Or define how many results you want for that page then copy the URL I'm pretty sure when you hit load more its actually altering the html path?
Maybe you can program in a click to load more function into your code.
Late answer, but the solution is coding a click load more button. Similarily how you can do a click next page button for your script to continue onwards.
Awsome, Good, Excellent, Nice, Best.
Hope UA-cam's algorithm recommend this to every Scrapper Enthusiast.
wow this was great! I am completely new to this and still could follow perfectly fine and loved the explanations of everything. Would love to know how to run this script every day automatically and send results to phone or create alerts for changes and send those to a phone. Again, awesome job!
This is what I'm looking for as well, but I'm not getting any further unfortunately
so you would need to host a server online to run it always, you can use it to also link to an app that checks the changes and alerts you. Just some ideas you can search more on StackOverflow :)
twilio is a good service for sending text messages via API...you could combine it with the scraping functionality and some sort of compare logic to text you the changes...
The only solution is to be constantly updating your code. There's not really a good way outside of intelligently analyzing the picture, description, & brand.
This is one of the only clear|fun python tutorials out there. Congrats
Hello, at 20:14 , the tag (in my case) jumps to tag inside tag. How to choose which tag we want to grab if there is more than 1 tag with same name
I ran into a similar problem. You can use the "find()" method in python to find a specific tag.
you can either have it in the following:
a) container.find("a","item-brand")
b) container.find("div","item-branding")
once you are in a specific tag, you can just go with . notation to get to the next sub-tag.
so for example, I had container.find("div", "item-branding").a.img["title"]
You can just skip directly by searching for the "a" tag instead of the "div" tag or maybe even the "img" tag.
I was also stuck there but i found the solution. Directly Just find the div with class : item-branding and from there you can get the image which will give you title.
@@hieudao428 Thankss!!
I don’t usually comment on videos but this was phenomenal. Thank you.
Great video. ..very easy to follow. hope you do more of that kind. Thanks.
Glad you enjoyed it! Did you mean more videos about web scraping, programming, data science, or data acquisition?
yes , I need more videos on web scraping. Thank you :)
number 1, i realise this was 5 months ago but still thought I'd make a suggestion.
If you get good at data scraping you end up with enormous CSV files... how do you manipulate them? Like if I was looking for a certain price at a certain date in the past, putting all your data in a python list and iterating through it crashes my computer usually...
BY FAR the best tutorial I've watched for web scraping.
But could someone just help me out with scraping through multiple pages? I know the guy mentioned something about it in the very end but still
I'm getting this error when I try to run it:
File "", line 2, in
NameError: name 'page' is not defined
he set it as page_soup, not page
This is one of the most useful Web Scrapping videos I have ever come across. I could learn it from scratch. Thanks.
This is very well explained and I enjoyed every second of it ! please do more ^^
I did a Web Scraper not so long ago with another set of tools. This video has motivated to create one, too!
Wow great video.Can you make a video on srapping data from multiple pages
would that involve threading?
We want more data scrapping video! This was awesome!
this soup is very beautiful, goddamn
Thanks. I have a basic understanding of python and html and I found this tutorial very easy to follow. You do a great job of clearly explaining things in the code which is what I need at my current skill level. Much appreciated.
Maybe it's better to use find() instead of findAll() to get product's name? So code will be less complex, like this :
title = container.find("a",{"class" : "item-title"}).text
How would you loop with this configuration?
thank so much
Saw many videos on web scraping but yours was probably the best one.
shipping_container = container.findAll("li",{"class":"price-ship"})
GETTING THIS ERROR
Traceback (most recent call last):
File "", line 1, in
TypeError: 'tuple' object is not callable
try find instead findAll
Same!!! I didn't understand when he said "oh I need to close this function".... Can anyone explain?
@@neilaybhalerao8373 shipping_container = container.findAll("li", {"class":"price-ship"} is what he typed originally. He forgot to add the ending ) to close the function. So he should've typed shipping_container = container.findAll("li", {"class":"price-ship"})
As of Nov 2020, I went through the whole thing without any issue! I used a different product name, but everything worked so well! Everything worked so perfectly! I learned so much from this video! this is awesome!!!! Thank you!!!!
brand = make_rating_sp[0].img["title"].title()
TypeError: 'NoneType' object is not subscriptable
[Finished in 3.074s]
anyone know why this is happening? or how to fix this?
Did you get an answer? I’m having this problem aswell
DUDE! High Quality Content!! You are very good at walking through the logical steps for breaking down a page! Other tutorials are great but are always geared toward the specific task at hand. With this it felt like I also learned how to tackle a page!
This helped a bunch!
Awesome tutorial, Please add how to scrap multiple pages :)
Linux IT make a list and a for loop?
Use multithreading for this
@@petersilie9504 Can you do this in python 3? I don't think it's possible (apparently the multithreading module it is not recommended). Sounds like a job for a compiler language.
@@johannbauer2863 can you please explain? Thanks
your such a great teacher! Just because you can code doesn't mean you can teach. Awesome!
the first div that it showed was item badges how do i navigate to different divs?
I am having the same problem??
Awestruck! It's amazingly simple to follow along! Thank you, sir, for adding to the community of self-learners!
when I try to follow, it gives me the following error message:
brand = container.div.div.a.img["title"]
AttributeError: 'NoneType' object has no attribute 'a'
Hey helmut i am also getting the the same error. have u fixed tge error?
I got the same error so I changed to code a bit which follows the same method as finding product name:
brand_container = container.findAll("a", {"class":"item-brand"})
brand_name = brand_container[0].img["title"]
product_container = container.findAll("a", {"class":"item-title"})
product_name = product_container[0].text
yeah it worked
can you tell me how to send requests to 3 different websites at same time without getting http timeout error.? I tried different ways to get rid of this error but no success.
@@anwowie can u tell how to send request to 3 different websites at the same time without getting Http error? coz i am working on a project .i tried many ways but no success
Senior Data Scientist, Senior Database Engineer... I know a fellow gamer when I see one! Thx for the the Tutorial. All this time...all I ever wanted from most of the internet was the ability to "scrape" (new term for me) what I wanted so that I can do something with that data. I like to organize things and categorize them. I always thought rss was okay...twitter okay...reddit okay...but I just want specific feeds from those sites and this is exactly what I was looking for! Better than paying a monthly fee to somebody who won't even teach you how to do it. Maybe its from collecting cards as a kid or playing video games that had really in depth inventory systems (rpgs). But it is enjoyable when you can get the exact bit of information you want and then do something cool with it. This is helpful! Where were you when I needed to organize my bank in world of warcraft!!!
brand = container.find("a", {"class":"item-brand"}).img.get('title')
your welcome
'NoneType' object has no attribute 'img' :D could you please send me your code?
The best tutorial. Thank you. Much better than all videos in russian lang
Hi Dojo, Really nice video. I have one doubt. The recent eCommerce sites done have class items constant, they have alpha numeric values like class="_3Hjcsab" how do you scrape when the site keeps on changing?
try the Xpath way!! i don't think they will change all the Attributes and the path of the element Periodically.
Then it gets harder! It's an adversarial problem. The time of development greatly increases because you have to build functions to check if the tag has all the features you are looking for before grabbing it. It's not as straight forward as grabbing by the div or id. In this case it might not be practical to scrape this sites because they clearly do not want to be scraped. Even if you scraped them successfully, they would be aware and change their code again accordingly.
Yeah, ran into the same problem, tried a lot to get around it but couldn't :/
Yes scraping may be a limited toolset as websites use more sophisticated format
Thanks great vid-easy to follow for a rookie
Thanks for the video. This was the best web scraping tutorial I have seen on youtube.
Hi, thanks for the video! How do you get to the second div tag in "container"?
Have the same question here. I've tryed different notations i.e. div[2], dic{2}, div(2) and others, but still don't get the second or third div
Truly enjoyed your simple step by step explanation on why each command or function is needed, and what it does. Your Python knowledge and skills are evident, as you are able to provide immediate solutions to errors and or challenges to the problem you are attempting to solve. Followed along with the tools and enjoyed the session. Thank you.
def Data Science Dojo():
Data Science Dojo = ("like", "share", "sub")
good job = (input.comment("Thanks you very much ! "))
if good job in Data Science Dojo :
print("love and respect from Kuwait")
else:
print("sorry maybe next time")
Data Science Dojo()
-------
Output :-
peace out and happy basic coding :D
I really liked the tone, rythm and clarity of this tutorial! I‘m not a total beginner with python anymore and so was able to listen and (mostly?) understand while preparing lunch for my kids. (I‘ll rewatch to try and do it later)
when i type uclient = ureq(my_url) it gives me a 403 error forbidden and a bunch of timeout, does this mean that it works but it crashed or will crash if it runs?
4K Bahrami same here
DeeganCraft how does that work?
you guys are using pages instead of pages ... :)
this helped me:
stackoverflow.com/questions/41214965/python-3-5-urllib-request-403-forbidden-error
This is gold for someone learning python and seeing its application.
could you upload the script?
loved it, though i m very beginner in data science and have zero knowledge in it, i watched the entire video and tried to grab everything possible discussed here
I keep getting 0 when I call len(containers)
Me too. Did you figure it out? I didn't..
@@saarakylmanen9345 did u figure this out yet
did u figure this out yet
I thought web scraping was hard until I found your video. Huge thanks man, you saved so much time for me!
Hi, I'm getting stuck at 28:50 when running the script. How do I solve this problem?
$ python Dojo.py
Traceback (most recent call last) :
File "Dojo.py", line 18, in
brand = container.div.img["title"]
TypeError: 'NoneType' object is not SUBSCRIPTABLE
Best Regards
That is a corner case error...your best bet is to apply a try or if else statement.
Hey, I got it too. It seems to come when they don't have the "3VGA" or whatever.
I fixed it by taking the first word out from the output "title_container[0].text".
So I tossed the original second part of "brand = xxx" and replaced it with "brand = title_container[0].text.split(' ', 1)[0]".
Hope it helps.
Tnx then I'm not going crazy it's the website changing that causes this kind of errors ☺
Looks like you need to add another "div" tag. --> brand = container.div.div.a.img["title"]
Hey, please help me, when I tried scrapping other site , I am getting 403 forbidden error , how do I fix that? Is it possible to scrap a secure site?
So thankful for this, I was able to run it and scrape similar information off of a coding website. I had some trouble with installing BS4. Tip, I used pip3 to install BS4 to keep everything clean.
sudo pip3 install bs4
Just use pycharm, man :-P
just use vim and then go native linux your set now you can throw the desktop away and get a tiling WM
Hey there! this guide really helped me to create a tailored scraper for a pilot project. Even though I am at the very beginning stage of learning Python, I could manage to create the entire script, and even learned while the process. Amazing, really appreciate this!
Hello everyone, find the updated version of this tutorial here: ua-cam.com/video/rlR0f4zZKvc/v-deo.html
This was very good. I'm a beginner to Python and this webscraping tutorial left me with very little questions.
6:28 - The goold old times where a mid-upper graphics card (GeForce GTX 1070) could be bought under 400$ :')
Great video, thx!
I dont know much about coding but the way you explained this made perfect sense. I hope to learn a lot from your channel.
Fantastic video dude, much more helpful than others I've seen on UA-cam
I don't have any words to explain how much this video was helpful. Hope soon I will use this feature.
Your presentation and explanation are awesome! You have opened my eyes to the uses of Python and Beautiful Soup.
Fantastic tutorial! gave me 95% of what I needed for my first screen scraping project.
This is the first tutorial on this that actually makes sense. THANK YOU. You earned a subscriber.
As someone self learning Python (my first programming language) with a web scraping script in mind, this was great!
One of the best scraping tutorials good job
Loving your coding skills. Was just about giving up on Web Scraping. Then BOOM!!! I found this. :)