Web Scraping Databases with Mechanical Soup and SQlite

Поділитися
Вставка
  • Опубліковано 10 вер 2024

КОМЕНТАРІ • 219

  • @inconnumj4692
    @inconnumj4692 2 роки тому +16

    can mechanical Soup scrap dynamic content "javascript pages" ?

    • @PythonSimplified
      @PythonSimplified  2 роки тому +23

      nope! it's meant for HTML/XML scraping only, so it's optimal for websites with very little user interaction! 😊
      If you're scraping a JavaScript website - definitely go for Selenium!
      I have a bunch of videos explaining how to use it:
      ⭐ Web Scraping LinkedIn:
      ua-cam.com/video/7aIb6iQZkDw/v-deo.html
      ⭐ Web Scraping Instagram:
      ua-cam.com/video/iJGvYBH9mcY/v-deo.html
      ⭐ Web Scraping Facebook:
      ua-cam.com/video/SsXcyoevkV0/v-deo.html
      Good luck and I hope it helps! 😀

    • @inconnumj4692
      @inconnumj4692 2 роки тому +7

      @@PythonSimplified thank you for your reply and for the good job you are doing ! keep it up

    • @alb12345672
      @alb12345672 2 роки тому +2

      Sometimes you can look at the network activity and call APIs. Every situation is different.

    • @KacperSieradziński
      @KacperSieradziński 2 роки тому +1

      @@PythonSimplified My viewers are asking me if this is legal :D Do you have any answer on such questions? ;-)

    • @PythonSimplified
      @PythonSimplified  2 роки тому +1

      @@KacperSieradziński I've actually filmed a short video on it a while back:
      ua-cam.com/video/f0B6RdVGcM8/v-deo.html
      It doesn't really count as "legal advice", but as long as you use it within the "fair use" copyrights clause you should be fine 😉

  • @VisuallyExplained
    @VisuallyExplained 2 роки тому +29

    This is probably one of the best and most thorough channels related to practical applications of Python. You have a very unique style, very well done!

    • @PythonSimplified
      @PythonSimplified  2 роки тому +2

      Thank you so much for the incredible feedback!! I really enjoyed reading your comment! 😃😃😃

    • @codzlaw
      @codzlaw 2 роки тому

      This is perfect for someone wanting to learn you have what a lot of educational videos are missing (simpler way of explaining) they teach like you already know and get it. Educators often forget because they get it don't mean everyone does. Different ways to teaching and people absorb information differently. Thanks for taking time to explain it fully step by step.

  • @chaghlarblabla5157
    @chaghlarblabla5157 2 роки тому +5

    As a blind programmer i find it very useful when you tell us what You typed. i subscribe Your channel now. Keep it going.

    • @PythonSimplified
      @PythonSimplified  2 роки тому +2

      Thank you so much for the lovely comment Chaghlar! 😃
      I'm always trying to make these videos as accessible as possible to everyone, and I'm so happy you found them helpful!!
      I really admire the fact that you're programming despite the challenge and I'm super excited to have you on board! 😁😁😁

    • @chaghlarblabla5157
      @chaghlarblabla5157 2 роки тому +2

      Thank You for te kind words coming from You

    • @chaghlarblabla5157
      @chaghlarblabla5157 2 роки тому

      One thing i haven't understand in this code. there is value.text.replace inside the brackets. i understand it's function and what it does. but, i got no clue where did You define value variable at first? is it a method of mechanicalsoup library or m i missing something due to i almost asleep.

    • @chaghlarblabla5157
      @chaghlarblabla5157 2 роки тому

      Thats a list comprehension, ok.

  • @TheDroc1990
    @TheDroc1990 Рік тому +1

    New subscriber!! Im a Data Engineer with 10 years of experience. Can't wait to watch all your videos! 👏

  • @CrypticPulsar
    @CrypticPulsar 2 роки тому +1

    You are just too awesome! Simple, powerful, and fluent method to easily remember commands that you wouldn’t otherwise.. bravo, and thank you!

  • @giorgosiotis1557
    @giorgosiotis1557 2 роки тому +1

    I wish all female minds worked a little like you. Congratulations from Greece

    • @PythonSimplified
      @PythonSimplified  2 роки тому +1

      hahahahaha thank you so much Giorgos! 😁😁😁
      Cheers from Canada! 🍁

  • @jefferyandme3741
    @jefferyandme3741 2 роки тому

    I'm hooked on this channel!! Easy to look at and her style resonates with me. She has a real way of helping me understand. Very refreshing!! Thank You!

  • @Pradeep-kv9hp
    @Pradeep-kv9hp 2 роки тому

    people call Python is easy language but you are makes python so easy and understanding by tutorial videos. Thanks and keep it and make it more and more tutorials videos

  • @stay_stoic_be_stoic
    @stay_stoic_be_stoic 2 роки тому

    When you said simplied, you are on the money. I needed to get a clear view on OOP, your video literally cleared all my doubts.

  • @martinmiguez6153
    @martinmiguez6153 2 роки тому +4

    Ohhh magnifique!!! Love love love this tutorial!!
    your work always improves. The way you explain and how you use the logic of programming is very clear.. thk you

    • @PythonSimplified
      @PythonSimplified  2 роки тому +1

      Thank you so much for the lovely comment Martin!! 😁😁😁 I'm super happy you like my explanations! 😊

  • @wslater56
    @wslater56 2 роки тому

    wow - i am very impressed - so much easier than going to coding sites where the logic is harder to translate. Thanks for clearly explaining the logic within each line of code

  • @alisheik3076
    @alisheik3076 11 місяців тому

    Hi,
    You have an unique way to explain the subject. I am really impressed the way you explain. I have seen hundreds of videos about web scraping, but none of them as simplified as this. Thank you so much. I heard about Beautifulsoup, Scrapy, Selenium, Puppeteer. But for the first time MechanicalSoup. And I am expecting some more videos about web scraping with the updated versions. You are techings are as sweet as your voice.
    Thanks

  • @shadyabdelhady-rm3sz
    @shadyabdelhady-rm3sz Рік тому +1

    thank you so much, that's exactly what I'm looking for

  • @tonym5857
    @tonym5857 2 роки тому

    You have a great tallent to teach us complex programs in the east way. 👏👏👏🌻☘️🐱.

  • @urbaneplanner
    @urbaneplanner 2 роки тому +2

    Nice walk through - there are many ways to webscrape - I hadn’t come across mechanical soup before (I’ve used beautiful soup, selenium, and scrapy and I would approach this task a bit differently) and I really like this intuitive walk through as the thought process applies to the various programmes

    • @PythonSimplified
      @PythonSimplified  2 роки тому +1

      Thank you so much! That's exactly what I was aiming for! 😊
      This video is meant to demonstrate a general approach to web scraping and using developer tools to select elements on the DOM!
      You can apply the same techniques with any other web scraping library, and I must admit that I'm much more of a Selenium kind of girl... but once in a while it's nice to diversify 😉

    • @urbaneplanner
      @urbaneplanner 2 роки тому

      @@PythonSimplified I’m not a programmer, but I like to use a little programming to support my work - so this sort of approach to coding is actually better I think for a lot of people who aren’t necessarily doing programming at scale or worried about efficiency - learn the concepts, do something that works and if you want later on you can learn the more efficient ways of doing things. I find some of the more elegant examples are a bit too sophisticated for some people as they can be so efficient it’s hard for a non programmer to actually understand why they work

  • @printdaniel
    @printdaniel Рік тому +1

    This is new for me, thanks.

  • @senseinfx9630
    @senseinfx9630 Рік тому +1

    I love you teacher👍😍😘.

  • @bc4198
    @bc4198 2 роки тому

    This is exactly the topic I was looking for, but it was crazy hard to find, so thank you!

  • @freedtmg16
    @freedtmg16 2 роки тому

    i love your tutorials! I am in a code camp for web development but im finding the logic driven scripting my true love and following along with your videos makes it really easy to learn as you do an amazing job explaining whats going on! thank you!

  • @Jason-si1yd
    @Jason-si1yd 2 роки тому +1

    Great job on presenting this. It was kept basic and very informative.

  • @kamertonaudiophileplayer847
    @kamertonaudiophileplayer847 2 роки тому

    Now I better understand how Google works. Indeed, it is a simple as a soup. Thank you.

  • @zparihar
    @zparihar 2 роки тому +1

    Definitely the cutest Python Teacher in existence!
    I'm officially giving up Ruby!

    • @PythonSimplified
      @PythonSimplified  2 роки тому

      hahaha thank you so much Zubin! 😀
      And yeeeeey! My evil plan to convert everyone to Python is finally working!! muahahahaha!!! 🤣🤣🤣

    • @zparihar
      @zparihar 2 роки тому

      @@PythonSimplified Found out your in Van! Go Van!

  • @dravidaravindkumar4207
    @dravidaravindkumar4207 2 роки тому +2

    Maam your teachings are awesome 👌👏..can you plss put a video on stadium seat booking system with python as front-end and SQL as backend...

  • @a43em18
    @a43em18 2 роки тому +2

    Amazingly clear explanation - thanks a lot for the example. 100-point :-D !!!

  • @semtex2987
    @semtex2987 2 роки тому +2

    my first choice would be pandas read_html, but if you wanna do this kinda procedural in a more manual way, using static indexes is the worst way to go.
    just extract all tables and iterate over the content of each. ;) so nothing goes wrong if the contentsize changes

    • @PythonSimplified
      @PythonSimplified  2 роки тому +1

      If the content size changes it shouldn't really matter as the index is not hard coded... we're performing a search to find in which index the value lives before we slice it so it should adjust accordingly (relevant only if the new distributions won't be added after Zorin OS though) 😊
      I definitely agree that read_html is a faster alternative! It's an incredible shortcut, while this video is more of a step by step demonstration of how to approach web scraping in general. I always like shortcuts, but in my opinion knowing the concept behind the shortcut is important as well 😉

    • @finkyfreak8515
      @finkyfreak8515 Рік тому

      Thought the same thing. But for a fast and "dirty" way to scrape data it's fine.

  • @Tobs_
    @Tobs_ 2 роки тому +2

    great video, thanks for sharing 👍 I always learn something new.

  • @harkiranonline
    @harkiranonline 2 роки тому

    If beautiful girls like you kept teaching, IT is bound to be more popular

  • @trtlphnx
    @trtlphnx 2 роки тому +2

    I Found This Highly Informative and Very Interesting: Thanks Sweetness, Love You And your Channel ~

    • @PythonSimplified
      @PythonSimplified  2 роки тому

      Thank you so much! I'm glad you enjoyed this tutorial! 😀😀😀

  • @domingezu4687
    @domingezu4687 2 роки тому +1

    BB - Beautiful and brilliant! :)

  • @return_1101
    @return_1101 2 роки тому +2

    Love your video! Content its great.

  • @JorgeEscobarMX
    @JorgeEscobarMX 8 місяців тому

    I love how she says "Attributes"

  • @sibtainshah3376
    @sibtainshah3376 2 роки тому

    It was really a helpful and inspirational experienced session for me to get an overview of different interesting fields in computing like this one .... 😊 Thanks a lot ... ☺

  • @IsaacNewton80735
    @IsaacNewton80735 2 роки тому

    Very entertaining. I want to start a Webscrapping project with Python. It was like I spect in many ways

  • @JaveGeddes
    @JaveGeddes 2 роки тому +1

    Some pages have links on them that have to be joined to get the url with the info I'm looking for, I want to open then scrape those.. can you explain how to do that?

  • @nikluz3807
    @nikluz3807 2 роки тому +1

    “And boom” haha I always say that when I’m explaining code. Nice

    • @PythonSimplified
      @PythonSimplified  2 роки тому

      hahaha it's the best way to announce a successful return statement! 😉

  • @wragabrr
    @wragabrr 2 роки тому +4

    Nice thanks, but if we work with static numbers it brakes as soon as something is added to the table. Would there be a way to only refer to the specific table?

    • @PythonSimplified
      @PythonSimplified  2 роки тому

      Absolutley! However not with Mechanical Soup but with Pandas! 🐼
      Checkout my notebook on Github (I'll film a tutorial about it shortly):
      github.com/MariyaSha/WebscrapingDatabases/blob/main/scraper_Pandas.ipynb
      The scraper object in the notebook code extracts all the tables from the page. Then you find the index of the table you're looking to scrape and you print it I the next cell.
      It's a shortcut that skips the entire scraping process and leaves it to Pandas, which is super handy! 😃
      I hope it helps 😊

  • @dipeshrathore8842
    @dipeshrathore8842 2 роки тому +3

    I just want some tutorials solving real problem which I will have to solve at a python job!
    It will help me a lot😊😇😊

    • @PythonSimplified
      @PythonSimplified  2 роки тому +1

      Hi Dipesh! 😀
      There are many jobs that involve Python! it's widely used in data science, financial analysis, cyber security, machine learning, etc.
      Almost all the videos on this channel will help you solving real problems which you may encounter on the job - it all depends on what industry you're aiming for 😉

    • @dipeshrathore8842
      @dipeshrathore8842 2 роки тому +2

      @@PythonSimplified Yes I am trying to get into Data science and Cyber security.
      Your videos helps me a lot ❤

    • @PythonSimplified
      @PythonSimplified  2 роки тому +2

      Awesome, I'm so happy to hear! 😊
      This tutorial is perfect for data science! many entry level jobs involve lots of web scraping (as you usually start from generating databases so that the senior data scientists can use them later)
      Good luck on your journey! 😁

    • @dipeshrathore8842
      @dipeshrathore8842 2 роки тому +1

      @@PythonSimplified Thank you so much ❤

  • @pllemost8410
    @pllemost8410 2 роки тому +1

    Adorable Mariya….!
    Beautiful soup, bs4.

  • @EmaMazzi76
    @EmaMazzi76 2 роки тому

    Thank you! Your tutorials are amazing 🤩

  • @paulwratt
    @paulwratt 2 роки тому

    if your code included variable assignments of those "print index", then you can still get all your wanted value.text results even if the contents of the tables changes. Also if you are going to re-run this scraper code that writes to the DB, you need to "DROP" the database first (ie delete the table, not just the data itself)

  • @autentikum
    @autentikum 2 роки тому

    I get the error when trying to do pd.DataFrame() "All arrays must be of the same length'. Do you maybe know why?

  • @killiandw
    @killiandw 2 роки тому

    man I love hearing her speak

  • @waylandchin
    @waylandchin 2 роки тому +2

    So every time you ran the code to test out the script, it would reload the webpage into mechanical soup. If the website is huge, how do we save the page into a file so that we don’t have to request it from the site again.

    • @PythonSimplified
      @PythonSimplified  2 роки тому +5

      Hi Wayland, it's a great question! 😊
      One solution could be scraping the entire content of the page with browser.page, converting it into a string (maybe replacing the tags with "," or something similar) and storing it in a csv or text file.
      with open('mt_text_file.txt', 'w') as my_file:
      my_file.write(my_string)
      with open('my_csv_file.csv', 'w') as my_file:
      my_file.write(my_string)
      If you're selecting specific elements rather than the entire page, try browser.page.find_all(element_name) and you can store them in the exact same manner 😀 This way you are interacting with a file on your computer rather with some kind of a web server.
      Good luck scraping and I hope it helps! 😄

  • @furkanozata6775
    @furkanozata6775 10 місяців тому

    Great video. Thanks alot.

  • @lopii777
    @lopii777 2 роки тому +1

    It is magic to watch you creating code, very informative and cool.
    I just have a question as an Excel-user that would like to learn how to code.
    What is the major difference and benefit between the Python code and Importing from a web-page in Excel ?
    Thank you !

  • @aliahmad5834
    @aliahmad5834 2 роки тому

    time to change my career path

  • @aamirahmed9744
    @aamirahmed9744 2 роки тому +1

    Hey... Mariya... Love u vdios they are really easy to understand.
    I m from Burnaby too. Sent u a connection on LinkedIn. Plz accept. This is Aamir Ahmed.

  • @ayoubcharbaji884
    @ayoubcharbaji884 4 місяці тому

    and if i already have a table that i wanna insert Pandas data frame into it what should i do ?

  • @mytoptechs
    @mytoptechs 2 роки тому

    I keep getting an sqlite3.OperationalError with cursor.execute("create table linux (Distribution, " + ",".join(column_names)+ ")") says near "("

  • @gazul05
    @gazul05 2 роки тому

    Awsome! Thanks a lot... greetings from Mexico.

  • @AndrewOBannon
    @AndrewOBannon 2 роки тому +1

    Mariya, like and I have subscribed to your github.

    • @PythonSimplified
      @PythonSimplified  2 роки тому +1

      Thank you so much Andrei!! 😃😃😃 it's a great way to find out the topics of the upcoming videos a few hours in advance 😉 (I usually load the lesson code to Github the night before a premiere)

  • @lucadelpartita
    @lucadelpartita Рік тому

    What do you think is best solution to scrape a website with HTML/XML and recaptcha or a website with javascript that give us a limited list of 10 result and then you must click on every one to popup an html window and see all details?
    Do you think is it possible to scrape data from a database when you must send at least some character for some fields?
    And what about a website with a database where you must feed a field with EXACT string to query? For example, if in the database there is a many record of names, it's not possible to insert just PAUL and then have all records for PAUL and JEANPAUL.
    I am talking about four different websites but all four with a database.

  • @dipeshrathore8842
    @dipeshrathore8842 2 роки тому +1

    You are great Maria😍😍🥰

    • @PythonSimplified
      @PythonSimplified  2 роки тому +1

      And so are you Dipesh!! Thank you so much 😊😊😊

  • @user-jchjkitv77896
    @user-jchjkitv77896 2 роки тому

    Very nice 👌

  • @mustlagzmustlagz333
    @mustlagzmustlagz333 Рік тому

    Wawwwww vous êtes compétente♥.
    Dommage, manque sous-titrage en Français

  • @scpecialist
    @scpecialist Рік тому

    Your videos are great and very informative.
    Are you planning to make a video about collecting data from forum sites? And hope you make a video about the how chatgpt is made it.

  • @abdullahtammour
    @abdullahtammour 2 роки тому +1

    Very good video and I learned a lot from it. Love the way you explain everything.
    I have a questions. What are the advantages of using python to scrape the table from the web page instead of using Excel or Power Query?

  • @ReadAlongClassics
    @ReadAlongClassics Рік тому

    Thanks!

  • @ilyessbenmessaoud9272
    @ilyessbenmessaoud9272 2 роки тому

    how we can do it from a table that has rows organised in multiples pages

  • @goosechasing
    @goosechasing 2 роки тому

    Great video as always! Thank you! I would love to see you make a tutorial on the Curses Module!

  • @oscaralejandropazbalderas4288
    @oscaralejandropazbalderas4288 2 роки тому

    Excelente...me he suscrito a su canal...es impresionante...Excellent, I just suscribed to your channel...It's amazing!!!

  • @siamakvakili6349
    @siamakvakili6349 2 роки тому

    Perfect...

  • @cubano100pct
    @cubano100pct 2 роки тому +1

    What would be the best for scrapping web sites that have login, so I can download details of my orders? There is one page that has order summaries and for each order I will click to get to the details to download. Which libraries would be best for this type of web scraping?

    • @PythonSimplified
      @PythonSimplified  2 роки тому +2

      Hi Felix! 😃
      I personally go for Selenium whenever dealing with e-commece sites. It allows you to scrape Javascript (while Mechanical Soup is limited to HTML/XML) and it also gives you a certain degree of protection from security blockers. It opens a browser window from where all the scraping happens and because of this window - the server is convinced you are a legitimate user rather than a bot :)
      I have so many tutorials about Selenium (I belive you'll see a bunch of links in the pinned message on this vid) but here's one where I scrape Facebook:
      ua-cam.com/video/SsXcyoevkV0/v-deo.html
      I hope it helps and good luck! 😊

  • @goodluckoriuwa1669
    @goodluckoriuwa1669 Рік тому

    Can you do a tutorial on how I can connect to the website databases, read data directly from the tables and update table data from the website url and this mechanical soup?

  • @yahyeabdirashid9716
    @yahyeabdirashid9716 2 роки тому

    To be honest i was staring at you whole time 😍

  • @fadyelias
    @fadyelias 2 роки тому

    I'm new on your channel thank you for this good and helpful tutorial but we need Django advanced tutorial

  • @dustinokelley156
    @dustinokelley156 2 роки тому

    I'm Taking a python course at school right now and am struggling mightily. I have had zero prior experience in programming. Do you have any suggestions on materials i could pick up to help myself?

  • @alfblack2
    @alfblack2 2 роки тому

    Simply awsome. a topic I wanted to do/research. Presented excellently (best presentation I have seen for a noob like me). By a very pretty lady! But sadly audio volume is not great. :(

  • @vigneshsuresh6003
    @vigneshsuresh6003 2 роки тому +1

    Plz make a video to scrape data from flight fare sites like cleartrip and expedia.

  • @granand
    @granand 2 роки тому

    Thank you Merry Xmas. Please can you always give li is to environment, editor you are using etc. Us that wayscript? In a month, I will following every video to do stuff. Thanks a ton

  • @neculaicristea8491
    @neculaicristea8491 2 роки тому

    If a table is shown on multiple web pages (Next page ), could we still use this scraper? Thanks.

  • @GergelyCsermely
    @GergelyCsermely 2 роки тому +1

    Thanks

    • @PythonSimplified
      @PythonSimplified  2 роки тому

      You're absolutely welcome! have fun web scraping! 😁

  • @PERSISTENTxMF
    @PERSISTENTxMF 2 роки тому +1

    😃

  • @unrealerich986
    @unrealerich986 2 роки тому

    Best

  • @Pradeep-kv9hp
    @Pradeep-kv9hp 2 роки тому

    Nice

  • @mibrahim4245
    @mibrahim4245 2 роки тому +1

    Thanks beast

    • @PythonSimplified
      @PythonSimplified  2 роки тому +1

      hahahaha thank you habibi! 😁😁😁

    • @mibrahim4245
      @mibrahim4245 2 роки тому

      @@PythonSimplified 😍😍😍😍😍😍😍😍😍 you welcome habibti .. ❤

  • @vishalvishwakarma7621
    @vishalvishwakarma7621 Рік тому

    Hey Mariya , would you tell me name of your large curve monitor, its totally insane like you

  • @zhang73
    @zhang73 2 роки тому

    What monitor are you using? I need to get one.

  • @user-bv8tm9ss2u
    @user-bv8tm9ss2u 2 роки тому

    How can i fix an incorrect number of bindings supplied ?

  • @rodrigolima9754
    @rodrigolima9754 2 роки тому

    linda....simplesmente a melhor...kisses from Brazil

  • @hanjielo4277
    @hanjielo4277 2 роки тому

    thank you, nice tutorial! I have a question, i come across no module of bs4 or mechanical on way scripts, any advices?

  • @BobRoyAce77
    @BobRoyAce77 2 роки тому

    Thanks for your tutorials...always informative and well-presented, and love your personality. By the way, what monitor is that that you are using?

  • @zakyvids6566
    @zakyvids6566 2 роки тому +1

    Hi
    Is it possible for you to make a short maybe an 30-1 hour long python crash course
    I had actually mentioned this in one of the previous videos too

    • @PythonSimplified
      @PythonSimplified  2 роки тому +1

      Hi Zaky! 😀
      That's a very specific time frame you got! hahahaha I'll definitely keep posting videos on the Python for beginners series from time to time.
      I'm just trying to keep up with all the tutorial ideas in my head and it's not an easy task because I have too many of them!! 😅😅
      I've filmed a roadmap video not too long ago, covering some basic stuff for about 8 minutes:
      ua-cam.com/video/wFEC7VbWBZo/v-deo.html
      If you're looking for a good source of beginners tutorials other than my channel, checkout Rune's channel (and more specifically the 8 hours worth of Python lessons playlist):
      ua-cam.com/video/ybeeuGXdhrQ/v-deo.html
      Good luck and I hope it helps! 😃

  • @lonixlon
    @lonixlon 2 роки тому

    whats the differece between mehanical soup and beautiful soup?

  • @Ksys
    @Ksys 2 роки тому

    What ultrawide monitor are you using?

  • @Golledaman
    @Golledaman 2 роки тому

    Great video, thanks for sharing!
    A bit offtopic, but what make and model is your monitor? I have been looking around for a 49" ultrawide for office use but it's difficult finding one that ticks all the boxes.

  • @maksympl8474
    @maksympl8474 2 роки тому +1

    Да ладно, неужели красивые девушки бывают такие умные? :)

    • @PythonSimplified
      @PythonSimplified  2 роки тому

      только если они русскоговорящие!! 🤪
      хахахаха спасибо Макс! 😃

  • @laondaradio2022
    @laondaradio2022 2 роки тому

    Nice channel, You can up information of react native please.

  • @carlosarrieta1145
    @carlosarrieta1145 2 роки тому

    Which monitor are you using?

  • @GexPlayerMD
    @GexPlayerMD 2 роки тому +1

    Ututo! 🙂

  • @dfcf7555
    @dfcf7555 2 роки тому

    приятно вас слушать. привет из россии

  • @r35p3ct00
    @r35p3ct00 2 роки тому

    Молодец)

  • @DS-nr9zc
    @DS-nr9zc 2 роки тому

    Your channel is so helpful! Can you do a tutorial on git?

  • @andrewthought3350
    @andrewthought3350 2 роки тому

    This is how Linus Torvalds's duaghters may be like.

  • @elvinrk
    @elvinrk 2 роки тому

    Awsome video!

  • @danield.7359
    @danield.7359 2 роки тому

    Subscribed 😊. Nice job. However, if any changes are done to the tables in regards adding/removing rows and/or columns - which in an actively maintained table of Linux distros is quite probable - the scraping algorithm won't work properly anymore. So instead of using hardwired indexes I'd add some more code to analyze the table(s) and compute the indexes dynamically. An other potential issue for certain sites is the EU cookie consent popup () that needs to clicked away before getting to the content. So you'd need to remote control a headless browser using js. Don't know if mechanical soup can handle this. If yes, I'd be interested to learn how.

  • @diwakar_tsn
    @diwakar_tsn 2 роки тому +1

    Wow❤️🥰❤️🇳🇵

  • @pawelwalenda
    @pawelwalenda 2 роки тому

    Great video as usual. But why do you use "pip" instead of "pip3". Is pip still supported?

  • @CurrentElectrical
    @CurrentElectrical 2 роки тому +1

    How do you determine what type of website you are scraping from? I.E. Javascript, HTML, XML etc? Awesome tutorial! Hello from Ontario. :D

    • @PythonSimplified
      @PythonSimplified  2 роки тому +4

      I don't know if there are strict rules to it, but usually when I need to scrape something, the first thing I consider is how interactive the site is.
      If there are many user interactions such as likes, shopping cart or highly scrollable content - I always go for Selenium which is incredibly powerful!
      Another consideration is - how likely is it that the website will block my bot? When dealing with e-commence websites you will encounter lots of blockers to prevent scavengers, therefore Selenium will be the best option again as it provides you with a GUI browser (which is considered as a legitimate user/client from the websites perspective. It just doesn't know how to differentiate between a human and an automated process when this browser window exists) and it bypasses these blockers so your ip is not recognized as a bot.
      So Selenium is definitely my favorite, but sometimes it's an overkill! 😀
      If you're scraping sites with very simple user interaction, for example: Google, Wikipedia, or even the weather forecasts on Environment Canada site (🍁🍁🍁). You'll notice that the majority of user interaction consists of links/anchor elements, which can easily be implemented with a markup language like HTML rather than a scripting one like JS.
      Also, these sites do not load more and more content when you scroll down - it has a pre-defined end of the page section and you can't scroll beyond that (unlike in Instagram and Facebook feeds for example).
      In these cases the best solution is probably Beautiful/Mechanical Soup or other XML/HTML scrapers with a headless browser (such as the stateful one from this tutorial) 😊
      It's so easy to work with and no need to download a special web driver to get it going, just pip install mechanical soup and you can start working! 😁
      Good luck with scraping, and I hope it helps!

  • @catafest
    @catafest 2 роки тому

    I need to scraping instagram , can you have a good tutorial about this issue? Thank you for share.