Python Web Scraping Tutorial: scraping dynamic JavaScript/AJAX websites with BeautifulSoup

Поділитися
Вставка
  • Опубліковано 14 гру 2024

КОМЕНТАРІ • 86

  • @JoJoSoGood
    @JoJoSoGood 3 роки тому

    Best video ever ...I will follow your channel from now on

  • @EnglishRain
    @EnglishRain 4 роки тому +6

    Another FANTASTIC topic, amazing! I absolutely love the niche topics you select, thank you so much for sharing your good knowledge my friend.

    • @RedEyedCoderClub
      @RedEyedCoderClub  4 роки тому +1

      Thank you very much!

    • @georgekingsley3972
      @georgekingsley3972 3 роки тому

      sorry to be so off topic but does any of you know a trick to get back into an Instagram account..?
      I was stupid forgot my password. I would love any assistance you can give me.

    • @robertoclay5729
      @robertoclay5729 3 роки тому

      @George Kingsley instablaster =)

    • @georgekingsley3972
      @georgekingsley3972 3 роки тому

      @Roberto Clay thanks so much for your reply. I got to the site thru google and im waiting for the hacking stuff atm.
      Takes quite some time so I will reply here later when my account password hopefully is recovered.

    • @georgekingsley3972
      @georgekingsley3972 3 роки тому

      @Roberto Clay it worked and I actually got access to my account again. Im so happy:D
      Thank you so much you saved my account !

  • @ДанилРезниченко-г2й

    finally, i have found you!
    thx for videos.

  • @ticTHEhero
    @ticTHEhero 4 роки тому +4

    that was exactly what i was looking for, thanks man

  • @varsim_a
    @varsim_a 2 роки тому

    Awesome video

  • @abrammarba
    @abrammarba 10 місяців тому

    This is great! Thank you! 😃

  • @bingchenliu1854
    @bingchenliu1854 3 роки тому

    That is what exactly I'm searching for! Thank you, man!

  • @igorbetkier856
    @igorbetkier856 2 роки тому

    Such a great tutorial! Thank you for that!

  • @Shajirr_
    @Shajirr_ Рік тому +1

    Tried to use this method with Reddit comment search and it doesn't work - the requests it sends are POST requests. So no conveniently available URL on them which you can use.
    The requests themselves are JSON objects.

  • @rustamakhmullaev5697
    @rustamakhmullaev5697 4 роки тому +1

    very useful lesson, thank's for your job!

  • @youngjordan5619
    @youngjordan5619 3 роки тому

    awesome. Always had problem with infinity scroll and used Selenium. Now I know how to do it with bs4 thanks to you, cheers :)

  • @tazimrahbar7882
    @tazimrahbar7882 4 роки тому

    Great explanation sir

  • @joeking9859
    @joeking9859 2 роки тому

    Excellent - best video on xhr (gets) nthat i have seen..great work
    Could you do a video on xhr (posts) please?

    • @RedEyedCoderClub
      @RedEyedCoderClub  2 роки тому

      Ok, thanks for your suggestion.
      POST requests require using of CSRF tokens, and it can be quite tricky or even barely possible to bypass this protection.

    • @joeking9859
      @joeking9859 2 роки тому

      @@RedEyedCoderClub thank you for your response. OK I will not try to go down that rabbit whole.

    • @joeking9859
      @joeking9859 2 роки тому

      do you see most sites going to this method to protect their sites from being scraped?

    • @RedEyedCoderClub
      @RedEyedCoderClub  2 роки тому

      most sites? Not sure. We always can use Selenium or Pyppeteer, for example

    • @joeking9859
      @joeking9859 2 роки тому

      @@RedEyedCoderClub why would selenium or pyppeteer be better?

  • @JackWQ
    @JackWQ 4 роки тому +1

    Hi, thanks for this, but I am encountering the website using "Post" method instead of "Get" in the Request Method, thus not able to replicate what you are doing by scraping the IDs first and copy into urls. The page is just constantly loading and then eventually said page not found. Is there a way to bypass this?

  • @KekikAkademi
    @KekikAkademi 4 роки тому +2

    this trick is awesome !

    • @KekikAkademi
      @KekikAkademi 4 роки тому

      please more crawling and scraping trick, without scrapy,selenium etc.
      for pyqt5 gui projects and telegram bot projects :)

  • @RedEyedCoderClub
    @RedEyedCoderClub  3 роки тому

    What video should I make next? Any suggestions? *Write me in comments!*
    Follow me @:
    Telegram: t.me/red_eyed_coder_club
    Twitter: twitter.com/CoderEyed
    Facebook: fb.me/redeyedcoderclub
    Help the channel grow! Please Like the video, Comment, SHARE & Subscribe!

  • @noelcovarrubias7490
    @noelcovarrubias7490 4 роки тому

    I need to scrape data from walmart, which is all in JavaScript . I'm going to watch and try this tomorrow, hopefully it works!

  • @Ноунейм-п5я3и
    @Ноунейм-п5я3и 4 роки тому

    Good job.
    Thanks for video.
    I'm click like

  • @shortcuts9005
    @shortcuts9005 2 роки тому

    brilliance

  • @amrhamza9831
    @amrhamza9831 3 роки тому

    thank you a lot this was really helpful to me thanks again

  • @ThEwAvEsHaPa
    @ThEwAvEsHaPa 3 роки тому

    great video really well explained. please can you make video showing login/sign in to website with Request sessions and OAUTH

    • @RedEyedCoderClub
      @RedEyedCoderClub  3 роки тому +1

      Thank you. I'll think about your suggestion. Have you any site as an example?

    • @ThEwAvEsHaPa
      @ThEwAvEsHaPa 3 роки тому

      @@RedEyedCoderClub Thanks. i dont really have a specfic site in mind, i have just noticed on a few sites i tried to scrape are using oauth and im not sure how to get around it with just requests.

    • @RedEyedCoderClub
      @RedEyedCoderClub  3 роки тому

      Ok, I'll think about it

    • @ThEwAvEsHaPa
      @ThEwAvEsHaPa 3 роки тому

      @@RedEyedCoderClub Thanks bro, keep up the great work

  • @akram42
    @akram42 4 роки тому

    awesome

  • @duckthishandle
    @duckthishandle 4 роки тому

    Very, very good video on this topic. The way you are explaining the things helps understanding the whole process behind getting the data! I am trying to access the data on various sites, but sometimes I get an error message that I "do not have the auth token" or "access denied!".. How can I bypass those?

    • @RedEyedCoderClub
      @RedEyedCoderClub  4 роки тому +1

      Thank you. An access can be denied by many reasons. And it's hard to say something definite blindly

  • @MrYoklmn
    @MrYoklmn 4 роки тому

    Спасибо большое!) А не планируешь ли серию уроков по scrapy? Ну и второй вопрос, можешь ли сделать урок по созданию на джанго самонаполняющегося агрегатора(новостей/товаров и т д)? Чтобы сайт сам парсил и заполнял себя. Пытаюсь такое реализовать на джанге и скрейпи. Но проблема с запуском парсера из джанги так, чтобы процесс не блокировался. В итоге привинтил celery, но с ним тоже возникают сложности(reactor ошибку выдает). Или мне не стоит на этом канале на русском писать?

  • @silvermir84
    @silvermir84 4 роки тому

    The While loop doesnt stop @800... what did i wrong? the else: Break doesnt work @ 15:47

  • @sassydesi7913
    @sassydesi7913 3 роки тому

    This is great!
    How would you scrape something like teamblind.com? Looks like they have infinite scroll & their payload is encrypted for every call. How would I go about getting historical posts data from this website?

  • @EnglishRain
    @EnglishRain 4 роки тому

    I have a challenge for you: 😜 Can you login to WhatsApp Web using Requests library without manually scanning the QR code & without using Selenium? I achieved it using Saved Profile in Selenium but just curious if you can do it using Requests library. Thanks!

    • @RedEyedCoderClub
      @RedEyedCoderClub  4 роки тому +3

      Interesting idea. But I'm afraid of WhatsApp they can ban my phone number. They really don't like our "style". I'll think about your suggestion, it's interesting.

    • @EnglishRain
      @EnglishRain 4 роки тому +1

      @@RedEyedCoderClub haha yes, i understand. No worries, let it be, i was just thinking aloud. :)

  • @Shajirr_
    @Shajirr_ Рік тому

    This search returned 779 results when the video was released. Now, it returns 4927 results.
    Just to put into perspective how much garbage is being shovelled onto the platform.

  • @АртёмФадеев-я6у
    @АртёмФадеев-я6у 3 роки тому +1

    Привет, это Олег Молчанов?

  • @akram42
    @akram42 4 роки тому

    can you host this script online and make it run 24/7 and sent the data to MySQL database? that would be amazing

  • @adrianka9405
    @adrianka9405 4 роки тому

    def main():
    all_pages = []
    start = 1
    url = f'www.otodom.pl/sprzedaz/mieszkanie/warszawa/?page={start}'

    while True:

    page = get_index_data(get_page(url))

    if page:
    all_pages.extend(page)
    start += 1
    url = f'www.otodom.pl/sprzedaz/mieszkanie/warszawa/?page={start}'
    else:
    break



    for url in page:
    data_set = get_detail_data(get_page(url))

    print( all_pages )
    This is part of my code where I tried to get detailed info from many pages on the website but it doesn't;t work. Do you have any idea why?

  • @egormakhlaev4866
    @egormakhlaev4866 4 роки тому +6

    Молчанов, это ты что-ли?

  • @mrpontmercy8906
    @mrpontmercy8906 4 роки тому

    hmm. At the very first step, it finds only 28 links, and then returns an empty list

  • @sriramkasu5286
    @sriramkasu5286 4 роки тому

    sir need help

    • @sriramkasu5286
      @sriramkasu5286 4 роки тому

      this video is good but what if I want to scrap data from website after logging in and getting details present in that logged account since the html wont work because logged in page cannot be requested

    • @RedEyedCoderClub
      @RedEyedCoderClub  4 роки тому

      ua-cam.com/video/wMf7LJn0k4U/v-deo.html

    • @sriramkasu5286
      @sriramkasu5286 4 роки тому

      @@RedEyedCoderClub thanks

  • @postyvlogs
    @postyvlogs 3 роки тому

    Please provide source code without Patreon

    • @RedEyedCoderClub
      @RedEyedCoderClub  3 роки тому

      Thanks for comment.
      The project is very simple, there is no need in source code at all

  • @anikahmed7456
    @anikahmed7456 4 роки тому

    please make a video on these website abc.austintexas.gov/web/permit/public-search-other?reset=true
    Search by Property
    Select- Sub Type : any
    Date : any
    Submit
    inthis website data where url doesn't changes i try so many time but couldn't success. also it's has JavaScript pagination link : javascript:reloadperm[pagination number] which is changes randomly
    Please make a video 🙏🙏🙏