The HTML Element I check FIRST when Web Scraping

Поділитися
Вставка
  • Опубліковано 17 лют 2024
  • Join the Discord to discuss all things Python and Web with our growing community! / discord
    Doing some string parsing to grab the structured data from a script tag.
    If you are new, welcome! I am John, a self taught Python developer working in the web and data space. I specialize in data extraction and JSON web API's both server and client. If you like programming and web content as much as I do, you can subscribe for weekly content.
    :: Links ::
    My Patrons Really keep the channel alive, and get extra content / johnwatsonrooney (NEW free tier)
    Recommender Scraper API www.scrapingbee.com?fpr=jhnwr
    I Host almost all my stuff on Digital Ocean m.do.co/c/c7c90f161ff6
    I rundown of the gear I use to create videos www.amazon.co.uk/shop/johnwat...
    Proxies I recommend iproyal.com/?r=jhnwr
    :: Disclaimer ::
    Some/all of the links above are affiliate links. By clicking on these links I receive a small commission should you chose to purchase any services or items.
  • Наука та технологія

КОМЕНТАРІ • 9

  • @user-kt2be4wo4i
    @user-kt2be4wo4i 4 місяці тому

    Hello John! Regarding to this particular case from the video i think it is worth to note that if you use js environment like Puppeteer for scraping you can just omit all these transformations simply by using eval function to get valid js object and have all required data. Of course it's risky to use such method when we talk about security but I thnik when scraping store data it is an edge case.

  • @xe2594
    @xe2594 4 місяці тому +1

    Hey John, recently subscribed. W aged to ask if you have sites you recommend to learn an array of coding eg Mimo?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 місяці тому

      hey - thanks and welcome! i dot have any good suggestions for a platform - I learned via youtube and a couple of python books (noteably Python Crash Course, by Eric Matthes) but I have heard good things about boot dot dev

  • @dhillaz
    @dhillaz 4 місяці тому +1

    Thanks John. I just now noticed you switched to Neovim, what did you find were the best learning resources and tricks to get started?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 місяці тому +1

      Use kickstart.nvim by teej_dv and go through the vim tutor! That and just practise the motions and moving around

    • @dhillaz
      @dhillaz 4 місяці тому

      @@JohnWatsonRooney Thanks! I will give it a try...

  • @bathuudamdin
    @bathuudamdin 4 місяці тому

    Hi John, i am a regular viewer of your channel and appreciate what you do for others. i am having a trouble scraping php - magento 2 based web page for product price, name etc.. I am using request_html to scrape dynamically loaded content, however item returning none. There is no json i can see in xhr/network, but json like (document) in the accessibility tab of inspect tools. Looks like data is Sec-fetched to this (document) and javascript in main html is running jquery script to get data from this (document). Any idea how to get this document data and succesfully scrape this web site? Thanks in advance.

  • @blenderpanzi
    @blenderpanzi 4 місяці тому

    If you can strip the comments the remainder seems to be valid YAML.

  • @alexanderscott2456
    @alexanderscott2456 4 місяці тому

    var d = [... document.querySelectorAll('script')].filter(e=>e.innerText.includes('dataObject'))[0].innerText;
    eval(d);
    JSON.stringify(dataObject);
    =D