This is how I scrape 99% websites via LLM

Поділитися
Вставка
  • Опубліковано 30 жов 2024

КОМЕНТАРІ • 64

  • @kryptobash9728
    @kryptobash9728 День тому +13

    wow agentQL is nuts!!

  • @JJ-tr8cu
    @JJ-tr8cu День тому +1

    Thanks for doing the dirty work and doing a comprehensive comparison!

  • @BaldyMacbeard
    @BaldyMacbeard День тому +21

    That's sounds like the worst business case ever. Either incredibly slow or expensive.

    • @acters124
      @acters124 День тому +2

      you would be surprised how often businesses forget about these two statistics when it comes to seeing buzzwords like "AI"

    • @TheGuillotineKing
      @TheGuillotineKing День тому

      In some cases you don't care because it runs 24:7 and it's cheaper than a human

  • @passportmarc
    @passportmarc 12 годин тому

    Amazing stuff man ! learned a ton !

  • @davidwylie8491
    @davidwylie8491 19 годин тому +2

    Amazing. Thanks for sharing

  • @NLPprompter
    @NLPprompter День тому +5

    Alan Turing is Smiling in heaven

    • @ScottzPlaylists
      @ScottzPlaylists 15 годин тому +5

      Thats a nice comment, but it's based on a false premise that most falsely believe because of ignorance.
      The 'dead know nothing' so they 'sleep in the grave' until the 2nd coming . (time passes instantly when you sleep)
      then all the saved will rise into the air to meet Jesus at the same time (the 1st resurection)
      -- almost ('the dead in christ will rise first, then the living' )
      Then after the saved have been in heaven for 1000 years, the 2nd resurection happens -- all the lost.
      they are judged and thrown into the lake of fire. The word is very clear on all this if you study.
      There's more at the 1000 year mark, and 10,000 year mark, but don't want to preach here.

    • @Python_Scott
      @Python_Scott 15 годин тому +4

      @@ScottzPlaylists You know your Bible!! Thanks.

    • @AGIBreakout
      @AGIBreakout 15 годин тому +3

      @@ScottzPlaylists Good to know... Thanks.

    • @NWONewsGod
      @NWONewsGod 14 годин тому +2

      @@ScottzPlaylists Straight Truth -- I like it.

    • @NWONewsGod
      @NWONewsGod 13 годин тому +1

      @@ScottzPlaylists It's seems nicer to know that you go there together, and right now , they sleep.
      They don't have to watch the horrors if this earth.
      The truth is better than the lie. So spirits are de-mo-ns trying to deceive us. They can appear and speak, act, look, exactly like the dead. After all, thy were present their whole life, trying to temp, and deceive.
      The D's know us than any human, plus the've had thousands of years of practice and observation.
      Everyone has an Angel and a D assigned.

  • @therammync
    @therammync День тому +1

    Good info! Thanks! Really appreciate if you slow down little bit

    • @thetroytroycan
      @thetroytroycan 15 годин тому +2

      Slow the speed of the video down.

  • @Mike-ts3kg
    @Mike-ts3kg День тому +3

    What's the legalities with scraping? Are we able to provide a service that is taking data from another company like this or do they just not care?

    • @ExTorvo
      @ExTorvo День тому

      historically linkedin has some famous cases but thats the only case i am aware. Of course, now that we know for sure most AI models are based from scraping we have other cases from that...

    • @xlretard
      @xlretard День тому

      I'm pretty sure new agent systems could be considered malware, if not user directed 🤔

    • @Python_Scott
      @Python_Scott 15 годин тому +4

      I think if a human and Read it and take Notes for free,
      SO should an AI on behalf of humans. ----- they just remember better if trained on it.

  • @torreydev
    @torreydev День тому +9

    Using an LLM for this means that you are paying each time you scrape the data. Writing a script might have a larger upfront cost but should be cheaper long term. Sure you might say that when the website is changed you will have to refactor your scrapper, but I'd guess that you would have to do the same for your LLM based scrapper.

    • @AlexanderShelestov
      @AlexanderShelestov День тому

      Imagine you need to scrap thousand of real estate typical websites everyday.

    • @ashleigh3021
      @ashleigh3021 День тому +1

      LLM cost will be lower long term, unless you require absolutely huge scale

    • @sentry404.
      @sentry404. День тому +3

      I've solved this with a self-maintaining crawler. It's been a bitch to do but I run it once a day on a small number of urls (scraping about 500k urls rn, 20 llm calls per maintenance) and it'll evaluate, update query selectors and even build new scripts.

    • @daylight8296
      @daylight8296 День тому

      you do not have to refactor your LLM scraper that much, it handles dynamic content very well and understands json super easily

    • @dylliedutch
      @dylliedutch День тому

      @@sentry404.this on GitHub?

  • @Y.AndreaRusso
    @Y.AndreaRusso 18 годин тому

    so at the end of the day all of these require python / some technical ability?

  • @TheGreyMotion
    @TheGreyMotion День тому +2

    an entrly level "expert" for 5-10 bucks an hour and the firs model shown was 4o. sorry thought it funny

  • @j2csharp
    @j2csharp День тому +1

    How do you guys feel about using Anthropic's Computer Use product to do web scraping?

    • @Nadia-AIInsiders
      @Nadia-AIInsiders 17 годин тому +2

      It's currently very expensive and not reliable. One major issue with these visually-driven models is their vulnerability to prompt injection. As a website owner, you could add something like 'forget all previous instructions' to prevent scraping and maybe even have a little fun with it :)

  • @ex3aliber
    @ex3aliber День тому

    Insane🎉🎉🎉🎉 love it

  • @HaiLeQuang
    @HaiLeQuang День тому

    Does the cost justify? AgentQL allows 15k API call for $99 per month. That's not much

  • @NLPprompter
    @NLPprompter День тому

    @AIJasonZ Jason do you know Microsoft omniparser model? what do you think building scraping agent on top if it?

  • @raymondaxyz
    @raymondaxyz День тому

    Amazing 🤩

  • @jobautomation
    @jobautomation День тому +1

    Have you seen one of your videos at 2x? 🐈

  • @JohnMcclaned
    @JohnMcclaned День тому +1

    Using llm's to scrape ui is horrifically inefficient lmao.

  • @hfislwpa
    @hfislwpa День тому +10

    Bro just discovered robotic process automation 😅

    • @khitabjaisinghani340
      @khitabjaisinghani340 21 годину тому +2

      He's a step ahead, he's trying to replace RPA

    • @hfislwpa
      @hfislwpa 21 годину тому +3

      @ if you couldn't tell he is coding a bot... That is RPA

    • @AI.24.7
      @AI.24.7 15 годин тому +3

      👍 RPA is when there is little to no AI involved... 👍
      I like the new Terms 'GUI Agent' best, then 'Computer using AI' then 'UI Agent' then ''Open Code Interpreter' then 'computer-use'
      I guess the industry hasn't standardized on terms yet.
      If it can be done without AI in the loop, it's much faster and cheaper.
      RPA encompasses a lot more than Web Scraping, like web testing, etc.

    • @SailGoldExplore
      @SailGoldExplore 14 годин тому +2

      Well, it's similar...

  • @tiagoafonso2971
    @tiagoafonso2971 День тому

    Would love to know how you would leverage the power of AI scraping in website that use older tech like php or asp

    • @KJM3SMG
      @KJM3SMG День тому +1

      huh? that is on server end. scraping is on the front end.

  • @gangs0846
    @gangs0846 День тому +1

    This works on dynamic JavaScript websites?

  • @SaadKhanAhmed
    @SaadKhanAhmed День тому +1

    Awesome stuff!

  • @ordinarygg
    @ordinarygg День тому +5

    So you are saying you are smarter than most companies using 50% eng resources to scrap correct data? I think you are dreaming) if you want to make sure you scrape 100% data your approach is the worst.
    99% cases guys just build a custom scrape script, this AI html to text solutions are not reliable if you need actual data

    • @leonsvideos
      @leonsvideos 18 годин тому

      Yeah, if he can automate the writing of such a script that automatically compares against sample data and guarantees correct fetching of correct key value pairs, that would be interesting

  • @codelucky
    @codelucky 15 годин тому

    Can you create a video to do it using the LLM API or have a repo on it?

  • @cssa2893
    @cssa2893 15 годин тому

    boosting AI, what if there are encryption

  • @VaibhavShewale
    @VaibhavShewale 14 годин тому +1

    users of jina after this video 💹💹