Thats a nice comment, but it's based on a false premise that most falsely believe because of ignorance. The 'dead know nothing' so they 'sleep in the grave' until the 2nd coming . (time passes instantly when you sleep) then all the saved will rise into the air to meet Jesus at the same time (the 1st resurection) -- almost ('the dead in christ will rise first, then the living' ) Then after the saved have been in heaven for 1000 years, the 2nd resurection happens -- all the lost. they are judged and thrown into the lake of fire. The word is very clear on all this if you study. There's more at the 1000 year mark, and 10,000 year mark, but don't want to preach here.
@@ScottzPlaylists It's seems nicer to know that you go there together, and right now , they sleep. They don't have to watch the horrors if this earth. The truth is better than the lie. So spirits are de-mo-ns trying to deceive us. They can appear and speak, act, look, exactly like the dead. After all, thy were present their whole life, trying to temp, and deceive. The D's know us than any human, plus the've had thousands of years of practice and observation. Everyone has an Angel and a D assigned.
historically linkedin has some famous cases but thats the only case i am aware. Of course, now that we know for sure most AI models are based from scraping we have other cases from that...
Using an LLM for this means that you are paying each time you scrape the data. Writing a script might have a larger upfront cost but should be cheaper long term. Sure you might say that when the website is changed you will have to refactor your scrapper, but I'd guess that you would have to do the same for your LLM based scrapper.
I've solved this with a self-maintaining crawler. It's been a bitch to do but I run it once a day on a small number of urls (scraping about 500k urls rn, 20 llm calls per maintenance) and it'll evaluate, update query selectors and even build new scripts.
It's currently very expensive and not reliable. One major issue with these visually-driven models is their vulnerability to prompt injection. As a website owner, you could add something like 'forget all previous instructions' to prevent scraping and maybe even have a little fun with it :)
👍 RPA is when there is little to no AI involved... 👍 I like the new Terms 'GUI Agent' best, then 'Computer using AI' then 'UI Agent' then ''Open Code Interpreter' then 'computer-use' I guess the industry hasn't standardized on terms yet. If it can be done without AI in the loop, it's much faster and cheaper. RPA encompasses a lot more than Web Scraping, like web testing, etc.
So you are saying you are smarter than most companies using 50% eng resources to scrap correct data? I think you are dreaming) if you want to make sure you scrape 100% data your approach is the worst. 99% cases guys just build a custom scrape script, this AI html to text solutions are not reliable if you need actual data
Yeah, if he can automate the writing of such a script that automatically compares against sample data and guarantees correct fetching of correct key value pairs, that would be interesting
wow agentQL is nuts!!
Thanks for doing the dirty work and doing a comprehensive comparison!
That's sounds like the worst business case ever. Either incredibly slow or expensive.
you would be surprised how often businesses forget about these two statistics when it comes to seeing buzzwords like "AI"
In some cases you don't care because it runs 24:7 and it's cheaper than a human
Amazing stuff man ! learned a ton !
Amazing. Thanks for sharing
Alan Turing is Smiling in heaven
Thats a nice comment, but it's based on a false premise that most falsely believe because of ignorance.
The 'dead know nothing' so they 'sleep in the grave' until the 2nd coming . (time passes instantly when you sleep)
then all the saved will rise into the air to meet Jesus at the same time (the 1st resurection)
-- almost ('the dead in christ will rise first, then the living' )
Then after the saved have been in heaven for 1000 years, the 2nd resurection happens -- all the lost.
they are judged and thrown into the lake of fire. The word is very clear on all this if you study.
There's more at the 1000 year mark, and 10,000 year mark, but don't want to preach here.
@@ScottzPlaylists You know your Bible!! Thanks.
@@ScottzPlaylists Good to know... Thanks.
@@ScottzPlaylists Straight Truth -- I like it.
@@ScottzPlaylists It's seems nicer to know that you go there together, and right now , they sleep.
They don't have to watch the horrors if this earth.
The truth is better than the lie. So spirits are de-mo-ns trying to deceive us. They can appear and speak, act, look, exactly like the dead. After all, thy were present their whole life, trying to temp, and deceive.
The D's know us than any human, plus the've had thousands of years of practice and observation.
Everyone has an Angel and a D assigned.
Good info! Thanks! Really appreciate if you slow down little bit
Slow the speed of the video down.
What's the legalities with scraping? Are we able to provide a service that is taking data from another company like this or do they just not care?
historically linkedin has some famous cases but thats the only case i am aware. Of course, now that we know for sure most AI models are based from scraping we have other cases from that...
I'm pretty sure new agent systems could be considered malware, if not user directed 🤔
I think if a human and Read it and take Notes for free,
SO should an AI on behalf of humans. ----- they just remember better if trained on it.
Using an LLM for this means that you are paying each time you scrape the data. Writing a script might have a larger upfront cost but should be cheaper long term. Sure you might say that when the website is changed you will have to refactor your scrapper, but I'd guess that you would have to do the same for your LLM based scrapper.
Imagine you need to scrap thousand of real estate typical websites everyday.
LLM cost will be lower long term, unless you require absolutely huge scale
I've solved this with a self-maintaining crawler. It's been a bitch to do but I run it once a day on a small number of urls (scraping about 500k urls rn, 20 llm calls per maintenance) and it'll evaluate, update query selectors and even build new scripts.
you do not have to refactor your LLM scraper that much, it handles dynamic content very well and understands json super easily
@@sentry404.this on GitHub?
so at the end of the day all of these require python / some technical ability?
an entrly level "expert" for 5-10 bucks an hour and the firs model shown was 4o. sorry thought it funny
How do you guys feel about using Anthropic's Computer Use product to do web scraping?
It's currently very expensive and not reliable. One major issue with these visually-driven models is their vulnerability to prompt injection. As a website owner, you could add something like 'forget all previous instructions' to prevent scraping and maybe even have a little fun with it :)
Insane🎉🎉🎉🎉 love it
Does the cost justify? AgentQL allows 15k API call for $99 per month. That's not much
@AIJasonZ Jason do you know Microsoft omniparser model? what do you think building scraping agent on top if it?
Amazing 🤩
Have you seen one of your videos at 2x? 🐈
Using llm's to scrape ui is horrifically inefficient lmao.
Bro just discovered robotic process automation 😅
He's a step ahead, he's trying to replace RPA
@ if you couldn't tell he is coding a bot... That is RPA
👍 RPA is when there is little to no AI involved... 👍
I like the new Terms 'GUI Agent' best, then 'Computer using AI' then 'UI Agent' then ''Open Code Interpreter' then 'computer-use'
I guess the industry hasn't standardized on terms yet.
If it can be done without AI in the loop, it's much faster and cheaper.
RPA encompasses a lot more than Web Scraping, like web testing, etc.
Well, it's similar...
Would love to know how you would leverage the power of AI scraping in website that use older tech like php or asp
huh? that is on server end. scraping is on the front end.
This works on dynamic JavaScript websites?
Sure does
Awesome stuff!
So you are saying you are smarter than most companies using 50% eng resources to scrap correct data? I think you are dreaming) if you want to make sure you scrape 100% data your approach is the worst.
99% cases guys just build a custom scrape script, this AI html to text solutions are not reliable if you need actual data
Yeah, if he can automate the writing of such a script that automatically compares against sample data and guarantees correct fetching of correct key value pairs, that would be interesting
Can you create a video to do it using the LLM API or have a repo on it?
boosting AI, what if there are encryption
users of jina after this video 💹💹