OpenAI's Agent 2.0: Excited or Scared?
Вставка
- Опубліковано 9 чер 2024
- I want to give you a full run down of browser/mobile/desktop AI agents
Get free HubSpot E-book: Using Generative AI to scale your content operation: clickhubspot.com/2ld
🔗 Links
- Follow me on twitter: / jasonzhou1993
- Join my AI email list: crafters.ai/
- My discord: / discord
- Github repo: github.com/JayZeeDesign/unive... (You do need WebQL api key first)
- WebQL: docs.webql.tinyfish.io/
- Self-operation-computer: github.com/OthersideAI/self-o...
- Hyperwrite: hyperwriteai.com/?via=jason-zhou
- MultiOn: multion.notion.site/Download-...
⏱️ Timestamps
0:00 Intro
2:07 Digital Agents
4:13 Using Generative AI to scale content operation
5:22 Core Components of Digital Agents
7:34 Three main ways to build digital agents
7:58 Method 1: HTML/XML based
11:00 Method 2: Vision based
16:38 Tutorial: build your own universal AI agent to scrape anything via WebQL
22:56 Demo of universal web scraper
👋🏻 About Me
My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com
#gpt5 #mixtral #gpt4turbo #gpt4 #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #bestaiagent #chatgpt #agentgpt #agent #autogen #autogpt #openai - Наука та технологія
00:05 OpenAI is developing a new type of agent to automate tasks.
02:02 Simulating real human interaction on computer devices unlocks everyday personal assistant use cases.
05:59 Training Agent 2.0 requires more steps and reasoning ability, but the potential market opportunities are exciting.
07:59 Using HTML or XML based approach to provide context for web agents.
12:03 Improving accuracy of self-operating computer vision tasks
14:03 Using multiple models together to interact with GUI screenshots
17:47 WebQ basic allows easy web automation
19:27 Setting up web ql and play right for web browser interaction
22:40 Developed a script for universal e-commerce product information scraping.
24:32 WebQ allows you to build powerful web agents for various tasks.
Thank god, I thought you were gone. Been waiting for another video. You are going places Jason, incredibly talented at teaching complex subjects in easy to understand digestible videos.
Thank you!
this is such a sweet comment :-)
he doesnt upload often anyway, he's never truly gone. the quality is among the best though
Thanks for the kind words! Glad you're enjoying the content!
Hi Jason, I subscribed a few weeks ago and have been really appreciating your videos! As others have said, your teaching style is very thorough and contains enough depth without being overwhelming. Thanks for keeping us all up to date on the latest AI tech and for taking the time to break it down for easy understanding. Wondering if you have a community?
DAYUMMM man you are always dropping fire contents. I'm always curious on what you can bring to the table for each upload and they always amaze me the quality you can bring. Thanks a lot, much appreciated and keep it up Jason! Big fan right here :)
This is my favourite AI channel, perfect mix of theory and practical application. Would love to see an indepth video on this. Please keep it up, these videoes help a lot!
Thanks Jason. I’ve only just found your channel but I’m glad I did. This is great content!
Great video, well researched. This is what I expect from a quality channel. You got yourself a subscriber!
Looking forward to the in depth video. 👍🏻
I’m only a minute into this video, but I just want to say- what amazing visuals (like the diagrams)!
Also, this is such a great premise for a video, and you are great at explaining complex things in a way such that people of varying levels of technical knowledge can understand them.
Thanks for the kind words, I will keep it up!
When I watch your videos, I feel I'm watching the future!
Thanks Jason for your content 👏🏼
Great video Jason, really informative, thank you
Thanks alot for this video!
I am also actively exploring this tools - so far I used the approach of Self Operating Computer with modifications - instead of using Selenuim or playwrite and struggle with web elements locators, I just define coordinates of those elements and interact with them. Whebql looks really interesting and I definitely will try it. I think that it's real potential can be used in a multi agents team, like Autogen or CrewAI. Thanks again!
Wow this information is so well done. Thank you
The breakdown of different methods to get vision model identify UI elements to interact with is very useful;
I just start imaging - if we have super powerful web agents & spin up 1000+ virtual machines and letting them completing web tasks simutanously - it's gonna be so powerful
True. The limit will be: A lotta people use windows, but windows is extremely resource heavy, even for VMs. *I guess people could try to use Linux and "Wine", but I'm not sure how good that will be, and won't be pure AI. You can spin up say 100 VMs of Linux, and because Windows is so much more heavy and requires licenses and stuff, thats like probably 10 windows machines only (just a off the cuff, but it is probably something like that)
Anyway TL/DR: hopefully windows OS can help deal with their limitations so we all can do/have this be effective at scale
Oh, you said WEB tasks. Yeah pure web is different, since you could use any OS for that probably and use a "headless browser". Great point.
What kinds of tasks do you envisage?
Very interested in the "in-depth" video you mentioned at the end, looking forward to seeing you add GPT Vision with WebQL
High quality video from Jason as always ❤
yes sir i would love a github link to check out the code of your scraper please.
Same here
Jason, you are a hero! this is great. Please a video for an agent who can browse the web discover websites that can be useful and organize the URLs in a spreadsheet. Thank you!!!
Love your videos Jason. You are one of the few guys who make good content for someone who is not entirely new in the space. Greatly appreciated! You are a GOAT in my opinion 🙌🐐🤩
Great video thanks Jason
I'm convinced open AI has an alien locked deep underground spilling all this black magic technology.
Demonolotry practitioners and others are known to channel demons (defined in many more ways than most realize) to write books through them and develop technological/scientific advancements.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
Great analysis, thanks for sharing
babe wake tf up ai jason posted
This is incredible, this will definitely change the way people interact with apps and the internet
Dude its amazing love it. Keep it up
incredible video jason. When do you think they'll release it approximately? Does this kill all agent startups?
This is gold. Thanks!
Super interesting topic. I'd really like to see GPT-4V working with WebQL/AgentQL. 🎉
Brilliant video - thanks for this. It also makes me concerned for the future of web security - we're almost at the point where an AI tool can be built that will call your mobile, speak to you in a real voice and extract enough information from you to enable it to then browse the web, log into your bank account and ... well, you get the rest.
Great breakdown! 👍
Great video!! Thx for your regular insights the perfect balance between new tech and tutorial. In your video you said that webql is open source. Where do I find these sources.
these are the beast AI Agent tutorials
3 minutes in, I already knew how this will help my actual work
Thank you!
I also worked on a similar project right after they announced rabbit r1, it was using ocr+yolo+ llm to control the computer.
I was able to get it to click wherever I wanted, but failed to build the backend for the llm to orchestrate the high level tasks.
It was simply too much work for me. 😅😅
🏆 Well done! Happy to see your subscriber count is growing as your videos are quite valuable!
I look forward to an AI agent where a topic/task can be given with little to no guidance. It uses a swarm to find relevant websites, grabs relevant info and creates a spreadsheet to compared the websites found. Use case: Best price for an item you want to purchase. Best features on offer for online tool/service to meet one's needed. Performing competitor analysis as part of business idea validation process. Many others I'm sure you can think of, too.
Fantastic introduction to WebQL and what it can do. Well done!
I'm curious about the name change to "AgentQL" and what it portends. Lots of agent frameworks use image snapshots, possibly with overlaid annotations, then drive the desktop or browser with basic actions (click, text entry, etc). Maybe generating WebQL is a better approach to driving the browser?
hey json this was amazing , can you please prepare a hands on tutorial on cogAgent
This is awesome, a few month ago web agent still feel like unusable, didn't know it has come along so far;
The demo in the end is super pratical, an universal web scraper itself will unlock lots of use case, trying it out tonight!!
Amazing video! Quick question. What program do u use to record this presentations? Loom?
Descript! Very good
amazing content!
It was great explanation...
I have been using Lang chain shell tool to perform various actions on my desktop that can be done by cmd. I believe a mix of power shell, HTML parsing and a ocr can act a good model to be in production.
So based on the given prompt, a lllm master agent can take a decision to use any of the path. And after 3-4 tries if It fails to do it. Then it can come back and get redirected to another path to fulfil the task.
thank you
great content sir
You are the only real pioneer of AI education that is readily avaliable to people that are not directly involved in this new developing area of CS.
All others are just baiting for views and scamming saying "make xyz $$$ with my shitty code that I just copied from somewhere else" lol
This is good Stuff, thanks.
WebGl seems pretty nice, will give it a shot, is there a javascript version of it?
Not yet, but its in a roadmap
ありがとうございます!
Thank you 🙏
Thanks
Thank you 🙏
i wanna know how ya brainstorm the thumbnail idea?
I believe that a viable solution would be to create a new city designed specifically for autonomous cars to flow smoothly, rather than trying to adapt autonomous cars in cities designed for human drivers. Similarly, I propose creating a dedicated part of the internet for interactions between artificial intelligences, where everything could be automated or accessed by voice commands. This would even include developing an operating system that accepts these types of interactions. For example, apps like Uber could be entirely accessible by artificial intelligences, allowing users to request services through voice commands. This could simplify tasks such as requesting a ride to downtown New York, where artificial intelligence could perform various actions within the app to meet the user's needs. A webpages designer that way would be the solution to agents trying to do many tasks on an website
If we start building whole cities to accommodate ai, then we are working for ai, ai is not working for us
Mmmmm… I like the approach… like a hybrid environment. Not necessarily force AI into ours or us into AI… but two distinct environments designed for human in one environment and AI in another to optimize performance in either environment with freedom to access the other environment knowing well that performance will be limited when not in the “native environment”.
How does it deal with hint modals that pop up, like in 23:56? If its a screenshot, it'll obscure things.
Thanks Jason!!
What is the cost of AgentQL
They are beta testing now so don’t think they finalised pricing yet!
Simple solution going forwards would be to add comments in all UI elements when designing a website. Describe exactly what they do: ie. //This is the element to submit the login form. Etc. Would take a while to catch on in the web dev community. I for one will be doing this on any of my sites going forward. Accessibility for AI is a genuine concern at this point. 🎉
Oh that’s a great point! Yea some portal around it will be great
I tried tools like MultiOn but no no-code tool seems to work well yet. Open to suggestions.
The begining of the end.
I think at this point, for practically speaking all of us, _how_ it works no longer matters. It’s now much more pressing to know what to do with it, as well as with ourselves.
Before agents were a thing I was using another gpt to determine which tools that were needed and then would format the request for me.
That WebQL sounds crazy. I searched for it but couldn’t find their site, just some placeholder. You know what happened?
I’ve added link in description!
we want another autonomous agent part 4 with langgraph
If sora has a universal model of real world physics, it sure could have a model of the universe of web browsers interfaces. That would make all these hack work arounds redundant. Open ai could have been using gpt-4 to build this training data for ages and be streaks ahead. If agents can learn to play video games, sora has a multi-world model of physics, and gpt-4 can reason better than most humans…. Wow
Could you make a QA bot that is given a scenario with steps to test happy flows and maybe also negative flows?
Totally can, it is actually a perfect use case
What happens when a site changes its format?
Can you test Gemini 1,5 with rpa?
thoughts on $OLAS? its a framework to develop AI agents
Can you use an LLM to talk to the editor instead of coding?
when it needs to complete a captcha to log in for you 0_0
Already AI able at 97%.
can anyone recommend similar channels around the web?
I can work on that and fix the problems, I need research center to work in
Isn't it just a fancy Selenium Webdriver?
I was using chatGPT to build an agent exactly like this. Does OpenAI have access to the contents of chatGPT chats? This is odd timing for this to all of a sudden be announced💀 I’d be more inclined to call it a coincidence if the company that’s building this wasn’t the same company I was using to develop this. Not making any claims but I’m curious now. Does OpenAI have access to user chat data?
That’s just parallel thinking Amy Schumer
Do we need API webql to try this?
Yes I believe you do, but they are planning open source it too
🔥🔥🔥🔥🔥/🔥🔥🔥🔥🔥
Chrome extension - You need to request permission to download the extension. Not sure how long will it take to get the access.
Why am I watching this at 2am. I don't even know how to code 😭
It exist now multion, uipath.
I would put it to play poker 😂
Can it solve any capcha?
Yes, I believe so for simple ones
Release the scraper code ❤
Added GitHub link in description! But you need api key first
omg, this means that designers have to design for another viewport/user agent... AI
what the hell is webQ??? where is the AI scraper????
good point. We renamed it to AgentQL :)
Yeah, Agents are great but what happens when someone prompts for their agent to find their banking password on the computer, log into the bank and transfer all funds to xxx - on someone else's computer - on an entire botnet of computers (millions)
It’s only a matter of time. Soon AI will be walking on CPU’s.
How does OpenAI keep pushing shitc?
Chat GPT + selenium = scary
This is just going to end up with a bunch of ai talking to each other lol
Why should I be excited or scared? I've had this thing for over a year now - built not long after GPT-3.5 was released. It has full control over my linux machine and works fairly well.
Did it also write this comment because that would explain a lot
@@JamesHoffmannLover Why so cheeky? Sure, they're doing a bit more than simply allowing an LLM to control your OS through the CLI, but do you really think the leap is that big? Like... I'm genuinely curious about your opinion on this. Which unique feature that we didn't already have in similar open-source projects is so exciting or fear-inducing?
@@anatolydyatlov963 Just keep playing with your Linux toy while the rest of us keep an open mind on new technology advancements 👍. But please try to show some respect for people like ai Jason who cover these topics for us
@@JamesHoffmannLover Why do you refuse to acknowledge the hard work of numerous software developers from the whole world who have created similar projects, FAR exceeding what I'm describing here? Have you even heard of the Self-Operating Computer Framework by OthersideAI? You're treating them like ghosts who don't even exist, and when a big corporation creates something similar, you're cheering as if they made a groundbreaking discovery. Own it up.
@@anatolydyatlov963 lol since when am I doing any of that? Maybe re-read the comments and think about it for a while.
lol at the soy face thumbnails
Self operating computer doesn't actually work...
To be honest selenium is still easier.
AutohotKey has been doing this for years. Requires basic programming skills.
This is insane!
Did you try CogAgent?