OpenAI's Agent 2.0: Excited or Scared?

Поділитися
Вставка
  • Опубліковано 9 чер 2024
  • I want to give you a full run down of browser/mobile/desktop AI agents
    Get free HubSpot E-book: Using Generative AI to scale your content operation: clickhubspot.com/2ld
    🔗 Links
    - Follow me on twitter: / jasonzhou1993
    - Join my AI email list: crafters.ai/
    - My discord: / discord
    - Github repo: github.com/JayZeeDesign/unive... (You do need WebQL api key first)
    - WebQL: docs.webql.tinyfish.io/
    - Self-operation-computer: github.com/OthersideAI/self-o...
    - Hyperwrite: hyperwriteai.com/?via=jason-zhou
    - MultiOn: multion.notion.site/Download-...
    ⏱️ Timestamps
    0:00 Intro
    2:07 Digital Agents
    4:13 Using Generative AI to scale content operation
    5:22 Core Components of Digital Agents
    7:34 Three main ways to build digital agents
    7:58 Method 1: HTML/XML based
    11:00 Method 2: Vision based
    16:38 Tutorial: build your own universal AI agent to scrape anything via WebQL
    22:56 Demo of universal web scraper
    👋🏻 About Me
    My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com
    #gpt5 #mixtral #gpt4turbo #gpt4 #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #bestaiagent #chatgpt #agentgpt #agent #autogen #autogpt #openai
  • Наука та технологія

КОМЕНТАРІ • 144

  • @jaanireel
    @jaanireel 3 місяці тому +9

    00:05 OpenAI is developing a new type of agent to automate tasks.
    02:02 Simulating real human interaction on computer devices unlocks everyday personal assistant use cases.
    05:59 Training Agent 2.0 requires more steps and reasoning ability, but the potential market opportunities are exciting.
    07:59 Using HTML or XML based approach to provide context for web agents.
    12:03 Improving accuracy of self-operating computer vision tasks
    14:03 Using multiple models together to interact with GUI screenshots
    17:47 WebQ basic allows easy web automation
    19:27 Setting up web ql and play right for web browser interaction
    22:40 Developed a script for universal e-commerce product information scraping.
    24:32 WebQ allows you to build powerful web agents for various tasks.

  • @thesvenni
    @thesvenni 3 місяці тому +31

    Thank god, I thought you were gone. Been waiting for another video. You are going places Jason, incredibly talented at teaching complex subjects in easy to understand digestible videos.
    Thank you!

    • @canadianblackops2412
      @canadianblackops2412 3 місяці тому +1

      this is such a sweet comment :-)

    • @ryzikx
      @ryzikx 3 місяці тому +1

      he doesnt upload often anyway, he's never truly gone. the quality is among the best though

    • @gokulakrishnanr8414
      @gokulakrishnanr8414 3 місяці тому

      Thanks for the kind words! Glad you're enjoying the content!

  • @ScottBrooks415
    @ScottBrooks415 3 місяці тому +6

    Hi Jason, I subscribed a few weeks ago and have been really appreciating your videos! As others have said, your teaching style is very thorough and contains enough depth without being overwhelming. Thanks for keeping us all up to date on the latest AI tech and for taking the time to break it down for easy understanding. Wondering if you have a community?

  • @gabrieleguo
    @gabrieleguo 3 місяці тому

    DAYUMMM man you are always dropping fire contents. I'm always curious on what you can bring to the table for each upload and they always amaze me the quality you can bring. Thanks a lot, much appreciated and keep it up Jason! Big fan right here :)

  • @ayushmansingh1470
    @ayushmansingh1470 3 місяці тому

    This is my favourite AI channel, perfect mix of theory and practical application. Would love to see an indepth video on this. Please keep it up, these videoes help a lot!

  • @theBLAMfam
    @theBLAMfam 3 місяці тому

    Thanks Jason. I’ve only just found your channel but I’m glad I did. This is great content!

  • @starmap
    @starmap 3 місяці тому +1

    Great video, well researched. This is what I expect from a quality channel. You got yourself a subscriber!

  • @thesilentcitadel
    @thesilentcitadel 3 місяці тому +2

    Looking forward to the in depth video. 👍🏻

  • @candlespotlight
    @candlespotlight 3 місяці тому +2

    I’m only a minute into this video, but I just want to say- what amazing visuals (like the diagrams)!
    Also, this is such a great premise for a video, and you are great at explaining complex things in a way such that people of varying levels of technical knowledge can understand them.

    • @AIJasonZ
      @AIJasonZ  3 місяці тому

      Thanks for the kind words, I will keep it up!

  • @JoaquinTorroba
    @JoaquinTorroba 3 місяці тому

    When I watch your videos, I feel I'm watching the future!
    Thanks Jason for your content 👏🏼

  • @gofastandfar
    @gofastandfar 3 місяці тому

    Great video Jason, really informative, thank you

  • @iakov_volf
    @iakov_volf 3 місяці тому +1

    Thanks alot for this video!
    I am also actively exploring this tools - so far I used the approach of Self Operating Computer with modifications - instead of using Selenuim or playwrite and struggle with web elements locators, I just define coordinates of those elements and interact with them. Whebql looks really interesting and I definitely will try it. I think that it's real potential can be used in a multi agents team, like Autogen or CrewAI. Thanks again!

  • @twokayoh9347
    @twokayoh9347 3 місяці тому

    Wow this information is so well done. Thank you

  • @Jim-ey3ry
    @Jim-ey3ry 3 місяці тому +10

    The breakdown of different methods to get vision model identify UI elements to interact with is very useful;
    I just start imaging - if we have super powerful web agents & spin up 1000+ virtual machines and letting them completing web tasks simutanously - it's gonna be so powerful

    • @mikesopko7374
      @mikesopko7374 3 місяці тому

      True. The limit will be: A lotta people use windows, but windows is extremely resource heavy, even for VMs. *I guess people could try to use Linux and "Wine", but I'm not sure how good that will be, and won't be pure AI. You can spin up say 100 VMs of Linux, and because Windows is so much more heavy and requires licenses and stuff, thats like probably 10 windows machines only (just a off the cuff, but it is probably something like that)
      Anyway TL/DR: hopefully windows OS can help deal with their limitations so we all can do/have this be effective at scale

    • @mikesopko7374
      @mikesopko7374 3 місяці тому

      Oh, you said WEB tasks. Yeah pure web is different, since you could use any OS for that probably and use a "headless browser". Great point.

    • @silent.-killer
      @silent.-killer 3 місяці тому

      What kinds of tasks do you envisage?

  • @thesvenni
    @thesvenni 3 місяці тому

    Very interested in the "in-depth" video you mentioned at the end, looking forward to seeing you add GPT Vision with WebQL

  • @free_thinker4958
    @free_thinker4958 3 місяці тому +6

    High quality video from Jason as always ❤

  • @TheSacredGrove
    @TheSacredGrove 3 місяці тому +9

    yes sir i would love a github link to check out the code of your scraper please.

  • @AIGooroo
    @AIGooroo 3 місяці тому +1

    Jason, you are a hero! this is great. Please a video for an agent who can browse the web discover websites that can be useful and organize the URLs in a spreadsheet. Thank you!!!

  • @user-ug3pf3uw6x
    @user-ug3pf3uw6x 3 місяці тому

    Love your videos Jason. You are one of the few guys who make good content for someone who is not entirely new in the space. Greatly appreciated! You are a GOAT in my opinion 🙌🐐🤩

  • @oooooooo347
    @oooooooo347 3 місяці тому +1

    Great video thanks Jason

  • @kayodeejisun2211
    @kayodeejisun2211 3 місяці тому +43

    I'm convinced open AI has an alien locked deep underground spilling all this black magic technology.

    • @sedat4151
      @sedat4151 3 місяці тому

      Demonolotry practitioners and others are known to channel demons (defined in many more ways than most realize) to write books through them and develop technological/scientific advancements.

    • @sedat4151
      @sedat4151 3 місяці тому

      Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.

    • @sedat4151
      @sedat4151 3 місяці тому

      Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.

    • @sedat4151
      @sedat4151 3 місяці тому

      Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.

    • @sedat4151
      @sedat4151 3 місяці тому

      Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.

  • @JoshuaGottlieb-oz4er
    @JoshuaGottlieb-oz4er 3 місяці тому

    Great analysis, thanks for sharing

  • @MrAndyisReal
    @MrAndyisReal 3 місяці тому +13

    babe wake tf up ai jason posted

  • @MartinPleasant-ty1rw
    @MartinPleasant-ty1rw 3 місяці тому

    This is incredible, this will definitely change the way people interact with apps and the internet

  • @aiAlchemyy
    @aiAlchemyy 3 місяці тому

    Dude its amazing love it. Keep it up

  • @righttiming
    @righttiming 3 місяці тому

    incredible video jason. When do you think they'll release it approximately? Does this kill all agent startups?

  • @j_s_h9
    @j_s_h9 3 місяці тому

    This is gold. Thanks!

  • @carstenli
    @carstenli 3 місяці тому

    Super interesting topic. I'd really like to see GPT-4V working with WebQL/AgentQL. 🎉

  • @sitedev
    @sitedev 3 місяці тому

    Brilliant video - thanks for this. It also makes me concerned for the future of web security - we're almost at the point where an AI tool can be built that will call your mobile, speak to you in a real voice and extract enough information from you to enable it to then browse the web, log into your bank account and ... well, you get the rest.

  • @mikew2883
    @mikew2883 3 місяці тому

    Great breakdown! 👍

  • @DannyGerst
    @DannyGerst 3 місяці тому

    Great video!! Thx for your regular insights the perfect balance between new tech and tutorial. In your video you said that webql is open source. Where do I find these sources.

  • @xonack
    @xonack 3 місяці тому

    these are the beast AI Agent tutorials

  • @lawalexlaw
    @lawalexlaw 3 місяці тому

    3 minutes in, I already knew how this will help my actual work

  • @realCleanK
    @realCleanK 3 місяці тому

    Thank you!

  • @PseudoProphet
    @PseudoProphet 28 днів тому +1

    I also worked on a similar project right after they announced rabbit r1, it was using ocr+yolo+ llm to control the computer.
    I was able to get it to click wherever I wanted, but failed to build the backend for the llm to orchestrate the high level tasks.
    It was simply too much work for me. 😅😅

  • @RoadTo19
    @RoadTo19 3 місяці тому

    🏆 Well done! Happy to see your subscriber count is growing as your videos are quite valuable!
    I look forward to an AI agent where a topic/task can be given with little to no guidance. It uses a swarm to find relevant websites, grabs relevant info and creates a spreadsheet to compared the websites found. Use case: Best price for an item you want to purchase. Best features on offer for online tool/service to meet one's needed. Performing competitor analysis as part of business idea validation process. Many others I'm sure you can think of, too.

  • @joeternasky
    @joeternasky 3 місяці тому

    Fantastic introduction to WebQL and what it can do. Well done!
    I'm curious about the name change to "AgentQL" and what it portends. Lots of agent frameworks use image snapshots, possibly with overlaid annotations, then drive the desktop or browser with basic actions (click, text entry, etc). Maybe generating WebQL is a better approach to driving the browser?

  • @venugopalt6861
    @venugopalt6861 3 місяці тому

    hey json this was amazing , can you please prepare a hands on tutorial on cogAgent

  • @jasonfinance
    @jasonfinance 3 місяці тому +2

    This is awesome, a few month ago web agent still feel like unusable, didn't know it has come along so far;
    The demo in the end is super pratical, an universal web scraper itself will unlock lots of use case, trying it out tonight!!

  • @RinaldiSebastian
    @RinaldiSebastian 3 місяці тому

    Amazing video! Quick question. What program do u use to record this presentations? Loom?

    • @AIJasonZ
      @AIJasonZ  3 місяці тому

      Descript! Very good

  • @georgestander2682
    @georgestander2682 3 місяці тому

    amazing content!

  • @saivamsi441
    @saivamsi441 3 місяці тому

    It was great explanation...
    I have been using Lang chain shell tool to perform various actions on my desktop that can be done by cmd. I believe a mix of power shell, HTML parsing and a ocr can act a good model to be in production.
    So based on the given prompt, a lllm master agent can take a decision to use any of the path. And after 3-4 tries if It fails to do it. Then it can come back and get redirected to another path to fulfil the task.

  • @far.k.3112
    @far.k.3112 3 місяці тому

    thank you

  • @fulowa
    @fulowa 3 місяці тому

    great content sir

  • @siper1686
    @siper1686 3 місяці тому

    You are the only real pioneer of AI education that is readily avaliable to people that are not directly involved in this new developing area of CS.
    All others are just baiting for views and scamming saying "make xyz $$$ with my shitty code that I just copied from somewhere else" lol

  • @webdancer
    @webdancer 3 місяці тому

    This is good Stuff, thanks.

  • @hackerhaze
    @hackerhaze 3 місяці тому +2

    WebGl seems pretty nice, will give it a shot, is there a javascript version of it?

    • @PavelDudka
      @PavelDudka 3 місяці тому

      Not yet, but its in a roadmap

  • @tsenri2743
    @tsenri2743 3 місяці тому +1

    ありがとうございます!

    • @AIJasonZ
      @AIJasonZ  3 місяці тому

      Thank you 🙏

  • @thesilentcitadel
    @thesilentcitadel 3 місяці тому +2

    Thanks

    • @AIJasonZ
      @AIJasonZ  3 місяці тому

      Thank you 🙏

  • @VaibhavShewale
    @VaibhavShewale 3 місяці тому +2

    i wanna know how ya brainstorm the thumbnail idea?

  • @devfromthefuture506
    @devfromthefuture506 3 місяці тому +2

    I believe that a viable solution would be to create a new city designed specifically for autonomous cars to flow smoothly, rather than trying to adapt autonomous cars in cities designed for human drivers. Similarly, I propose creating a dedicated part of the internet for interactions between artificial intelligences, where everything could be automated or accessed by voice commands. This would even include developing an operating system that accepts these types of interactions. For example, apps like Uber could be entirely accessible by artificial intelligences, allowing users to request services through voice commands. This could simplify tasks such as requesting a ride to downtown New York, where artificial intelligence could perform various actions within the app to meet the user's needs. A webpages designer that way would be the solution to agents trying to do many tasks on an website

    • @jbo8540
      @jbo8540 3 місяці тому +1

      If we start building whole cities to accommodate ai, then we are working for ai, ai is not working for us

    • @eightysevenmoore
      @eightysevenmoore 3 місяці тому

      Mmmmm… I like the approach… like a hybrid environment. Not necessarily force AI into ours or us into AI… but two distinct environments designed for human in one environment and AI in another to optimize performance in either environment with freedom to access the other environment knowing well that performance will be limited when not in the “native environment”.

  • @DeepfriedBaby
    @DeepfriedBaby 3 місяці тому

    How does it deal with hint modals that pop up, like in 23:56? If its a screenshot, it'll obscure things.

  • @RichardGetzPhotography
    @RichardGetzPhotography 3 місяці тому +1

    Thanks Jason!!
    What is the cost of AgentQL

    • @AIJasonZ
      @AIJasonZ  3 місяці тому

      They are beta testing now so don’t think they finalised pricing yet!

  • @anubisai
    @anubisai 3 місяці тому

    Simple solution going forwards would be to add comments in all UI elements when designing a website. Describe exactly what they do: ie. //This is the element to submit the login form. Etc. Would take a while to catch on in the web dev community. I for one will be doing this on any of my sites going forward. Accessibility for AI is a genuine concern at this point. 🎉

    • @AIJasonZ
      @AIJasonZ  3 місяці тому

      Oh that’s a great point! Yea some portal around it will be great

  • @bertstevens245
    @bertstevens245 3 місяці тому

    I tried tools like MultiOn but no no-code tool seems to work well yet. Open to suggestions.

  • @harimgarcialamont9140
    @harimgarcialamont9140 3 місяці тому

    The begining of the end.

  • @chrisoffersen
    @chrisoffersen 3 місяці тому

    I think at this point, for practically speaking all of us, _how_ it works no longer matters. It’s now much more pressing to know what to do with it, as well as with ourselves.

  • @skateking8
    @skateking8 3 місяці тому

    Before agents were a thing I was using another gpt to determine which tools that were needed and then would format the request for me.

  • @dawid_dahl
    @dawid_dahl 3 місяці тому

    That WebQL sounds crazy. I searched for it but couldn’t find their site, just some placeholder. You know what happened?

    • @AIJasonZ
      @AIJasonZ  3 місяці тому

      I’ve added link in description!

  • @whatyoumissed9994
    @whatyoumissed9994 3 місяці тому

    we want another autonomous agent part 4 with langgraph

  • @nathank5140
    @nathank5140 3 місяці тому

    If sora has a universal model of real world physics, it sure could have a model of the universe of web browsers interfaces. That would make all these hack work arounds redundant. Open ai could have been using gpt-4 to build this training data for ages and be streaks ahead. If agents can learn to play video games, sora has a multi-world model of physics, and gpt-4 can reason better than most humans…. Wow

  • @darkbelg
    @darkbelg 3 місяці тому

    Could you make a QA bot that is given a scenario with steps to test happy flows and maybe also negative flows?

    • @AIJasonZ
      @AIJasonZ  3 місяці тому

      Totally can, it is actually a perfect use case

  • @chockablock34839
    @chockablock34839 3 місяці тому

    What happens when a site changes its format?

  • @Yakibackk
    @Yakibackk 3 місяці тому

    Can you test Gemini 1,5 with rpa?

  • @drustan6890
    @drustan6890 3 місяці тому

    thoughts on $OLAS? its a framework to develop AI agents

  • @stanleylu3625
    @stanleylu3625 2 місяці тому

    Can you use an LLM to talk to the editor instead of coding?

  • @alden6321
    @alden6321 3 місяці тому +1

    when it needs to complete a captcha to log in for you 0_0

    • @brianmi40
      @brianmi40 3 місяці тому

      Already AI able at 97%.

  • @neponel
    @neponel 3 місяці тому

    can anyone recommend similar channels around the web?

  • @semosemo3827
    @semosemo3827 3 місяці тому

    I can work on that and fix the problems, I need research center to work in

  • @oleksandrsova4803
    @oleksandrsova4803 3 місяці тому

    Isn't it just a fancy Selenium Webdriver?

  • @Bt_allen22
    @Bt_allen22 3 місяці тому

    I was using chatGPT to build an agent exactly like this. Does OpenAI have access to the contents of chatGPT chats? This is odd timing for this to all of a sudden be announced💀 I’d be more inclined to call it a coincidence if the company that’s building this wasn’t the same company I was using to develop this. Not making any claims but I’m curious now. Does OpenAI have access to user chat data?

  • @jirivchi
    @jirivchi 3 місяці тому

    Do we need API webql to try this?

    • @AIJasonZ
      @AIJasonZ  3 місяці тому +1

      Yes I believe you do, but they are planning open source it too

  • @brando2818
    @brando2818 3 місяці тому

    🔥🔥🔥🔥🔥/🔥🔥🔥🔥🔥

  • @mazkaibil9108
    @mazkaibil9108 3 місяці тому

    Chrome extension - You need to request permission to download the extension. Not sure how long will it take to get the access.

  • @errmmm
    @errmmm 3 місяці тому +1

    Why am I watching this at 2am. I don't even know how to code 😭

  • @rahuldinesh2840
    @rahuldinesh2840 3 місяці тому

    It exist now multion, uipath.

  • @alfonsopayra
    @alfonsopayra 3 місяці тому

    I would put it to play poker 😂

  • @Dron008
    @Dron008 3 місяці тому

    Can it solve any capcha?

    • @AIJasonZ
      @AIJasonZ  3 місяці тому

      Yes, I believe so for simple ones

  • @hbruceweaver
    @hbruceweaver 3 місяці тому +1

    Release the scraper code ❤

    • @AIJasonZ
      @AIJasonZ  3 місяці тому

      Added GitHub link in description! But you need api key first

  • @DeepfriedBaby
    @DeepfriedBaby 3 місяці тому

    omg, this means that designers have to design for another viewport/user agent... AI

  • @hqcart1
    @hqcart1 3 місяці тому

    what the hell is webQ??? where is the AI scraper????

    • @PavelDudka
      @PavelDudka 3 місяці тому +1

      good point. We renamed it to AgentQL :)

  • @74Gee
    @74Gee 3 місяці тому

    Yeah, Agents are great but what happens when someone prompts for their agent to find their banking password on the computer, log into the bank and transfer all funds to xxx - on someone else's computer - on an entire botnet of computers (millions)

  • @GvRy8_5x46o7yXgSGaaJ.
    @GvRy8_5x46o7yXgSGaaJ. 3 місяці тому

    It’s only a matter of time. Soon AI will be walking on CPU’s.

  • @kilih.4525
    @kilih.4525 3 місяці тому

    How does OpenAI keep pushing shitc?

  • @skateking8
    @skateking8 3 місяці тому

    Chat GPT + selenium = scary

  • @whosaidthat2201
    @whosaidthat2201 3 місяці тому

    This is just going to end up with a bunch of ai talking to each other lol

  • @anatolydyatlov963
    @anatolydyatlov963 3 місяці тому

    Why should I be excited or scared? I've had this thing for over a year now - built not long after GPT-3.5 was released. It has full control over my linux machine and works fairly well.

    • @JamesHoffmannLover
      @JamesHoffmannLover 2 місяці тому

      Did it also write this comment because that would explain a lot

    • @anatolydyatlov963
      @anatolydyatlov963 2 місяці тому

      ​@@JamesHoffmannLover Why so cheeky? Sure, they're doing a bit more than simply allowing an LLM to control your OS through the CLI, but do you really think the leap is that big? Like... I'm genuinely curious about your opinion on this. Which unique feature that we didn't already have in similar open-source projects is so exciting or fear-inducing?

    • @JamesHoffmannLover
      @JamesHoffmannLover 2 місяці тому

      @@anatolydyatlov963 Just keep playing with your Linux toy while the rest of us keep an open mind on new technology advancements 👍. But please try to show some respect for people like ai Jason who cover these topics for us

    • @anatolydyatlov963
      @anatolydyatlov963 2 місяці тому

      @@JamesHoffmannLover Why do you refuse to acknowledge the hard work of numerous software developers from the whole world who have created similar projects, FAR exceeding what I'm describing here? Have you even heard of the Self-Operating Computer Framework by OthersideAI? You're treating them like ghosts who don't even exist, and when a big corporation creates something similar, you're cheering as if they made a groundbreaking discovery. Own it up.

    • @JamesHoffmannLover
      @JamesHoffmannLover 2 місяці тому

      @@anatolydyatlov963 lol since when am I doing any of that? Maybe re-read the comments and think about it for a while.

  • @MeatCatCheesyBlaster
    @MeatCatCheesyBlaster 3 місяці тому

    lol at the soy face thumbnails

  • @ahsin.shabbir
    @ahsin.shabbir 3 місяці тому

    Self operating computer doesn't actually work...

  • @MrMehrd
    @MrMehrd 3 місяці тому

    To be honest selenium is still easier.

  • @mixmax6027
    @mixmax6027 3 місяці тому

    AutohotKey has been doing this for years. Requires basic programming skills.

  • @scaledeals-io
    @scaledeals-io 3 місяці тому

    This is insane!

  • @lexchirita
    @lexchirita 3 місяці тому

    Did you try CogAgent?