MASSIVE Step Allowing AI Agents To Control Computers (MacOS, Windows, Linux)

Поділитися
Вставка
  • Опубліковано 27 сер 2024

КОМЕНТАРІ • 282

  • @KCM25NJL
    @KCM25NJL 4 місяці тому +55

    It's great an all, but I kinda think one of two things will end up happening:
    1. An AI layer will become a standard for interoperability as part of the OSI and App Dev Stacks
    2. A whole new OS will be developed that serves this very purpose.
    I suspect we may start with 1 and end up with 2 in the longer term.

    • @theterminaldave
      @theterminaldave 4 місяці тому +5

      When i was helping to write test steps for an automated software testing app, I was required to basically open up the developer tools and get the name of the object that was needed to be interacted with, the HTML code/name for a particular button, or a certain drop down textbox.
      I don't understand the whole "lay a grid over the screen and guess the coordinates." That's just the user interface, the computer utilizes all the code in the background I don't get why the AI isn't navigating by looking at the underlying code for the page instead of the graphical output of the page?

    • @DaveEtchells
      @DaveEtchells 4 місяці тому

      @@theterminaldave Interesting point. I'd say though that the point is to have the AI interact with the UI based on what a human would see. on a related note, there have been tools for doing software regression testing dating back many years that'd let you interact with UI elements, but it was a PITA to write the scripts for them and they were very fragile in that tiny changes could send them off the rails.

    • @Daniel-jm8we
      @Daniel-jm8we 4 місяці тому +1

      ​@@theterminaldave Would the AI always have access to the code?

    • @ich3601
      @ich3601 4 місяці тому

      ​@@Daniel-jm8we Allmost. When using RPA-Tools you're scanning the HTML, the OS-Events, or the application events. Would be great if an AI would eat this stuff, because nowadays RPA-Tools are very sensitive to changes.

    • @theterminaldave
      @theterminaldave 4 місяці тому

      @@Daniel-jm8we Open any webpage, and press f12, and click on the inspector tab, that's the code I'm referring to.
      It's basically the code for the graphical interface, so yes, the AI would always have "access" because if you don't have access it's because it's not appearing on the page.
      After you open the inspector, click on any line and hit delete, it will disappear from the page. If you hit refresh it will come back.

  • @donrosenthal5864
    @donrosenthal5864 4 місяці тому +56

    OSWorld project video? Yes, please!!!

    • @reidelliot1972
      @reidelliot1972 4 місяці тому +4

      Yes, tutorial please! Please elaborate more on the relationship to CrewAI-like frameworks and potential implications for the rumored YAML endpoints!

    • @user-wz3qe3vw6h
      @user-wz3qe3vw6h 3 місяці тому +3

      @@reidelliot1972 Yes Matthew, pls!

  • @haroldpierre1726
    @haroldpierre1726 4 місяці тому +29

    It would be helpful to have a catalog of pre-built open source AI agents that can be easily downloaded and used for specific tasks. My brain shuts off trying to follow video tutorials on programming my own AI agent from scratch.

  • @Carnivore69
    @Carnivore69 4 місяці тому +66

    User: What happens between the steps in these Ikea instructions?
    Agent: A fuckton of swearing!
    User: Test passed.

  • @BlankBrain
    @BlankBrain 4 місяці тому +5

    The most difficult part of making something like OSWorld is security. When you open your OS to computer manipulation, it's a lot easier for computers to manipulate it.

  • @alanhoeffler9629
    @alanhoeffler9629 4 місяці тому +2

    This was good video showing what had to be done to make LLM’s agentic using computer OS’s. It showed me two things. The first was why self autonomous cars are so hard to set up. The auto system has to not only know what the “rules of the road” are, what the automobile’s driving characteristics are, and how to make the car do what it needs to do, but it has to be able to correctly parse at high speed what a situation that it has never encountered before is, what is the correct action to take is and pull off executing it in real time. The second is that a system that can do that well is way closer to AGI than any LLM.

  • @ScottzPlaylists
    @ScottzPlaylists 4 місяці тому +15

    Yes Please 👍 Need lots of OSWorld Videos ❗❗❗
    We need a video tutorial watching AI, that creates a training set item for OSWorld on how to do X, by watching a video on how to do X (and fills in missing details not shown). 🤯🤯🤯🤯❗❗❗❗

    • @AGIBreakout
      @AGIBreakout 4 місяці тому +8

      Great Idea!!!!

    • @CryptoMetalMoney
      @CryptoMetalMoney 4 місяці тому +7

      YT Tutorials videos would be a huge ready to go dataset... Great Idea

    • @CryptoMetalMoney
      @CryptoMetalMoney 4 місяці тому +5

      Continuous learning will be huge in the future, and using computers will be a big part of that.

    • @NWONewsGod
      @NWONewsGod 4 місяці тому +5

      YT is a treasure trove for more Advanced forms of AI Training and even Training Now.

  • @pvanukoff
    @pvanukoff 4 місяці тому +53

    Not long before we have star-trek style computers, where we just say "computer ... do x, y and z for me".

    • @theterminaldave
      @theterminaldave 4 місяці тому +2

      That's the goal. Agentic AI.

    • @ericspecullaas2841
      @ericspecullaas2841 4 місяці тому +2

      You can do that now. Although food replicator and hollowdecks are still far off

    • @shooteru
      @shooteru 4 місяці тому +6

      Working on it, many of us

    • @JBulsa
      @JBulsa 4 місяці тому

      2 - 9 years

    • @tomaszzielinski4521
      @tomaszzielinski4521 4 місяці тому

      Who do you mean by "we"?

  • @jimbo2112
    @jimbo2112 4 місяці тому +2

    Yes please! Tutorial on this would be great. I see agents as being a driving force behind vast amounts of commercial AI adoption. Companies want greater efficiency and agents are the tools to bring this.

  • @threepe0
    @threepe0 4 місяці тому +4

    Really look forward to your videos. You’ve helped me get the gist of developments as they come out and determine which technologies are useful and worth spending my time on, and which ones I am equipped to handle, for my personal use-cases.
    I have and will continue to recommend your channel to friends and co-workers.
    Seriously man when I see your name, I click. Thank you for continuing to do what you do.

  • @AhmedMagdy-ly3ng
    @AhmedMagdy-ly3ng 3 місяці тому +1

    I will be more than happy to see you testing it in real world examples, not complex task but just everyday tasks, like summarize a bunch of pdfs or make a research, and things like that.
    And also i need to say that a really appreciate your work❤

  • @justjosh1400
    @justjosh1400 4 місяці тому +1

    Can't wait for the tutorial. Wanted to say thanks for the videos Matthew.

  • @marshallodom1388
    @marshallodom1388 4 місяці тому +7

    Computer! Computer?
    [Handed a mouse, he speaks into it]
    Hello, computer.
    The Dr. says just use the keyboard.
    Keyboard. How quaint.

  • @reidelliot1972
    @reidelliot1972 4 місяці тому +1

    Yes, tutorial please! Please elaborate more on the relationship to CrewAI-like frameworks and potential implications for the rumored YAML endpoints!

  • @iwatchyoutube9610
    @iwatchyoutube9610 4 місяці тому +5

    I was waiting for your own test the whole video. Git'r done son!

  • @BThunder30
    @BThunder30 4 місяці тому +2

    This is amazing. I think you need a team to help you set it up fast. We want to see a demo!

  • @darwinboor1300
    @darwinboor1300 4 місяці тому

    Thanks Matt.
    The change-the-background task is like an Optimus realworld task. Using the mouse requires a collection of basic motion skills (eg move in XY, click right/left, scroll up/down, etc). Moving and activating the mouse on a screen are simple subtasks necessary to build actual realworld tasks (on the PC these basic skills and subtasks and more can be accomplished using AutoHotkey). The reactive sequence of mouse subtasks (including motions) is the equivalent of FSD navigation from location A to B in the realworld or Optimus stepping through a set of realworld subtasks to complete a realworld task. The advantage for a change-the-background task AI is the paucity of edge cases that make realworld tasks so difficult for Optimus and for FSD. All three AI systems need to evaluate the realworld changes they evoke before executing the next subtask. Optimus and FSD repeatedly face infinite realworld variations between subtasks. These variations are introduced by independent external agents (cars, animals, fallen trees, etc.) The change-the-background task AI will mostly face changes due to software upgrades and due to different starting states. Most computer issues can be resolved by deeper searches on the web. AutoHotkey can programatically solve simple issues (hiding open windows). Having an AI to navigate the process would fundamentally change the ability to execute complex computer tasks based upon simple sequences of verbal commands.
    Here is an example: Convert the most recent Matt Berman UA-cam to mp4 and then extract unique screenshots to a Powerpoint file and the youtube transcript without timestamps to a text file. The filename for each file is MB1.

  • @ACTION-PLAY-SAFARI
    @ACTION-PLAY-SAFARI 4 місяці тому +2

    always awesome and informative videos Matt, love it brother. And feel like that much smarter after watching them. Keep up the awesome work!

  • @DefaultFlame
    @DefaultFlame 3 місяці тому +1

    Nice! I'd love to see you test it out.

  • @joe_limon
    @joe_limon 4 місяці тому +11

    How close until I have a locally run agentic system that can install all future improved agentic systems and/or github projects autonomously?

    • @fullcrum2089
      @fullcrum2089 4 місяці тому +2

      With this, a person's ideas, dreams and personalities can become immortal.

    • @nickdisney3D
      @nickdisney3D 4 місяці тому

      Id share my repo but i think youtube comments deletes it automatically.

    • @electiangelus
      @electiangelus 4 місяці тому

      Already there. Im actually passed this.

    • @fullcrum2089
      @fullcrum2089 4 місяці тому

      @@nickdisney3D yes, i can't see it, just share the path repo/name.

    • @electiangelus
      @electiangelus 4 місяці тому

      @@fullcrum2089 Apotheosis was thinking that 6 months ago.

  • @AGI-Bingo
    @AGI-Bingo 4 місяці тому +1

    A new golden age of open source is upon us ❤

  • @nangld
    @nangld 4 місяці тому +8

    20% success rate is super impressive a start. As soon as they iterate on that and train a proper model, it will reach 99%, leading to all office workers getting fired.

    • @andrada25m46
      @andrada25m46 4 місяці тому

      Yeah prolly not.
      I use AI at work, I’m one of the few who do. A lot of data is confidential and extra security measures are needed, sth like this breaches contractual agreements since the AI provider would have access to the data.
      Not to mention proprietary apps running in containers which the AI wouldn’t be able to navigate..

    • @marcussturup1314
      @marcussturup1314 4 місяці тому +5

      @@andrada25m46 Local LLM's could fix the data access issue.

    • @WolfeByteLabs
      @WolfeByteLabs 4 місяці тому +1

      This.

    • @stefano94103
      @stefano94103 4 місяці тому

      @@andrada25m46 All of the big player MicroSoft, IBM, Google all have enterprise software that is data privacy compliant. The price varies with the solution. The only problem with the enterprise LLMs are they do not move at the speed of other models for obvious reasons. But open source or enterprise is the way to go if your company has compliance requirements.

    • @greenleaf44
      @greenleaf44 4 місяці тому +1

      ​@@marcussturup1314 I feel like people underestimate how possible it is for large businesses to run their own inference

  • @PhoebusG
    @PhoebusG 4 місяці тому +1

    Yes, def set it up that would be a good video. Keep up the cool videos :)

  • @gatesv1326
    @gatesv1326 4 місяці тому

    Very similar to RPA (Robotic Process Automation) that I’ve been developing for 10 years now. Nothing new, but being able to do this with a typed or vocal prompt is what’s going to be interesting when it does get as good as a human can do (which is what RPA has been successful at doing for a long time), also understanding that RPA licences are expensive.

  • @OSWALD569
    @OSWALD569 4 місяці тому

    For performing actions on desktops there is a macro recorder available and suitable.

  • @ayreonate
    @ayreonate 4 місяці тому

    I think they set the temp @ 1.0 to test how hard it will hallucinate if given more creative freedom, then added it to the presentation just to show off

  • @galaxymariosuper
    @galaxymariosuper 4 місяці тому

    16:40 think of temperature as of maneuverability. the higher it is the more flexible the system, which is basically a closed loop control system at this point.

  • @buggi666
    @buggi666 3 місяці тому

    Soooo we basically arrived at Reinforcement Learning using LLMs? Thats sounds so awesome!

  • @DailyTuna
    @DailyTuna 3 місяці тому

    I think as this evolves it’s time for somebody to create a Linux system that would work directly with this, you need an operating system, catering directly to the agents

  • @DonDeCaire
    @DonDeCaire 4 місяці тому

    This is why simulated data is so important, if you can replicate REAL world environments you can test an infinite amount, of environmental conditions and infinite amount of times.

  • @mshonle
    @mshonle 4 місяці тому

    16:38 It depends on the specific formula used for the temperature setting, so a 1 here is by no means the maximum. The use of top-p implies there is nucleus sampling being used, which prevents the most improbable completions from even being considered. They are looking for a wider sampling to establish a baseline and setting the temperature too low would create more repetitive results (repeats across different runs and also repeating the same phrase in a single run until the context is full) and thus would be too easy dismiss as a strawman.

  • @systemlord001
    @systemlord001 4 місяці тому

    I think temp is set to 1 because if it fails and does another attempt it will have different approaches. When temp is set to lower values it might not get to a working solution because the tried method’s are not divergent enough to contain a valid solution.
    But i think having an llm fine tuned on datasets generated by humans in the format of OSWorld (the tree, screenshots ect…). Could improve the succes rate.
    If I am not mistaken this is what Rabbit R1 was doing. It’s basically teach mode but with more examples then just the one you give it.

  • @ThomasEWalker
    @ThomasEWalker 4 місяці тому

    Cool - This is moving SO fast! I think we will get AIs with the ability to recognize what is on the screen more directly, much like a self-driving car sees the world. This would become 'go click the button that does X', without screenshots. I bet that happens this year. Real world agents with AGI for a Christmas present!

  • @davidhoracek6758
    @davidhoracek6758 4 місяці тому

    This only needs to work once and you basically build the universal installer. Soon you just tell a computer "make the latest stablediffusion (or whatever) work on my computer, including all the hardware-specific optimizations that apply to my specific system. Then it just needs to bootsrap in the newest interaction AI for my OS, have a little conversation with the system, try promising settings, and if they fail, come up with others, and (importantly) update the weights of the remote installer system based on the successes and errors of this particular interaction.

  • @arinco3817
    @arinco3817 4 місяці тому

    This is really interesting. I've been thinking for ages about how to go from vllm to action. It's a bit like us sitting in front of your computer and describing what you want to happen.

  • @MeinDeutschkurs
    @MeinDeutschkurs 4 місяці тому +1

    Temperature of 0.1 could lead to “I cannot click, I’m just an LLM.”

  • @ThomasTomiczek
    @ThomasTomiczek 3 місяці тому

    I think a lot of the current problems are training - if GPT-5 is trained on videos from youtube and that includes a lot of videos of people USING THE COMPUTER - the AI may be more prepared for this.

  • @rupertllavore1731
    @rupertllavore1731 4 місяці тому

    NICE is see you getting Brand deals! May your channel Keep getting more brand deals!

  • @DamielBE
    @DamielBE 3 місяці тому

    hopefully one day we'll get agents like the Muses in Eclipse Phase or the Alt-Me in Peter F Hamilton's Salvation trilogy

  • @youjirogaming1m4daysago
    @youjirogaming1m4daysago 4 місяці тому

    Taking a screenshot and guessing is an impractical implementation, for desktop agents to truely work we would totally have to create new apis that directly alters the desktop state and best operating system to do this is Linux right now, but if max and Windows also provide them I think then it is possible for agents to make a significant impact

  • @2106chrissas
    @2106chrissas 3 місяці тому

    Great Project,
    it would be interesting to have a video on RAG and programs available for the RAG (example H2OGPT)

  • @dafunkyzee
    @dafunkyzee 3 місяці тому +1

    I strongly feel this is the completely wrong way of going about using agents. I respect the project is basically "Use what we got" We have windows we have MacOS... so now we want an agent to figure out how to use these interfaces to get things done.... but that idea is wrong because the OS is designed as a human interface with the machine... I'm working on an ai based os where the agents would directly work with the kernel to get shnizz done. Still hats off to the team to try this round of experimentation to see what the limits and capabilities of agents in their current form.

  • @mikezooper
    @mikezooper 4 місяці тому

    Copilot on Windows already allows control of the OS. For example, you can ask it to switch to night mode and it will.

    • @slomnim
      @slomnim 4 місяці тому

      That's pretty simple compared to where this project is going. Maybe soon yeah Microsoft will have copilot do some of this stuff but so far this seems like the first real attempt

  • @ScottSummerill
    @ScottSummerill 4 місяці тому

    Actually your video, specifically the table, convinced me that agents at least in this interaction are not all that spectacular. They will likely get there but right now it’s a lot of hype.

  • @monnef
    @monnef 4 місяці тому

    Very nice project. I would find interesting to see success rates in different OSes (or in case of Linux even DE/WM). Also GUI vs CLI - I can imagine on some tasks CLI would be a king, while in others it could fail miserably. Still, it could be useful to see for which use cases different OSes or GUI/CLI are better and might be worth of trying to utilize an AI for them.

  • @moses5407
    @moses5407 4 місяці тому

    Great presentation! Too bad the accuracy levels are currently so low but this seems to be a framework that can self-grade and, hopefully, self-adjust for improvement.

  • @RomeoTheOptimist
    @RomeoTheOptimist 4 місяці тому

    Often restricting model to output code only reduces the accuracy, especially on complex tasks. It's worth trying to allow it to print chain of thought (even better if there is a self-critical inner dialogue loop) and then output the final code piece.

  • @nqnam12345
    @nqnam12345 4 місяці тому +1

    Great ! Pls more on this topics

  • @beckettrj
    @beckettrj 4 місяці тому

    OSworld project videos please! This could be a series of videos?
    I could see this helping me do my job five times faster ! Helpdesk support tool to check and update XYZ application user account then email user letting them know we have updated their account and that they should be able to login. Complicated processes, such as opening VPN connection and checking active directory account settings, and then logging into administrative program(s) to search and open users account to check their settings. The user account Settings in active directory must match the user login settings in the application(s). Email the findings and let them know what was altered or changed, etc..

  • @Maisonier
    @Maisonier 4 місяці тому

    This is great! I'm going to wait for a Linux distro that has these agents built-in to automatically configure Wi-Fi, printers, drivers, or even VMs with Windows (for specific programs that don't work in Wine).

  • @yenielmercado5468
    @yenielmercado5468 4 місяці тому

    Excited for the Humane Ai pin Agents feature coming .

  • @francoislanctot2423
    @francoislanctot2423 4 місяці тому

    Thanks Yes please install it and show us the procedure. I think it is going to be useful for a lot of people.

  • @DaveEtchells
    @DaveEtchells 4 місяці тому

    I guess this is interesting, but I don’t understand why I should be so excited about it over Open Interpreter.
    The need to have predefined accessibility for the apps seems very limiting and a purely transitional step.
    In the relatively near term, AI agents will just interact directly with UI elements, figuring out what they need to do based on what they see on the screen. In the case of mainstream apps, they’ll know the general operation from their training, so will have little to deduce in specific instances, just as you can ask ChatGPT how to do things in Excel, etc.
    Longer term there may be direct hooks for AIs built in, but I don’t know to what extent that’ll make sense, as inference costs plummet.

  • @ayreonate
    @ayreonate 4 місяці тому

    maybe the LLMs are vastly better in the daily and professional tasks because thats whats widely available online aka their training data. while workflow based tasks dont have that much resources. case in point, the example they used (viewing photos of receipts and logging them on a spreadsheet) that wont have the same amount of online resources as daily or professional tasks.

  • @clapclapapp
    @clapclapapp 4 місяці тому

    when you have agents you must make show the temperature is very low, because you don't want to do them crazy things...

  • @alpineparrot1057
    @alpineparrot1057 4 місяці тому

    I enjoy your content Matt. You put me on to LM Studio, then Ollama, then Crewai. CrewAI has excellent case use, so thank you so much. Could you please do some more stuff with CrewAI (I have mine setup in the one file approach, but am not too sure how to set it up with multiple files and calling to and from (I'm not to familiar with Python, chat gpt is excellent help, but it still only goes so far)..

  • @xxxxxx89xxxx30
    @xxxxxx89xxxx30 4 місяці тому

    Interesting take, but again, trying to go to general. I am curious if there is a team working on a real "AI OS". Not using screenshots and these half-solutions, but actually having predefined built in functions that control the device through code and track the progress in the same way to do the "grounding" step?

  • @tigs9573
    @tigs9573 4 місяці тому

    Yes I would like to learn more about OSworld , keep up with the great content

  • @LauraMedinaGuzman
    @LauraMedinaGuzman 4 місяці тому

    Amazing! I want to try it for Revit, a software for architecture. Actually I did try something that worked! However I truly need more knowledge, so your help is very very aprecciated! Thanks!

  • @jeffsteyn7174
    @jeffsteyn7174 4 місяці тому

    Windows os is not a closed system. It has an api for everything an agent needs for grounding. Including which apps are open and what apps are active.

    • @ZuckFukerberg
      @ZuckFukerberg 4 місяці тому

      Please expand on your answer as I have tried to automate a lot of stuff through powershell and it simply lacks many options and commands for actions often performed by standard users

  • @johnkintree763
    @johnkintree763 4 місяці тому

    I want the digital agent in my phone to download my monthly invoice from the electric utility, merge that and other data I want recorded publicly into a decentralized graph representation that is maintained in collaboration with digital agents running in other personal devices to create a shared world model for planning collective action.

  • @waqaskhan-uw3pf
    @waqaskhan-uw3pf 4 місяці тому +1

    Please make video about Romo AI- super AI tools in one place and learnex AI - world's First fully AI powered education platform. My favorite AI tools

  • @CharlesFinneyAdventure
    @CharlesFinneyAdventure 4 місяці тому

    I would love to watch you setting up OS world on your own
    machine testing it out and using it to create
    a tutorial from it of it.

  • @scottwatschke4192
    @scottwatschke4192 4 місяці тому

    Very interesting. I would love a testing video.

  • @dilfill
    @dilfill 4 місяці тому

    Would love to see you test this out doing a few different tasks! Also curious if this could run someone's social media etc.

  • @Daniel-jm8we
    @Daniel-jm8we 4 місяці тому

    It's more advanced than the starship Enterprise. They have to use people to push buttons.

  • @ExodeApplicationsInc
    @ExodeApplicationsInc 4 місяці тому

    Basically a screen scrapper (RPA - Robotic Process Automation)

  • @settlece
    @settlece 4 місяці тому

    i would definitely like to see more OSWorld
    thank for bringing this exciting news to us

  • @sergedeh
    @sergedeh 4 місяці тому

    The next level is an AI as the gateway of all the OS.
    I am working on it with AndyAi.
    Using the mouse to get the AI to control the system is really the hardest way to do it...

  • @BelaKomoroczy
    @BelaKomoroczy 4 місяці тому

    Yes, test it out, go deeper, it is a very interesting project!

  • @jamalnuh8565
    @jamalnuh8565 3 місяці тому

    Update us always like this, especially the new research papers

  • @ThinkAI1st
    @ThinkAI1st 4 місяці тому

    You are a very good teacher…so keep teaching.

  • @ma77yg
    @ma77yg 4 місяці тому

    would interesting to have a tutorial on this setup

  • @christopheboucher127
    @christopheboucher127 4 місяці тому

    Of course we want to see more about that ;) thx 4 all

  • @yugowatari2935
    @yugowatari2935 4 місяці тому

    Yes.. please do a tutorial in osworld. Have been waiting for this for some time.

  • @infj5196
    @infj5196 2 місяці тому

    Most of A.I research were done by Chinese people.
    They are amazing group of people. Appreciate those intelligent minds.

  • @canadiannomad2330
    @canadiannomad2330 4 місяці тому

    In Linux there is the xserver.. I've been thinking it would be neat to plug a system into the xserver backend, and have an llm communicate with that directly... Somewhat bypasses most visual interpretation, except what is actually rendered as graphic

  • @paketisa4330
    @paketisa4330 4 місяці тому

    Considering a project where a person documents daily experiences, thoughts, feelings and personal history in a diary specifically for a future AGI’s learning. Do you think such a personalised dataset could enhance an AGI’s ability to understand and interact with individuals on a deeper level? And lastly, is it feasible to expect an AGI to become a close, personal companion based on this method, or would it somehow be redundant useless data? Thank you for the answer.

  • @wardehaj
    @wardehaj 4 місяці тому

    Great explanation video. Thanks a lot!

  • @Treewun2
    @Treewun2 4 місяці тому

    Please do a series on Fine Tuning open source models!

  • @albertkim1809
    @albertkim1809 4 місяці тому

    I think low temperature means most predictable. So the creators setting a low temperature makes sense.

  • @apester2
    @apester2 4 місяці тому

    NVidia is gonna start running this inside Isacsym. And then agents can improve agents. 😱

  • @tanuj.mp4
    @tanuj.mp4 4 місяці тому +1

    Please create an OSWorld Tutorial

  • @oratilemoagi9764
    @oratilemoagi9764 4 місяці тому +1

    Did you see Apple's New Open Source LLM

  • @japneetsingh5015
    @japneetsingh5015 4 місяці тому

    I am already waiting for a linux where i could enter commands in natural language and the llm model gemerates a set of possible true commans and i just have to choose one or make a minor change

  • @lighteningrod36
    @lighteningrod36 3 місяці тому

    Or Maybe Conversational AI with a bunch of connectors and RPA?

  • @ktolis
    @ktolis 4 місяці тому

    will be interesting to see ReALM geting benchmarked

  • @scotter
    @scotter 4 місяці тому

    With regard to difficulty of an AI to access the desktop, is there an exception if we are talking about just manipulating a browser window through the use of selenium?

    • @byrnemeister2008
      @byrnemeister2008 4 місяці тому +1

      You can build tools for an Agent using Selenium as a browser Automator. There is also the likes of RPA apps like Power Automate.

  • @tonysolar284
    @tonysolar284 4 місяці тому

    I already have this. My AI controls my home with my special logic prompt.

  • @AbstruseJoker
    @AbstruseJoker 4 місяці тому

    I fully believe that using keyboard and mouse is the wrong approach for agents. They should primarily control the OS via code

  • @adtiamzon3663
    @adtiamzon3663 3 місяці тому

    Good start. Excellent. 🤫 🌞👏👏

  • @THOR_THE_GOD
    @THOR_THE_GOD 4 місяці тому

    Can an uncensored AI like Dolpin hack/code AI like this into an upgraded uncensored model trained for automated tasks over the Tor network? Could such an AI be theoretically loaded onto Tails, now or in the near future?
    I'm fascinated by how AI Agents especially AGI would interact with the darkweb in private. As we all know most AI in its current forms are heavily censored and subject to surveillance.

  • @marcfruchtman9473
    @marcfruchtman9473 4 місяці тому

    Thanks for the video! Yes, this seems like it will be very useful.

  • @DailyTuna
    @DailyTuna 3 місяці тому

    He set up a windows machine with a partition for stuff like this I wouldn’t put it in my regular system

  • @Copa20777
    @Copa20777 4 місяці тому +1

    Thank you for your journalism Matthew.. we ❤ you bro from Africa, Blessed sunday everyone.. sipping my coffee on this one

  • @luxaeterna00
    @luxaeterna00 3 місяці тому

    Any link to the presentation? Thanks!

  • @RupertBruce
    @RupertBruce 4 місяці тому

    Sounds like pyautogui is doing the heavy lifting. Autoit might have more scripts available to train on...

  • @Aiworld2025
    @Aiworld2025 4 місяці тому

    Can you please remove the circle in futures videos - in the txt so I can follow along reading and paying attention?

  • @echonomix_
    @echonomix_ 3 місяці тому

    Speedrunning into the apocalypse no-more-brakes% WR

  • @amigaworkbench720
    @amigaworkbench720 4 місяці тому

    It shouldn't be hard to connect AI with Linux terminal and logs, where if anything happens, AI reads log and do terminal check or create scripts/commands to run on terminal...