Building an AI assistant that listens and sees the world (Step by step tutorial)

Поділитися
Вставка
  • Опубліковано 5 вер 2024
  • I built an AI assistant that listens to my commands and uses my webcam to understand the world around me. I used Python to build it, and this video will give you a step-by-step overview of how it works.
    Here is the link to the source code: github.com/svp...
    I teach a live, interactive program that'll help you build production-ready Machine Learning systems from the ground up. Check it out here:
    www.ml.school
    To keep up with my content:
    • Twitter/X: / svpino
    • LinkedIn: / svpino
    🔔 Subscribe for more stories: / @underfitted

КОМЕНТАРІ • 81

  • @michaelduffy5309
    @michaelduffy5309 3 місяці тому +10

    Magic. Another demonstration of that fact that computer science is just piling layers of abstraction on top of each other. Astounding. Well done.

  • @HowardKeziah
    @HowardKeziah Місяць тому +2

    I love how you explain your code!

  • @AdnanAli
    @AdnanAli 3 місяці тому +4

    Thank you so much for your hard work and passionate explanation. More than the code walkthrough, what I learn most from you is to how to break down a use case into the parts of the workflow.

  • @caseyhoward8261
    @caseyhoward8261 2 місяці тому +2

    You're literally building my project faster than I can! 😂❤️
    You mentioned tipping $5... lol. 100% you'll hear back from me with way more gratitude than just $5. 😉 You're giving me hope!

  • @generichuman_
    @generichuman_ 2 місяці тому +1

    Also, if anyone wants longer responses from the LLM but doesn't want to sacrifice speed, you can stream the tokens, and have a function running on a separate thread that chunks these into sentences and adds them to a queue, then you can have another function on a separate thread checking this queue and processing them into speech. This way, the bot can start speaking while the streaming is still going on.

  • @HyperUpscale
    @HyperUpscale 3 місяці тому +3

    Sweet! Amazing Video!
    Can you create a second video like this one , but this time only with local models running on ollama, faster whisper, etc ?

  • @youmnaification
    @youmnaification 2 місяці тому +1

    I really like how you explain what the code is doing. Looking forward watching other tutorials.

  • @sergeziehi4816
    @sergeziehi4816 3 місяці тому +1

    The way you you explain that code!!!! Goose Bumps

  • @Van-Helssen
    @Van-Helssen Місяць тому +1

    Maestro! Very well explained Santiago, mis 10+ 😊

  • @thebluefortproject
    @thebluefortproject 3 місяці тому +1

    This has some awesome, awesome, awesome applications.
    Thanks for sharing this knowledge, man

  • @sirishkumar-m5z
    @sirishkumar-m5z 15 днів тому

    It's a fascinating quest to build an AI assistant that can see and hear the world! There are more tools available if you want to go further; they could provide new features and advantages to improve your AI projects. #TechTutorial #Innovation #AIAssistant

  • @davieslacker
    @davieslacker 3 місяці тому

    Love your tutorials. You keep it so straight forward and simple while still explaining what each part does.

  • @wagnerpazsc
    @wagnerpazsc 3 місяці тому

    this is amazing! I was able get it running on Windows easily. Thanks very much. I'll post my demo on youtube and tag you! An issue that I find was that it sometimes catch some background sounds and transcribe it to some weird sentences.

  • @souravbarua3991
    @souravbarua3991 3 місяці тому +1

    I am also working to build this kind of project. In my case it's a chat process. This video will be helpful. I am trying to use the multimodal which supports video, which will make the process easy.I am gonna do it.

  • @cherdak_turista
    @cherdak_turista 3 місяці тому

    Good afternoon! This was a very nice educational video. Not short, not long example, just enough to understand the idea. And I'd like also mention the passion and your interest to the job you are doing! Just thank you, and do, please, more)

  • @JustWonderingAloud
    @JustWonderingAloud 2 місяці тому

    Like? Whoa!! Bro? 😳😍Thanks for taking the time to share. Much appreciation. Subscribed!!!

  • @DarkXappHiRe
    @DarkXappHiRe Місяць тому

    Immediately i saw this video i knew i had to subscribe ASAP. I tried it and its up and running though i had some errors cos i am not a python programmer. But i want to use it to build an Ai that can be installed on a vessel (ship). Thanks for sharing ❤

  • @toddroloff93
    @toddroloff93 3 місяці тому +1

    This is AWESOME!!! "goose bumps"👍

  • @AdnanAli
    @AdnanAli 3 місяці тому +2

    Trying it out with local Llava model which is a multimodal one available with Ollama. Getting a ValueError saying image_url can only be string.

  • @AdnanAli
    @AdnanAli 3 місяці тому +1

    In the source code, if you want to use openai model, you'll need to import ChatOpenAI.

  • @sarahroark3356
    @sarahroark3356 3 місяці тому

    Wow, I just saw Wes Roth's demo of it. Incredible.

    • @mickelodiansurname9578
      @mickelodiansurname9578 3 місяці тому +2

      I'm in the middle of integrating this into CrewAI and AutoGroq.... with the Autogroq model acting as the main agent assistant that is able to use crewai as its agency over a given task... it seems simple enough (famous last words) ... two classes in this script... although the alloy voice stinks... in fact all the openAI voices stink and Elevenlabs are still my go to.

    • @sarahroark3356
      @sarahroark3356 2 місяці тому

      @@mickelodiansurname9578 Oh my, you'll have to let me know how well it plays with CrewAI and Autogroq! If you can pull that off, should be awesome.

    • @christianweyer74
      @christianweyer74 8 днів тому

      @@mickelodiansurname9578 did you succeed with your demo/project?

  • @sgatea74
    @sgatea74 3 місяці тому

    Excellent ! Fantastic explanation of the code ! Thank you Santiago !

  • @TomHermans
    @TomHermans 2 місяці тому

    very cool idea. well executed and great explanation. subscribed

  • @orenozeri
    @orenozeri 2 місяці тому

    Kudos for your clear explanation, inspiring 🤘

  • @raulmarusca
    @raulmarusca 2 місяці тому +1

    ¡Gracias!

  • @techmumus6780
    @techmumus6780 2 місяці тому

    Great video Santiago!! Congratulations!

  • @jmmbuthia
    @jmmbuthia 3 місяці тому

    Very simple but impressive

  • @kellymweu2781
    @kellymweu2781 Місяць тому

    Thank you for this. Learnt a lot from you

  • @sada-bokentertainment5790
    @sada-bokentertainment5790 2 місяці тому

    Your explanation is amazing ... Can I use 11labs voice here ?

  • @siddharth5339
    @siddharth5339 2 місяці тому

    Hi I really liked your video I took inspiration from it and made the same thing without the camera integration I used the local llm llama3 to run it.I could not pay for openai api key so had to make use of that.If there is any new model which can do the webcam integration locally please recreate this video using that

  • @user-hr8iz9lb3g
    @user-hr8iz9lb3g 3 місяці тому

    Keep making more.

  • @lokeshsharma4177
    @lokeshsharma4177 3 місяці тому

    LOVED IT. God Bless You

  • @germainrodrigue367
    @germainrodrigue367 3 місяці тому +1

    Amazing 🎉

  • @igmeMarcial
    @igmeMarcial 29 днів тому

    wow This is amazing!!!!

  • @SirJohn2024
    @SirJohn2024 3 місяці тому

    Great demo...😎

  • @Outcast100
    @Outcast100 2 місяці тому

    Im not very good at coding but can I add the LM Studio local server with a vision model instead of openAI or gemini? Also different local models for text to audio and audio to text. This would be great to have all local like the rabbit device....but free, would be great for ppl that cant see.

  • @cgtinc4868
    @cgtinc4868 3 місяці тому

    I am a fan immediately

  • @nachoeigu
    @nachoeigu 3 місяці тому

    Excellent content!! Amazing! One question: Why didn't you use Langchain in the TTS phase? Does Langchain only support Text-to-Text?

    • @underfitted
      @underfitted  3 місяці тому +1

      I could have, but it was just a one line of code.

    • @nachoeigu
      @nachoeigu 3 місяці тому

      @@underfitted thank you for clarifying. Keep doing content like this 🙌🏼

  • @iBlackWolfZz
    @iBlackWolfZz 3 місяці тому

    Incredible

  • @sumitdevraye9725
    @sumitdevraye9725 3 місяці тому

    This is very amazing

  • @fercjpn
    @fercjpn 3 місяці тому

    Good stuff!

  • @maloukemallouke9735
    @maloukemallouke9735 3 місяці тому

    greate JOB Thanks a million for share

  • @christopherc168
    @christopherc168 3 місяці тому +2

    i cant hear you

    • @underfitted
      @underfitted  3 місяці тому

      What happens? Too quiet the video?

    • @christopherc168
      @christopherc168 3 місяці тому +1

      @@underfitted yes i can hear sounds not what the sounds are saying

  • @wadejohnson4542
    @wadejohnson4542 3 місяці тому

    Nice!

  • @jmmbuthia
    @jmmbuthia 3 місяці тому

    I swapped out gemini for llama3 and got an error, 'ValueError: Only string image_url content parts are supported.' Seems like swapping out models won't be as straightforward yet despite the goosebumps! Any idea why this happenend?

  • @realshyfox5374
    @realshyfox5374 3 місяці тому

    Could it be used with local lava model through webui that now has an api key?

  • @Sam-oi3hw
    @Sam-oi3hw 3 місяці тому

    excellent

  • @riemannderakhshan1037
    @riemannderakhshan1037 2 місяці тому

    Hi Santiago, would it be possible for you to create a project which combines a ML with a AI methods (preferably with open source tools) to solve one real word problem? In case if the other chanell members are interested on my proposal they may jump in and provide their comments and suggestions. Thank you in advance.

  • @Dannydrinkbottom
    @Dannydrinkbottom 2 місяці тому

    You're the Goat 🐐

  • @TooyAshy-100
    @TooyAshy-100 3 місяці тому

    Thank you,,,

  • @F336
    @F336 2 місяці тому

    WHY?: RuntimeError: Unrecognized CachingAllocator option: 0
    I hate these issues...😬

  • @mickelodiansurname9578
    @mickelodiansurname9578 2 місяці тому

    So look if I wanted the AI to JOIN a zoom call or a facetime session or something.... well okay how do we do that?

  • @samcavalera9489
    @samcavalera9489 3 місяці тому

    Magic 🧝‍♂️ 😅

  • @Toasty-_-12
    @Toasty-_-12 2 місяці тому

    CC it is then

  • @ANBUGUY
    @ANBUGUY 3 місяці тому

    u are so cool

  • @Toasty-_-12
    @Toasty-_-12 2 місяці тому

    I can't hear video

  • @StynerDevHub
    @StynerDevHub 3 місяці тому

    ❤❤🥳🥳🥳🥳🥳

  • @avataros111
    @avataros111 6 днів тому

    I don't see you and I hope I never will.Don't recommend channel option is not working!

  • @JonathanLoscalzo
    @JonathanLoscalzo 3 місяці тому

    Great PoC | tutorial!
    is langchain >> llamaindex ??? or they are different?

    • @underfitted
      @underfitted  3 місяці тому +1

      Similar. Pick the one you like the most.

  • @raulmarusca
    @raulmarusca 2 місяці тому

    ¡Gracias!