How to build a real-time AI assistant (with voice and vision)

Поділитися
Вставка
  • Опубліковано 23 лип 2024
  • This is a new version of my AI assistant, this time using LiveKit (livekit.io.) This is the same platform OpenAI used to build their ChatGPT assistant.
    The source code of my example is here: github.com/svpino/livekit-ass....
    I teach a live, interactive program that'll help you build production-ready Machine Learning systems from the ground up. Check it out here:
    www.ml.school
    To keep up with my content:
    • Twitter/X: / svpino
    • LinkedIn: / svpino
    🔔 Subscribe for more stories: / @underfitted
  • Наука та технологія

КОМЕНТАРІ • 61

  • @toddroloff93
    @toddroloff93 Місяць тому +8

    Incredible video. You're taking your content to the next level. Keep up the good work and thankyou for all you do.

  • @sumitdevraye9725
    @sumitdevraye9725 Місяць тому +1

    Great video. Keep these coming.

  • @davieslacker
    @davieslacker 29 днів тому +1

    Really cool stuff... I def plan to recreate some of these things along with you when I have a bit more time at my computer. Just a thought, adding screen capture in with this would be pretty cool too to get help with whatever applications you're in... I would imagine you could include both camera and screenshot images in the same context and it should be able to distinguish which you're asking about.. or build a different tool that it can function call for that. Can't wait until we get some slightly more expressive voices as an option like OpenAI teased us with.

  • @riemannderakhshan1037
    @riemannderakhshan1037 Місяць тому

    You turned your videos to the next level which is pretty amazing. I would like you to ask if is possible, show us how to use open source models in those apps. Thank you in advance.

  • @moacirosa
    @moacirosa 26 днів тому

    Amazing content with solid explanation. Thanks very much 👏

  • @ronsinolast
    @ronsinolast 15 днів тому +2

    Hi, this is great. I tried and its working. I tried to introduce my self, then I ask, "Do you know my name ?" the response is "I'm not able to remember past conversations." So, Can we make it remember the conversations, and also "remember" my face ?

  • @jimmywang6177
    @jimmywang6177 Місяць тому

    very interesting! thank you!

  • @huangphoenix
    @huangphoenix 10 днів тому

    Great video, keep going. Just wonder if you can add barge-in function?

  • @edgarl.mardal8256
    @edgarl.mardal8256 Місяць тому

    Hi, I am working on creating a closed lan-network, using per to per, and will input a live AI agent, locally stored, getting knowledge from LLM, and wonder if it is possible to have this kind of system then running without using internet?

  • @jameszhang2832
    @jameszhang2832 21 день тому

    Fantastic, thank you very much. How would you adapt your code if you have multiple participants?

  • @user-yh2uz6fd7l
    @user-yh2uz6fd7l 22 дні тому

    Great info, thx! Is there a way to use local LLM (like ollama, local AI etc) on this platform instead of openai?

  • @insitegd7483
    @insitegd7483 28 днів тому

    Thank you, It is very interesting.

  • @7BlackJack8
    @7BlackJack8 Місяць тому +2

    Can be used with google flash? Thanks for super content!❤

  • @rithikkumar7683
    @rithikkumar7683 Місяць тому

    I hope we can we use gemini 1.5 pro? I will try to make this changes in old code

  • @dheerajmadaan866
    @dheerajmadaan866 4 дні тому

    This was a really cool stuff. Thanks for sharing such a quality stuff. I ran it on vscode and it worked. The main problem is the latency. It took like 10s for the conversation. Not sure if it is because of the free account or their websocket API has the issue.

  • @jock21341
    @jock21341 29 днів тому

    sir can you help me why my assistant isnt talking back or nothings happening but its recognising in chat what im saying

  • @andriusem
    @andriusem 27 днів тому

    Hi, great video! How to change the source code that it captures my screen, desktop. Thanks.

  • @ridhwanbakare3406
    @ridhwanbakare3406 17 днів тому

    This is really cool. As someone with python knowledge how would you suggest I get started?
    Any roadmaps or videos you published?

  • @sr.modanez
    @sr.modanez 29 днів тому

    obrigado, fantástico o vídeo 👏👏👏👏👏👏👏👏👏

  • @AmitMarx-ei8tt
    @AmitMarx-ei8tt 3 дні тому

    Got stuck with the API Keys, i'm not sure how to set them

  • @rxWar
    @rxWar Місяць тому

    Nice men thanks

  • @dmitrypehovski
    @dmitrypehovski Місяць тому

    Hi , i start test with all your steps and got stuck on the fact that text and audio from the openai api are not transferred to livekit, all requests pass in the terminal , tried many solutions...doesnt work

    • @densonsmith2
      @densonsmith2 Місяць тому

      I think I may have a similar issue on Windows there is some problem with the ffmpeg library.

  • @danieladama8105
    @danieladama8105 Місяць тому

    This is great!

  • @lets-makeiteasy
    @lets-makeiteasy Місяць тому

    so i cannaot code can you make toturial for using ph3 which is free and have vision and also use visper ai to convert text to speech and other free tools so minimizing the cost to completely zero I am a student trying out these stuff and don't wanna pay or don't have money to pay for the API or other things so please make a toturial using all the free and open source tools

  • @densonsmith2
    @densonsmith2 Місяць тому

    Has anyone gotten this to work on Windows?

  • @Brou15O
    @Brou15O 16 днів тому

    could i get this on my smartphone?

    • @underfitted
      @underfitted  15 днів тому

      As is, no. You’ll need to rewrite it in a phone-friendly language

  • @abdiasj3692
    @abdiasj3692 13 днів тому

    would love to see how to to implement Deepgram TTS instead of OPenAI !

    • @underfitted
      @underfitted  13 днів тому +1

      It’s actually very simple: simpler than what I had to do to get OpenAI working

    • @abdiasj3692
      @abdiasj3692 13 днів тому

      @@underfitted Hey thank for replying! This would be an awesome! Also using maybe openrouter as well! Wild ideas come to mind!

  • @juanmanuelzwiener4447
    @juanmanuelzwiener4447 12 днів тому

    Santiago, the voices of assistant are only in english? or also in spanish? abrazo crack!

  • @boooosh2007
    @boooosh2007 Місяць тому

    Is this functionally any different than your previous video?

    • @underfitted
      @underfitted  Місяць тому

      While they work the same for the demo, my previous code is very brittle. This one is much better because I’m using an entire existing infrastructure to support it.

  • @jeff_holmes
    @jeff_holmes 28 днів тому

    Curious about the latency. I noticed that you cut the video after each question (after 19:55), so I am assuming it was a few seconds?

    • @underfitted
      @underfitted  27 днів тому

      It wasn’t bad, but GPT-4o is not as fast as it could be, so you definitely have to wait a second or so for an answer

    • @vesalaasanen2158
      @vesalaasanen2158 21 день тому

      @@underfitted , would be nice to add at least one answer in real time so we would get more realistic picture of it.

  • @sharplcdtv198
    @sharplcdtv198 21 день тому +1

    your code generally doesn't run in VScode in windows... some things seem platform dependent unfortunately

    • @underfitted
      @underfitted  21 день тому

      I don’t think it’s a problem with my code… it’s a problem with Windows. Try WSL.

  • @aaronwenniger7966
    @aaronwenniger7966 26 днів тому +2

    now i keep running into troubles when using this code,
    I would love to be able to discuss this so i can get it fixed i want to implement some features to see if it can work for something else to.

    • @AI_by_AI_007
      @AI_by_AI_007 26 днів тому +1

      Yes the API keys do not pass -- what are you experiencing?

    • @aaronwenniger7966
      @aaronwenniger7966 26 днів тому

      @@AI_by_AI_007 Hi Yes,
      So i had to rework the code a little bit to get everything working again.
      And now its working great except that the voice of the AI is not working and i cannot give voice commands anymore.

    • @jarvisperaudon
      @jarvisperaudon 26 днів тому

      @@aaronwenniger7966How do you have do for the livekit api key ?

    • @jarvisperaudon
      @jarvisperaudon 26 днів тому

      How for the livekit api key ?

    • @aaronwenniger7966
      @aaronwenniger7966 26 днів тому +1

      @@jarvisperaudon ?

  • @reynoldoramas3138
    @reynoldoramas3138 28 днів тому

    Hola Santiago saludos desde Cuba, acabo de ver en su perfil de Github que es un coterráneo. Su contenido es muy valioso, por aquí un ingeniero de IA tratando de salir adelante en este mundo. Me encantaría poder contactar con usted y ayudarle en algún proyecto.

  • @aidanthompson5053
    @aidanthompson5053 Місяць тому

    2:38

  • @jarvisperaudon
    @jarvisperaudon 27 днів тому

    Hey I have a issue with key api livekit its telling me error like its invalid

    • @AI_by_AI_007
      @AI_by_AI_007 26 днів тому +1

      Me as well -- YOU on windows or MAC as you try this?

    • @jarvisperaudon
      @jarvisperaudon 26 днів тому

      @@AI_by_AI_007 windows

    • @jarvisperaudon
      @jarvisperaudon 26 днів тому

      @@AI_by_AI_007Windows

    • @rahahoseini1523
      @rahahoseini1523 8 днів тому

      @@AI_by_AI_007 How can I access to the API Keys? could you please tell me step by step.