Creating JARVIS - Python Voice Virtual Assistant (ChatGPT, ElevenLabs, Deepgram, Taipy)

Поділитися
Вставка
  • Опубліковано 9 лип 2024
  • Check out the GitHub repository here:
    github.com/AlexandreSajus/JARVIS
    0:00 Talking to JARVIS
    0:58 Intro
    1:52 How JARVIS works
    3:12 How to setup JARVIS
    4:05 Getting API keys
    5:05 Installing JARVIS
    6:49 Running JARVIS
    7:44 Talking to JARVIS
    9:18 How to mod JARVIS for your use case
    10:45 Recording audio using Pyaudio
    12:25 Transcribing to text using Deepgram
    12:45 Sending prompts to OpenAI GPT
    13:14 Changing JARVIS' personality (context)
    14:10 Generating voice using ElevenLabs
    14:50 Playing audio using Pygame
    15:15 Displaying the convo in a webpage with Taipy
    16:40 Use cases and limitations
  • Наука та технологія

КОМЕНТАРІ • 174

  • @joeternasky
    @joeternasky 6 місяців тому +6

    Fantastic project. Love how you connected these services and packages together. Thanks for going over the project, posting this video, etc. I learned quite a bit.

  • @dwilson7230
    @dwilson7230 5 місяців тому +2

    Bro this is sick as hell! Thanks for posting a video about it.

  • @isagiyoichi5207
    @isagiyoichi5207 5 місяців тому +2

    this is actually really incredible thanks for the video

  • @chrsl3
    @chrsl3 6 місяців тому +1

    Fantastic work and video, thank you!!

  • @iandanforth
    @iandanforth 6 місяців тому +2

    Impressive! One key bit of the UX of ChatGPT mobile are the "clicks" that indicate when the model has 1. Stopped listening and 2. Stopped talking. A very small touch that makes a world of difference.

    • @alexandresajus
      @alexandresajus  6 місяців тому

      Yes I should definitely find better ways to convey to the user when he is being listened to

  • @xgodwhitex
    @xgodwhitex 6 місяців тому +1

    Amazing job!

  • @gr8tbigtreehugger
    @gr8tbigtreehugger 3 місяці тому +2

    Many thanks for this super helpful tutorial! My next step is voice ID, so the AI knows it's me!

  • @mikew2883
    @mikew2883 6 місяців тому +1

    Good stuff! 👍

  • @painperdu6740
    @painperdu6740 6 місяців тому +1

    LETS GOOO NEW ALEXANDRE SAJUS VIDEO I CLICK LIKE I SUBSCRIBEEE

  • @Threecommaaclub
    @Threecommaaclub 5 місяців тому +1

    Hey Alex, I'm using a Linux Device running python 3.11 venv, when i try to run main.py i get the following error " no module name pyaudio. i go about using the simple command pip install pyaudio, however when running that command i get greeted with this error, "could not build wheels for py audio, which is required to install pyproject.toml-based projects, i was hoping you may be able to share some insight into why this may be happening. Great video btw, i await your speedy response :)

    • @alexandresajus
      @alexandresajus  5 місяців тому +1

      Were you able to solve this by creating a new virtual environment. Otherwise, I have no idea how to fix this, let me know if you find a solution

    • @Threecommaaclub
      @Threecommaaclub 5 місяців тому +1

      @@alexandresajusyeah man we were able to make it happen once we used the virtual env thanks again

    • @alexandresajus
      @alexandresajus  5 місяців тому

      @@Threecommaaclub Perfect!

  • @muhammadilyasrasyid5817
    @muhammadilyasrasyid5817 6 місяців тому +1

    thank you very much sir

  • @marouane9682
    @marouane9682 6 місяців тому +1

    i love it maaaaaaaan thank u for sharing .. pls keep sharing wiith us ur magic

    • @alexandresajus
      @alexandresajus  6 місяців тому +1

      Thank you!

    • @marouane9682
      @marouane9682 6 місяців тому +1

      @@alexandresajus brother help me pls on my questions, .. how can i make jarvis able to transcribe and talk in french instead of english ?

    • @alexandresajus
      @alexandresajus  6 місяців тому +1

      @@marouane9682 This shoud not be too hard, you just need to add a few parameters for Deepgram and Elevenlabs. For Elevenlabs, just change the voice parameter to "Pierre" or another french voice at line 116 of main.py. For Deepgram it is a bit more complicated, you will have to add a PrerecordedOptions parameter at line 72 of main.py which contains a language="fr" parameter. It's a bit too much to write in a comment so I invite you to take a look at the Deepgram doc (github.com/deepgram/deepgram-python-sdk/blob/main/README.md) Let me know if you need more help

    • @marouane9682
      @marouane9682 6 місяців тому +1

      @@alexandresajus thank you so much cheef

  • @shawnmuok542
    @shawnmuok542 Місяць тому

    hello i have a problem when i try to run main.py it shows me no moduel deepgram found

  • @JanikJanesch
    @JanikJanesch 2 дні тому

    Do you know why thee is an error that it says i inly have 12 xaracters left but my request needs 42 caracters? even tho i have 20$ account balance on chatgpt.

  • @sebaperalta2001
    @sebaperalta2001 6 місяців тому +1

    Nice work! Is it possible to have it answering only on activation word? Like if you don't say Jarvis, then it would not answer. So the program is always listening, but activates on context.

    • @alexandresajus
      @alexandresajus  6 місяців тому +1

      Thanks! Yes this should be easy to do, just add a condition: if the activation word is not in the transcript, continue (restart the loop without answering)

  • @Firebabys89
    @Firebabys89 2 місяці тому +1

    u are amazing dude

  • @taylorsmith1720
    @taylorsmith1720 3 місяці тому +2

    🎯 Key Takeaways for quick navigation:
    01:02 *🚀 Overview of Voice Virtual Assistant Development*
    - Explanation of building a voice virtual assistant similar to Jarvis from Iron Man.
    - Overview of the backend workflow involving voice input, transcription, response generation, and audio output.
    - Introduction to third-party services like Deepgram, OpenAI, 11 Labs, and Taipy used in the development process.
    03:21 *🔧 Installation Instructions for the Voice Virtual Assistant*
    - Cloning the GitHub repository and installing necessary requirements.
    - Setting up API keys for Deepgram, OpenAI, and 11 Labs.
    - Creating an environment file to store API keys securely.
    - Executing installation commands and waiting for requirements to install.
    08:33 *🛠️ Running the Voice Virtual Assistant*
    - Instructions for running the display interface (`display.py`) and the main script (`main.py`).
    - Description of how the assistant listens, transcribes, generates responses, and displays conversations.
    - Example interaction demonstrating the assistant's response to user input.
    09:28 *💡 Customization and Modification of the Voice Virtual Assistant*
    - Guidance on modifying the assistant for specific use cases.
    - Suggestions for changing context, models, and voices for customization.
    - Discussion of potential improvements, such as integrating news, adding memory, and overcoming latency limitations.
    Made with HARPA AI

    • @alexandresajus
      @alexandresajus  3 місяці тому +1

      Now THAT is how you should advertise a product. Great summary!

  • @adben001
    @adben001 27 днів тому

    Will That generate Costs throught the API or is that for free?

  • @handlepersonthing
    @handlepersonthing 6 місяців тому +1

    Awesome work! I wonder if using the GPT-4 model would speed things up a bit?

    • @alexandresajus
      @alexandresajus  6 місяців тому +2

      Thank you very much! Unfortunately, I don’t think switching the model would do a lot. Profiling here is 1s for transcribing, 1s for gpt and 2s for generating audio. The best way to reduce latency would be using smaller/quantized models or streaming data instead of doing each task sequentially

    • @serenditymuse
      @serenditymuse 6 місяців тому +2

      @@alexandresajus larger models often take longer thinking.

  • @oldspammer
    @oldspammer 3 місяці тому +1

    Some operating system API exists for text to speech are free and can act instantly without having to transact information flows through the internet to some central system that might get bogged down with excess usage. I have noticed that if one becomes dependent upon something or someone, a monopoly situation may well result and you end up potentially having to pay pay pay for things that your local PC could have done for free on its own without the need of network data interactions. Often the distant server has a better sounding voice and it does not mispronounce as many words, but soon you shall be out sourcing too many things to outside entities where you become too dependent on them.
    If a set of 10 words or so are known to be mispronounced by the local speech api in your PC is there a way to have your PC handle those exception words with specialized processing where a sylable at a time is custom handled per each of the 10 exception words to save you from having to use an api key that can be withdrawn from handy use by the flick of a switch by the third party provider?

  • @crprp4769
    @crprp4769 5 місяців тому +1

    Awesome video! Thanks for sharing, but I've got a question. How can I implement a pre-trained OpenAI assistant into Taipy?

    • @alexandresajus
      @alexandresajus  5 місяців тому +1

      Thanks! It should be quite simple. Just replace the model variable line 53 at 12:52 with your own model ("ft:gpt-3.5-turbo:my-org:custom_suffix:id") and it should work. Let me know if you need more help.

  • @tismine
    @tismine 2 місяці тому +1

    Hey Alex! Thanks a lot for the video, can you please explain a good way to create a neat requirements.txt file after I'm done with a project?

    • @alexandresajus
      @alexandresajus  2 місяці тому

      Sure! Use « pip list » in terminal to check which package versions you are using. Then create a requirements.txt at the root of your project with on each line « package_name==version » for only the packages you import within the code (not their dependencies)

  • @PandaLorian14
    @PandaLorian14 Місяць тому

    dose noone get same code on deepgram me and zou dont got same code

  • @pntra1220
    @pntra1220 6 місяців тому +1

    Nice project bro! Do you know how can I use deepgram to transcribe spanish voice? I already figured it out for elevenlabs but not for deeprgram. Thank you for taking the time to read this and continue making this videos!

    • @alexandresajus
      @alexandresajus  6 місяців тому +1

      Thanks! I have not tried but there does seem to be the option to transcribe Spanish voice by using their nova-2 model and adding the parameter "language=es" to the query
      developers.deepgram.com/docs/language
      developers.deepgram.com/docs/models-languages-overview

  • @rodrigodifederico
    @rodrigodifederico 5 місяців тому +1

    I did the same a few months ago but i made it all through a real phone number so you can actually call a number and an assistant will pick the call and talk to you about the shop services or clinic procedures, etc. Pretty nice lab.

    • @alexandresajus
      @alexandresajus  5 місяців тому +1

      That is a great use case. Were there any issues surrounding the latency? Were there any customer complaints from people who found the delay in answering too long or did not want to talk to an AI?

    • @rodrigodifederico
      @rodrigodifederico 5 місяців тому +2

      @@alexandresajus I reduced the delay by 90% running all the systems locally. The speech to audio generator, audio transcription, the language model, etc. The only remote api that i used was for the phone number ( twillio ). If you run everything through remote apis, the delay will be a real problem, won't work as an assistant over the phone because it may take up to 10 seconds for an answer. But running everything locally it's almost instant. For the voice part, both to text and back, i don't generate an audio file, i stream it, so there is no delay. With a few tricks, you can make it almost real time 🙂

    • @alexandresajus
      @alexandresajus  5 місяців тому

      @@rodrigodifederico Great! Is there anywhere where I could take a look at that project. Which text-to-speech model are you using?

    • @rodrigodifederico
      @rodrigodifederico 5 місяців тому +1

      @@alexandresajus I am planning to transform it into a product so for now i won't share the code but i'll record a live interaction video and upload it to youtube soon, ill drop the link here if you are interested. About the text to speech, i created my own model.. pretty similar to elevenlabs. But i have to say that if you use elevenlabs streaming, this part of the process will have a similar delay, so i might switch to elevenlabs stream in the future, unless i want to keep it 100% free of costs, then i would keep my model.

    • @alexandresajus
      @alexandresajus  5 місяців тому

      @@rodrigodifederico Sure I'd love to see a demo

  • @aashishkumarlohra277
    @aashishkumarlohra277 Місяць тому

    when i run python main.py . i get this error
    Traceback (most recent call last):
    File "E:\JARVIS_TEST\JARVIS\main.py", line 15, in
    from record import speech_to_text
    File "E:\JARVIS_TEST\JARVIS
    ecord.py", line 8, in
    from rhasspysilence import WebRtcVadRecorder, VoiceCommand, VoiceCommandResult
    ModuleNotFoundError: No module named 'rhasspysilence'

    • @alexandresajus
      @alexandresajus  Місяць тому

      Check this issue:
      github.com/AlexandreSajus/JARVIS/issues/4
      Also try creating a new clean virtual env before installing requirements. Check if there are no errors during installation. Check that you are running main.py from that env. Check that rhasspysilence is installed with pip list

  • @nightmare6159
    @nightmare6159 2 місяці тому +1

    I need help, When I do pip install -requirements.txt it says there is no such directory even tho I see the file

    • @alexandresajus
      @alexandresajus  2 місяці тому

      Make sure that you are in the right directory in your terminal. You can use ls in the terminal to check the contents of the directory you are in. You can switch directory using cd in the terminal or using "Open Folder..." in VSCode.
      In general, the syntax should be "pip install -r [PATH-TO-TXT]"

  • @charliepersonalaccount5276
    @charliepersonalaccount5276 2 місяці тому +1

    Great stuff man! What's the best way to chat with you? I have an mvp i want to run by you and maybe have you help me build it out

    • @alexandresajus
      @alexandresajus  2 місяці тому

      Thanks. Feel free to reach out on Linkedin:
      www.linkedin.com/in/alexandre-sajus/
      I don't have much time because of work, but I can take a look.

  • @PenguinjitsuX
    @PenguinjitsuX 5 місяців тому +2

    This is awesome! I am wondering though how much this project is costing you from API calls (if you were to use this daily and pretty often)? I'm planning to build a home assistant that can control all of my home gadgets and perform actions on my computer, but I'm trying to decide whether I should use all local models (whisper, coqui, and mistral) instead of the paid online services. The quality and speed is a bit lower locally, but it's free so I'm thinking about the tradeoff. Please let me know what you think, thanks!

    • @alexandresajus
      @alexandresajus  5 місяців тому +1

      Hey! Thanks, glad you liked it! I recommend going the paid online route. ElevenLabs is a paid subscription at 5$/month for 30,000 characters. OpenAI and Deepgram are pay-per-request but are dirt cheap: for this whole project, I probably talked for an entire hour with JARVIS, and it cost me 12 cents on OpenAI and 40 cents on Deepgram. If you want to lower cost, find an ElevenLabs equivalent that is pay-per-request, and you'll be good.
      Going local will drastically reduce performance and speed unless you have proper hardware, i.e., a dedicated GPU cluster at home. You'll have to use open-source, quantized to 8Gb models. If you have adequate hardware though, going local might be a good idea since you'll keep performance, and you can reduce latency by half by hosting locally, doing code shenanigans to parallelize each task instead of running them sequentially, and generally optimizing the pipeline.
      Latency is the biggest drawback; JARVIS is at 4 seconds of latency. Even if it was 2 seconds, it is still too awkward for a conversation.

    • @PenguinjitsuX
      @PenguinjitsuX 5 місяців тому +1

      @@alexandresajus Thanks for the in-depth reply! That's awesome to see that it's so cheap. I was actually really lucky and got got a 4090 last week. I've been running tests - On whisper and llm inference, I got performance at almost real-time,

    • @alexandresajus
      @alexandresajus  5 місяців тому

      @@PenguinjitsuX Wow you already made a lot of progress! Yeah unfortunately I think we are just a few years away to solve that performance-latency tradeoff for TTS, then we'll be able to have a proper conversational Jarvis. Is your project open-source? I would love to take a look if you'd let me. I don't have a Discord server but I'd love to keep in touch on Discord. Here's my username: alex_1337

  • @edosetiawan9589
    @edosetiawan9589 3 місяці тому +1

    Awesome!! How to make this project to access custom data

    • @alexandresajus
      @alexandresajus  3 місяці тому

      A quick way to do this would simply be adding the data as a string in the context. This has its limitations (the context has a max length). If you want a chatbot that knows information from documents. I suggest you look into RAG models

  • @FantasyDark-ub3xh
    @FantasyDark-ub3xh 3 місяці тому +1

    Sir i want to do like this sir is there any Free API available if not in OpenAI means, pls tell some other AI APIs to do ai tasks sir!

    • @alexandresajus
      @alexandresajus  3 місяці тому

      Sir! If you search for them online, there should be free alternatives for the models I used in the video! I recommend looking at HuggingFace for an OpenAI alternative, sir! For example, the Mistral model has a free inference API that is only rate-limited, sir!

  • @omjondhalefyco-9953
    @omjondhalefyco-9953 3 місяці тому +1

    What alternative can be used for elvenlabs

    • @alexandresajus
      @alexandresajus  3 місяці тому

      I have not tried anything apart from Elevenlabs and google_tts. I was not impressed with the quality of google_tts, but it was way faster. I'm sure you'll find better answers online

  • @PilotsPitstop
    @PilotsPitstop Місяць тому +1

    what exactly did u purchase on the open ai api thingy for it not to return "exceeded current quota"? i payed for chat gpt "hobbyist" plan and thought that would help but nah i wasted 20 $. and u should def start a discord good stuff

    • @alexandresajus
      @alexandresajus  Місяць тому +1

      Ah I see, you’re not supposed to pay a chatgpt subscription. OpenAI have a website for their API where you just have to enter billing details and maybe add a dollar of credit to use. They charge per request and not on a subscription basis. It should be on the same site where you got your API key

    • @PilotsPitstop
      @PilotsPitstop Місяць тому

      @@alexandresajus AH MY HERO SO FAST, so i just add some money to my account and boom it works?

  • @edbayliss1862
    @edbayliss1862 5 місяців тому +1

    This really interested me. I modified it a bit to add a listen button to the UI so it only listen when you select listen, this is easier than a “wake word”
    Then I thought, integration. I use MacOS.
    I build a folder called modules, added a second step that parse the text through GPT again to match a dictionary, and then GPT decide which function in the dictionary matched and ran it.
    It worked great for checking calendar events etc, and if no matches were found it defaulted to gpt chat reponse but the extra layer added more latency and just isn’t scalable

    • @alexandresajus
      @alexandresajus  5 місяців тому

      Incredible! Good work! Is there anywhere where we could check out your project?

    • @edbayliss1862
      @edbayliss1862 5 місяців тому +1

      @@alexandresajus sure, is your GitHub open to branches? I can just push it as a branch for you check out on Monday

    • @alexandresajus
      @alexandresajus  5 місяців тому

      @@edbayliss1862 I'm not sure, I think it is open to fork then pull request. I think I need to manually add you as a collaborator if you want to directly push to a branch. Your call. Or you could just share the link of your repo if it is public.

  • @GameXnationOfficial
    @GameXnationOfficial 2 місяці тому +1

    "You exceeded your current quota, please check your plan and billing details" its showing something like this and jarvis is not replying after an error

    • @alexandresajus
      @alexandresajus  2 місяці тому

      You've exceeded your free quota on one of the APIs, check on which function call this error gets triggered to see which API needs billing

  • @s.gveeronstart4794
    @s.gveeronstart4794 5 місяців тому +1

    sir can u teach how to made it
    i mean to say that if u make a play list according to this topic

    • @alexandresajus
      @alexandresajus  5 місяців тому

      Unfortunately, I won't be making an extended tutorial on this in the near future. But I'm sure there are many tutorials on the tools I used on UA-cam. You can just look up "ElevenLabs tutorial" or "OpenAI API tutorial".

  • @DalazG
    @DalazG 2 місяці тому +1

    Incredible material! Thanks bro, you're tutorials are super helpful for those learning to code. I'm trying to follow along
    Not sure if you've taken any subscriber requests. I've really wanted to find a tutorial on creating a machine learning model on python that can figure out its own strategy for successfully trading forex and integrating it with mql4 or 5.
    Definitely possible but there's next to no tutorials on this anywhere i noticed

    • @alexandresajus
      @alexandresajus  2 місяці тому

      Thanks! Glad to know the video is helpful. This indeed seems to be a niche topic. I don’t think I could help you with this unfortunately since I don’t know anything about forex or mql.

    • @DalazG
      @DalazG 2 місяці тому +1

      @alexandresajus no worries, this tutorial was super useful anyway! Subscribed.
      Curious, would ask these apis you used for this jarvis application cost a lot of money though? I know chatgpt api isn't free (just the free credits)

    • @alexandresajus
      @alexandresajus  2 місяці тому +1

      @@DalazG The APIs did not cost that much: for the whole project I talked for about 2 hours to JARVIS. It cost less than a dollar for both Deepgram and OpenAI. ElevenLabs cost me 5$ only because they have a subscription based fee.

    • @DalazG
      @DalazG 2 місяці тому

      @@alexandresajus gotcha, elevenlabs has a brilliant voice api. But just because it adds up, i would probably prefer to use a cheaper worse one 😅 .

  • @undeadgaming2102
    @undeadgaming2102 3 місяці тому +1

    i want to ask can you make a video on how we can make it do different tasks???

    • @alexandresajus
      @alexandresajus  3 місяці тому

      What task are you thinking about? If it's just asking about the weather, you can add the current weather to the context so Jarvis knows about the current weather

    • @undeadgaming2102
      @undeadgaming2102 3 місяці тому

      @@alexandresajus i was thinking like a google assistant

  • @EnnoAI431
    @EnnoAI431 5 місяців тому +1

    Great Project!!
    Would it also run on a RaspberryPi?
    Recently I ran a project also called Jarvis on a Pi . You don't need the API's from Deepgram & Elevenlabs and also latency is pretty good. Although the voice was horrible.... unless you like robots :-).

    • @alexandresajus
      @alexandresajus  5 місяців тому

      Thanks! Sure this should be able to run on Raspberry since all of the heavy stuff is third party services that are hosted so barely anything runs in local. Cool! Where can I take a look at your project?

  • @AndroidePulpico
    @AndroidePulpico 3 місяці тому +1

    The latency is preaty Bad, have you tried Whisper Jax or Faster Whisper ??

    • @alexandresajus
      @alexandresajus  3 місяці тому

      Yeah, the latency issue is currently the worst one. I have not tried these services. Let me know if it speeds up things. Currently, the consensus for reducing latency seems to be streaming data, running the tasks in parallel instead of sequentially, and hosting local and smaller models.

  • @user-qw6zz7pr2x
    @user-qw6zz7pr2x 4 місяці тому

    When I run display.py to start the web interface, it shows "ModuleNotFoundError: No module named 'taipy'". But then after I install taipy (version 3.0.0), it still gives me the same error message. I have tried to uninstall and install taipy but same error message...

    • @alexandresajus
      @alexandresajus  4 місяці тому +1

      Are you sure you are running display.py from the Python environment where taipy is installed? Use `pip list` to check that taipy is installed and then `python display.py` to run the file. If this does not work, I suggest creating a new virtual environment and re-installing the requirements. Bear in mind that taipy only works with Python 3.8 to 3.11

    • @user-qw6zz7pr2x
      @user-qw6zz7pr2x 4 місяці тому

      Thanks! Instead of click to run display.py, I typed in "python display.py" and it open the website! @@alexandresajus
      One more question- when I ran "python main.py", I got the error message "TypeError: 'ABCMeta' object is not subscriptable". I am using Python3.8.10 in Visual Studio.

  • @anirvindhch1209
    @anirvindhch1209 3 місяці тому +1

    What are you using to code this Alexandre??

    • @alexandresajus
      @alexandresajus  3 місяці тому +1

      What do you mean? I'm coding in Python using VSCode, I used external APIs like ElevenLabs, OpenAI, Deepgram. Libraries like Taipy for the interface. I use GitHub Copilot to help me code faster as well.

  • @niyatibalsara9409
    @niyatibalsara9409 4 місяці тому

    im encountering webrtcvad installation error..please let me know what to do..its urgent.. i need it for my project

    • @niyatibalsara9409
      @niyatibalsara9409 4 місяці тому

      @alexandresajus

    • @alexandresajus
      @alexandresajus  4 місяці тому +1

      Please refer to this fix, let me know if it works:
      github.com/AlexandreSajus/JARVIS/issues/3

    • @niyatibalsara9409
      @niyatibalsara9409 4 місяці тому

      PS C:\Users\HP\Desktop\JARVIS2> & c:/Users/HP/Desktop/JARVIS2/myvenv/Scripts/python.exe c:/Users/HP/Desktop/JARVIS2/JARVIS/main.py
      Traceback (most recent call last):
      File "c:\Users\HP\Desktop\JARVIS2\JARVIS\main.py", line 8, in
      from dotenv import load_dotenv
      ModuleNotFoundError: No module named 'dotenv'
      PS C:\Users\HP\Desktop\JARVIS2> pip install python-dotenv
      Collecting python-dotenv
      Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
      Installing collected packages: python-dotenv
      Successfully installed python-dotenv-1.0.1
      [notice] A new release of pip available: 22.3.1 -> 24.0
      [notice] To update, run: python.exe -m pip install --upgrade pip
      PS C:\Users\HP\Desktop\JARVIS2> .\venv\Scripts\Activate
      (venv) PS C:\Users\HP\Desktop\JARVIS2> python JARVIS\main.py
      pygame 2.5.2 (SDL 2.28.3, Python 3.11.2)
      Hello from the pygame community. www.pygame.org/contribute.html
      Traceback (most recent call last):
      File "C:\Users\HP\Desktop\JARVIS2\JARVIS\main.py", line 13, in
      import elevenlabs
      File "C:\Users\HP\Desktop\JARVIS2\venv\Lib\site-packages\elevenlabs\__init__.py", line 2, in
      from .simple import * # noqa F403
      ^^^^^^^^^^^^^^^^^^^^^
      File "C:\Users\HP\Desktop\JARVIS2\venv\Lib\site-packages\elevenlabs\simple.py", line 113, in
      elevenlabs.set_api_key(os.getenv("ELEVENLABS_API_KEY"))
      ^^^^^^^^^^^^^^^^^^^^^^
      AttributeError: partially initialized module 'elevenlabs' has no attribute 'set_api_key' (most likely due to a circular import)
      Please solve this error.. its urgent not working.. please help

  • @tomasrochaakemi
    @tomasrochaakemi 6 місяців тому +2

    hey alex! can you help me with this error? "ERROR: Failed building wheel for webrtcvad
    Failed to build webrtcvad
    ERROR: Could not build wheels for webrtcvad, which is required to install pyproject.toml-based projects"

    • @alexandresajus
      @alexandresajus  6 місяців тому

      Sure! This is because you don't have Microsoft Visual C++ installed properly. I have written a guide on how to fix this here:
      github.com/AlexandreSajus/JARVIS/issues/3

    • @tomasrochaakemi
      @tomasrochaakemi 6 місяців тому +1

      @@alexandresajus hey man. it worked but now i got another error. while running python main.py this error apears: line 17, in set_api_key
      os.environ["ELEVEN_API_KEY"] = api_key
      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^
      File "", line 684, in __setitem__
      File "", line 744, in check_str
      TypeError: str expected, not NoneType

    • @alexandresajus
      @alexandresajus  6 місяців тому

      @@tomasrochaakemi This means that Python has tried to find a .env file with ELEVEN_API_KEY but has not found either the file or the key in the file. You'll need to create a .env file a the same level of main.py containing ELEVENLABS_API_KEY=[your-API-key]
      Please follow the Requirements and the How to Install Step 3 of my repository ( github.com/AlexandreSajus/JARVIS ). I mention these steps at 4:06 and 6:06 of the video.

    • @tomasrochaakemi
      @tomasrochaakemi 6 місяців тому +1

      @@alexandresajus I did it still shows this

    • @alexandresajus
      @alexandresajus  6 місяців тому

      ​@@tomasrochaakemi Hmmm weird issue. As a workaround, just replace the 3 lines of os.getenv("...") by simply the API key as a string. For example:
      OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") -> OPENAI_API_KEY = "YOUR-API-KEY"

  • @felipemartinez1924
    @felipemartinez1924 5 місяців тому +1

    How do I change the speech recognition to spanish? Btw amazing work!

    • @alexandresajus
      @alexandresajus  5 місяців тому +1

      Thanks! I have not tried another language but there does seem to be the option in Deepgram's API to transcribe Spanish voice by using their nova-2 model and adding the parameter "language=es" to the query
      developers.deepgram.com/docs/language
      developers.deepgram.com/docs/models-languages-overview

    • @felipemartinez1924
      @felipemartinez1924 5 місяців тому +1

      @@alexandresajus Thanks, you're amazing! You should do a series of this kind of videos, maybe a Jarvis like this one but that is able to take action like opening a program, or saving reminders, stuff like that. Thank you very much and looking forward to more videos. :)

    • @jan-peterbornsen8506
      @jan-peterbornsen8506 4 місяці тому

      @@felipemartinez1924 Hey were you able to change the language of Deepgram's API? I want to change it to german but all my attempts failed so far... i tried just adding a language=de but its not helping in anyway...

  • @Jordan-tr3fn
    @Jordan-tr3fn 6 місяців тому +1

    hey cool vids, why not using OpenAI for translation instead on Deepgram ? you could stream the audio and not have audio files

    • @alexandresajus
      @alexandresajus  6 місяців тому +1

      This is indeed probably a better approach. I was not aware of it at the time

    • @tismine
      @tismine 2 місяці тому

      Are you sure OpenAI supports streamed audio input? I looked around all the places no one was able to do that...

    • @Jordan-tr3fn
      @Jordan-tr3fn 2 місяці тому

      @@tismine « openai stream audio » on Google …

  • @ezzeldinhany7301
    @ezzeldinhany7301 4 місяці тому

    hi alex, it says no module named 'deepgram' after running python main.py in terminal what should i do?

    • @ezzeldinhany7301
      @ezzeldinhany7301 4 місяці тому

      i also tried pip install deepgram and it did not work

    • @alexandresajus
      @alexandresajus  4 місяці тому

      @@ezzeldinhany7301 Using the same terminal where you ran "python main.py", run "pip list" and check if deepgram if properly installed. I suggest you reinstall requirements into a clean environment for this. Let me know if this works.

    • @ezzeldinhany7301
      @ezzeldinhany7301 4 місяці тому

      @@alexandresajus i did reinstall requirements during the process of trying to solve this problem

    • @alexandresajus
      @alexandresajus  4 місяці тому

      @@ezzeldinhany7301 Did the terminal say that deepgram was successfully installed? Can you check with "pip list" if deepgram is installed? Can you check if you are running main.py from the environment where you installed deepgram? Once again, I strongly recommend creating a fresh Python environment using venv and installing the requirements there and checking everything above

    • @ezzeldinhany7301
      @ezzeldinhany7301 4 місяці тому

      i now have fixed the deepgram issue but it says it cannot download rhasspysilence i tried with pip also
      @@alexandresajus

  • @olakunleogunseye9657
    @olakunleogunseye9657 6 місяців тому +1

    aye this is so cool but there is no wake up keep and end key buh this the greatest and I know you know

  • @NotZymsYT
    @NotZymsYT 2 місяці тому

    can anyone help be i keep getting "ERROR: Failed building wheel for pyarrow" ?

    • @alexandresajus
      @alexandresajus  2 місяці тому +1

      Switch to Python 3.8 to 3.11. The Taipy version I am using is old and does not support Python 3.12. You can also try changing to taipy==3.1.0 in requirements.txt
      github.com/AlexandreSajus/JARVIS/issues/7

    • @NotZymsYT
      @NotZymsYT 2 місяці тому +1

      @alexandresajus you are awesome thank you so much !!!!

    • @NotZymsYT
      @NotZymsYT 2 місяці тому

      @@alexandresajus hey sorry to be a pest the original issue is fixed but now It seems like the api_key variable obtained from os.getenv("ELEVENLABS_API_KEY") is None, and the set_api_key function from the elevenlabs module is trying to set this None value as the value of the ELEVEN_API_KEY environment variable. However, environment variables must be strings, so attempting to assign None as the value raises a TypeError. im really new to all this and any help is super appreciated

    • @alexandresajus
      @alexandresajus  2 місяці тому

      @@NotZymsYT os.getenv("ELEVENLABS_API_KEY") should not get None. Please make sure you properly do step 3 of the installation as described at 6:04: make sure you have a .env file at the same level as main.py and make sure it is filled with the API keys using the syntax described in the README

    • @NotZymsYT
      @NotZymsYT 2 місяці тому

      @@alexandresajus i ran through the whole video on extra slow and now its giving me Traceback (most recent call last):
      File "main.py", line 59, in
      file_name: Union[Union[str, bytes, PathLike[str], PathLike[bytes]], int]
      TypeError: 'ABCMeta' object is not subscriptable

  • @ibrahimqadirmustafa
    @ibrahimqadirmustafa 6 місяців тому +1

    Amazing bro , I want create like this but in Kurdish language do you know how can i use it and speaking in Kurdish language?

    • @alexandresajus
      @alexandresajus  6 місяців тому

      Thanks! Unfortunately this might be harder to do in Kurdish. You need to find services that support the Kurdish language which are quite rare: both Deepgram and Elevenlabs do not support Kurdish currently. I'd guess that OpenAI does support Kurdish but I am not sure, even if it does not you can use a service to do the English-Kurdish translation in the middle of the pipeline.

    • @ibrahimqadirmustafa
      @ibrahimqadirmustafa 6 місяців тому +1

      @@alexandresajus
      Can I use Google translate package in python for translate the content response from AI

    • @alexandresajus
      @alexandresajus  6 місяців тому

      @@ibrahimqadirmustafa Yes this would solve part of the problem

    • @ibrahimqadirmustafa
      @ibrahimqadirmustafa 6 місяців тому

      @@alexandresajus ok thanks for you if i need help i can contact u 😁

  • @AdeniranFrancis
    @AdeniranFrancis 14 днів тому

    whenever i see videos like these, i clone the repos and i am never, ever able to successfully install all the dependencies or requirements.txt. makes me want to give up writing code altogether.

  • @_GIGABYTES
    @_GIGABYTES 6 місяців тому

    Traceback (most recent call last):
    File "F:\va\New folder (3)\JARVIS\display.py", line 5, in
    from taipy.gui import Gui, State, invoke_callback, get_state_id
    ModuleNotFoundError: No module named 'taipy'

    • @alexandresajus
      @alexandresajus  6 місяців тому

      Are you sure you installed the requirements of the project (5:33)?

    • @Threecommaaclub
      @Threecommaaclub 5 місяців тому +1

      hey, im I'm not sure if you're still running into this issue however I was able to solve this dilemma by creating a virtual environment as stated in the video try creating a virtual environment and if you need help there is a another video on UA-cam that should solve that issue.

  • @ashrafulislamemon8782
    @ashrafulislamemon8782 21 день тому

    I am stuck at git clone

  • @blazzycrafter
    @blazzycrafter 5 місяців тому +2

    YOU STOLE MY WORK?........
    ......
    ......
    .....
    .....
    ......
    HOW THE HEK DID IT WORK?
    XD

  • @tchen8124
    @tchen8124 6 місяців тому

    What’s the point of using elevenlabs? Without carefully finetuning, the voice sounds robotic anyway. Kinda a waste of money

    • @alexandresajus
      @alexandresajus  6 місяців тому

      What do you suggest I use? I looked for fast TTS AI services and stumbled upon Elevenlabs and did not ask too many questions. The whole point was trying to recreate Jarvis from Iron Man which has a robotic voice. It cost me a dollar for 30,000 characters

    • @kyouko5363
      @kyouko5363 6 місяців тому

      ​@@alexandresajus I'm tempted to make a suggestion here but.. if it gets too popular I might not be able to use it anymore. I can't afford API keys, and rely on it every day to ingest documentation and large pieces of text without interrupting my programming. Even made a private Neovim plugin for it.. as for LLMs.. I am *this* close to saying to hell with it and writing a daemon or local webserver or something that'll instruct Selenium to forward queries and responses on a headless Chromium instance. I'm tired of there being no free API keys for LLMs, not even rate limited ones, when the browser experience is free to begin with, but the moment I want to see the text in my terminal and respond in my terminal, it suddenly costs money, despite me technically having reduced their server load by skipping all the unnecessary CSS, HTML and JS every time I want to just send and receive a goddamned string? I *thought* ChatGPT had a free rate limited API key, and conveniently around the time it became part of my workflow, the API credits equivalent of a free trial runs out, almost as if to give you a cake and then take it right back after the first bite. I'm rambling. But hey, at least I've got good TTS for free.

  • @GreggHoush
    @GreggHoush 6 місяців тому +7

    You should disable those API keys and blur API keys in videos like these. Everybody wants free API keys.

    • @alexandresajus
      @alexandresajus  6 місяців тому +2

      Good advice. I disabled these keys right after recording and they all have a hard rate limit

  • @PHG_Team
    @PHG_Team 6 місяців тому

    bruh
    note: This error originates from a subprocess, and is likely not a problem with pip.
    ERROR: Failed building wheel for pyarrow
    Failed to build pyarrow
    ERROR: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects

    • @alexandresajus
      @alexandresajus  6 місяців тому

      This is probably due to a Python version issue: you are probably using Python 3.12 and this project uses Taipy which only supports Python 3.8 to 3.11. Please try using another Python version. If this does not help, do not hesitate to give more details on the issue here: github.com/AlexandreSajus/JARVIS/issues

    • @PHG_Team
      @PHG_Team 6 місяців тому +1

      ​​@@alexandresajusthx bro. If i delete display.py the assistant works? I want to create mine gui

    • @alexandresajus
      @alexandresajus  6 місяців тому

      @@PHG_Team Yes you can delete display.py, both programs are independant.

    • @PHG_Team
      @PHG_Team 6 місяців тому

      i'm italian adn i want to change speaking lenguage how can i do?@@alexandresajus

  • @Mirkolinori
    @Mirkolinori 24 дні тому

    Good Idea but Eleven Labs is to expensive, the price is more then horrible for live tts… better you use the build in OpenAi tts. Also you can use the openai api whisper, assistant gpt and tts… all with easy tts. Quick cheap and easy

  • @n00ter99
    @n00ter99 6 місяців тому +1

    That latency is painful

    • @alexandresajus
      @alexandresajus  6 місяців тому +2

      Agreed, unfortunately that latency is very hard to shave off. We could probably reduce it a bit by hosting locally, using quantized/smaller models and streaming the data instead of doing each task sequentially

    • @chrsl3
      @chrsl3 6 місяців тому +1

      it works so wonderfully, i wouldn't be bothered at all by the small latency.

    • @n00ter99
      @n00ter99 6 місяців тому +1

      @@alexandresajus Measure the latencies of the things you mentioned - you'll find that implementing streaming all the way across the stack will solve most of it. I have spent the last year building low latency streaming models in order to get sub 100-millisecond latencies for various audio/speech startups, it's the only way to get speeds and responsiveness that feels natural

    • @alexandresajus
      @alexandresajus  6 місяців тому +1

      ​@@n00ter99 I did profiling on each task and we are at about 1s for transcribing, 1s for gpt and 2s for generating audio. Really? Where can I find how to do this? What models/services were you using?