- 31
- 21 390
Linguflex
Приєднався 24 чер 2023
A humble attempt to bring sci-fi dreams of capable Jarvis-style AI companions closer to reality:
This channel is about my journey to write an open-source, ultra-responsive, capable personal assistant with a high quality voice and tons of features.
This channel is about my journey to write an open-source, ultra-responsive, capable personal assistant with a high quality voice and tons of features.
Linguflex with custom wakewords
Custom wakewords (like "samantha" and "linguflex") now possible in Linguflex.
github.com/KoljaB/Linguflex/
github.com/KoljaB/Linguflex/
Переглядів: 84
Відео
Ollama & Home Assistant: Ultimate Privacy with Linguflex for Smart Homes
Переглядів 519Місяць тому
Say hello to total privacy with #Ollama support for #HomeAssistant in Linguflex. Smart home automation running fully locally. #OpenSource #SmartHome github.com/KoljaB/Linguflex github.com/KoljaB/Linguflex/blob/main/docs/home.md github.com/KoljaB/Linguflex/tree/main/lingu/modules/home
Linguflex remote controlled with phone browser
Переглядів 307Місяць тому
Linguflex remote controlled with phone browser
Milkshake Scene with realtime speaker diarization algorithm
Переглядів 3133 місяці тому
3 Speakers, short sentences and music at the end. Code available here: github.com/KoljaB/WhoSpeaks File used: realtime_diarize.py
Realtime speaker diarization algorithm
Переглядів 1,7 тис.3 місяці тому
Realtime diarization of the coin toss scene from "No Country for Old Men", which is a challenge for every speaker diarization engine. Both speakers have a similar voice characteristics, they speak quiet and are hard to understand. This tests a new voice characteristics grouping algorithm with automatical speaker number detection that learns and gets better with time. Code is online here: github...
Current progress: developing a realtime speaker diarization algorithm
Переглядів 1343 місяці тому
Completely unpolished tests. Basic idea seems to work, yet very early work state still, needs more tests for verification and lots of refinement.
First realtime speaker diarization algorithm test
Переглядів 1383 місяці тому
Completely unpolished first test. Basic idea seems to work, yet very early work state still, needs more tests for verification and lots of refinement.
Linguflex 2.0 with Samuel L. Jackson voice
Переглядів 1293 місяці тому
Not Elevenlabs, just XTTS and RVC postprocessing with finetuned models. All local generated in realtime.
Linguflex 2.0 with Snoop Dogg voice
Переглядів 1203 місяці тому
Not Elevenlabs, just XTTS and RVC postprocessing with finetuned models. All local generated in realtime.
Linguflex 2.0 with David Attenborough voice
Переглядів 8833 місяці тому
Not Elevenlabs, just XTTS and RVC postprocessing with finetuned models. All local generated in realtime.
MoneyPrinterTurbo AI English Installation Manual
Переглядів 4723 місяці тому
Original video: ua-cam.com/video/vWBf5p fr4/v-deo.html Automatically translated with: github.com/KoljaB/TurnVoice CLI command: turnvoice vWBf5p fr4 -l en -v female
Setup Guide: Linguflex 2.0 on Windows
Переглядів 6294 місяці тому
Step-by-step video for installing Linguflex 2.0 AI assistant on Windows, featuring local operation and ultra-low latency. For detailed documentation, visit the GitHub repository. github.com/KoljaB/Linguflex
Speech interruption in linguflex 2.0, a free open-source personal ai assistant
Переглядів 2924 місяці тому
Source code: github.com/KoljaB/Linguflex /tree/lingu-2.0-preview Features: - lightning fast assistant - custom personalities - allows usage of local llms - high quality local realtime tts - easily extendable with your own functions - allows huge number of functions with keyword filtering
Replacing six speakers at once with azure voices with a single cli command
Переглядів 1507 місяців тому
Opensource: github.com/KoljaB/TurnVoice
#coqui ➕ https://github.com/KoljaB/TurnVoice = 🔥
Переглядів 2027 місяців тому
#coqui ➕ github.com/KoljaB/TurnVoice = 🔥
RealtimeTTS v0.3.3 now supports OpenAI TTS
Переглядів 4087 місяців тому
RealtimeTTS v0.3.3 now supports OpenAI TTS
RealtimeTTS v0.3.0 with simplified chinese support
Переглядів 1257 місяців тому
RealtimeTTS v0.3.0 with simplified chinese support
Fast local AI talk with a custom voice based on Zephyr model, RealtimeSTT and RealtimeTTS libraries.
Переглядів 1,5 тис.8 місяців тому
Fast local AI talk with a custom voice based on Zephyr model, RealtimeSTT and RealtimeTTS libraries.
Realtime translation in 6 languages with RealtimeSTT library in under 80 lines of code.
Переглядів 1,8 тис.9 місяців тому
Realtime translation in 6 languages with RealtimeSTT library in under 80 lines of code.
RealtimeSTT: A low-latency speech-to-text library with advanced voice activity detection
Переглядів 1,8 тис.10 місяців тому
RealtimeSTT: A low-latency speech-to-text library with advanced voice activity detection
RealtimeSTT: A low-latency speech-to-text library with advanced voice activity detection
Переглядів 26510 місяців тому
RealtimeSTT: A low-latency speech-to-text library with advanced voice activity detection
Voicebased interface to language model based on two new python libraries I developed
Переглядів 20810 місяців тому
Voicebased interface to language model based on two new python libraries I developed
Low latency AI voice talk in 60 lines of code using faster_whisper and elevenlabs input streaming.
Переглядів 6 тис.11 місяців тому
Low latency AI voice talk in 60 lines of code using faster_whisper and elevenlabs input streaming.
Wake word activation, smart home control, music playout and the sophia girlfriend personality
Переглядів 387Рік тому
Wake word activation, smart home control, music playout and the sophia girlfriend personality
Can you help me? :( TomlDecodeError: Reserved escape sequence used (line 100 column 1 char 3696) Traceback: File "C:\Users\tiago\miniconda3\envs\MoneyPrinterTurbo\lib\site-packages\streamlit untime\scriptrunner\script_runner.py", line 584, in _run_script exec(code, module.__dict__) File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\webui\Main.py", line 34, in <module> from app.services import task as tm, llm, voice File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\services\task.py", line 8, in <module> from app.config import config File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\config\__init__.py", line 6, in <module> from app.config import config File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\config\config.py", line 42, in <module> _cfg = load_config() File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\config\config.py", line 30, in load_config _config_ = toml.loads(_cfg_content) File "C:\Users\tiago\miniconda3\envs\MoneyPrinterTurbo\lib\site-packages\toml\decoder.py", line 514, in loads raise TomlDecodeError(str(err), original, pos)
Sorry, can't help. I have nothing to do with MoneyPrinterTurbo. This is a translation example as showcase for my TurnVoice GitHub project.
You gotta put the pexels API key in quotes like this - """ pexels_api_keys = [ "nE113EOVlRVbpWvRE0yZFuy6KmM9WAqvelyadayadayada",] """.
Action latency is yet to be improved , amazing project❤
Great work..do you have any ideas to reduce latency in text to speech..im working on it..
Looking promising!
Does Linguflex allow you to interrupt the model?
Yes, it does. Voicebased or via escape key. See: ua-cam.com/video/uQ8jJtalc9M/v-deo.html
Hey mate love your work. Unfortunately i cant get this software to run. Finally got your chat package working though (forget the name), adapted it for ollama (using langchain) as i just cant get the cuda version of llama cpp working (so takes about 20 secs for a response)...langchain just seems a little more straight forward and ollama is blazing fast. I was also experiencing a lot of those pipe issues, changing python version fixed it for me. Seems a difficult bit of software to maintain (like all AI packages) given the number of dependencies which are very sensitive to exact versions of other packages. But keep it up, amazing work.
Codebase for this: github.com/KoljaB/EmoTTS It uses OpenAI. Ollama is best choice I think for local models. You're right about maintaining AI projects ends up in dependency nightmares. Of course you can set fixed versions, but if not all dependent libs do this too it will mess up the environment nevertheless sooner or later.
@@Linguflex will be amazing when local llms are capable of this. Ive never bothered with openAI for these things, though I am a paid subscriber, aren't the costs in addition to that to use their API? whilst I only have 12GB of vram on my laptop, stuck with the comparatively dumb llms. Though even some of the coqui voice models at time express emotion, the quality of the text they are reading obviously plays a huge role. The thing. I hate about all ai libraries being stuck to specific python versions and package versions...is I must have installed a venv for just about every bloody python version and every bloody py torch version. My love/hate relationship with python.
@@bigdaddy5303 Very true words. It was an insane hazzle to combine all needed libs for linguflex into one single environment. We already have ollama support for linguflex in some implementations on our discord (so I have and another dev but we didnt release it yet). Should be coming soon...
would this project be implementable into a business
depends (tts license etc), but sure
I'm attempting to run a few example clients provided in the git. Maybe there's something obvious I'm missing, but an error I commonly run into is: ModuleNotFoundError: No module named 'RealtimeSTT'
If you did perform "pip install realtimestt" before then probably your pip/python env is somehow corrupted. You could try it in a virtual env.
@@Linguflex I knew it was something obvious 🤦 Don't know how I missed that step lol Thank you for your help
im gonna try to make a Vrchat STT app that puts the words above my head using their osc system :D
Awesome. How is interruptions implemented here? Thank
The interruptions are triggered by the volume level detected on the microphone. It's a straightforward implementation that unfortunately also gets interrupted if the text-to-speech (TTS) output is too loud. To eliminate this issue, implementing echo cancellation is necessary, but that process is quite complex and not trivial
Very nice. Greatjob❤ Out of curiosity, how would you handle back to back conversation with interruptionhandling without using space?
Thank you. We talked about how to do solid interruption in my discord channel recently: discord.gg/f556hqRjpv Highly encourage you to join, it's a great place to ask questions, share progress and get support from tech enthusiasts. Would love to see you there!
I really love your content and aspire to be like you. Although I am new to programming, I am learning Python and JavaScript simultaneously, which has been quite challenging. I live in the Kakuma refugee camp, where few people are interested in contributing to technology, focusing more on their own lives. Could you please update your full roadmap so that I can follow it? Additionally, it would be great if you could create a Discord community for discussions and support.
Thank you so much for your kind words! ❤ Great to hear that you're learning both Python and JavaScript at the same time in Kenya, I think your determination will definitely pay off. I don't have a specific roadmap since I usually just develop projects based on inspiration and current interests. I highly encourage you to join our Discord community: discord.gg/f556hqRjpv It's a great place to ask questions, share progress and get support from tech enthusiasts. Would love to see you there!
🤑🥰🥰🥰🥰🥰
I was looking to a way to add auto generated subtitle to offline videos. Can it be used for this porpuse? Didn't it need any speech recognition model?
I wouldn't use my realtime libraries for this: they are optimized for speed, not quality. I'd generate a srt file transcript with word-level timestamps with a good STT library like stable_whisper, then add the subtitles to the video using that srt file (ffmpeg can do this for example).
That's incredibly accurate. Nice work! Can you active-transcribe AND wake-word for commands? It'd be great if you could have it always listening and then do something on wake word.
No, currently not. The idea is good, I can see some use-cases for this. I'll think about that.
I love you work
Can I have it
Sure, here's the repo: github.com/KoljaB/WhoSpeaks
Can this be used hands free or does the mic need toggle per reply?
Can be used hands free
This amazing! Is there anyway I could run this on my computer then talk to it using a headset connected to my phone so it’s mobile?
Hello man I have difficulties running your library, can you help me ?
Sure. Could you write your problems as an issue, so others can see how it's solved afterwards? Depending on if it's input or output: github.com/KoljaB/RealtimeSTT/issues github.com/KoljaB/RealtimeTTS/issues
I don't understand how to use it...
What do you want to do? The "tests" folder contains some examples how you can use it: github.com/KoljaB/RealtimeSTT/tree/master/tests Maybe also the "tests" of RealtimeTTS can help, they also use RealtimeSTT a lot: github.com/KoljaB/RealtimeTTS/tree/master/tests
Impressive work, thanks
at the prerequisites 0:39 i downloaded cuDNN as a zip file, but i have no idea where to place it. is there an executable inside or do i extract them somewhere specific? (edit, for people that will be wondering the same: ua-cam.com/video/OEFKlRSd8Ic/v-deo.html ) as well as ffmpeg. as a non developer, it's not pretty straightforward :( (edit: tutorial for this one ua-cam.com/video/jZLqNocSQDM/v-deo.html )
Thanks a lot for this information, as a dev I'm often just not aware of the problems the users have. I'll rework the Prerequisites section, thx again.
And this is the worst it will be. Can’t wait for next update.
Diarrheazation?
This is epic, thank you for sharing!
Insanely good distinction after just a moment. Meanwhile i had to try and close my eyes and try to guess myself. Might aswell have been a monologue
Interesting but clearly struggling a bit lol
What language you used?
Python
I know basic python, where do I go next?
@@niscchay he literally posted the github page lol
Wheat
Hello, thank you for this great project. I know it's not the point but would it be possible to add the option to use text as input for the user ?
Yes absolutely, we need that. Any ideas where to place it in the UI?
@@Linguflex I know there are quiet few icons already but the simplest would be maybe to add another icon that would open a chat window ?
Super cool! is this in a public repo?
Not yet but soon
Nice one! I look forward to trying this out
very erotic voice 🙂
Thx 😀 took me forever to get that soft whisper into the voice, it's included in linguflex btw
That is awesome 👏 I just tested it. Great work! Is it possible to change some settings in the interruption? And if so, which file should I do it in?
Thanks a lot. ❤You can click on the 👂button, then a window opens. At the bottom of that window you can customize the trigger volume for speech interruption. Also you can disable speech interruption completely by adding allow_speech_interruption: false to the listen section of the settings.yaml file (in the Linguflex/lingu folder).
@@Linguflex Thank you ☺️ But I was thinking about to try to build on top of it. Because you might want to only interrupt with certain things, so it doesn’t get interrupted with phrases like “Ahh, I see” or short confirming answers
Right, that's actually a great idea!
@@Linguflex - just thought you would be able to have a more realistic kind of conversation, if it could be able to detect things like that while it’s speaking, so it doesn’t interrupt for everything said.
@@Linguflex - Another thing; is there somehow that I can reach out to you privately? I have a business idea that I have been working on for a while, and I have been following you for quite some time now, and would be interested to pitch an idea for a collaboration, if you are interested?
Wow thats fast!
Gut dass ich Deutsch, Englisch und Spanisch spreche, sonst würde diese Demo keinen Sinn machen
Thank you for showing us your library in action as well as letting us know how we can support it!
Is it possible to make the voice dictation instantaneous at the cost of accuracy? I want to try controlling the servos on an animatronic mouth with voice dictation. It doesn't have to be accurate, it just needs to be accurate enough to be convincing and as fast as possible
You probably want to use whisper.cpp with a quantized tiny model and grammar sampling, look up Georgi Gerganov's chess example.
You could also train a wake word model to do this. They are crazy fast and reliable but specialized on few keywords. Check Openwakeword or PvPorcupine.
It's impressive! Which GPU are you using?
Thank you. I have a RTX 2080 Super.
@@Linguflex Thanks for your answer! I have some questions. I've seen your email in the comments, can I email you?
Actually you want about 100 MS of delay at the very least. We're human and take time to process information and it would just seem unnatural to have a conversation where you felt like someone was finishing your sentences for you all the time.
I have followed a lot of your projects on GitHub! And I am amazed by your work. What platform would be able to generate the fastest and most realistic sounding voices from an input?
I'm on windows 10 and when I try to install linguflex, it says "Python not found". Do you know how to fix this?
You need to install python 3.9.9 first from here www.python.org/downloads/release/python-399/ Another linguflex release is coming soon too (hopefully release in ~2 weeks)
@@Linguflex Already installed that python version, but I'm happy to hear another release is coming! Good luck!
@@MrStellateWaffle Then maybe python was not added to the system's PATH environment variable
@@Linguflex It's a good news, i really like the work you are doing with your various libs.
I can't seem to install linguflex. Step 7: Launch Linguflex To do this, run the following batch file in the Linguflex installation folder: "start_linguflex" Alternatively: python linguflex.py When I do step 7, it says "Python not found". Do you know how I can fix this?
hi Buddy!! Im trying this approach but getting error, I have trained voice assitant using langchain and gpt 3.5 turbo and using elevenlabs api and opean ai api but latency is not reducing
Wow, awsome!
Hey brother! When i am running your program it is showing rate limit error. btw I am using free tier of openai
Elevenlabs or Openai API ran into rate limit. Check characters used in elevenlabs and settings limits in your openai account
@@Linguflex it is saying openai limit crossed. i am using free tier of openai. is free tier enough for this program to run or i must upgrade to paid tier?
Paid account, it needs openai api key.
Great
Incredible work! Found your projects today and I cannot describe in words how impressive this all is. +1!
Nice!
Incredible! I was working on the same project and had the issue of TTS latency: any Cloud TTS service has latency that is too high for real-time purposes. Definitely going to implement you approach. Thanks!
May I also point you to this one which can greatly help with TTS and latency: github.com/KoljaB/RealtimeTTS
Du bist Lustig. Echt cooles Werk!