Stop storing your secrets and API keys in your code!! Try Keeper, a password manager you can use in the terminal: (built for devs/admins): www.keeper.io/networkchuck I did it…..after days of frustration, blood, sweat and coffee..I finally figured out a way to clone a voice to use with my fully local, AI voice assistant!!!! This isn’t using cloud-based products like ElevenLabs…no…we are using a fully-local, open-source project called Piper TTS. This works wonderfully with the Assist voice pipeline in Home Assistant. 📝GUIDE and WALKTHROUGH: blog.networkchuck.com/posts/how-to-clone-a-voice/ 🔥🔥Join the NetworkChuck Academy!: ntck.co/NCAcademy **Sponsored by Keeper
i often have to slow down your videos to see and take notes what youre doing why not have it do the same (not even fully done with the video yet very happy with this my dad has wanted a morgan freeman ai assistant
The problem is you can’t access GPU from Docker…. Well, you can but you’ll end up doing all the same fiddling but with extra headache of the Docker layer
It probably has been years since i watched a 37 min video without skipping once let alone a tech video. I feel like my attention span has been permanently increased.
Quick note -- instead of removing silence, you would have been better served splitting at silence. The output would have been more intelligible for transcription, would not have required as many mid-word cuts which cause issues, etc etc
Hey thanks for all your videos on home automation. I've started my own home automation journey watching your channel and learning what's possible. Now looking forward to commanding my home like the USS Enterprise.... "Computer; make coffee" :D
Just bought a new house and am currently working on setting up automation and localizing everything offline. Challenge I'm hitting right now is getting mics in every space that go back to the assistant instead of having pi's everywhere. Also trying to limit the response to the room from which the request came from. Thanks for all the content! You have definitely made the process way more understandable and fun.
Hey Chuck, awesome video! I’m working on image detection, and it gave me an idea for your next project. How about a video on training custom image detection models? Like recognizing specific objects (e.g., PET bottles, toys) to expand what a home assistant can do. It could add some cool features to your Raspberry Pi assistant. Would love to see your take on it!
That was intense. I can't imagine, how much time, work, coffee and nerves you put in this project, but it really was worth it. Terry sounds great! I hope the next project is less nerve wrecking. xD
The topic is so crazy and fascinating, I think I‘ll do a home project like this. The only thing that bothers me that I don‘t want to run my desktop pc 24/7.
Lookin forward to building a local digital assistant with Multiple Personality Disorder, where Dr. Jekyl sounds like Morgan Freeman and Mr. Hyde sounds like Samuel L Jackson....
The instructions on your blog is incomplete... stuff missing and lots of libraries fail with torch and stuff. Cna you please try on a fresh ubuntu wsl install and follow your own guide and correct the errors coming up.
Had a lot of issues getting it running on macOS, but was able to successfully get it up and running on my Ubuntu machine with python 3.10.12. After a few minutes of training, I tested it out and was surprised with the results. Pretty cool! If I have hours of quality recordings, what would the amount be to get a quality voice? Did you ever figure out why yours was a bit quirky?
I also wanted to train my local ai voice assistant with my voice and started using the piper studio in the German language. It wanted me to say a lot of sentences that sounds like they're from an software call-center and could be used for software scam calls i.E. "The activation key you've entered is invalid" and in combination with other sentences like "Then call the police and see how far you get there" it sounds pretty strange to me. Then I saw a disclaimer on the page that says "By clicking Submit, you agree to dedicate your recorded audio to the public domain (CC0)". Is there anything known that the voice recorded by the software is distributed to the www and used for malicious phone calls?
Hey there Chuck, Great video, One more request or suggestion, whatever seems right, Make it talk with emotions, like the LLM is giving the responses and it is just reading it as it is, Maybe it should emphasize on those words, add some filler words and talk actually like it is a human talking. For example *talks intensely* shouldn't be read, instead adapted as emotion. Thank you, this is one of a gem Channel I have found which actually teaches cool stuffs.
Lmao, the mike monologues were the best thing I've ever heard. I really need to buy a new Pi so I can set up home assistant... I have an old RPi2, but it doesn't have the specs needed to run home assistant :(
I copied your last video and was like damn, i wish i could make my own, literally you a bit later, thank you :) The Ai thing is running in a virtualmachine in proxmox with a gtx970 so it's a little sheit but it works XD
I made a Pi Led Agent a couple days ago. I can turn Led ON and OFF using whisper(small model) to translate my voice to llama3.2:3b, then llama generates a response that executes a condition based on the string it provides and toggles the LED. Also model can respond using voice of piper(small model) with another prompt that llama does, besides the one that controls the LED. I use pre-promts to guide. Like explain to the LLM what it is, comands it should generate, and give it a few examples of how its done, as this can improve its responses.
Thanks so much for this favorite online classes. You are the best teacher. Please teach me how to make a raspberry pi that contains local chatgpt for generating texts to 3d gaming characters ❓
Hey Chuck fantastic and clear video! thank you! However you bring mixed messages, when you mention Keeper you said that is good that is "Cloud Based", but in your video, it seems like you prefer local installations (1:11 mark)
What is good for an individual (local hardware) may not be good for a company. As an individual, I’m willing to accept the cost and pain of maintaining a local infrastructure because it’s fun. For a business, the highest value becomes reliability.
I actually was so inspired by your last video that i developed a python script that runs natively on windows and does everything satellite does... well more.. stt and tts is on windows (no docker necessary). transcript is sent to conversation API through websocket and then response is turned into speech in 1/10th of second on your master race PC (no hate - console gamer here). So you do not need an extra satellite (about $70,- in extra HW) when you have a faster machine right in front of you. It was hard for two reasons. 1) home assistants docs and API are crap. API refuse to work if you pass wrong params, but docs don't tell you, that you actually CAN talk to Ollama on Home Assistant... if you spend 3 days in trial and error. (i am not python pro... I have not touched python before. I'm php enthusiast at best. That i was able to do this - makes me proud). But, for the last day I've been trying to replicate Donald's voice. Got Onnx, without tflite file. Demo works, but not in HA. Your vid comes just in time!
I've literally been following this series that you have been updating, From the Start to now - I have Ollama with AlwaysReddy setup on my Ubuntu 24.04 OS - Running this - I will be trying to implement this on a New Raspberry Pi 5 (Quick question, will it be beneficial to add the AI HAT that you get for the Pi?) But really interested in this project and thank you so much for the inspiration to follow along the journey. Much respect, Great Channel.
onnx is not a universal format for tts. There are more pth files for tts readily available. Also -- ONNX (Open Neural Network Exchange) is an open format built to represent machine learning models... any models be it stable diffusion, GPT's... I'm not guru, learned it all today
Hi Chuck! I want to integrate this to all the bedrooms in my soon to be home. I already plan to build in Sonos speakers into the ceiling(Sonos in-ceiling speakers). Is it possible to use this speakers instead of the small speaker that you are currently using? Thanks mate! Really enjoying your content! 🙌 (About to build my dream home and wants to make it smart/AI)
Not sure if anyone has thought of this... But I just downloaded an Audible with a celebrity reading and now have 3 hours of perfect training material 😂
@NetworkChuck, the Terry Crews voice clone does sound great, but I feel like you must have left something out. You attempted to use an automated process to generate an onnx file from your recorded voice, but the results were poor. You went back to Piper Recording Studio to get a decent voice clone. You said Mike spent some quality time with Piper Recording Studio for good results. I don't imagine Terry used Piper Recording Studio. So what did you do differently to achieve such a good result from prerecorded audio?
I just wanna say.. I am fully onboard with making my own AI assistant based on your video guides. However, the only thing holding me back is that I get Amazon Music via my Alexa. Would it be possible to include this service with this setup?
Hi Chuck, I really enjoyed your tutorial. Sorry if I am doing something wrong but, I have tried several times to add a lengthy comment which keep disappearing, do you know why this might be? Ernie
Hey, love your vids but can you also include AMD GPUs in your tutorials? Ik its only a small percentage of AMD but it would still be nice. (Or link another video as totorial, just something so i dont feel left alone lmao)
"I can't handle batch size" might not be gpu ram issue, but wrong software configuration. 4070 with 12GB couldn't batch size of 2, but on CPU 16 it was not a problem. I didn't try 32
To quote the maintainer of Piper Recording Studio: "This is only there because I'm hosting Piper Recording Studio where it is the case that submitted/uploaded audio is donated to the public domain. But when you run locally, it's up to you what you do with the data. I should add a flag to only show that text for my website"
It might be helpful if you separate the voice audio from the video into a splitter file then clean it up remaster the clean up listen for errors then break down words run them through a phonics filter create a alphanumeric dictionary in losless audio file trim the top frequency and bottom note the speed ratio and frame rate. Yhen train on the audio dictionary you created make sure yo speak the alphabet and numerical values clearly then use the training data yo compare and compete against one another then do a live training where you compete against the e audio in a olayback recording session it should be at least 1,000 words and 1,000 numbered lol😂
Super cool video. Tried the steps and I get an error trying to install numpy 1.24.4. : "module 'pkgutil' has no attribute 'ImpImporter'." Did you run into this as well? Can't find a solution just yet.
Stop storing your secrets and API keys in your code!! Try Keeper, a password manager you can use in the terminal: (built for devs/admins): www.keeper.io/networkchuck
I did it…..after days of frustration, blood, sweat and coffee..I finally figured out a way to clone a voice to use with my fully local, AI voice assistant!!!! This isn’t using cloud-based products like ElevenLabs…no…we are using a fully-local, open-source project called Piper TTS. This works wonderfully with the Assist voice pipeline in Home Assistant.
📝GUIDE and WALKTHROUGH: blog.networkchuck.com/posts/how-to-clone-a-voice/
🔥🔥Join the NetworkChuck Academy!: ntck.co/NCAcademy
**Sponsored by Keeper
i like you content
why not just slow down your videos have your ai hear it then slowly train the ai to speed it up
that way it can hear you annunciate
i often have to slow down your videos to see and take notes what youre doing why not have it do the same (not even fully done with the video yet very happy with this my dad has wanted a morgan freeman ai assistant
That laptop 3080 is more like a 3070 or a 3070ti at best..... but still better than my 3050 6gb running my ollama lol.
@@Yuriel1981 Your being very generous
With all the dependency issues and fiddling around, someone should totally make this toolkit into a docker image!
The problem is you can’t access GPU from Docker…. Well, you can but you’ll end up doing all the same fiddling but with extra headache of the Docker layer
It probably has been years since i watched a 37 min video without skipping once let alone a tech video. I feel like my attention span has been permanently increased.
thanks for bringing that to my attention, i hadn´t realized it was that long , crazy
Be careful with showing yt-dlp...
Linus had a strike for similar reasons, I think this video might receive the same "attention" from UA-cam unfortunately
Quick note -- instead of removing silence, you would have been better served splitting at silence. The output would have been more intelligible for transcription, would not have required as many mid-word cuts which cause issues, etc etc
Bravo, this is the peak educational youtube content. Learning with a twisted bit of fun
The end results of all the methods were so cool. Worth watching the entire video.
Hey thanks for all your videos on home automation. I've started my own home automation journey watching your channel and learning what's possible. Now looking forward to commanding my home like the USS Enterprise.... "Computer; make coffee" :D
Just bought a new house and am currently working on setting up automation and localizing everything offline. Challenge I'm hitting right now is getting mics in every space that go back to the assistant instead of having pi's everywhere. Also trying to limit the response to the room from which the request came from.
Thanks for all the content! You have definitely made the process way more understandable and fun.
Him: a CPU will work
Me: looking at my HP 540 g3
🥲
Yeah, naw dawg......I feel for you.
It will work - sooner or later
@@WWSchoof later, much much later
Hey Chuck, awesome video! I’m working on image detection, and it gave me an idea for your next project. How about a video on training custom image detection models? Like recognizing specific objects (e.g., PET bottles, toys) to expand what a home assistant can do. It could add some cool features to your Raspberry Pi assistant. Would love to see your take on it!
I had like flash backs for the 1st 10seconds, from being a kid yelling at those recorded talk back hamster toys with that same audio playing back XP
I'm so excited to try this out, each video I've tried to keep up and implement the home assistant and local ai. The voice is a wild addition
That was intense. I can't imagine, how much time, work, coffee and nerves you put in this project, but it really was worth it. Terry sounds great! I hope the next project is less nerve wrecking. xD
The topic is so crazy and fascinating, I think I‘ll do a home project like this. The only thing that bothers me that I don‘t want to run my desktop pc 24/7.
Thanks Chuck, that vid had me wanting more, what a project! I hope some other shanagins come about from this😊
OMG the Terry voice is AMAZING!!!!!
Nice, I never recognized Mike as the voice of Mandark on Dexter's Laboratory before now. That's awesome.
Lookin forward to building a local digital assistant with Multiple Personality Disorder, where Dr. Jekyl sounds like Morgan Freeman and Mr. Hyde sounds like Samuel L Jackson....
Nice to see your still uploading, used to watch you after school everyday ages ago through my window (we were neighbors)
WHAT
Please need to talk about cash for servers, how it is done, and from what background should I learn this technique, and do you have courses about it?
This is awesome. Thank you for all your work. And special Thanks for sparing us the crying. :-)
The instructions on your blog is incomplete... stuff missing and lots of libraries fail with torch and stuff. Cna you please try on a fresh ubuntu wsl install and follow your own guide and correct the errors coming up.
When chuck asked. Dont you want this in your home? I was like. F YEAH I DO!
Awesome! Now I can put your voice to the life-size doll I have of you....
Re: training issues: garbage in, garbage out. AI transcriptions are not suitable for AI training.
Lmfao your 1000% becoming my voice assistant when I have the time!
Thanks, Chuck! I was really looking forward to this video. I absolutely love your content!
I ended up trying a bunch of API LLMs and Open Ai 's Conversation agent and TTS is awesome and fast if you dont want to use your own hardware
I nearly wet myself when you played your voice after the training! Technology can't live without it😂
This is amazing, great you figured everything out. And ofcourse i want this in my home assistant 😮
Had a lot of issues getting it running on macOS, but was able to successfully get it up and running on my Ubuntu machine with python 3.10.12. After a few minutes of training, I tested it out and was surprised with the results. Pretty cool! If I have hours of quality recordings, what would the amount be to get a quality voice? Did you ever figure out why yours was a bit quirky?
I also wanted to train my local ai voice assistant with my voice and started using the piper studio in the German language.
It wanted me to say a lot of sentences that sounds like they're from an software call-center and could be used for software scam calls i.E. "The activation key you've entered is invalid" and in combination with other sentences like "Then call the police and see how far you get there" it sounds pretty strange to me.
Then I saw a disclaimer on the page that says "By clicking Submit, you agree to dedicate your recorded audio to the public domain (CC0)".
Is there anything known that the voice recorded by the software is distributed to the www and used for malicious phone calls?
I will try it, hope it works for me, is the project i have been waiting for! Thanks for sharing
When this guy opens up his camera gear, bugs and errors completely stop existing... I wish reality was like that.
Hey there Chuck, Great video, One more request or suggestion, whatever seems right, Make it talk with emotions, like the LLM is giving the responses and it is just reading it as it is, Maybe it should emphasize on those words, add some filler words and talk actually like it is a human talking. For example *talks intensely* shouldn't be read, instead adapted as emotion.
Thank you, this is one of a gem Channel I have found which actually teaches cool stuffs.
I now know what I'm doing when I get home, Thanks Chuck!
Lmao, the mike monologues were the best thing I've ever heard. I really need to buy a new Pi so I can set up home assistant... I have an old RPi2, but it doesn't have the specs needed to run home assistant :(
Honestly the chuck voice had me laughing so hard after 30 min of development 😂
I copied your last video and was like damn, i wish i could make my own, literally you a bit later, thank you :)
The Ai thing is running in a virtualmachine in proxmox with a gtx970 so it's a little sheit but it works XD
I made a Pi Led Agent a couple days ago. I can turn Led ON and OFF using whisper(small model) to translate my voice to llama3.2:3b, then llama generates a response that executes a condition based on the string it provides and toggles the LED. Also model can respond using voice of piper(small model) with another prompt that llama does, besides the one that controls the LED. I use pre-promts to guide. Like explain to the LLM what it is, comands it should generate, and give it a few examples of how its done, as this can improve its responses.
Was it _really_ free, though? haha. Awesome job! Thanks for this!
Wow the Terry crews voice was amazing! Proper voice for your beefy Terry AI server! Congratulations
You were my hero just with the other video, and now @just 1:23 you are more hero than hero .... LOL
Thanks so much for this favorite online classes. You are the best teacher. Please teach me how to make a raspberry pi that contains local chatgpt for generating texts to 3d gaming characters ❓
Fun fact: Demirkapı means iron door in Turkish (bill probably are)
Bro has Brad Boimler vibes! I'm here for it.
He just needs the Boimler scream!
@@ethanberg1 That could be the beard that Boimler has been growing all season.
Thanks Chuck, I think I'm to dumb to do that, But it looks so cool and out of the cloud 👍
Chuck says "So many little things to remember" all I hear is "take notes and write a script as you will never remember them all"
Next video must be putting your voice in a Chuck the assassin doll with creepy phrases pleaaasse 😂
Hey Chuck fantastic and clear video! thank you! However you bring mixed messages, when you mention Keeper you said that is good that is "Cloud Based", but in your video, it seems like you prefer local installations (1:11 mark)
What is good for an individual (local hardware) may not be good for a company. As an individual, I’m willing to accept the cost and pain of maintaining a local infrastructure because it’s fun. For a business, the highest value becomes reliability.
@NetworkChuck Hello my name is suck 😁🤣
Absolute Beast Mode..!!! You Rock ...!! Cheers :)
I actually was so inspired by your last video that i developed a python script that runs natively on windows and does everything satellite does... well more.. stt and tts is on windows (no docker necessary). transcript is sent to conversation API through websocket and then response is turned into speech in 1/10th of second on your master race PC (no hate - console gamer here). So you do not need an extra satellite (about $70,- in extra HW) when you have a faster machine right in front of you. It was hard for two reasons. 1) home assistants docs and API are crap. API refuse to work if you pass wrong params, but docs don't tell you, that you actually CAN talk to Ollama on Home Assistant... if you spend 3 days in trial and error. (i am not python pro... I have not touched python before. I'm php enthusiast at best. That i was able to do this - makes me proud). But, for the last day I've been trying to replicate Donald's voice. Got Onnx, without tflite file. Demo works, but not in HA. Your vid comes just in time!
Networkchucks voice with an chinease accent sounds so funny 😆😆
@NetworkChuck what kind of keyboard do you use? I'm dying to know because it just sounds SO good
Oh man, love it! Freakin' cool! Totally worth the effort! 😁
Now we're talking. Been waiting for this one!
That was amazing..I am currently building mine❤❤
I can recommend using stable whisper instead of whisper to get better timestamps.
I've literally been following this series that you have been updating, From the Start to now - I have Ollama with AlwaysReddy setup on my Ubuntu 24.04 OS - Running this - I will be trying to implement this on a New Raspberry Pi 5 (Quick question, will it be beneficial to add the AI HAT that you get for the Pi?) But really interested in this project and thank you so much for the inspiration to follow along the journey.
Much respect,
Great Channel.
Thanks! This is going to make my life so much easier. Going to use a pi zero 2 W and a keyestudio 2 mic hat.
bro there are much easier ways to clone voice locally. but still fun to watch this video 👌👌
I remember thinking ".wav" files were huge!
I can't sleep without watching your video's 🎉🎉
onnx is not a universal format for tts. There are more pth files for tts readily available. Also -- ONNX (Open Neural Network Exchange) is an open format built to represent machine learning models... any models be it stable diffusion, GPT's... I'm not guru, learned it all today
Hi Chuck! I want to integrate this to all the bedrooms in my soon to be home. I already plan to build in Sonos speakers into the ceiling(Sonos in-ceiling speakers). Is it possible to use this speakers instead of the small speaker that you are currently using? Thanks mate! Really enjoying your content! 🙌 (About to build my dream home and wants to make it smart/AI)
We all know that Morgan Freeman is what Chuck is going to change it to after the video ends.
This is overkill. All you need is Alltalk tts and 10ish second sample of your voice.
The accuracy won't be the same though
29:20 oh no not the csauce
i love you chuck. youre the best
Good job chuck
YOU LET OUT MAGIC SMOKE!
Hi Ya & best wishes. Thanks for work. Be Happy. Sevastopol/Crimea.)
Not sure if anyone has thought of this... But I just downloaded an Audible with a celebrity reading and now have 3 hours of perfect training material 😂
This was so cool!
You should do some Unriad Videos, you can do all this in Unraid with docker apps so much easier
@NetworkChuck, the Terry Crews voice clone does sound great, but I feel like you must have left something out. You attempted to use an automated process to generate an onnx file from your recorded voice, but the results were poor. You went back to Piper Recording Studio to get a decent voice clone. You said Mike spent some quality time with Piper Recording Studio for good results. I don't imagine Terry used Piper Recording Studio. So what did you do differently to achieve such a good result from prerecorded audio?
I just wanna say.. I am fully onboard with making my own AI assistant based on your video guides. However, the only thing holding me back is that I get Amazon Music via my Alexa. Would it be possible to include this service with this setup?
Nevermind. I just found an article on how to do it.
@@derivitivcan you post a link?
A very good UA-camr for voice based AI is Jarods Journey. He's fantastic and I learned everything I need to know about voice AI from him.
Holy clone!
I may end up spending a month clipping "The A-Team" to get mine to talk like Mr. T, fool!
great video
Mikes voice sounds like Mandark from Dexter's lab lmao 🤣🤣🤣
Sir big fan ❤
Hey chuck, still you didn't fix that longer conversation....any way to fix it? Kr summarise the context?
"Hi, my name is Chuck. My voice is my password. Verify me."
it doesn't work for 40xx series of gpu, so everyone on 40xx series should do the github issue fix.
Hi Chuck, I really enjoyed your tutorial. Sorry if I am doing something wrong but, I have tried several times to add a lengthy comment which keep disappearing, do you know why this might be? Ernie
"Hey Chuck! Show me how to groom my beard like yours!" "Sure! All ya gotta do is drink a LOT of my coffee!" ;)
Hey, love your vids but can you also include AMD GPUs in your tutorials? Ik its only a small percentage of AMD but it would still be nice.
(Or link another video as totorial, just something so i dont feel left alone lmao)
"I can't handle batch size" might not be gpu ram issue, but wrong software configuration. 4070 with 12GB couldn't batch size of 2, but on CPU 16 it was not a problem. I didn't try 32
Time to find me some Majel Roddenberry clips I guess (for respectful personal, non-commercial, non-distribution use of course)
hell yeah...he made it work....
Awesome 🥰🥰🥰🥰
I Wanted to use this so much, but couldn't 'cos of the disclaimer @06:10 that dedicates all audio clips public domain.
?
To quote the maintainer of Piper Recording Studio: "This is only there because I'm hosting Piper Recording Studio where it is the case that submitted/uploaded audio is donated to the public domain.
But when you run locally, it's up to you what you do with the data. I should add a flag to only show that text for my website"
It might be helpful if you separate the voice audio from the video into a splitter file then clean it up remaster the clean up listen for errors then break down words run them through a phonics filter create a alphanumeric dictionary in losless audio file trim the top frequency and bottom note the speed ratio and frame rate. Yhen train on the audio dictionary you created make sure yo speak the alphabet and numerical values clearly then use the training data yo compare and compete against one another then do a live training where you compete against the e audio in a olayback recording session it should be at least 1,000 words and 1,000 numbered lol😂
Why do you use windows more and more instead of a native linux installation? I mean, Linux is a bit more secure (depending on the OS you're using).
Terry should use a Borg voice
You did it!!!
Super cool video. Tried the steps and I get an error trying to install numpy 1.24.4. :
"module 'pkgutil' has no attribute 'ImpImporter'."
Did you run into this as well? Can't find a solution just yet.