Thank you dude. You just made my day. I tried to use this through Pinokio, but my PC sucks and it took 25 minutes to generate a 4 word sentence. NOW, I can run it online and it took just seconds. I have a 4 second audio clip of my departed father and NOW I can finally make all the AI pictures I made of him, talk. I just made my first one and it is scary good.
when these things start becoming full packages and not just tech demos or developer APIs then then so much is going to change . Packages with plugins , slicing tools , synth modulators , speed curves and things that let you link images & vid to export or create decision trees then the amount of writers who going to publish their own media productions is going to be huge . Big studios think we have run out of stories and keep regurgitating the same stuff with different skins but there are so many creative people out there with stories trapped in their heads and they just need the right tools to be able to tell them the way they want in their own style and language and just have them translate across cultures at scale . That doesnt even cover education where teachers are going to write up re-enactments of historic ,scientific events or mathematical scenarios , so that students can just watch videos as homework and understand why before teachers show them how to get the most value out of short classes and they'll be able to do it in their own language and translate to students native languages making this even better for poor developing nations to grow their education systems fast quickly.
The big question is if the community can improve it so it can reach its full potential. BTW I think the creators of the model said the cost wasn't astronomical.
I loved the ending example, and seeing you smile and enjoy it! This is why I watch your stuff Bob! Please keep entertaining and informing us. In return, I always watch the full two ads without skipping.
Me. I wonder if bob will cover. Yes. Yes he will thanks I’ve been using ttsopenai for two podcasts i started. I needed something that can clone my voice really well. Thanks a bunch as always
6:09 You can see the Whisper transcription if you click the "Terminal" tab in the Pinokio sidebar. From there you can copy it and paste it into the Reference Text field in the UI if you want to use the same text multiple times and want to skip the transcription step (faster).
This is so fascinating. I will have to test it on my Mac. I’m not sure how accessible it is with the voiceover screen reader, but I will give it a whirl. I’ve been using 11 labs for a while, so this would be a really nice tool if this works properly. Do you know if you can use this on iPhone and android as well? It would be pretty cool if you could.
Wow, this is absolutely mind-blowing! The accuracy and emotion captured in just a 10-second sample are incredible. It’s amazing to see how far voice cloning technology has come-F5-TTS really nailed it! Can't wait to see what more is possible with this level of precision and emotion. Great job!
Bit confused with the ending? So if I have a 15 second clip of a narrator with just a basic voice can I add happiness & sadness to that??? That would be great if u could do that from a generic clip.
It also works with a 6gb card. I have an rtx3060 6gb in my laptop and it still works great and takes about 45 seconds usually for a generation. not the best timing, but totally worth the wait
Bob, would this be good for cloning a friend’s singing voice. My musical partner passed away a year ago from lung cancer, but I, and his widow, would still like to hear him “live” again through my productions. I have used software to animate him singing, but need tools for the voice. Any suggestions?
I think there would be much better options for that, since singing is a lot more complex. If you have recordings of your friend where he sings, an easy way would be to look into RVC ( Retrieval-based Voice Conversion). If you don't know, there was a huge hype around that last year since people started making songs with the voices of big stars. There's a lot of platforms that use this where you can train your own model and can basically turn every singing recording into a recording with the voice of that model. You can also do this for free on your own machine, but I guess it's a bit harder to understand for a beginner and as always you got to have a capable computer. You can find "RVC" in Pinokio, the program Bob used in this video. But if you ask me, the absolute best option would be to use ACE Studio. It's basically a composing suite for AI voices and they added the option to train your own model for free lately. So you could basically use your friend's voice as an instrument that sings lyrics and it is highly flexible. Unfortunately, the software itself is not free, there's a monthly fee. Bob made a video about it lately, but he used his speaking voice as samples for the model which isn't optimal. I got much better (I would even say shockingly realistic) results with samples of me singing. But keep in mind that all of those solutions still aren't perfect and could easily lead to frustration in a case like yours. I wish you the best of luck and hope you find the right solution for you!
@@missoats8731 Thanks for the very useful advice. I am well versed in audio recording and have lots of multitrack recordings of his voice that I guess could be used to train the models. What I don’t have is a lot of experience in the AI scene, but have been watching the emerging apps through UA-cam and the web. I will take your advice to heart and explore RVC and Ace Studio. Thanks again for the helpful guidance!
Bob. Is it better than Fake You? I've been using Fake You, and I love it. Oh wow Bob. It turns out that F5-TTS is built in to fake u, and it's pretty cool. When did they add that?
Wonder if this could be used to bring a loved ones voice back to Life who have passed away. If you took an audio clip from seeing Old recording. Might be a bit uncanny, Valley though.
FYI: The Pinokio full install can take a long time if your connection isn't fast, so, don't do it on a deadline and have something else to do to keep yourself busy.
I've been waiting for a free tool to transcribe ebooks into audiobooks for me. This might have met the threshold I've been looking for! I have a 12GB 4070 Super, 64GB RAM, and a 20-core i7-14700. I'm extremely curious to see if this would take a month or a day. I really have no idea! Based on your experience, what would your total guess at extrapolating be for a 10-hour audiobook?
Please use DeepL for this text translation, not google translate. XD Found it today and didn't had the time yet to try it myself, but it is indeed really good on voice cloning itself also the multiple tone feature is nice. I wonder if you can mix tones inside sentences.
It would be rly cool if the sotfware had feature to change already recorded audio to sound like sample audio :) do you know any ai open source software that could do that? :) I know there is RVC but you have to have a train model first, and that model requries ~15min of audio.
Thanks for the video, much easyr to install via pinokio, great tool. My F5 is using CPU instead GPU, is taking too long to process the audio file. I have a 2060 rtx, but a not good processor, so each second of TTS is taking 1 minute to process. So a 10 second TTS take 10 minutes to produce. :/ How do I configure the gpu to process the TTS instead? Or thats not possible? thanks!
My problem with it is that even if i put high quality recrodings (dataset) the results (soundwise) dosnt sound good . I meen the clone of the character is good but sound quality is bad like a phone call quality or somthing like that
The longish pause between sentence fragments is weird for me. I don't notice the problem in your examples, or in examples I see in other videos about F5-TTS. It almost sounds [pause] like this. [pause] And there's little I can do, [pause] to adjust it.
Thanks , this is the best voice clone by far. I just use the Hugging Face , which is all I really need. The F5-TSS is really good but I have to say the F2-TSS version is much better in emotion. I did try the add emotion feature to the F5-TSS as done in your video but it don;t seem to work , it shows (No Audio being produced) even though I uploaded a audio and it just produces the same audio. What am I'm doing wrong ?
I been using it for days and it's better than most, but still not as good as like playhd or hi whatever it is, but... for a local its cool, i been playing with it local
The final example, while very impressive, falls short of what I would consider “usable”, at least for my needs. I suffer from extreme camera/microphone anxiety. I believe I’m a decent writer and I’m not shy about most of the things I might do in front of a camera(playing guitar, woodworking…). But trying to speak and explain what I’m doing feels impossible. So the idea of being able to type things up and have a voice speak it for me sounds like a brilliant solution! But in many cases, it would need to sound like me and I don’t want friends clicking on my videos and saying “what the hell is this?” because it’s obviously not me or just doesn’t sound right.
It struggles with British accents, particually northern ones like mine. When I try to synthesize my own voice it seems to make me sound like I speak with RP. I've even tried accentuating my accent in the sample and it doesn't make any difference.
I've been getting a lot of scam calls lately that don't seem to say anything. I think they're trying to get me to say 10 seconds worth of stuff. This stuff's amazing but it's very effective for scamming more than anything else
I guess anything is possible. Of course the sample of you would sound like you're on the phone...but I guess that could certainly be misused. It's dicey technology for sure...but it's everywhere, and I don't see it going away.
Dude I cant get through your videos cause your dcreen is blinding me, youe in a dark room and you have it on Light mode, @15:57 you say you want people to enjoy these videos without some annoying sound, what is really annoying is that I have to find sunglasses to watch your videos, cause for some insane reason you like to sit in the dark and blind yourself with a bright ass white screen, its no wonder why you wear glasses and stil squint at the screen, your going blind... Turn the damn screen on dark mode, and welcome to the 2020s. Either adopt Dark mode or make these videos much shorterm like 3-4 mins instead of 20.. Think about this, nobody who uses light mode is going to turn away cause your using dark mode but I can guarantee Dark mode users will, I can proove it, soon as i hit send im clicking next video (Jesus Christ I looked up just to see what was the next video and I couldnt see my goddamn keyboard after,) Im just gonna have chatGPT summarize any of your videos if theres one Im interested in...
@@ChrisDallasDualped Well, I don’t agree that it sucks, because my tests worked great for such a short sample. Sorry you didn’t like your results, but others here have expressed that they like it. So results may vary. I definitely stand by my opinion that it’s worth trying.
It's open source. It's pretty impressive for 10 seconds. It will most certainly get better. My opinion is that it is actually closer than many competitors, and it adds control that 11 labs does not. Absolutely worth giving my attention in my opinion.
Thank you dude. You just made my day. I tried to use this through Pinokio, but my PC sucks and it took 25 minutes to generate a 4 word sentence. NOW, I can run it online and it took just seconds. I have a 4 second audio clip of my departed father and NOW I can finally make all the AI pictures I made of him, talk. I just made my first one and it is scary good.
Sorry Pinokio didn't out for you as a solution. Yeah, the GPU definitely makes a difference.
@@BobDoyleMedia Pinokio does work for me using FaceFusion 3.0 tho. Still a little slow but it's tolerable
How does it work on mac?
I don't have a nivida graphics card instead of an AMD Radeon. Will that work?
How did you run it online?
You seem like a down to earth guy. Thanks for this video and for explaining everything step by step :)
when these things start becoming full packages and not just tech demos or developer APIs then then so much is going to change . Packages with plugins , slicing tools , synth modulators , speed curves and things that let you link images & vid to export or create decision trees then the amount of writers who going to publish their own media productions is going to be huge .
Big studios think we have run out of stories and keep regurgitating the same stuff with different skins but there are so many creative people out there with stories trapped in their heads and they just need the right tools to be able to tell them the way they want in their own style and language and just have them translate across cultures at scale . That doesnt even cover education where teachers are going to write up re-enactments of historic ,scientific events or mathematical scenarios , so that students can just watch videos as homework and understand why before teachers show them how to get the most value out of short classes and they'll be able to do it in their own language and translate to students native languages making this even better for poor developing nations to grow their education systems fast quickly.
Really cool, thanks for the tip on Pinocchio, very smooth installation. I'll be playing with a bunch of other toys through Pinocchio now!
The big question is if the community can improve it so it can reach its full potential. BTW I think the creators of the model said the cost wasn't astronomical.
If you're taking a survey on GPUs, both models worked fine on my Gigabyte laptop with an RTX 3070 GPU.
While the video itself was amazing, the last 10 seconds took me away! 😅 You're amazing, well done.
I might try this one! ;) Please keep entertaining and informing us in this kind of video contents.
I watched again, and both times I felt your pain about the loading, but made me laugh every time
I loved the ending example, and seeing you smile and enjoy it! This is why I watch your stuff Bob! Please keep entertaining and informing us. In return, I always watch the full two ads without skipping.
Thanks so much!
Thnks for Making it Simple and I also tried to use this through Pinokio, but my PC sucks and it took hours generate a sentence.
I'm excited about this info. Thanks for always sharing
My pleasure!
I was listening to you speak thinking you are the Matthew McConaghey of AI vids, and at that precise moment, your sample audio mentioned his name
@@jamesvictor2182 Coooooooooool 😎
In my opinion, the e2 model does the voice cloning more accurate.
The F5 model sometimes gives results that doesn’t really sound like the voice.
Me. I wonder if bob will cover. Yes. Yes he will thanks I’ve been using ttsopenai for two podcasts i started. I needed something that can clone my voice really well. Thanks a bunch as always
I might try this one... like that it can talk in Mandarin, could be great for learning another language.
6:09 You can see the Whisper transcription if you click the "Terminal" tab in the Pinokio sidebar. From there you can copy it and paste it into the Reference Text field in the UI if you want to use the same text multiple times and want to skip the transcription step (faster).
@@LiFancier thank you! Great tip.
@@BobDoyleMediaI've experimented it even works with effects
This is so fascinating. I will have to test it on my Mac. I’m not sure how accessible it is with the voiceover screen reader, but I will give it a whirl. I’ve been using 11 labs for a while, so this would be a really nice tool if this works properly. Do you know if you can use this on iPhone and android as well? It would be pretty cool if you could.
Wow, this is absolutely mind-blowing! The accuracy and emotion captured in just a 10-second sample are incredible. It’s amazing to see how far voice cloning technology has come-F5-TTS really nailed it! Can't wait to see what more is possible with this level of precision and emotion. Great job!
I think if you use EMaster with the outputs it will sound even better
Bob, your voice and personality can carry the content easily without the (over-gained) music used in the intro. Great content🌟👏👍
LoL the Windows update progress bar, u made me spit my coffe 😂
@@p_p I always appreciate when someone lets me know they caught some little thing like that. Sorry for the mess. 🤪
10 seconds in, "it's not perfect"
title: "perfect"
...
Okay.
Bit confused with the ending?
So if I have a 15 second clip of a narrator with just a basic voice can I add happiness & sadness to that???
That would be great if u could do that from a generic clip.
It also works with a 6gb card. I have an rtx3060 6gb in my laptop and it still works great and takes about 45 seconds usually for a generation. not the best timing, but totally worth the wait
how long of words did you do to get this estimate. I'm just curious
Does it only support English language?
Thanks for the video for taking the time to produce it
@@SAMEGAMAN right now it’s English and Chinese.
Bob, would this be good for cloning a friend’s singing voice. My musical partner passed away a year ago from lung cancer, but I, and his widow, would still like to hear him “live” again through my productions. I have used software to animate him singing, but need tools for the voice. Any suggestions?
I think there would be much better options for that, since singing is a lot more complex. If you have recordings of your friend where he sings, an easy way would be to look into RVC ( Retrieval-based Voice Conversion). If you don't know, there was a huge hype around that last year since people started making songs with the voices of big stars. There's a lot of platforms that use this where you can train your own model and can basically turn every singing recording into a recording with the voice of that model. You can also do this for free on your own machine, but I guess it's a bit harder to understand for a beginner and as always you got to have a capable computer. You can find "RVC" in Pinokio, the program Bob used in this video.
But if you ask me, the absolute best option would be to use ACE Studio. It's basically a composing suite for AI voices and they added the option to train your own model for free lately. So you could basically use your friend's voice as an instrument that sings lyrics and it is highly flexible. Unfortunately, the software itself is not free, there's a monthly fee. Bob made a video about it lately, but he used his speaking voice as samples for the model which isn't optimal. I got much better (I would even say shockingly realistic) results with samples of me singing.
But keep in mind that all of those solutions still aren't perfect and could easily lead to frustration in a case like yours. I wish you the best of luck and hope you find the right solution for you!
@@missoats8731 Thanks for the very useful advice. I am well versed in audio recording and have lots of multitrack recordings of his voice that I guess could be used to train the models. What I don’t have is a lot of experience in the AI scene, but have been watching the emerging apps through UA-cam and the web. I will take your advice to heart and explore RVC and Ace Studio. Thanks again for the helpful guidance!
Bob. Is it better than Fake You? I've been using Fake You, and I love it. Oh wow Bob. It turns out that F5-TTS is built in to fake u, and it's pretty cool. When did they add that?
Wonder if this could be used to bring a loved ones voice back to Life who have passed away. If you took an audio clip from seeing Old recording. Might be a bit uncanny, Valley though.
This is great .
Did they took out the "podcast" section?? As it is gone in last few days from Pinokio's rep!!
Here we go again with another of your treasures. U DA MAN
@@joseparedes380 thanks! Should be a fun one!
FYI: The Pinokio full install can take a long time if your connection isn't fast, so, don't do it on a deadline and have something else to do to keep yourself busy.
So, this is what Suno and Udio is using? Or something similar. That's why they can reproduce the singer's voice with only small sample.
Yes, exactly. It seems like Suno and Udio are using a similar approach,
I've been waiting for a free tool to transcribe ebooks into audiobooks for me. This might have met the threshold I've been looking for! I have a 12GB 4070 Super, 64GB RAM, and a 20-core i7-14700. I'm extremely curious to see if this would take a month or a day. I really have no idea! Based on your experience, what would your total guess at extrapolating be for a 10-hour audiobook?
Please use DeepL for this text translation, not google translate. XD
Found it today and didn't had the time yet to try it myself, but it is indeed really good on voice cloning itself also the multiple tone feature is nice. I wonder if you can mix tones inside sentences.
It would be rly cool if the sotfware had feature to change already recorded audio to sound like sample audio :) do you know any ai open source software that could do that? :)
I know there is RVC but you have to have a train model first, and that model requries ~15min of audio.
clonemyvoice AI fixes this. Perfect voice clone with emotion.
which one? i cant seem to find it
It's a comparison page of 'Top AI Voice Cloning Software in 2024' to pay for.
Thanks for the video, much easyr to install via pinokio, great tool.
My F5 is using CPU instead GPU, is taking too long to process the audio file.
I have a 2060 rtx, but a not good processor, so each second of TTS is taking 1 minute to process.
So a 10 second TTS take 10 minutes to produce. :/
How do I configure the gpu to process the TTS instead? Or thats not possible?
thanks!
ooh cool, only few seconds input to train
Yes, I find it very impressive.
So, how do you fix errors? Like, the AI voice chose to emphasize “sick” in the sentence that was intended to emphasize the word “out” (13:13).
Works on my windows laptop: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 2.42 GHz?
My problem with it is that even if i put high quality recrodings (dataset) the results (soundwise) dosnt sound good . I meen the clone of the character is good but sound quality is bad like a phone call quality or somthing like that
The longish pause between sentence fragments is weird for me. I don't notice the problem in your examples, or in examples I see in other videos about F5-TTS. It almost sounds [pause] like this. [pause] And there's little I can do, [pause] to adjust it.
Thank you for your comment. I know what. You mean. 😂 Been running into that with other TTS AI. Might still give this one a go though.
Thanks , this is the best voice clone by far. I just use the Hugging Face , which is all I really need. The F5-TSS is really good but I have to say the F2-TSS version is much better in emotion. I did try the add emotion feature to the F5-TSS as done in your video but it don;t seem to work , it shows (No Audio being produced) even though I uploaded a audio and it just produces the same audio. What am I'm doing wrong ?
What if I don't have room on my C drive? Pinokio installs it there, and I cant change home directory because it will give me an error.
Any idea on how well it runs on Apple silicon macs. was planning to get an m2 pro mac
The ‘Corner Over There’ is a half a mile down the road at the corner of the farm…..
I have an Irish accent and I can never TTS websites right, I either sound American or English! 😁
Yea. I tried a couple of uk soccer pundits. The scottish and irish come out real bad. Shame about that.
Love it
I did try that TTS but for some reason the copy paste function is not working, you need to type manually 😢
Apparently it doesn't work on Mac OS 13.5, I have an error message?
Cool !😊
amazing...many many thanks....
Can you tell me the best AI for lip syncing non human characters? Thanks
I been using it for days and it's better than most, but still not as good as like playhd or hi whatever it is, but... for a local its cool, i been playing with it local
I'm amazed 11 labs still hasn't got emotions? They been out for over a year on playht & revoicer???
Hey Bob
Can I load Pinokio on Mimic-PC? How would I do that?
Would this work for vocals like replay?
The final example, while very impressive, falls short of what I would consider “usable”, at least for my needs.
I suffer from extreme camera/microphone anxiety. I believe I’m a decent writer and I’m not shy about most of the things I might do in front of a camera(playing guitar, woodworking…). But trying to speak and explain what I’m doing feels impossible. So the idea of being able to type things up and have a voice speak it for me sounds like a brilliant solution! But in many cases, it would need to sound like me and I don’t want friends clicking on my videos and saying “what the hell is this?” because it’s obviously not me or just doesn’t sound right.
It struggles with British accents, particually northern ones like mine. When I try to synthesize my own voice it seems to make me sound like I speak with RP. I've even tried accentuating my accent in the sample and it doesn't make any difference.
Could you advise make tamil Text to speech with my own voice
The E2 sound like Office Space.
Can this be used on mimic computer?
Please use the dark theme.
I like the light theme.
No
Is it available on any online services like MimicPC ?
@@epicchannel4724 i’m hoping it comes to mimicPC. I’ll ask them about it.
15:20 The emotion needs to be in curly brackets, actually. (Parentheses will not work.)
Does this work with vocals or only spoken language?
only spoken language
You can very likely train a fine-tuning specifically for vocals and use that on top.
We want a tutorial about the last 5 seconds of the video😅
Definitely has come a long way but the longer it plays, the less human it sounds.
The Windoze progress bar is 100% not 100% when it sits at 100% for 100% of 3 plus minutes before getting to 100% finished.
Is this software safe? I don't even expect it to be free, just, is it safe?
@@czesnikadam6355 it is both free and safe. At least I haven’t had any problems with it.
Loads of artifacts
i tried it, but the the quality is so and so....
我的天啊,以后怎么分辨真人和AI😂
I've been getting a lot of scam calls lately that don't seem to say anything. I think they're trying to get me to say 10 seconds worth of stuff. This stuff's amazing but it's very effective for scamming more than anything else
I guess anything is possible. Of course the sample of you would sound like you're on the phone...but I guess that could certainly be misused. It's dicey technology for sure...but it's everywhere, and I don't see it going away.
Why do people feel the need to edit almost every sentence in a video? I’m running out of breath listening. Leave some time to breath.
I would subscribe several times if I could, but you can find me and have cold beers together 🍻
@@fulldivemedia 😄
First 10 seconds "It's not perfect", also Title "Perfect voice clone"
c'mon, you are smiling when the guy is talking in chinese. PD" i dont understand anything" LOL
Why don;t u ever reply back to me ? You too good for us ?
@@eccentricballad9039 I do the best I can to keep up. I never intentionally ignore anyone.
I spent a lot of time watching this video, but the explanation is really very difficult to understand
I doubt you have a 3090. They cost several 1000s of dollars.
I know a millionaire that lives in a simple rental house. Never judge someone by their looks.
Dude I cant get through your videos cause your dcreen is blinding me, youe in a dark room and you have it on Light mode, @15:57 you say you want people to enjoy these videos without some annoying sound, what is really annoying is that I have to find sunglasses to watch your videos, cause for some insane reason you like to sit in the dark and blind yourself with a bright ass white screen, its no wonder why you wear glasses and stil squint at the screen, your going blind... Turn the damn screen on dark mode, and welcome to the 2020s. Either adopt Dark mode or make these videos much shorterm like 3-4 mins instead of 20.. Think about this, nobody who uses light mode is going to turn away cause your using dark mode but I can guarantee Dark mode users will, I can proove it, soon as i hit send im clicking next video (Jesus Christ I looked up just to see what was the next video and I couldnt see my goddamn keyboard after,) Im just gonna have chatGPT summarize any of your videos if theres one Im interested in...
Too hard to make it work. I'm only interested in put together websites, that I only have to push a couple of buttons
lol thats why ppl charge folks like you $$. I take it this is open source/free?
REALLY: THE TIME YOU PUT IN IS NOT FREE. $$ IS NOTHING COMPARED TO THE WORK OTHER PEOPLE AND YOURSELF PUT IN: WHICH IS > THAN lIFE
Dreadful
It SUCKS!!! Not even close to being good. There are way better choices out there. tried it , hate it!
@@ChrisDallasDualped Well, I don’t agree that it sucks, because my tests worked great for such a short sample. Sorry you didn’t like your results, but others here have expressed that they like it. So results may vary.
I definitely stand by my opinion that it’s worth trying.
I'm tired of seeing hypes of elevenlabs rivals and then the result sounding nowhere close. The voice sounds so synthetic.
It's open source. It's pretty impressive for 10 seconds. It will most certainly get better. My opinion is that it is actually closer than many competitors, and it adds control that 11 labs does not. Absolutely worth giving my attention in my opinion.