F5-TTS! They DID IT! Perfect voice clone with Emotion with a 10-second sample!

Bob Doyle Media

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 3 лют 2025

КОМЕНТАРІ • 208

@BobDoyleMedia Місяць тому ⁺³
🌀✨Hey! Quick favor... ✨🌀
If you found value in this video, a simple LIKE 👍 and hitting SUBSCRIBE 🛎 helps more than you know. It lets UA-cam know we’re onto something good here and helps others discover it too. ⚛
I’d love to hear what you think! 💬 Drop your thoughts, insights, or questions in the comments below. 👇
Your support means everything. Let’s keep exploring the creative uses of AI together!

#BobDoyleMedia #LikeAndSubscribe #YourSupportMatters #ThankYou
@ahom_ahom_ahom Місяць тому
Hey Bob. Great Channel. Thanks.
@dalecorne-new-mtv 3 місяці тому ⁺³⁶
Thank you dude. You just made my day. I tried to use this through Pinokio, but my PC sucks and it took 25 minutes to generate a 4 word sentence. NOW, I can run it online and it took just seconds. I have a 4 second audio clip of my departed father and NOW I can finally make all the AI pictures I made of him, talk. I just made my first one and it is scary good.
@BobDoyleMedia 3 місяці тому ⁺²
Sorry Pinokio didn't out for you as a solution. Yeah, the GPU definitely makes a difference.
@dalecorne-new-mtv 3 місяці тому
@@BobDoyleMedia Pinokio does work for me using FaceFusion 3.0 tho. Still a little slow but it's tolerable
@bobbyboyderecords 3 місяці тому ⁺¹
How does it work on mac?
@anshumankar3320 3 місяці тому
I don't have a nivida graphics card instead of an AMD Radeon. Will that work?
@Munim_Studio 3 місяці тому ⁺¹
How did you run it online?
@richermorin 2 місяці тому ⁺²
its cool to see how satisfied you look when you're listening to the results :)
@BobDoyleMedia Місяць тому
I truly do get excited aby how cool all this stuff is - even when it's not perfect yet.
@LostBoysGaming 12 днів тому
Had no idea about Pinokio, life saver, thank you!
@WashweshnyW 3 місяці тому ⁺²
Wow, this is absolutely mind-blowing! The accuracy and emotion captured in just a 10-second sample are incredible. It’s amazing to see how far voice cloning technology has come-F5-TTS really nailed it! Can't wait to see what more is possible with this level of precision and emotion. Great job!
@Rich2150x 2 місяці тому
Great demo! Me too, I love this stuff! You did a great job explaining AND showing us how it works, and how to get started. Thanks Bob!
@seansanderson Місяць тому ⁺²
I have a tape of my dad from when he used to do taped “letters” to his brother from abroad. I am totally going to resurect my dads voice.
@markgoodman2801 3 місяці тому ⁺²
I loved the ending example, and seeing you smile and enjoy it! This is why I watch your stuff Bob! Please keep entertaining and informing us. In return, I always watch the full two ads without skipping.
@BobDoyleMedia 3 місяці тому
Thanks so much!
@AITLDR 3 місяці тому
I might try this one! ;) Please keep entertaining and informing us in this kind of video contents.
@CaptainSnackbar 15 днів тому
thats really impressive, i'd love to try it out in other languagtes not all are supported, hopfully they will make that happen
@robertdouble559 3 місяці тому
Really cool, thanks for the tip on Pinocchio, very smooth installation. I'll be playing with a bunch of other toys through Pinocchio now!
@mitchrussianimmersionchann63 2 місяці тому
You seem like a down to earth guy. Thanks for this video and for explaining everything step by step :)
@Osurankaya-AI 3 місяці тому ⁺¹
While the video itself was amazing, the last 10 seconds took me away! 😅 You're amazing, well done.
@richermorin 2 місяці тому
its so cool to see how you make boring stuff fun to learn tks for that
@LiFancier 3 місяці тому ⁺³
6:09 You can see the Whisper transcription if you click the "Terminal" tab in the Pinokio sidebar. From there you can copy it and paste it into the Reference Text field in the UI if you want to use the same text multiple times and want to skip the transcription step (faster).
@BobDoyleMedia 3 місяці тому
@@LiFancier thank you! Great tip.
@coolman-ms8 3 місяці тому
@@BobDoyleMediaI've experimented it even works with effects
@zemoxian 2 місяці тому
Something about that last segment reminded me about the Good Place Janet emotionally pleading with you not to kill her then speaking normally reminding you that it’s just her self defense mechanism. It’s hilarious how it switches modes instantly instead of building up to the next emotional state.
It’s interesting watching the technology evolve in real time. Each incremental step towards more believable speech. It is remarkable how quickly it’s changing.
@fulldivemedia 3 місяці тому
I watched again, and both times I felt your pain about the loading, but made me laugh every time
@johnzach2057 3 місяці тому ⁺¹⁰
The big question is if the community can improve it so it can reach its full potential. BTW I think the creators of the model said the cost wasn't astronomical.
@christian_duru 3 місяці тому
I'm excited about this info. Thanks for always sharing
@BobDoyleMedia 3 місяці тому
My pleasure!
@tyler361t2 Місяць тому
@@BobDoyleMedia whast best RVC and SO-VITS-SVC or F5-TTS models
@yoranvandenbrink4790 3 місяці тому ⁺⁵
In my opinion, the e2 model does the voice cloning more accurate.
The F5 model sometimes gives results that doesn’t really sound like the voice.
@robertdouble559 3 місяці тому ⁺²
If you're taking a survey on GPUs, both models worked fine on my Gigabyte laptop with an RTX 3070 GPU.
@havocthehobbit 3 місяці тому ⁺⁴
when these things start becoming full packages and not just tech demos or developer APIs then then so much is going to change . Packages with plugins , slicing tools , synth modulators , speed curves and things that let you link images & vid to export or create decision trees then the amount of writers who going to publish their own media productions is going to be huge .
Big studios think we have run out of stories and keep regurgitating the same stuff with different skins but there are so many creative people out there with stories trapped in their heads and they just need the right tools to be able to tell them the way they want in their own style and language and just have them translate across cultures at scale . That doesnt even cover education where teachers are going to write up re-enactments of historic ,scientific events or mathematical scenarios , so that students can just watch videos as homework and understand why before teachers show them how to get the most value out of short classes and they'll be able to do it in their own language and translate to students native languages making this even better for poor developing nations to grow their education systems fast quickly.
@p_p 3 місяці тому ⁺²
LoL the Windows update progress bar, u made me spit my coffe 😂
@BobDoyleMedia 3 місяці тому ⁺¹
@@p_p I always appreciate when someone lets me know they caught some little thing like that. Sorry for the mess. 🤪
@joseparedes380 3 місяці тому
Here we go again with another of your treasures. U DA MAN
@BobDoyleMedia 3 місяці тому
@@joseparedes380 thanks! Should be a fun one!
@jamesvictor2182 3 місяці тому ⁺¹
I was listening to you speak thinking you are the Matthew McConaghey of AI vids, and at that precise moment, your sample audio mentioned his name
@BobDoyleMedia 3 місяці тому
@@jamesvictor2182 Coooooooooool 😎
@srikantdhondi Місяць тому ⁺¹
I caught up at image animations with lip sync, how have you made it? Please share turial about it. Lovely video, thoroughly entertained 😂
@sigmondroland Місяць тому
Amazing, I wish you had used better quality reference audio, so we could hear it's best quality
@halcyon__r3289 26 днів тому
the last speech cloning program that i tried took me hours to install, didn't work well (still shoutout to the devs) and took minutes to render.
@donaldramotsebe7051 3 місяці тому
Me. I wonder if bob will cover. Yes. Yes he will thanks I’ve been using ttsopenai for two podcasts i started. I needed something that can clone my voice really well. Thanks a bunch as always
@EMUromania Місяць тому
You're the man 👍
@holdthetruthhostage 3 місяці тому
I think if you use EMaster with the outputs it will sound even better
@DukeOfGumby Місяць тому
F5-TTS was the AI used for Tank Rogan’s voice
@Jwoodill2112 3 місяці тому
It also works with a 6gb card. I have an rtx3060 6gb in my laptop and it still works great and takes about 45 seconds usually for a generation. not the best timing, but totally worth the wait
@PeterNwawuba 3 місяці тому
how long of words did you do to get this estimate. I'm just curious
@timandmonica 3 місяці тому ⁺²
I've been waiting for a free tool to transcribe ebooks into audiobooks for me. This might have met the threshold I've been looking for! I have a 12GB 4070 Super, 64GB RAM, and a 20-core i7-14700. I'm extremely curious to see if this would take a month or a day. I really have no idea! Based on your experience, what would your total guess at extrapolating be for a 10-hour audiobook?
@HMaxTube11 3 місяці тому
Bob, your voice and personality can carry the content easily without the (over-gained) music used in the intro. Great content🌟👏👍
@bertsdad 3 місяці тому
FYI: The Pinokio full install can take a long time if your connection isn't fast, so, don't do it on a deadline and have something else to do to keep yourself busy.
@ahom_ahom_ahom Місяць тому ⁺¹
IPinokio!
@dr.glurak2462 2 дні тому
This is crazy. Eleven Labs only allowed you a few character impression slots, even if you paid. And here you can make as many as you want for free?!
@frame_play 3 місяці тому
Thnks for Making it Simple and I also tried to use this through Pinokio, but my PC sucks and it took hours generate a sentence.
@danave9396 Місяць тому
Do you know of any alternatives for German speakers?
I'm so glad I've found your channel ❤ Thank you for your many great videos
@blsemetan7232 3 місяці тому
I might try this one... like that it can talk in Mandarin, could be great for learning another language.
@scottdunlap4109 Місяць тому
Great video
@RDEnMinutosoficial 2 місяці тому
Pure gold
@benfrombc 2 місяці тому
This is great .
@Kloud11Studios 3 місяці тому
This is so fascinating. I will have to test it on my Mac. I’m not sure how accessible it is with the voiceover screen reader, but I will give it a whirl. I’ve been using 11 labs for a while, so this would be a really nice tool if this works properly. Do you know if you can use this on iPhone and android as well? It would be pretty cool if you could.
@LKSuperHitz 3 місяці тому
amazing...many many thanks....
@yoramoment 3 місяці тому ⁺¹
Did they took out the "podcast" section?? As it is gone in last few days from Pinokio's rep!!
@TheBlackOperations День тому
Ive been trying to find something like eleven labs where you can record your own voice speaking but apply it to a different voice. this is close but I still cant find anything with that feature. i have some ideas to get work arounds. but having to type the next is limiting vs being able to record my voice in all of its inflections and then apply that to a completely different voice.
@Kkidzz 3 місяці тому
The ‘Corner Over There’ is a half a mile down the road at the corner of the farm…..
@SiliconSouthShow 3 місяці тому ⁺¹
I been using it for days and it's better than most, but still not as good as like playhd or hi whatever it is, but... for a local its cool, i been playing with it local
@armondtanz 3 місяці тому
I'm amazed 11 labs still hasn't got emotions? They been out for over a year on playht & revoicer???
@DrCognitive 3 дні тому
It's pretty cool, but the problem is that it seems to need to create from scratch every time. So you can't clone your voice and then use it to read an ebook or something. That's what I'm interested in doing. Maybe not with my voice, but a natural language voice that I can convert ebooks to audio books with (for myself, not commercial use or anything).
@videoeditoranimation1714 3 місяці тому ⁺¹
Bob. Is it better than Fake You? I've been using Fake You, and I love it. Oh wow Bob. It turns out that F5-TTS is built in to fake u, and it's pretty cool. When did they add that?
@RahhmiPoofs 3 місяці тому ⁺⁶
10 seconds in, "it's not perfect"
title: "perfect"
...
@jr5296 3 місяці тому
Okay.
@blizado3675 3 місяці тому
Please use DeepL for this text translation, not google translate. XD
Found it today and didn't had the time yet to try it myself, but it is indeed really good on voice cloning itself also the multiple tone feature is nice. I wonder if you can mix tones inside sentences.
@daveowenmusic1749 3 місяці тому ⁺²
Bob, would this be good for cloning a friend’s singing voice. My musical partner passed away a year ago from lung cancer, but I, and his widow, would still like to hear him “live” again through my productions. I have used software to animate him singing, but need tools for the voice. Any suggestions?
@missoats8731 3 місяці тому ⁺³
I think there would be much better options for that, since singing is a lot more complex. If you have recordings of your friend where he sings, an easy way would be to look into RVC ( Retrieval-based Voice Conversion). If you don't know, there was a huge hype around that last year since people started making songs with the voices of big stars. There's a lot of platforms that use this where you can train your own model and can basically turn every singing recording into a recording with the voice of that model. You can also do this for free on your own machine, but I guess it's a bit harder to understand for a beginner and as always you got to have a capable computer. You can find "RVC" in Pinokio, the program Bob used in this video.
But if you ask me, the absolute best option would be to use ACE Studio. It's basically a composing suite for AI voices and they added the option to train your own model for free lately. So you could basically use your friend's voice as an instrument that sings lyrics and it is highly flexible. Unfortunately, the software itself is not free, there's a monthly fee. Bob made a video about it lately, but he used his speaking voice as samples for the model which isn't optimal. I got much better (I would even say shockingly realistic) results with samples of me singing.
But keep in mind that all of those solutions still aren't perfect and could easily lead to frustration in a case like yours. I wish you the best of luck and hope you find the right solution for you!
@daveowenmusic1749 3 місяці тому
@@missoats8731 Thanks for the very useful advice. I am well versed in audio recording and have lots of multitrack recordings of his voice that I guess could be used to train the models. What I don’t have is a lot of experience in the AI scene, but have been watching the emerging apps through UA-cam and the web. I will take your advice to heart and explore RVC and Ace Studio. Thanks again for the helpful guidance!
@SAMEGAMAN 3 місяці тому
Does it only support English language?
Thanks for the video for taking the time to produce it
@BobDoyleMedia 3 місяці тому
@@SAMEGAMAN right now it’s English and Chinese.
@anewman1976 3 місяці тому ⁺¹
I have an Irish accent and I can never TTS websites right, I either sound American or English! 😁
@armondtanz 3 місяці тому
Yea. I tried a couple of uk soccer pundits. The scottish and irish come out real bad. Shame about that.
@backpackerwebs 25 днів тому
I tried to install on Surface Pro 7, it freezed during the installation. Tried the online, it was very slow,
@RaveMasterr 3 місяці тому ⁺¹
So, this is what Suno and Udio is using? Or something similar. That's why they can reproduce the singer's voice with only small sample.
@py_man 3 місяці тому
Yes, exactly. It seems like Suno and Udio are using a similar approach,
@VaibhavShewale 3 місяці тому ⁺¹
ooh cool, only few seconds input to train
@BobDoyleMedia 3 місяці тому
Yes, I find it very impressive.
@tomasbusse2410 3 місяці тому
Love it
@RetifsAiStories 2 місяці тому
Cool !😊
@taltevet820 3 місяці тому ⁺²
My problem with it is that even if i put high quality recrodings (dataset) the results (soundwise) dosnt sound good . I meen the clone of the character is good but sound quality is bad like a phone call quality or somthing like that
@TomTheEnglishPicker 3 місяці тому
Wonder if this could be used to bring a loved ones voice back to Life who have passed away. If you took an audio clip from seeing Old recording. Might be a bit uncanny, Valley though.
@JieTie 3 місяці тому
It would be rly cool if the sotfware had feature to change already recorded audio to sound like sample audio :) do you know any ai open source software that could do that? :)
I know there is RVC but you have to have a train model first, and that model requries ~15min of audio.
@armondtanz 3 місяці тому ⁺¹
Bit confused with the ending?
So if I have a 15 second clip of a narrator with just a basic voice can I add happiness & sadness to that???
That would be great if u could do that from a generic clip.
@SkyMaster-w4n 3 місяці тому ⁺¹
Thanks for the video, much easyr to install via pinokio, great tool.
My F5 is using CPU instead GPU, is taking too long to process the audio file.
I have a 2060 rtx, but a not good processor, so each second of TTS is taking 1 minute to process.
So a 10 second TTS take 10 minutes to produce. :/
How do I configure the gpu to process the TTS instead? Or thats not possible?
thanks!
@Nerdolord Місяць тому ⁺¹
I'm having the same problem. I have an RTX 3060, but instead of using the GPU, the program keeps running on the CPU. I still don’t know how to fix it.
@CapaUno1322 2 місяці тому
Morning buddy! Can I install this on linux? It does say but if your familiar with these things, I have stable diffusion installed and so Python, PyTorch, GIT are all installed already....I'd rather have all my AI stuff installed on one OS as well as generally it's a mare on windows as I have an AMD card, thanks buddy, this is all very cool! Happy Friday! ;D
@micah_noel 3 місяці тому ⁺⁴
The final example, while very impressive, falls short of what I would consider “usable”, at least for my needs.
I suffer from extreme camera/microphone anxiety. I believe I’m a decent writer and I’m not shy about most of the things I might do in front of a camera(playing guitar, woodworking…). But trying to speak and explain what I’m doing feels impossible. So the idea of being able to type things up and have a voice speak it for me sounds like a brilliant solution! But in many cases, it would need to sound like me and I don’t want friends clicking on my videos and saying “what the hell is this?” because it’s obviously not me or just doesn’t sound right.
@V.Z.69 Місяць тому
Is there a GPU setting somewhere? It takes a freaking long time for a simple Multi-* small paragraph of 4 lines of emotion.
@heard3879 3 місяці тому
So, how do you fix errors? Like, the AI voice chose to emphasize “sick” in the sentence that was intended to emphasize the word “out” (13:13).
@mwetzel0 3 місяці тому ⁺¹
The longish pause between sentence fragments is weird for me. I don't notice the problem in your examples, or in examples I see in other videos about F5-TTS. It almost sounds [pause] like this. [pause] And there's little I can do, [pause] to adjust it.
@orbetymo 3 місяці тому
Thank you for your comment. I know what. You mean. 😂 Been running into that with other TTS AI. Might still give this one a go though.
@MREDZ 2 місяці тому
Hi, not sure why my install doesn't feature a 'podcast' tab. Could you perhaps shed some light as to why this is?
@danielbanic3738 3 місяці тому
Thanks , this is the best voice clone by far. I just use the Hugging Face , which is all I really need. The F5-TSS is really good but I have to say the F2-TSS version is much better in emotion. I did try the add emotion feature to the F5-TSS as done in your video but it don;t seem to work , it shows (No Audio being produced) even though I uploaded a audio and it just produces the same audio. What am I'm doing wrong ?
@as-ng5ln 2 місяці тому
Do you know why they removed that podcast feature?
@manumartinezkcxu 3 місяці тому ⁺¹
Works on my windows laptop: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 2.42 GHz?
@jonasjonasson5709 8 днів тому
Pinokio is very cool, without pinokio we would have to install all the git hub requirements manually.
@BobDoyleMedia 7 днів тому
Yeah, it's not without it's problems, but when you're just getting started, it can be really nice - unless it isn't. :D
@qingLiu-w2k Місяць тому
Why does a white screen appear after I click discover in pinokio?
@bronxboys101 3 місяці тому
I did try that TTS but for some reason the copy paste function is not working, you need to type manually 😢
@purplearmy. 14 днів тому
I get some robotic sounding audio? RTX 4080
@purplearmy. 14 днів тому
Jittery
@prabhat7728 13 днів тому
will 3060 12GB give good results or it's gonna choke like 2070 super? Pls reply
@arhamahmedkhan7060 2 місяці тому
Is there something which can do voice to voice from just a 20 sec audio sample??
@RUDataDriven Місяць тому
hi is there a way to make an API out of this? and what is the character count on inputs?
@PeterNwawuba 3 місяці тому
Any idea on how well it runs on Apple silicon macs. was planning to get an m2 pro mac
@kiranbastwad8178 Місяць тому
When I tried its using cpu for me instead of GUP. Any idea how to change this
@AshaHalake 22 дні тому
can i use it in window 10
@Yanduo888 Місяць тому ⁺¹
Why is there no sound in my audio output?
@BobDoyleMedia Місяць тому ⁺¹
@@Yanduo888 is there any visible indication that a waveform was generated?
@Yanduo888 Місяць тому
Can nvidia 1660 super vram 6gb run?
@Yanduo888 Місяць тому
@@BobDoyleMedia waveform not generated
@CerebricTech 29 днів тому
So this wont run with 4gb vram at all....not even sllow???
@mwetzel0 3 місяці тому
15:20 The emotion needs to be in curly brackets, actually. (Parentheses will not work.)
@rosariomittiga1926 2 дні тому
please help: I do exactly as you do with 10 seconds mp3 but I always get Error
@BobDoyleMedia 2 дні тому
@@rosariomittiga1926 are you running it on your local machine and what is the error? I don’t know that I’ll be able to help you but perhaps someone else can if they know more information.
@rosariomittiga1926 2 дні тому
@@BobDoyleMedia tank you for reply, all good now, one more questio: can you choose different languace, i'm try to use italian
@KOR_REAL1 3 місяці тому
What if I don't have room on my C drive? Pinokio installs it there, and I cant change home directory because it will give me an error.
@KierstenCrystalLillian 3 місяці тому ⁺²⁷
clonemyvoice AI fixes this. Perfect voice clone with emotion.
@DanSpartan177 3 місяці тому ⁺¹
which one? i cant seem to find it
@bwheldale 3 місяці тому
It's a comparison page of 'Top AI Voice Cloning Software in 2024' to pay for.
@zeloguy 3 місяці тому
The E2 sound like Office Space.
@rightside8937 3 місяці тому
Apparently it doesn't work on Mac OS 13.5, I have an error message?
@antonmarks2810 3 місяці тому
Hey Bob
Can I load Pinokio on Mimic-PC? How would I do that?
@HyperQuills03 2 місяці тому
will it work on RTX3050Laptop vram 4GB
@tufferstv 3 місяці тому
It struggles with British accents, particually northern ones like mine. When I try to synthesize my own voice it seems to make me sound like I speak with RP. I've even tried accentuating my accent in the sample and it doesn't make any difference.
@The_Muzix 3 місяці тому
Would this work for vocals like replay?
@damnned 3 місяці тому
We want a tutorial about the last 5 seconds of the video😅
@SaliuSodiq-rs4zh Місяць тому
Try MaskGCT or MaskGCT tts
@BobDoyleMedia Місяць тому
@@SaliuSodiq-rs4zh I’ll look into it! Haven’t heard of that.

Наступне

Автоматичне відтворення

F5 Text to Speech Tutorial | Hit "Refresh" on Your AI Voice!