F5-TTS! They DID IT! Perfect voice clone with Emotion with a 10-second sample!

Поділитися
Вставка
  • Опубліковано 22 лис 2024

КОМЕНТАРІ • 139

  • @dalecorne-new-mtv
    @dalecorne-new-mtv Місяць тому +19

    Thank you dude. You just made my day. I tried to use this through Pinokio, but my PC sucks and it took 25 minutes to generate a 4 word sentence. NOW, I can run it online and it took just seconds. I have a 4 second audio clip of my departed father and NOW I can finally make all the AI pictures I made of him, talk. I just made my first one and it is scary good.

    • @BobDoyleMedia
      @BobDoyleMedia  Місяць тому +2

      Sorry Pinokio didn't out for you as a solution. Yeah, the GPU definitely makes a difference.

    • @dalecorne-new-mtv
      @dalecorne-new-mtv Місяць тому

      @@BobDoyleMedia Pinokio does work for me using FaceFusion 3.0 tho. Still a little slow but it's tolerable

    • @bobbyboyderecords
      @bobbyboyderecords Місяць тому +1

      How does it work on mac?

    • @anshumankar3320
      @anshumankar3320 Місяць тому

      I don't have a nivida graphics card instead of an AMD Radeon. Will that work?

    • @Munim_Studio
      @Munim_Studio 28 днів тому

      How did you run it online?

  • @mitchrussianimmersionchann63
    @mitchrussianimmersionchann63 4 дні тому

    You seem like a down to earth guy. Thanks for this video and for explaining everything step by step :)

  • @havocthehobbit
    @havocthehobbit Місяць тому +3

    when these things start becoming full packages and not just tech demos or developer APIs then then so much is going to change . Packages with plugins , slicing tools , synth modulators , speed curves and things that let you link images & vid to export or create decision trees then the amount of writers who going to publish their own media productions is going to be huge .
    Big studios think we have run out of stories and keep regurgitating the same stuff with different skins but there are so many creative people out there with stories trapped in their heads and they just need the right tools to be able to tell them the way they want in their own style and language and just have them translate across cultures at scale . That doesnt even cover education where teachers are going to write up re-enactments of historic ,scientific events or mathematical scenarios , so that students can just watch videos as homework and understand why before teachers show them how to get the most value out of short classes and they'll be able to do it in their own language and translate to students native languages making this even better for poor developing nations to grow their education systems fast quickly.

  • @robertdouble559
    @robertdouble559 27 днів тому

    Really cool, thanks for the tip on Pinocchio, very smooth installation. I'll be playing with a bunch of other toys through Pinocchio now!

  • @johnzach2057
    @johnzach2057 Місяць тому +9

    The big question is if the community can improve it so it can reach its full potential. BTW I think the creators of the model said the cost wasn't astronomical.

  • @robertdouble559
    @robertdouble559 27 днів тому +1

    If you're taking a survey on GPUs, both models worked fine on my Gigabyte laptop with an RTX 3070 GPU.

  • @Osurankaya-AI
    @Osurankaya-AI 28 днів тому +1

    While the video itself was amazing, the last 10 seconds took me away! 😅 You're amazing, well done.

  • @AITLDR
    @AITLDR Місяць тому

    I might try this one! ;) Please keep entertaining and informing us in this kind of video contents.

  • @fulldivemedia
    @fulldivemedia Місяць тому

    I watched again, and both times I felt your pain about the loading, but made me laugh every time

  • @markgoodman2801
    @markgoodman2801 Місяць тому +2

    I loved the ending example, and seeing you smile and enjoy it! This is why I watch your stuff Bob! Please keep entertaining and informing us. In return, I always watch the full two ads without skipping.

  • @frame_play
    @frame_play 23 дні тому

    Thnks for Making it Simple and I also tried to use this through Pinokio, but my PC sucks and it took hours generate a sentence.

  • @christian_duru
    @christian_duru Місяць тому

    I'm excited about this info. Thanks for always sharing

  • @jamesvictor2182
    @jamesvictor2182 Місяць тому +1

    I was listening to you speak thinking you are the Matthew McConaghey of AI vids, and at that precise moment, your sample audio mentioned his name

    • @BobDoyleMedia
      @BobDoyleMedia  Місяць тому

      @@jamesvictor2182 Coooooooooool 😎

  • @yoranvandenbrink4790
    @yoranvandenbrink4790 Місяць тому +1

    In my opinion, the e2 model does the voice cloning more accurate.
    The F5 model sometimes gives results that doesn’t really sound like the voice.

  • @donaldramotsebe7051
    @donaldramotsebe7051 Місяць тому

    Me. I wonder if bob will cover. Yes. Yes he will thanks I’ve been using ttsopenai for two podcasts i started. I needed something that can clone my voice really well. Thanks a bunch as always

  • @blsemetan7232
    @blsemetan7232 Місяць тому

    I might try this one... like that it can talk in Mandarin, could be great for learning another language.

  • @LiFancier
    @LiFancier Місяць тому +1

    6:09 You can see the Whisper transcription if you click the "Terminal" tab in the Pinokio sidebar. From there you can copy it and paste it into the Reference Text field in the UI if you want to use the same text multiple times and want to skip the transcription step (faster).

    • @BobDoyleMedia
      @BobDoyleMedia  Місяць тому

      @@LiFancier thank you! Great tip.

    • @coolman-ms8
      @coolman-ms8 Місяць тому

      ​@@BobDoyleMediaI've experimented it even works with effects

  • @Kloud11Studios
    @Kloud11Studios 22 дні тому

    This is so fascinating. I will have to test it on my Mac. I’m not sure how accessible it is with the voiceover screen reader, but I will give it a whirl. I’ve been using 11 labs for a while, so this would be a really nice tool if this works properly. Do you know if you can use this on iPhone and android as well? It would be pretty cool if you could.

  • @WashweshnyW
    @WashweshnyW Місяць тому +1

    Wow, this is absolutely mind-blowing! The accuracy and emotion captured in just a 10-second sample are incredible. It’s amazing to see how far voice cloning technology has come-F5-TTS really nailed it! Can't wait to see what more is possible with this level of precision and emotion. Great job!

  • @holdthetruthhostage
    @holdthetruthhostage Місяць тому

    I think if you use EMaster with the outputs it will sound even better

  • @HMaxTube11
    @HMaxTube11 Місяць тому

    Bob, your voice and personality can carry the content easily without the (over-gained) music used in the intro. Great content🌟👏👍

  • @p_p
    @p_p Місяць тому +2

    LoL the Windows update progress bar, u made me spit my coffe 😂

    • @BobDoyleMedia
      @BobDoyleMedia  Місяць тому +1

      @@p_p I always appreciate when someone lets me know they caught some little thing like that. Sorry for the mess. 🤪

  • @RahhmiPoofs
    @RahhmiPoofs Місяць тому +5

    10 seconds in, "it's not perfect"
    title: "perfect"
    ...

    • @jr5296
      @jr5296 21 день тому

      Okay.

  • @armondtanz
    @armondtanz Місяць тому +1

    Bit confused with the ending?
    So if I have a 15 second clip of a narrator with just a basic voice can I add happiness & sadness to that???
    That would be great if u could do that from a generic clip.

  • @Jwoodill2112
    @Jwoodill2112 Місяць тому

    It also works with a 6gb card. I have an rtx3060 6gb in my laptop and it still works great and takes about 45 seconds usually for a generation. not the best timing, but totally worth the wait

    • @PeterNwawuba
      @PeterNwawuba 27 днів тому

      how long of words did you do to get this estimate. I'm just curious

  • @SAMEGAMAN
    @SAMEGAMAN 24 дні тому

    Does it only support English language?
    Thanks for the video for taking the time to produce it

    • @BobDoyleMedia
      @BobDoyleMedia  24 дні тому

      @@SAMEGAMAN right now it’s English and Chinese.

  • @daveowenmusic1749
    @daveowenmusic1749 Місяць тому +2

    Bob, would this be good for cloning a friend’s singing voice. My musical partner passed away a year ago from lung cancer, but I, and his widow, would still like to hear him “live” again through my productions. I have used software to animate him singing, but need tools for the voice. Any suggestions?

    • @missoats8731
      @missoats8731 Місяць тому +3

      I think there would be much better options for that, since singing is a lot more complex. If you have recordings of your friend where he sings, an easy way would be to look into RVC ( Retrieval-based Voice Conversion). If you don't know, there was a huge hype around that last year since people started making songs with the voices of big stars. There's a lot of platforms that use this where you can train your own model and can basically turn every singing recording into a recording with the voice of that model. You can also do this for free on your own machine, but I guess it's a bit harder to understand for a beginner and as always you got to have a capable computer. You can find "RVC" in Pinokio, the program Bob used in this video.
      But if you ask me, the absolute best option would be to use ACE Studio. It's basically a composing suite for AI voices and they added the option to train your own model for free lately. So you could basically use your friend's voice as an instrument that sings lyrics and it is highly flexible. Unfortunately, the software itself is not free, there's a monthly fee. Bob made a video about it lately, but he used his speaking voice as samples for the model which isn't optimal. I got much better (I would even say shockingly realistic) results with samples of me singing.
      But keep in mind that all of those solutions still aren't perfect and could easily lead to frustration in a case like yours. I wish you the best of luck and hope you find the right solution for you!

    • @daveowenmusic1749
      @daveowenmusic1749 Місяць тому

      @@missoats8731 Thanks for the very useful advice. I am well versed in audio recording and have lots of multitrack recordings of his voice that I guess could be used to train the models. What I don’t have is a lot of experience in the AI scene, but have been watching the emerging apps through UA-cam and the web. I will take your advice to heart and explore RVC and Ace Studio. Thanks again for the helpful guidance!

  • @videoeditoranimation1714
    @videoeditoranimation1714 Місяць тому +1

    Bob. Is it better than Fake You? I've been using Fake You, and I love it. Oh wow Bob. It turns out that F5-TTS is built in to fake u, and it's pretty cool. When did they add that?

  • @TomTheEnglishPicker
    @TomTheEnglishPicker Місяць тому

    Wonder if this could be used to bring a loved ones voice back to Life who have passed away. If you took an audio clip from seeing Old recording. Might be a bit uncanny, Valley though.

  • @benfrombc
    @benfrombc День тому

    This is great .

  • @yoramoment
    @yoramoment 22 дні тому

    Did they took out the "podcast" section?? As it is gone in last few days from Pinokio's rep!!

  • @joseparedes380
    @joseparedes380 Місяць тому

    Here we go again with another of your treasures. U DA MAN

    • @BobDoyleMedia
      @BobDoyleMedia  Місяць тому

      @@joseparedes380 thanks! Should be a fun one!

  • @bertsdad
    @bertsdad 22 дні тому

    FYI: The Pinokio full install can take a long time if your connection isn't fast, so, don't do it on a deadline and have something else to do to keep yourself busy.

  • @RaveMasterr
    @RaveMasterr Місяць тому +1

    So, this is what Suno and Udio is using? Or something similar. That's why they can reproduce the singer's voice with only small sample.

    • @py_man
      @py_man Місяць тому

      Yes, exactly. It seems like Suno and Udio are using a similar approach,

  • @timandmonica
    @timandmonica Місяць тому

    I've been waiting for a free tool to transcribe ebooks into audiobooks for me. This might have met the threshold I've been looking for! I have a 12GB 4070 Super, 64GB RAM, and a 20-core i7-14700. I'm extremely curious to see if this would take a month or a day. I really have no idea! Based on your experience, what would your total guess at extrapolating be for a 10-hour audiobook?

  • @blizado3675
    @blizado3675 26 днів тому

    Please use DeepL for this text translation, not google translate. XD
    Found it today and didn't had the time yet to try it myself, but it is indeed really good on voice cloning itself also the multiple tone feature is nice. I wonder if you can mix tones inside sentences.

  • @JieTie
    @JieTie 24 дні тому

    It would be rly cool if the sotfware had feature to change already recorded audio to sound like sample audio :) do you know any ai open source software that could do that? :)
    I know there is RVC but you have to have a train model first, and that model requries ~15min of audio.

  • @KierstenCrystalLillian
    @KierstenCrystalLillian Місяць тому +27

    clonemyvoice AI fixes this. Perfect voice clone with emotion.

    • @DanSpartan177
      @DanSpartan177 Місяць тому +1

      which one? i cant seem to find it

    • @bwheldale
      @bwheldale Місяць тому

      It's a comparison page of 'Top AI Voice Cloning Software in 2024' to pay for.

  • @SkyMaster-w4n
    @SkyMaster-w4n 20 днів тому

    Thanks for the video, much easyr to install via pinokio, great tool.
    My F5 is using CPU instead GPU, is taking too long to process the audio file.
    I have a 2060 rtx, but a not good processor, so each second of TTS is taking 1 minute to process.
    So a 10 second TTS take 10 minutes to produce. :/
    How do I configure the gpu to process the TTS instead? Or thats not possible?
    thanks!

  • @VaibhavShewale
    @VaibhavShewale 21 день тому +1

    ooh cool, only few seconds input to train

  • @heard3879
    @heard3879 Місяць тому

    So, how do you fix errors? Like, the AI voice chose to emphasize “sick” in the sentence that was intended to emphasize the word “out” (13:13).

  • @manumartinezkcxu
    @manumartinezkcxu Місяць тому +1

    Works on my windows laptop: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 2.42 GHz?

  • @taltevet820
    @taltevet820 Місяць тому +1

    My problem with it is that even if i put high quality recrodings (dataset) the results (soundwise) dosnt sound good . I meen the clone of the character is good but sound quality is bad like a phone call quality or somthing like that

  • @mwetzel0
    @mwetzel0 28 днів тому +1

    The longish pause between sentence fragments is weird for me. I don't notice the problem in your examples, or in examples I see in other videos about F5-TTS. It almost sounds [pause] like this. [pause] And there's little I can do, [pause] to adjust it.

    • @RAWAIRT
      @RAWAIRT 22 дні тому

      Thank you for your comment. I know what. You mean. 😂 Been running into that with other TTS AI. Might still give this one a go though.

  • @danielbanic3738
    @danielbanic3738 Місяць тому

    Thanks , this is the best voice clone by far. I just use the Hugging Face , which is all I really need. The F5-TSS is really good but I have to say the F2-TSS version is much better in emotion. I did try the add emotion feature to the F5-TSS as done in your video but it don;t seem to work , it shows (No Audio being produced) even though I uploaded a audio and it just produces the same audio. What am I'm doing wrong ?

  • @KorTheRealWhiteHorse
    @KorTheRealWhiteHorse 29 днів тому

    What if I don't have room on my C drive? Pinokio installs it there, and I cant change home directory because it will give me an error.

  • @PeterNwawuba
    @PeterNwawuba 27 днів тому

    Any idea on how well it runs on Apple silicon macs. was planning to get an m2 pro mac

  • @Kkidzz
    @Kkidzz Місяць тому

    The ‘Corner Over There’ is a half a mile down the road at the corner of the farm…..

  • @anewman1976
    @anewman1976 Місяць тому +1

    I have an Irish accent and I can never TTS websites right, I either sound American or English! 😁

    • @armondtanz
      @armondtanz Місяць тому

      Yea. I tried a couple of uk soccer pundits. The scottish and irish come out real bad. Shame about that.

  • @tomasbusse2410
    @tomasbusse2410 27 днів тому

    Love it

  • @bronxboys101
    @bronxboys101 Місяць тому

    I did try that TTS but for some reason the copy paste function is not working, you need to type manually 😢

  • @rightside8937
    @rightside8937 28 днів тому

    Apparently it doesn't work on Mac OS 13.5, I have an error message?

  • @RetifsAiStories
    @RetifsAiStories 13 днів тому

    Cool !😊

  • @LKSuperHitz
    @LKSuperHitz Місяць тому

    amazing...many many thanks....

  • @GFRxR
    @GFRxR 27 днів тому

    Can you tell me the best AI for lip syncing non human characters? Thanks

  • @SiliconSouthShow
    @SiliconSouthShow Місяць тому +1

    I been using it for days and it's better than most, but still not as good as like playhd or hi whatever it is, but... for a local its cool, i been playing with it local

    • @armondtanz
      @armondtanz Місяць тому

      I'm amazed 11 labs still hasn't got emotions? They been out for over a year on playht & revoicer???

  • @antonmarks2810
    @antonmarks2810 Місяць тому

    Hey Bob
    Can I load Pinokio on Mimic-PC? How would I do that?

  • @The_Muzix
    @The_Muzix Місяць тому

    Would this work for vocals like replay?

  • @micah_noel
    @micah_noel Місяць тому +4

    The final example, while very impressive, falls short of what I would consider “usable”, at least for my needs.
    I suffer from extreme camera/microphone anxiety. I believe I’m a decent writer and I’m not shy about most of the things I might do in front of a camera(playing guitar, woodworking…). But trying to speak and explain what I’m doing feels impossible. So the idea of being able to type things up and have a voice speak it for me sounds like a brilliant solution! But in many cases, it would need to sound like me and I don’t want friends clicking on my videos and saying “what the hell is this?” because it’s obviously not me or just doesn’t sound right.

  • @tufferstv
    @tufferstv Місяць тому

    It struggles with British accents, particually northern ones like mine. When I try to synthesize my own voice it seems to make me sound like I speak with RP. I've even tried accentuating my accent in the sample and it doesn't make any difference.

  • @cready2117
    @cready2117 26 днів тому

    Could you advise make tamil Text to speech with my own voice

  • @zeloguy
    @zeloguy Місяць тому

    The E2 sound like Office Space.

  • @thefreesoulchannel
    @thefreesoulchannel Місяць тому

    Can this be used on mimic computer?

  • @EpicStoryHaven-s9tsss
    @EpicStoryHaven-s9tsss Місяць тому +6

    Please use the dark theme.

  • @epicchannel4724
    @epicchannel4724 Місяць тому

    Is it available on any online services like MimicPC ?

    • @BobDoyleMedia
      @BobDoyleMedia  Місяць тому

      @@epicchannel4724 i’m hoping it comes to mimicPC. I’ll ask them about it.

  • @mwetzel0
    @mwetzel0 28 днів тому

    15:20 The emotion needs to be in curly brackets, actually. (Parentheses will not work.)

  • @thefreesoulchannel
    @thefreesoulchannel Місяць тому

    Does this work with vocals or only spoken language?

    • @FunnyProducer
      @FunnyProducer Місяць тому +1

      only spoken language

    • @epistemex10
      @epistemex10 Місяць тому

      You can very likely train a fine-tuning specifically for vocals and use that on top.

  • @damnned
    @damnned Місяць тому

    We want a tutorial about the last 5 seconds of the video😅

  • @davidinark
    @davidinark Місяць тому

    Definitely has come a long way but the longer it plays, the less human it sounds.

  • @ZedMagnet
    @ZedMagnet Місяць тому

    The Windoze progress bar is 100% not 100% when it sits at 100% for 100% of 3 plus minutes before getting to 100% finished.

  • @czesnikadam6355
    @czesnikadam6355 3 дні тому

    Is this software safe? I don't even expect it to be free, just, is it safe?

    • @BobDoyleMedia
      @BobDoyleMedia  3 дні тому

      @@czesnikadam6355 it is both free and safe. At least I haven’t had any problems with it.

  • @TR-ei1cy
    @TR-ei1cy Місяць тому +1

    Loads of artifacts

  • @g.s.3389
    @g.s.3389 Місяць тому

    i tried it, but the the quality is so and so....

  • @ssink1cn7
    @ssink1cn7 24 дні тому

    我的天啊,以后怎么分辨真人和AI😂

  • @arodg
    @arodg 16 днів тому

    I've been getting a lot of scam calls lately that don't seem to say anything. I think they're trying to get me to say 10 seconds worth of stuff. This stuff's amazing but it's very effective for scamming more than anything else

    • @BobDoyleMedia
      @BobDoyleMedia  16 днів тому

      I guess anything is possible. Of course the sample of you would sound like you're on the phone...but I guess that could certainly be misused. It's dicey technology for sure...but it's everywhere, and I don't see it going away.

  • @alphabeets
    @alphabeets 6 днів тому +2

    Why do people feel the need to edit almost every sentence in a video? I’m running out of breath listening. Leave some time to breath.

  • @fulldivemedia
    @fulldivemedia Місяць тому

    I would subscribe several times if I could, but you can find me and have cold beers together 🍻

  • @AndrewQPower
    @AndrewQPower 22 дні тому

    First 10 seconds "It's not perfect", also Title "Perfect voice clone"

  • @joseparedes380
    @joseparedes380 Місяць тому

    c'mon, you are smiling when the guy is talking in chinese. PD" i dont understand anything" LOL

  • @eccentricballad9039
    @eccentricballad9039 Місяць тому

    Why don;t u ever reply back to me ? You too good for us ?

    • @BobDoyleMedia
      @BobDoyleMedia  Місяць тому

      @@eccentricballad9039 I do the best I can to keep up. I never intentionally ignore anyone.

  • @judinalfatah
    @judinalfatah Місяць тому

    I spent a lot of time watching this video, but the explanation is really very difficult to understand

  • @Michael_Lak
    @Michael_Lak 15 днів тому

    I doubt you have a 3090. They cost several 1000s of dollars.

    • @ohsocialmio3884
      @ohsocialmio3884 12 днів тому

      I know a millionaire that lives in a simple rental house. Never judge someone by their looks.

  • @beatsbywoods8388
    @beatsbywoods8388 Місяць тому +4

    Dude I cant get through your videos cause your dcreen is blinding me, youe in a dark room and you have it on Light mode, @15:57 you say you want people to enjoy these videos without some annoying sound, what is really annoying is that I have to find sunglasses to watch your videos, cause for some insane reason you like to sit in the dark and blind yourself with a bright ass white screen, its no wonder why you wear glasses and stil squint at the screen, your going blind... Turn the damn screen on dark mode, and welcome to the 2020s. Either adopt Dark mode or make these videos much shorterm like 3-4 mins instead of 20.. Think about this, nobody who uses light mode is going to turn away cause your using dark mode but I can guarantee Dark mode users will, I can proove it, soon as i hit send im clicking next video (Jesus Christ I looked up just to see what was the next video and I couldnt see my goddamn keyboard after,) Im just gonna have chatGPT summarize any of your videos if theres one Im interested in...

  • @thebrinksf69
    @thebrinksf69 Місяць тому +2

    Too hard to make it work. I'm only interested in put together websites, that I only have to push a couple of buttons

    • @luismiguel69able
      @luismiguel69able Місяць тому +1

      lol thats why ppl charge folks like you $$. I take it this is open source/free?

    • @manumartinezkcxu
      @manumartinezkcxu Місяць тому

      REALLY: THE TIME YOU PUT IN IS NOT FREE. $$ IS NOTHING COMPARED TO THE WORK OTHER PEOPLE AND YOURSELF PUT IN: WHICH IS > THAN lIFE

  • @TR-ei1cy
    @TR-ei1cy Місяць тому +1

    Dreadful

  • @ChrisDallasDualped
    @ChrisDallasDualped 20 днів тому

    It SUCKS!!! Not even close to being good. There are way better choices out there. tried it , hate it!

    • @BobDoyleMedia
      @BobDoyleMedia  20 днів тому +1

      @@ChrisDallasDualped Well, I don’t agree that it sucks, because my tests worked great for such a short sample. Sorry you didn’t like your results, but others here have expressed that they like it. So results may vary.
      I definitely stand by my opinion that it’s worth trying.

  • @jeffbruno847
    @jeffbruno847 Місяць тому

    I'm tired of seeing hypes of elevenlabs rivals and then the result sounding nowhere close. The voice sounds so synthetic.

    • @BobDoyleMedia
      @BobDoyleMedia  Місяць тому +1

      It's open source. It's pretty impressive for 10 seconds. It will most certainly get better. My opinion is that it is actually closer than many competitors, and it adds control that 11 labs does not. Absolutely worth giving my attention in my opinion.