Real-Time Voice Cloning Toolbox

Поділитися
Вставка
  • Опубліковано 22 жов 2024

КОМЕНТАРІ • 754

  • @gravitacion7412
    @gravitacion7412 5 років тому +856

    What a time to be alive !

    • @PokettoMusic
      @PokettoMusic 5 років тому +55

      I see what you did there

    • @subzakk4226
      @subzakk4226 5 років тому +2

      @@PokettoMusic I dont understand haha

    • @pglove
      @pglove 4 роки тому +28

      SubZakk check out the channel “two minute papers”, the creator’s catchphrase is “What a time to be alive!”. Probably my favourite channel on UA-cam. Great content in short presentations.

    • @subzakk4226
      @subzakk4226 4 роки тому +3

      @@pglove Ahh yeah I watch him and I just now realized he says that thank you

    • @AMSASH
      @AMSASH 4 роки тому +34

      Hold on to your papers

  • @HRHKingJamesIXofScotland
    @HRHKingJamesIXofScotland 3 роки тому +257

    I think this is a unique piece of software. this software, can potentially, give someone who has lost their physical voice, the powerful ability to speak fluently while assisted. you have done a remarkable thing. well done.

    • @uCanCallMeBob
      @uCanCallMeBob Рік тому +2

      pretty sure apple is doing this for ios 17

  • @CorentinJemine
    @CorentinJemine  4 роки тому +132

    A while back I changed the description of this video to invite whoever is interested in cloning a voice to not use my repo and to instead head over to resemble.ai. That came out as a sellout and that wasn't my intention. I've changed back the description.
    My initial intention with that message was to avoid having new people spending hours trying to setup the project ultimately to give up or to obtain subpar results.
    While resemble does offer a free plan that will let you clone your voice with more naturalness than this project will, purely for legal reasons we cannot allow you to clone the voice of someone else without a bit of legal work.
    This repo lets you do anything you want on that regard, so now I get why people want to use it. I've posted an update here: github.com/CorentinJ/Real-Time-Voice-Cloning/issues/364. I've also included a link in the readme to that installation guide that seems to work for most people.

    • @newmonengineering
      @newmonengineering 4 роки тому +3

      I would love to figure out to take what you have done and turn it into a parrot for my robot. I.e. Have the robot ask a few questions, record the answers, then use those answers to start talking back to the user in their OWN voice. (These would be scripted questions, answers) But I think that would be an awesome project. It's almost like terminator, the robot becomes the person!!

    • @shane7965
      @shane7965 3 роки тому

      @Robert Sapolsky It's a python project, so you will need to run it with a python interpreter.

    • @JohnSmith-fl1gs
      @JohnSmith-fl1gs 3 роки тому +3

      Does it work with non english langauge

    • @krash3842
      @krash3842 3 роки тому +1

      Does it work with different languages?

    • @ResembleAI
      @ResembleAI 3 роки тому +2

      Resemble AI seems like a cool company ;)

  • @PunxTV123
    @PunxTV123 3 роки тому +55

    any tutorial on how to this? from the start?

  • @sonnymastrangioli
    @sonnymastrangioli 5 років тому +375

    This would help greatly for something like making new voice lines for mods using the vanilla voice actors in Skyrim/Fallout.
    Bit of a legal and Intellectual Property grey area though when you use establishrd actors' voices through this as opposed to the generic voices (ie making new lines using the voices of Max Von Sydow, Joan Allen, Vladimir Kulich, Michael Hogan, etc for their respective characters when they never really recorded them, a machine did it via programming and learning)
    Scary when you look at it as it could theoretically bring dead actor's voices back to life with the right samples too. (Like doing another Godfather game without Marlon Brando voicing Vito Corleone)

    • @mikerhinos
      @mikerhinos 5 років тому +19

      Well if you're doing that in EU it's illegal because voice is a private data thus protected by the RGPD.
      (Yes I wanted to use celebrity audio samples too so I made a bit of researches on legal aspects lol).

    • @qeter129
      @qeter129 5 років тому +10

      A work around would be to replace all the original voices with completely synthetic ones that could be freely used by the community.
      And what if a company only ever uses synthetic voice models in the first place? Would they have exclusive rights to the model? What about a model built using embeddings from the company's model?

    • @HaloDude557
      @HaloDude557 5 років тому +3

      You would need a different network for each tone though (narrative, angry, interrogative, etc.)

    • @satibel
      @satibel 5 років тому +1

      @@mikerhinos but if you publish it (by playing in a movie) gdpr might not protect it.

    • @TehFlush
      @TehFlush 4 роки тому +1

      Never thought of that! Nice!

  • @nexus95
    @nexus95 5 років тому +123

    This is awesome and horrifying at the same time.

  • @EvaWebb
    @EvaWebb 3 роки тому +61

    Question: Can you use something like this to take a voice input, rather than a piece of text?
    That way, you can do things like preserving tone and inflection.
    It's probably a couple of additional steps, but I would love to have something like that.

    • @J4cobSkiJumping
      @J4cobSkiJumping 2 роки тому +3

      You mean by speaking to the mic? That would be scary tho, but also great (scary - for people to actually impersonate higher-up celebrities, etc pretending to be them live - would be more catfishes tho)

    • @swapnilmasurekar5431
      @swapnilmasurekar5431 2 роки тому +10

      Is any such toolkit available?

    • @lunnprod
      @lunnprod 2 роки тому +7

      would love this for my dnd characters

    • @mathaius69
      @mathaius69 2 роки тому +7

      did you ever find an answer?

    • @peppino3609
      @peppino3609 Рік тому

      ​@@J4cobSkiJumpingImagine like an horror game like that s

  • @PeterCooperUK
    @PeterCooperUK 5 років тому +19

    The first two voices are British English in the sample but gain a US accent in the synthesis (though they do admittedly sound like those same people, just putting on an American accent). I'm guessing that's due to the voices of the audiobooks used in some of the training? Very clever though, nice work!

  • @fzigunov
    @fzigunov 4 роки тому +120

    Plot twist: This video was narrated by the AI

  • @60FpsGoodness
    @60FpsGoodness 4 роки тому

    Can this be used to create a TTS voice for the PC already? Or do we need something else?

  • @JDRos
    @JDRos 4 роки тому +1

    Corentin Jemine , Can I use a single audio file and just straight up load it into the toolbox? How should I do it?

  • @MartinPiron
    @MartinPiron 5 років тому +55

    Wow ! Nice work, nice presentation, congrats !

  • @SingleServingMimic
    @SingleServingMimic 5 років тому +17

    Finally i can now complete my Conan the detective cosplay

  • @selinakyle7624
    @selinakyle7624 5 років тому +11

    Great work!!! I’m super excited to use this. I’ve been trying for the last couple of days...I don’t have a computer with a Nvidia GPU...Do you think something like the Jetson Nano would be able to run this?

  • @EpochIsEpic
    @EpochIsEpic 5 років тому +35

    This is really freaking cool! My friends are going to get freaked out on discord lol

  • @brytonmassie
    @brytonmassie 3 роки тому +13

    I think the best way to get the cloner to produce the tone or structure you want would to have it try to use your own voice's inflections and changes in tone. and use that as the guide/instructions as to how it should make the cloned voice speak and sound during speaking.

  • @Bloom_HD
    @Bloom_HD 4 роки тому +8

    Very interesting! Do you think it's possible to preform speech to speech synthesis? Keeping tone and pitch of voice of the input intact for the output? I want to be solid snake :O

  • @JediWebSurf
    @JediWebSurf 3 роки тому +1

    I can confirm this actually does work on windows 10. I followed the main branch of the github repo.

  • @immineal
    @immineal Рік тому +3

    Is there an open-source solution that doesn't focus on speed, but rather on quality and that you can clone on a large dataset?
    I can only find this as an open-source packet that does voice cloning...

  • @SanctusCrypta
    @SanctusCrypta 5 років тому +13

    This is amazing and frightening at the same time. Fantastic job!

  • @WigWoo1
    @WigWoo1 4 роки тому +3

    But can you have it work in realtime with a microphone instead of typing what I want to say? It's not realtime if I can't talk with my microphone and have it come out with that voice

    • @wushu7294
      @wushu7294 4 роки тому +1

      realtime in this context is the ability to type and let the algorithm generate speech. This is not an app for your phone, this is just an example of the algorithm which could be downloaded from github. This is meant to be for Machine Learning developers and students who use python.

    • @unknownuwu3890
      @unknownuwu3890 4 роки тому

      it can be if you use an nvidia tesla card but you still have to use a soundboard

    • @spider279
      @spider279 Рік тому

      already exists but it takes a little but long delay to perform

  • @virginboi4654
    @virginboi4654 5 років тому +10

    So basically you have already trained the models in English language and now using knowledge transfer for any voice type who is speaking in English to get the generated voices ?

  • @gueschmo
    @gueschmo 3 роки тому +8

    Hi, this is awesome. Congratulations. Do you think different languages can be trained by changing the language of the training set? I would like to give it a try on Spanish

  • @captainsaturnus7127
    @captainsaturnus7127 4 роки тому +6

    Hello!
    This is awesome tool!
    How can i adapt my own datasets for training? The program allows you to use only those listed in the training section.

  • @Grim2
    @Grim2 4 роки тому +81

    "1. You will get much better results with many more features through"
    No, we wouldn't. Because you want us to pay you 500$ per month to be able to clone voices other than our own. Why not simply limit the number of clips we can generate per day? The current model is just bloody nuts when you stated "It's a proof of concept. Many features are lacking and the quality is often poor".

    • @its.arjun.s
      @its.arjun.s 4 роки тому +8

      what are you talking about?

  • @Overthere_World
    @Overthere_World 2 роки тому +7

    Another question: Could you use something like this to get speech input in one language, and then use a piece of text in another language to get speech output of that language? That function will be more interesting.

  • @Jufiuno
    @Jufiuno 5 років тому +1

    How did you import the dataset into the toolbox? The browse button only allows searching for music.

  • @MrRolnicek
    @MrRolnicek 4 роки тому +10

    This seems pretty good.
    Although I don't much care about the voice synthesis. I want to try to use this as a realtime voicechanger. That's going to take a bit of work.

    • @HonorMacDonald
      @HonorMacDonald Рік тому +1

      I'm very interested in this functionality, to voice different characters in animations. Did you ever find a way to do it?

    • @MrRolnicek
      @MrRolnicek Рік тому +1

      @@HonorMacDonald Well this already IS a way to do it. Same way this is done can for sure be used but I didn't get around to making it. I'm not a programmer so it would be a lot of work for me to make it.

    • @OwenPrescott
      @OwenPrescott Рік тому

      synthesis is the important part, voice changer can be added later

    • @MrRolnicek
      @MrRolnicek Рік тому

      @@OwenPrescott No, synthesis is already a thing, I don't want to go that route. I wanted to do a realtime voicechanger that goes phoneme by phoneme.
      If you're doing it with synthesis and you're speaking a longer word, your synthesis based voicechanger has to wait for the whole word to be heard before it synthesizes it because otherwise it wouldn't know how to pronounce it.

  • @blokyk
    @blokyk 5 років тому +7

    This... This is amazing ! Thank you for creating that tool and for making it so accessible !

  • @pavan.reacts
    @pavan.reacts Рік тому +1

    This is great software and thanks for keeping it free. I was searching for voice cloning software to clone my granny's voice(she is not alive) so i can feel her close to me. Thanks again for such a wonderful tool. I will try it.

  • @p_p
    @p_p 3 роки тому

    hello! i'm going crazy trying to understand if is possible to use italian

  • @LaurentSparksMusic
    @LaurentSparksMusic 4 роки тому +1

    It's sad that you are saying this isn't worth my or your time, simply because you want to collect some dividend from resemble? I don't get it, something like resemble requires a specific set of words to be learned from, that's ridiculous if you are trying to teach an AI anything but your own voice.... this is still a fantastic resource if you have the patience to set it up, with a little modification it can learn its own voice and search the internet for say... idk maybe a podcast host that consistently has good quality audio, and over the years get better and better and its own speech, building on its possible minute changes in intination and pitch and speed. Idc what your description says, this is bloody brilliant man

  • @1hitkill973
    @1hitkill973 2 роки тому

    HOLYYYYY. I don't even know what to say. The possibilities are endless for this tech.

    • @edgartalamantes1584
      @edgartalamantes1584 2 роки тому

      Like what?

    • @1hitkill973
      @1hitkill973 2 роки тому

      You're one of the good guys, aren't you?

    • @edgartalamantes1584
      @edgartalamantes1584 2 роки тому

      @@1hitkill973 I'm honestly curious about this. I'll love to hear about practical uses of this technology.

  • @SneakiestChameleon
    @SneakiestChameleon 4 роки тому

    I think I'm close to opening the toolbox, but I'm getting syntax errors when trying to run the module for requirements.txt in Python 3.7.0 Shell.

    • @Hsaelt
      @Hsaelt 4 роки тому

      Then try installing each segment of requirements manually via py -m pip install (module name) without brackets

    • @SneakiestChameleon
      @SneakiestChameleon 4 роки тому

      @@Hsaelt Thanks for the advice my good man! After countless trial and errors I got it open.

    • @Hsaelt
      @Hsaelt 4 роки тому

      @@SneakiestChameleon sigh glad it worked for you ar least. I am still trying to make it work.

  • @dayday8421
    @dayday8421 4 роки тому +1

    Wow... just wow. I am in awe of your work!

  • @bigglyguy8429
    @bigglyguy8429 2 роки тому +2

    Is there a simple Windows installer I can run, cos all I see at GH is a lot of individual file thingies? Thanks

  • @kyleglowacki
    @kyleglowacki 5 місяців тому

    Super cool. If you use longer input samples do you get a cleaner output? Some of the output sounds like its breaking up or glitchy... maybe extra reverb in the sound. I'm not sure how to describe it.

  • @andres.aiassa
    @andres.aiassa 11 місяців тому

    hi thank you for share this one, just a question, I have some issues when I trying to execute the program on "demo_cly,py) demo_cli.py", line 80, in
    encoder.embed_utterance(np.zeros(encoder.sampling_rate))
    inference.py", line 144, in embed_utterance
    frames = audio.wav_to_mel_spectrogram(wav)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    \encoder\audio.py", line 58, in wav_to_mel_spectrogram , there are any upadtes for the program?

  • @dissonanceparadiddle
    @dissonanceparadiddle 4 роки тому +2

    This is really clever. It would be neat if you could have a speaker talk into it instead of text to speech that way you can capture emotion and inflection too but I get it that's hard to do

  • @daydemon6729
    @daydemon6729 4 роки тому +4

    I find this really awsome and i would like to try it out myself but i'm new to python and all this stuff, i know i'm probably just really dumb missing something that i shouldn't but is there any tutorial/guide that would be able to teach someone that never used any program like this before? just want to make sure that i'm not missing something that's so dumb it hasen't been described. i mean from opening the program to generating the tts

  • @bigboimarkus9644
    @bigboimarkus9644 5 років тому +3

    This will change my life

  • @martin518441
    @martin518441 2 роки тому +1

    I am using microphone to record , I always get stuck at "add 1 more points" and it doesnt make a projection , what am I doing wrong ?

  • @mattheww1072
    @mattheww1072 4 роки тому +1

    Hey there, Just a few quick questions. In order to get the AI better at speaking, would it be better to load more files into the "Use embedding from" section or should I synthesize over and over. I have a bunch of X's inside the toolbox output box but I'm not really sure what it means.
    I find that even with a short sentence the AI slurs and stops mid sentence.

    • @unixtreme
      @unixtreme 2 роки тому

      Same here, I noticed longer samples of the source voice were working better but still tons of slurring and crap.

  • @Гнег-у9ъ
    @Гнег-у9ъ 5 років тому +164

    I need to test this on GLaDOS

    • @saucepewpewpew1147
      @saucepewpewpew1147 5 років тому +7

      How did it go ?

    • @PokettoMusic
      @PokettoMusic 5 років тому +4

      yes pls tell how did it went

    • @anterprites
      @anterprites 5 років тому +2

      exactly the reason I am looking for this

    •  5 років тому +1

      did you get it to work?? i cant get it to work :(

    • @BahkaSheep
      @BahkaSheep 5 років тому +14

      @ tried to train it on glados but there is no way you'll get the robotic voice. it just sounds like a really bad human voice :^(

  • @CanutoIX
    @CanutoIX 5 років тому +1

    Omg! This is amazing! Very nice work,man!

  • @TheLobsterCode
    @TheLobsterCode Рік тому

    This Guy doesn't get enough credit. Every implementation of this technology should be paying you 10% at least.

  • @siksdenine2669
    @siksdenine2669 3 роки тому +2

    Perfect for indie game devs who want some voice acting. Not perfect but good enough

  • @12copablo
    @12copablo 4 роки тому +4

    Wow this is amazing work!
    I have a few questions: Does this system work in other languages? If it doesn't, how many hours of anotated speech do I need it to make it work?
    Also, is there a way to improve the quality of the audio output? (decrease the noise between phonemes?)
    Thanx a lot!

  • @nova9493
    @nova9493 5 років тому +7

    the fact that this piece of technology is hidden behind a wall of technobabble is the only thing barring me from making cicero skyrim recite insane clown posse lyrics. unfortunately for me and almost every other person on planet earth this is completely unuseable. what is code. i don't know. please help

    • @Hsaelt
      @Hsaelt 4 роки тому +2

      lmao to get something you need to sacrifice something no one's gonna give you everything ready to use

    • @nova9493
      @nova9493 4 роки тому

      ​@@Hsaelt im aware of that of course. but until someone starts selling a user friendly version of this i am going to have to sit here and stew in the fact that my own ignorance in this field is the ONE thing stopping me from making Commander Data say 'fuck'. sitting and shaking my fist at the sky. one day

  • @seedac7907
    @seedac7907 5 років тому +5

    Mission Impossible come true

  • @ClownXmachina
    @ClownXmachina 4 роки тому +1

    How do I get different vocoders in the list ? Will selecting a different one have better intonation ?

  • @WalterSautter
    @WalterSautter 4 роки тому +5

    Go here and following the instructs EXACTLY to successfully install the program! poorlydocumented.com/2019/11/installing-corentinjs-real-time-voice-cloning-project-on-windows-10-from-scratch/

  • @Enigmo1
    @Enigmo1 4 роки тому +1

    does using more input for training result in higher quality output?

  • @Theanine3D
    @Theanine3D Рік тому +1

    Would it be possible to use this for other languages besides English?

  • @4thDeadlySin
    @4thDeadlySin 4 роки тому +2

    I run a lot of DnD and other tabletop games, I have voice augmenting software with notes like "Mountain Troll, "Squirrel Folk", etc, so I can change my voice in real-time while I speak for NPCs that interact with characters. With all these embeddings, spectrograms, etc. Would I be able to use X amount of audio of Morgan Freeman (Audio A), then record my own voice saying the exact same thing at the same pace as the Morgan Freeman voice file (Audio B) and it would be able to log all the differences between them so that when I talk in real-time, it applies all the changes so my voice of Audio A comes out sounding like Audio B?

    • @BlackStarEOP
      @BlackStarEOP 2 роки тому

      I'm not sure if that's possible right now. But it will. Crazy things will happen in the next two decades.

    • @4thDeadlySin
      @4thDeadlySin 2 роки тому +1

      @@BlackStarEOP 2 years and no practical progress, feels bad XP I'm sure a lot of progress has actually been made, just it will take A LOT more even :P

  • @masinicuandrei5985
    @masinicuandrei5985 2 роки тому +2

    Hello,
    If you are not a coder and have now clue on how to work with Pytorch, how can you use these tools? I`m a video editor, I want to rebuild audio for dropped connections on streams.

    • @masinicuandrei5985
      @masinicuandrei5985 2 роки тому +1

      Is there a step by step tutorial on how to get into the interface?

  • @tristanzh3213
    @tristanzh3213 2 роки тому

    wow it can be used on real-time translation with the speaker's own voice. thats cool

  • @camelCased
    @camelCased 4 роки тому +1

    Wondering, what would be the best way to clone voice timbre without TTS? I've tried Real-Time-Voice-Cloning but it seems to generate only TTS text to the target voice and has no emotions whatsoever. I would like to record my own phrase with my voice, and then encode it as if spoken by the target person's voice and keep my original emotions and inflections.

    • @camelCased
      @camelCased 2 роки тому +1

      @Vegan Pete But will it be able to apply those emotions at the exact places where I need them, and not by some "typical behavior"?
      An example. I want to have a sentence that sounds authoritative and patronizing and puts accents on specific words. Let's say, I have a large voice library of someone who has been reading different styles - patronizing, normal, depressed etc.
      I doubt Descript will somehow automagically know which voice style to pick for which sentence, and which words to accentuate.
      If there was a system that could pick up the emotions and accents for specific text I'm recording, and then apply them directly onto a TTS engine voice, this would make indie game development so much easier - you wouldn't need voice actors to record your phrases, you could record them yourself and then run through the hypothetical "voice changing engine".

  • @enesitsme
    @enesitsme 4 роки тому +3

    That's incredible. What about different language accents?

  • @greenway3d394
    @greenway3d394 2 роки тому +1

    How do you import data sets like you have in the drop-down menu? Mine says "Random".

  • @maddenfootballtalk6544
    @maddenfootballtalk6544 3 роки тому

    It still sounds a bit robotic. You definitely can hear a small difference but whoa I love it

  • @DrWho2008t101
    @DrWho2008t101 4 роки тому

    Librosa will be unable to open mp3 files if additional software is not installed.
    Please install ffmpeg or add the '--no_mp3_support' option to proceed without support for mp3 files.
    the got the above message when run this:python demo_toolbox.py
    I had already installed ffmpeg.

  • @alexandernevsky5055
    @alexandernevsky5055 5 років тому +35

    Why not start with a demonstration of how to install and run?

    • @travismailsa1
      @travismailsa1 5 років тому

      I know, im trying to work that out now

    • @WomboBraker
      @WomboBraker 5 років тому

      :D

    • @WomboBraker
      @WomboBraker 5 років тому +1

      use google man

    • @WomboBraker
      @WomboBraker 5 років тому +2

      or you could just follow the guide on the github page. just dont be lazy

    • @Extil2
      @Extil2 4 роки тому +4

      @@WomboBraker followed the instructions on github, still having problems to run this thing

  • @vtcabbit
    @vtcabbit 4 роки тому +2

    isn't there a way to do stuff like this without having to "schedule a demo" and all of this extra, unneeded bullshit? literally almost all natural voice generation/speech websites have the same infuriating methods when it comes to just trying it out... i just wanna generate voices for non-profit, fun use. is that too much to ask for?

  • @Sponta_
    @Sponta_ 10 місяців тому

    I can't execute pip install -r requirements.txt
    where i am supposed to find that txt file?

  • @CES-x5t
    @CES-x5t 10 місяців тому

    wonder what gpu did you use? its running pretty fast or you make the learning video shot fast?

  • @ClownXmachina
    @ClownXmachina 4 роки тому +3

    DID IT !!!!!! Had errors so hours of troubleshooting. Use anaconda with a virtual environment theres a command for getting cuda, and others all working from the virtual env. !

    • @evolvingevrday
      @evolvingevrday Рік тому

      Where did you get the software? I'm still having trouble finding it. Thank you

  • @dhanielbawias9933
    @dhanielbawias9933 Рік тому

    nice bro thank you. btw i have questions : what method or algorithm do you use bro????

  • @LBLC49
    @LBLC49 5 років тому +4

    That's amazing.... GENIUS!!!

  • @Katerinahill
    @Katerinahill Рік тому

    Thanks For The Cloned Card My Dogg *DAWGGOAT* I Almost Went Crazy Cuz Of Yo Card😂 When Atm Approved The Card And Start Throwing Da Amount I Click On Out.. You Just Got Yourself Another Long Term Business Customer

  • @F.GRIGORIE
    @F.GRIGORIE 4 роки тому +5

    How do i open the toolbox? i DONT get it... the video starts already with everything open... no tutorial... nothing to help you know what to do

  • @csomi35
    @csomi35 2 роки тому

    Which languages have been supported? I didn't find any info about it. I would like to play with hungarian language and text.

  • @CydoniaPhysGeekGirl
    @CydoniaPhysGeekGirl 3 роки тому +2

    Amazing. This could be really useful for people with Lou Gehrig's disease etc if the model could be trained or 'banked' before serious symptoms appear.
    I wonder if anyone has already done any work in that area/that regard.

    • @luisleal4169
      @luisleal4169 2 роки тому +2

      I arrived here exactly with that in mind, probably will be running some experiments soon

  • @jimbarrofficial
    @jimbarrofficial 4 роки тому +1

    Running this tool is cool, but haphazard.. I get a lot of long pauses, "breathing", lost audio, garbled audio. I have good samples. Do the samples HAVE to conform to 5 seconds, or can you use longer ones? Again, this works, but I want to make the clones sound more natural. If anyone has some tips on improving playback, please post them here? Looking for things to do during this unending layover at home due to C-19.

    • @sunny-makes-stuff
      @sunny-makes-stuff 4 роки тому +1

      I'm having the same issue as you -- long pauses, missing words, garbled audio, and that weird "blowing on the microphone" noise. Haven't been able to find a fix yet.

  • @Poxy2
    @Poxy2 Рік тому

    the fact that this has already existed 4 years ago

  • @omairahmad4155
    @omairahmad4155 Рік тому +1

    Hello! whenever I try to import audio file (Urdu language ) in sv2tts so i get this error (can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool).
    And I already convert my audio file datatype for float, int, complex but still not working. I am very disappointed. Please help me as soon as possible. @Corentin Jemine

  • @tlatosmd
    @tlatosmd 5 років тому +5

    This is so awesome that you're offering this as open source! There'd be only two requests I'd have:
    a.) What about adding support for other languages? And I don't mean the buttons or menus, specifically I mean sampling German voices and generating audio from German text.
    b.) I know this will be even harder to do, but what about going beyond a simple, robotic TTS approach to a design of *REPLACING* a voice in an existing recording with another, while it's still saying the same words, in the same mood, timing, acting manners, etc. That would would make for much more natural sounding results. I'd think of the sampling still working a lot like, then giving the program an audio file where it's supposed to replace the original voice with the voice you've just cloned. You may even keep the typable text feature in order to help the program understand the voice to be cloned and its words better, as well as the words in the recording that it will then have to change to the sampled voice.

    • @tlatosmd
      @tlatosmd 5 років тому +1

      Here's a few guys with open-source papers working on making the leap from TTS to voice replacement, in case that helps you guys: en.wikipedia.org/wiki/WaveNet (see footnotes for the papers).

    • @tlatosmd
      @tlatosmd 5 років тому +2

      They're also working on highly-efficient automatic speech-to-text recognition: proceedings.mlr.press/v32/graves14.pdf This can come in handy when you require your speech recordings also as written text, as your Github is saying that you do for training the program.

    • @peppino3609
      @peppino3609 Рік тому

      ​@@tlatosmdso basicaly you talk and you have Someone else voicw

  • @ashupednekar6989
    @ashupednekar6989 5 років тому +4

    how long does it take to synthesize?

  • @rockethero1177
    @rockethero1177 4 роки тому +2

    How do I even get started? github download has no app to launch or anything. How do I get to that program?

  • @seres-de-luz
    @seres-de-luz 3 роки тому

    one video, 1.62K subs, you're a legend

  • @bluecorp8557
    @bluecorp8557 2 роки тому

    How do you actually set it up tho? I downloaded the "Real-time voice cloning master" and it just gave me a bunch of files but no application.

  • @DEEPANJANBISWAS
    @DEEPANJANBISWAS 5 років тому +4

    Hello Sir. Tried to install your tool from your github link. But facing some issue with pytorch and tensorflow-gpu. I even tried to modify the requirements.txt but to no avail. Can you please post a video for the installation of this great tool. Thanks in advance!

    • @DEEPANJANBISWAS
      @DEEPANJANBISWAS 5 років тому

      @Chris Connelly not yet

    • @hariprasath9222
      @hariprasath9222 5 років тому

      hey !!!!
      did you install it in terminal / cmd ,
      see this is an project which is done on PyCharm, and that's the project file

    • @DEEPANJANBISWAS
      @DEEPANJANBISWAS 5 років тому

      @@hariprasath9222 thanks Hari for the pointer.

    • @jimbarrofficial
      @jimbarrofficial 4 роки тому +2

      Issues with TensorFlow usually relate to the CUDA driver. You MUST have the CUDA toolkit installed with a driver at 9.0 or higher. I ran into this while trying to use an old supposedly CUDA-enabled video card installed, but this failed. I then switched to a Dell laptop running native Ubuntu (no VirtualBox VMs, must be native) and got it work pretty easily once I figured out all the tools that needed to be downloaded. Also, don't edit requirements txt file, there is no need. In addition, you don't need the audio sample library, as this is 5.6 GB of wasted HD space. Just make sure your WAV files are of good quality with little to no noise in the background. I used TV news people and it worked pretty well.

  • @MidnightPodcastGaming
    @MidnightPodcastGaming День тому +1

    I would love to get this on my PC. Sadly, we all get lost after reading Number 1.

  • @luisleal4169
    @luisleal4169 2 роки тому

    Can the model be used as standalone and/or integrated to a custom python program instead of your GUI? (I'm an ML engineer so I'm confortable with retraining if needed)

  • @therealquade
    @therealquade 2 роки тому +2

    yeah uh, is there a way to get a pre-compiled executable version of this that doesn't require me to manually install and run python and know how to use python?

    • @markgiroux3442
      @markgiroux3442 8 місяців тому

      Do you want easy, or do you want free? There's plenty of paid services out there if you don't want to put in the minimal work to get this set up

    • @therealquade
      @therealquade 8 місяців тому

      @@markgiroux3442 all the paid services are web based AFAIK, i want a desktop executable, which is, i assume, how python *would* be working once set up, its just the set up part that i cant seem to do correctly

  • @grahamsutherland1106
    @grahamsutherland1106 5 років тому +9

    0:22 Bill Bailey!

  • @Enoch_The_Gent
    @Enoch_The_Gent 4 роки тому +7

    can you do a tutorial for infusing this technology into a voice Chatbot?

  • @Anime4nik
    @Anime4nik 3 роки тому

    Where find this vocals, that you use for example

  • @Zekno
    @Zekno 4 роки тому +1

    File “C:\Users
    ame\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\__init__.py”, line 81, in from torch._C import *ImportError: DLL load failed: The operating system cannot run %1.
    this error comes when running python demo_toolbox.py, what can i do to fix this?

    • @orenong
      @orenong 4 роки тому

      I see so many people getting this error, I got another error, did you try asking on their GitHub?

  • @ralvarezb78
    @ralvarezb78 4 роки тому

    And that's how Terminator discovered that John Connor's adoptive parents were killed by T-1000

  • @pouringpoipoi5777
    @pouringpoipoi5777 4 роки тому +1

    do you have standalone ready to run .exe? i dont really know how to use python :(

  • @KatouMegumiosu
    @KatouMegumiosu 5 років тому +36

    Can voices in other languages be used too?

    • @NoName-br8pb
      @NoName-br8pb 5 років тому +2

      I think you are waiting for Japanese?)

    • @KatouMegumiosu
      @KatouMegumiosu 5 років тому +13

      @@NoName-br8pb maybe? ;) But Russian Shrek dub was amazing.
      ЧТО ТЫ ДЕЛАЕШЬ В МОЁМ БОЛОТЕ!?

    • @NoName-br8pb
      @NoName-br8pb 5 років тому

      @@KatouMegumiosu lol

    • @rob99201
      @rob99201 5 років тому +1

      Doesn't look like could be done easily - but can be done. See this thread: github.com/CorentinJ/Real-Time-Voice-Cloning/issues/30#issuecomment-507864097

    • @kochankapusti824
      @kochankapusti824 5 років тому

      @@KatouMegumiosu привет, а можешь, пожалуйста, подсказать, как запустить программу?) Я туплю дико. Скачал zip с гита, но не догоняю, где exe)).

  • @sunnowo
    @sunnowo 4 роки тому

    I'mma feed this thing so many death grips vocals

  • @eclipse3828
    @eclipse3828 Місяць тому +1

    The fact is that the AI ​​tools that exist today existed before

  • @clover_network
    @clover_network 4 роки тому

    How can I add that librispeech repo to the program???

  • @NithinJune
    @NithinJune 4 роки тому +1

    3:50 Is it just me or does that sound like Salman Khan from Khan acadamy????
    our hero

  • @Moneyaddthenmultiply
    @Moneyaddthenmultiply 3 роки тому

    this is giving me ‘Miyazaki watching the zombie animation’ vibes

  • @quicktips3858
    @quicktips3858 4 роки тому

    I will use this to clone my grandpa voices, because i miss him so much.

    • @apol8245
      @apol8245 4 роки тому +2

      I lost mine too, not alone dude

  • @RuinDweller
    @RuinDweller 2 роки тому +3

    Thank you! This finally works, but "Dataset", "Speaker", and "Utterance" are all greyed out. also, I have no option for "pretrained" under any sub-heading. I have "Encoder" under encoder, "Synthesizer" under synthesizer, and "Vocoder" and "Griffin-Lim" under vocoder. Can you help me sort that out please? I have downloaded LibriSpeech and unpacked it to a folder within the root directory, but I have no option to select it anywhere that I can see.
    UPDATE: 2 things, I needed to pass this argument: [python demo_toolbox.py -d ], and then I had to remember that the GZ file had to be unzipped, and THEN the resulting TAR file had to be unzipped. 😕...But now my noob ass has actually unlocked this thing, and it's working - even the dataset voices!
    I STILL have no option for "Pretrained" anywhere, so I don't know why others have that, but that's pretty much the last thing I've seen that I'm lacking.

    • @Cookiekeks
      @Cookiekeks 2 роки тому +1

      Thanks, I've been looking for this solution

    • @michaelsmith4904
      @michaelsmith4904 2 роки тому

      Did you ever figure out the pretrained or other vocoder options?

    • @RuinDweller
      @RuinDweller 2 роки тому

      @@michaelsmith4904 Sadly, no. :(

    • @fischy0339
      @fischy0339 Рік тому

      how exactly did you get them working now?

    • @RuinDweller
      @RuinDweller Рік тому

      @@fischy0339 That bit of coding that's within the "[ ]" in my response is the bit that you need to copy/paste into a command prompt, and hit "enter". That passes an argument that allows the program to access those functions. You also need to make sure that the GZ is unzipped to a TAR file, and then that has to be unzipped also. It has now been a while since I did this myself, so hopefully it will help you!

  • @albatroshd7945
    @albatroshd7945 8 місяців тому

    I'll read you're thesis in my free time!