SubZakk check out the channel “two minute papers”, the creator’s catchphrase is “What a time to be alive!”. Probably my favourite channel on UA-cam. Great content in short presentations.
I think this is a unique piece of software. this software, can potentially, give someone who has lost their physical voice, the powerful ability to speak fluently while assisted. you have done a remarkable thing. well done.
A while back I changed the description of this video to invite whoever is interested in cloning a voice to not use my repo and to instead head over to resemble.ai. That came out as a sellout and that wasn't my intention. I've changed back the description. My initial intention with that message was to avoid having new people spending hours trying to setup the project ultimately to give up or to obtain subpar results. While resemble does offer a free plan that will let you clone your voice with more naturalness than this project will, purely for legal reasons we cannot allow you to clone the voice of someone else without a bit of legal work. This repo lets you do anything you want on that regard, so now I get why people want to use it. I've posted an update here: github.com/CorentinJ/Real-Time-Voice-Cloning/issues/364. I've also included a link in the readme to that installation guide that seems to work for most people.
I would love to figure out to take what you have done and turn it into a parrot for my robot. I.e. Have the robot ask a few questions, record the answers, then use those answers to start talking back to the user in their OWN voice. (These would be scripted questions, answers) But I think that would be an awesome project. It's almost like terminator, the robot becomes the person!!
This would help greatly for something like making new voice lines for mods using the vanilla voice actors in Skyrim/Fallout. Bit of a legal and Intellectual Property grey area though when you use establishrd actors' voices through this as opposed to the generic voices (ie making new lines using the voices of Max Von Sydow, Joan Allen, Vladimir Kulich, Michael Hogan, etc for their respective characters when they never really recorded them, a machine did it via programming and learning) Scary when you look at it as it could theoretically bring dead actor's voices back to life with the right samples too. (Like doing another Godfather game without Marlon Brando voicing Vito Corleone)
Well if you're doing that in EU it's illegal because voice is a private data thus protected by the RGPD. (Yes I wanted to use celebrity audio samples too so I made a bit of researches on legal aspects lol).
A work around would be to replace all the original voices with completely synthetic ones that could be freely used by the community. And what if a company only ever uses synthetic voice models in the first place? Would they have exclusive rights to the model? What about a model built using embeddings from the company's model?
Question: Can you use something like this to take a voice input, rather than a piece of text? That way, you can do things like preserving tone and inflection. It's probably a couple of additional steps, but I would love to have something like that.
You mean by speaking to the mic? That would be scary tho, but also great (scary - for people to actually impersonate higher-up celebrities, etc pretending to be them live - would be more catfishes tho)
The first two voices are British English in the sample but gain a US accent in the synthesis (though they do admittedly sound like those same people, just putting on an American accent). I'm guessing that's due to the voices of the audiobooks used in some of the training? Very clever though, nice work!
Great work!!! I’m super excited to use this. I’ve been trying for the last couple of days...I don’t have a computer with a Nvidia GPU...Do you think something like the Jetson Nano would be able to run this?
I think the best way to get the cloner to produce the tone or structure you want would to have it try to use your own voice's inflections and changes in tone. and use that as the guide/instructions as to how it should make the cloned voice speak and sound during speaking.
Very interesting! Do you think it's possible to preform speech to speech synthesis? Keeping tone and pitch of voice of the input intact for the output? I want to be solid snake :O
Is there an open-source solution that doesn't focus on speed, but rather on quality and that you can clone on a large dataset? I can only find this as an open-source packet that does voice cloning...
But can you have it work in realtime with a microphone instead of typing what I want to say? It's not realtime if I can't talk with my microphone and have it come out with that voice
realtime in this context is the ability to type and let the algorithm generate speech. This is not an app for your phone, this is just an example of the algorithm which could be downloaded from github. This is meant to be for Machine Learning developers and students who use python.
So basically you have already trained the models in English language and now using knowledge transfer for any voice type who is speaking in English to get the generated voices ?
Hi, this is awesome. Congratulations. Do you think different languages can be trained by changing the language of the training set? I would like to give it a try on Spanish
"1. You will get much better results with many more features through" No, we wouldn't. Because you want us to pay you 500$ per month to be able to clone voices other than our own. Why not simply limit the number of clips we can generate per day? The current model is just bloody nuts when you stated "It's a proof of concept. Many features are lacking and the quality is often poor".
Another question: Could you use something like this to get speech input in one language, and then use a piece of text in another language to get speech output of that language? That function will be more interesting.
This seems pretty good. Although I don't much care about the voice synthesis. I want to try to use this as a realtime voicechanger. That's going to take a bit of work.
@@HonorMacDonald Well this already IS a way to do it. Same way this is done can for sure be used but I didn't get around to making it. I'm not a programmer so it would be a lot of work for me to make it.
@@OwenPrescott No, synthesis is already a thing, I don't want to go that route. I wanted to do a realtime voicechanger that goes phoneme by phoneme. If you're doing it with synthesis and you're speaking a longer word, your synthesis based voicechanger has to wait for the whole word to be heard before it synthesizes it because otherwise it wouldn't know how to pronounce it.
This is great software and thanks for keeping it free. I was searching for voice cloning software to clone my granny's voice(she is not alive) so i can feel her close to me. Thanks again for such a wonderful tool. I will try it.
It's sad that you are saying this isn't worth my or your time, simply because you want to collect some dividend from resemble? I don't get it, something like resemble requires a specific set of words to be learned from, that's ridiculous if you are trying to teach an AI anything but your own voice.... this is still a fantastic resource if you have the patience to set it up, with a little modification it can learn its own voice and search the internet for say... idk maybe a podcast host that consistently has good quality audio, and over the years get better and better and its own speech, building on its possible minute changes in intination and pitch and speed. Idc what your description says, this is bloody brilliant man
Super cool. If you use longer input samples do you get a cleaner output? Some of the output sounds like its breaking up or glitchy... maybe extra reverb in the sound. I'm not sure how to describe it.
hi thank you for share this one, just a question, I have some issues when I trying to execute the program on "demo_cly,py) demo_cli.py", line 80, in encoder.embed_utterance(np.zeros(encoder.sampling_rate)) inference.py", line 144, in embed_utterance frames = audio.wav_to_mel_spectrogram(wav) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \encoder\audio.py", line 58, in wav_to_mel_spectrogram , there are any upadtes for the program?
This is really clever. It would be neat if you could have a speaker talk into it instead of text to speech that way you can capture emotion and inflection too but I get it that's hard to do
I find this really awsome and i would like to try it out myself but i'm new to python and all this stuff, i know i'm probably just really dumb missing something that i shouldn't but is there any tutorial/guide that would be able to teach someone that never used any program like this before? just want to make sure that i'm not missing something that's so dumb it hasen't been described. i mean from opening the program to generating the tts
Hey there, Just a few quick questions. In order to get the AI better at speaking, would it be better to load more files into the "Use embedding from" section or should I synthesize over and over. I have a bunch of X's inside the toolbox output box but I'm not really sure what it means. I find that even with a short sentence the AI slurs and stops mid sentence.
Wow this is amazing work! I have a few questions: Does this system work in other languages? If it doesn't, how many hours of anotated speech do I need it to make it work? Also, is there a way to improve the quality of the audio output? (decrease the noise between phonemes?) Thanx a lot!
the fact that this piece of technology is hidden behind a wall of technobabble is the only thing barring me from making cicero skyrim recite insane clown posse lyrics. unfortunately for me and almost every other person on planet earth this is completely unuseable. what is code. i don't know. please help
@@Hsaelt im aware of that of course. but until someone starts selling a user friendly version of this i am going to have to sit here and stew in the fact that my own ignorance in this field is the ONE thing stopping me from making Commander Data say 'fuck'. sitting and shaking my fist at the sky. one day
Go here and following the instructs EXACTLY to successfully install the program! poorlydocumented.com/2019/11/installing-corentinjs-real-time-voice-cloning-project-on-windows-10-from-scratch/
I run a lot of DnD and other tabletop games, I have voice augmenting software with notes like "Mountain Troll, "Squirrel Folk", etc, so I can change my voice in real-time while I speak for NPCs that interact with characters. With all these embeddings, spectrograms, etc. Would I be able to use X amount of audio of Morgan Freeman (Audio A), then record my own voice saying the exact same thing at the same pace as the Morgan Freeman voice file (Audio B) and it would be able to log all the differences between them so that when I talk in real-time, it applies all the changes so my voice of Audio A comes out sounding like Audio B?
Hello, If you are not a coder and have now clue on how to work with Pytorch, how can you use these tools? I`m a video editor, I want to rebuild audio for dropped connections on streams.
Wondering, what would be the best way to clone voice timbre without TTS? I've tried Real-Time-Voice-Cloning but it seems to generate only TTS text to the target voice and has no emotions whatsoever. I would like to record my own phrase with my voice, and then encode it as if spoken by the target person's voice and keep my original emotions and inflections.
@Vegan Pete But will it be able to apply those emotions at the exact places where I need them, and not by some "typical behavior"? An example. I want to have a sentence that sounds authoritative and patronizing and puts accents on specific words. Let's say, I have a large voice library of someone who has been reading different styles - patronizing, normal, depressed etc. I doubt Descript will somehow automagically know which voice style to pick for which sentence, and which words to accentuate. If there was a system that could pick up the emotions and accents for specific text I'm recording, and then apply them directly onto a TTS engine voice, this would make indie game development so much easier - you wouldn't need voice actors to record your phrases, you could record them yourself and then run through the hypothetical "voice changing engine".
Librosa will be unable to open mp3 files if additional software is not installed. Please install ffmpeg or add the '--no_mp3_support' option to proceed without support for mp3 files. the got the above message when run this:python demo_toolbox.py I had already installed ffmpeg.
isn't there a way to do stuff like this without having to "schedule a demo" and all of this extra, unneeded bullshit? literally almost all natural voice generation/speech websites have the same infuriating methods when it comes to just trying it out... i just wanna generate voices for non-profit, fun use. is that too much to ask for?
DID IT !!!!!! Had errors so hours of troubleshooting. Use anaconda with a virtual environment theres a command for getting cuda, and others all working from the virtual env. !
Thanks For The Cloned Card My Dogg *DAWGGOAT* I Almost Went Crazy Cuz Of Yo Card😂 When Atm Approved The Card And Start Throwing Da Amount I Click On Out.. You Just Got Yourself Another Long Term Business Customer
Amazing. This could be really useful for people with Lou Gehrig's disease etc if the model could be trained or 'banked' before serious symptoms appear. I wonder if anyone has already done any work in that area/that regard.
Running this tool is cool, but haphazard.. I get a lot of long pauses, "breathing", lost audio, garbled audio. I have good samples. Do the samples HAVE to conform to 5 seconds, or can you use longer ones? Again, this works, but I want to make the clones sound more natural. If anyone has some tips on improving playback, please post them here? Looking for things to do during this unending layover at home due to C-19.
I'm having the same issue as you -- long pauses, missing words, garbled audio, and that weird "blowing on the microphone" noise. Haven't been able to find a fix yet.
Hello! whenever I try to import audio file (Urdu language ) in sv2tts so i get this error (can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool). And I already convert my audio file datatype for float, int, complex but still not working. I am very disappointed. Please help me as soon as possible. @Corentin Jemine
This is so awesome that you're offering this as open source! There'd be only two requests I'd have: a.) What about adding support for other languages? And I don't mean the buttons or menus, specifically I mean sampling German voices and generating audio from German text. b.) I know this will be even harder to do, but what about going beyond a simple, robotic TTS approach to a design of *REPLACING* a voice in an existing recording with another, while it's still saying the same words, in the same mood, timing, acting manners, etc. That would would make for much more natural sounding results. I'd think of the sampling still working a lot like, then giving the program an audio file where it's supposed to replace the original voice with the voice you've just cloned. You may even keep the typable text feature in order to help the program understand the voice to be cloned and its words better, as well as the words in the recording that it will then have to change to the sampled voice.
Here's a few guys with open-source papers working on making the leap from TTS to voice replacement, in case that helps you guys: en.wikipedia.org/wiki/WaveNet (see footnotes for the papers).
They're also working on highly-efficient automatic speech-to-text recognition: proceedings.mlr.press/v32/graves14.pdf This can come in handy when you require your speech recordings also as written text, as your Github is saying that you do for training the program.
Hello Sir. Tried to install your tool from your github link. But facing some issue with pytorch and tensorflow-gpu. I even tried to modify the requirements.txt but to no avail. Can you please post a video for the installation of this great tool. Thanks in advance!
Issues with TensorFlow usually relate to the CUDA driver. You MUST have the CUDA toolkit installed with a driver at 9.0 or higher. I ran into this while trying to use an old supposedly CUDA-enabled video card installed, but this failed. I then switched to a Dell laptop running native Ubuntu (no VirtualBox VMs, must be native) and got it work pretty easily once I figured out all the tools that needed to be downloaded. Also, don't edit requirements txt file, there is no need. In addition, you don't need the audio sample library, as this is 5.6 GB of wasted HD space. Just make sure your WAV files are of good quality with little to no noise in the background. I used TV news people and it worked pretty well.
Can the model be used as standalone and/or integrated to a custom python program instead of your GUI? (I'm an ML engineer so I'm confortable with retraining if needed)
yeah uh, is there a way to get a pre-compiled executable version of this that doesn't require me to manually install and run python and know how to use python?
@@markgiroux3442 all the paid services are web based AFAIK, i want a desktop executable, which is, i assume, how python *would* be working once set up, its just the set up part that i cant seem to do correctly
File “C:\Users ame\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\__init__.py”, line 81, in from torch._C import *ImportError: DLL load failed: The operating system cannot run %1. this error comes when running python demo_toolbox.py, what can i do to fix this?
Doesn't look like could be done easily - but can be done. See this thread: github.com/CorentinJ/Real-Time-Voice-Cloning/issues/30#issuecomment-507864097
Thank you! This finally works, but "Dataset", "Speaker", and "Utterance" are all greyed out. also, I have no option for "pretrained" under any sub-heading. I have "Encoder" under encoder, "Synthesizer" under synthesizer, and "Vocoder" and "Griffin-Lim" under vocoder. Can you help me sort that out please? I have downloaded LibriSpeech and unpacked it to a folder within the root directory, but I have no option to select it anywhere that I can see. UPDATE: 2 things, I needed to pass this argument: [python demo_toolbox.py -d ], and then I had to remember that the GZ file had to be unzipped, and THEN the resulting TAR file had to be unzipped. 😕...But now my noob ass has actually unlocked this thing, and it's working - even the dataset voices! I STILL have no option for "Pretrained" anywhere, so I don't know why others have that, but that's pretty much the last thing I've seen that I'm lacking.
@@fischy0339 That bit of coding that's within the "[ ]" in my response is the bit that you need to copy/paste into a command prompt, and hit "enter". That passes an argument that allows the program to access those functions. You also need to make sure that the GZ is unzipped to a TAR file, and then that has to be unzipped also. It has now been a while since I did this myself, so hopefully it will help you!
What a time to be alive !
I see what you did there
@@PokettoMusic I dont understand haha
SubZakk check out the channel “two minute papers”, the creator’s catchphrase is “What a time to be alive!”. Probably my favourite channel on UA-cam. Great content in short presentations.
@@pglove Ahh yeah I watch him and I just now realized he says that thank you
Hold on to your papers
I think this is a unique piece of software. this software, can potentially, give someone who has lost their physical voice, the powerful ability to speak fluently while assisted. you have done a remarkable thing. well done.
pretty sure apple is doing this for ios 17
A while back I changed the description of this video to invite whoever is interested in cloning a voice to not use my repo and to instead head over to resemble.ai. That came out as a sellout and that wasn't my intention. I've changed back the description.
My initial intention with that message was to avoid having new people spending hours trying to setup the project ultimately to give up or to obtain subpar results.
While resemble does offer a free plan that will let you clone your voice with more naturalness than this project will, purely for legal reasons we cannot allow you to clone the voice of someone else without a bit of legal work.
This repo lets you do anything you want on that regard, so now I get why people want to use it. I've posted an update here: github.com/CorentinJ/Real-Time-Voice-Cloning/issues/364. I've also included a link in the readme to that installation guide that seems to work for most people.
I would love to figure out to take what you have done and turn it into a parrot for my robot. I.e. Have the robot ask a few questions, record the answers, then use those answers to start talking back to the user in their OWN voice. (These would be scripted questions, answers) But I think that would be an awesome project. It's almost like terminator, the robot becomes the person!!
@Robert Sapolsky It's a python project, so you will need to run it with a python interpreter.
Does it work with non english langauge
Does it work with different languages?
Resemble AI seems like a cool company ;)
any tutorial on how to this? from the start?
This would help greatly for something like making new voice lines for mods using the vanilla voice actors in Skyrim/Fallout.
Bit of a legal and Intellectual Property grey area though when you use establishrd actors' voices through this as opposed to the generic voices (ie making new lines using the voices of Max Von Sydow, Joan Allen, Vladimir Kulich, Michael Hogan, etc for their respective characters when they never really recorded them, a machine did it via programming and learning)
Scary when you look at it as it could theoretically bring dead actor's voices back to life with the right samples too. (Like doing another Godfather game without Marlon Brando voicing Vito Corleone)
Well if you're doing that in EU it's illegal because voice is a private data thus protected by the RGPD.
(Yes I wanted to use celebrity audio samples too so I made a bit of researches on legal aspects lol).
A work around would be to replace all the original voices with completely synthetic ones that could be freely used by the community.
And what if a company only ever uses synthetic voice models in the first place? Would they have exclusive rights to the model? What about a model built using embeddings from the company's model?
You would need a different network for each tone though (narrative, angry, interrogative, etc.)
@@mikerhinos but if you publish it (by playing in a movie) gdpr might not protect it.
Never thought of that! Nice!
This is awesome and horrifying at the same time.
Question: Can you use something like this to take a voice input, rather than a piece of text?
That way, you can do things like preserving tone and inflection.
It's probably a couple of additional steps, but I would love to have something like that.
You mean by speaking to the mic? That would be scary tho, but also great (scary - for people to actually impersonate higher-up celebrities, etc pretending to be them live - would be more catfishes tho)
Is any such toolkit available?
would love this for my dnd characters
did you ever find an answer?
@@J4cobSkiJumpingImagine like an horror game like that s
The first two voices are British English in the sample but gain a US accent in the synthesis (though they do admittedly sound like those same people, just putting on an American accent). I'm guessing that's due to the voices of the audiobooks used in some of the training? Very clever though, nice work!
Plot twist: This video was narrated by the AI
of course it was!
Always has been
plot twist: this comment was copied from others
@@warker6186 how do u know
this comment is like a year old
Can this be used to create a TTS voice for the PC already? Or do we need something else?
Corentin Jemine , Can I use a single audio file and just straight up load it into the toolbox? How should I do it?
Wow ! Nice work, nice presentation, congrats !
Finally i can now complete my Conan the detective cosplay
Great work!!! I’m super excited to use this. I’ve been trying for the last couple of days...I don’t have a computer with a Nvidia GPU...Do you think something like the Jetson Nano would be able to run this?
This is really freaking cool! My friends are going to get freaked out on discord lol
ahhahaahah
I think the best way to get the cloner to produce the tone or structure you want would to have it try to use your own voice's inflections and changes in tone. and use that as the guide/instructions as to how it should make the cloned voice speak and sound during speaking.
Very interesting! Do you think it's possible to preform speech to speech synthesis? Keeping tone and pitch of voice of the input intact for the output? I want to be solid snake :O
I can confirm this actually does work on windows 10. I followed the main branch of the github repo.
Is there an open-source solution that doesn't focus on speed, but rather on quality and that you can clone on a large dataset?
I can only find this as an open-source packet that does voice cloning...
This is amazing and frightening at the same time. Fantastic job!
But can you have it work in realtime with a microphone instead of typing what I want to say? It's not realtime if I can't talk with my microphone and have it come out with that voice
realtime in this context is the ability to type and let the algorithm generate speech. This is not an app for your phone, this is just an example of the algorithm which could be downloaded from github. This is meant to be for Machine Learning developers and students who use python.
it can be if you use an nvidia tesla card but you still have to use a soundboard
already exists but it takes a little but long delay to perform
So basically you have already trained the models in English language and now using knowledge transfer for any voice type who is speaking in English to get the generated voices ?
Yeah
Hi, this is awesome. Congratulations. Do you think different languages can be trained by changing the language of the training set? I would like to give it a try on Spanish
Funcionó?
Hello!
This is awesome tool!
How can i adapt my own datasets for training? The program allows you to use only those listed in the training section.
"1. You will get much better results with many more features through"
No, we wouldn't. Because you want us to pay you 500$ per month to be able to clone voices other than our own. Why not simply limit the number of clips we can generate per day? The current model is just bloody nuts when you stated "It's a proof of concept. Many features are lacking and the quality is often poor".
what are you talking about?
Another question: Could you use something like this to get speech input in one language, and then use a piece of text in another language to get speech output of that language? That function will be more interesting.
How did you import the dataset into the toolbox? The browse button only allows searching for music.
This seems pretty good.
Although I don't much care about the voice synthesis. I want to try to use this as a realtime voicechanger. That's going to take a bit of work.
I'm very interested in this functionality, to voice different characters in animations. Did you ever find a way to do it?
@@HonorMacDonald Well this already IS a way to do it. Same way this is done can for sure be used but I didn't get around to making it. I'm not a programmer so it would be a lot of work for me to make it.
synthesis is the important part, voice changer can be added later
@@OwenPrescott No, synthesis is already a thing, I don't want to go that route. I wanted to do a realtime voicechanger that goes phoneme by phoneme.
If you're doing it with synthesis and you're speaking a longer word, your synthesis based voicechanger has to wait for the whole word to be heard before it synthesizes it because otherwise it wouldn't know how to pronounce it.
This... This is amazing ! Thank you for creating that tool and for making it so accessible !
This is great software and thanks for keeping it free. I was searching for voice cloning software to clone my granny's voice(she is not alive) so i can feel her close to me. Thanks again for such a wonderful tool. I will try it.
hello! i'm going crazy trying to understand if is possible to use italian
It's sad that you are saying this isn't worth my or your time, simply because you want to collect some dividend from resemble? I don't get it, something like resemble requires a specific set of words to be learned from, that's ridiculous if you are trying to teach an AI anything but your own voice.... this is still a fantastic resource if you have the patience to set it up, with a little modification it can learn its own voice and search the internet for say... idk maybe a podcast host that consistently has good quality audio, and over the years get better and better and its own speech, building on its possible minute changes in intination and pitch and speed. Idc what your description says, this is bloody brilliant man
HOLYYYYY. I don't even know what to say. The possibilities are endless for this tech.
Like what?
You're one of the good guys, aren't you?
@@1hitkill973 I'm honestly curious about this. I'll love to hear about practical uses of this technology.
I think I'm close to opening the toolbox, but I'm getting syntax errors when trying to run the module for requirements.txt in Python 3.7.0 Shell.
Then try installing each segment of requirements manually via py -m pip install (module name) without brackets
@@Hsaelt Thanks for the advice my good man! After countless trial and errors I got it open.
@@SneakiestChameleon sigh glad it worked for you ar least. I am still trying to make it work.
Wow... just wow. I am in awe of your work!
Is there a simple Windows installer I can run, cos all I see at GH is a lot of individual file thingies? Thanks
Super cool. If you use longer input samples do you get a cleaner output? Some of the output sounds like its breaking up or glitchy... maybe extra reverb in the sound. I'm not sure how to describe it.
hi thank you for share this one, just a question, I have some issues when I trying to execute the program on "demo_cly,py) demo_cli.py", line 80, in
encoder.embed_utterance(np.zeros(encoder.sampling_rate))
inference.py", line 144, in embed_utterance
frames = audio.wav_to_mel_spectrogram(wav)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
\encoder\audio.py", line 58, in wav_to_mel_spectrogram , there are any upadtes for the program?
This is really clever. It would be neat if you could have a speaker talk into it instead of text to speech that way you can capture emotion and inflection too but I get it that's hard to do
I find this really awsome and i would like to try it out myself but i'm new to python and all this stuff, i know i'm probably just really dumb missing something that i shouldn't but is there any tutorial/guide that would be able to teach someone that never used any program like this before? just want to make sure that i'm not missing something that's so dumb it hasen't been described. i mean from opening the program to generating the tts
This will change my life
I am using microphone to record , I always get stuck at "add 1 more points" and it doesnt make a projection , what am I doing wrong ?
Hey there, Just a few quick questions. In order to get the AI better at speaking, would it be better to load more files into the "Use embedding from" section or should I synthesize over and over. I have a bunch of X's inside the toolbox output box but I'm not really sure what it means.
I find that even with a short sentence the AI slurs and stops mid sentence.
Same here, I noticed longer samples of the source voice were working better but still tons of slurring and crap.
I need to test this on GLaDOS
How did it go ?
yes pls tell how did it went
exactly the reason I am looking for this
did you get it to work?? i cant get it to work :(
@ tried to train it on glados but there is no way you'll get the robotic voice. it just sounds like a really bad human voice :^(
Omg! This is amazing! Very nice work,man!
This Guy doesn't get enough credit. Every implementation of this technology should be paying you 10% at least.
Perfect for indie game devs who want some voice acting. Not perfect but good enough
Wow this is amazing work!
I have a few questions: Does this system work in other languages? If it doesn't, how many hours of anotated speech do I need it to make it work?
Also, is there a way to improve the quality of the audio output? (decrease the noise between phonemes?)
Thanx a lot!
1) 5000 hours would be just enough.
2) No
the fact that this piece of technology is hidden behind a wall of technobabble is the only thing barring me from making cicero skyrim recite insane clown posse lyrics. unfortunately for me and almost every other person on planet earth this is completely unuseable. what is code. i don't know. please help
lmao to get something you need to sacrifice something no one's gonna give you everything ready to use
@@Hsaelt im aware of that of course. but until someone starts selling a user friendly version of this i am going to have to sit here and stew in the fact that my own ignorance in this field is the ONE thing stopping me from making Commander Data say 'fuck'. sitting and shaking my fist at the sky. one day
Mission Impossible come true
How do I get different vocoders in the list ? Will selecting a different one have better intonation ?
Go here and following the instructs EXACTLY to successfully install the program! poorlydocumented.com/2019/11/installing-corentinjs-real-time-voice-cloning-project-on-windows-10-from-scratch/
does using more input for training result in higher quality output?
Would it be possible to use this for other languages besides English?
I run a lot of DnD and other tabletop games, I have voice augmenting software with notes like "Mountain Troll, "Squirrel Folk", etc, so I can change my voice in real-time while I speak for NPCs that interact with characters. With all these embeddings, spectrograms, etc. Would I be able to use X amount of audio of Morgan Freeman (Audio A), then record my own voice saying the exact same thing at the same pace as the Morgan Freeman voice file (Audio B) and it would be able to log all the differences between them so that when I talk in real-time, it applies all the changes so my voice of Audio A comes out sounding like Audio B?
I'm not sure if that's possible right now. But it will. Crazy things will happen in the next two decades.
@@BlackStarEOP 2 years and no practical progress, feels bad XP I'm sure a lot of progress has actually been made, just it will take A LOT more even :P
Hello,
If you are not a coder and have now clue on how to work with Pytorch, how can you use these tools? I`m a video editor, I want to rebuild audio for dropped connections on streams.
Is there a step by step tutorial on how to get into the interface?
wow it can be used on real-time translation with the speaker's own voice. thats cool
Wondering, what would be the best way to clone voice timbre without TTS? I've tried Real-Time-Voice-Cloning but it seems to generate only TTS text to the target voice and has no emotions whatsoever. I would like to record my own phrase with my voice, and then encode it as if spoken by the target person's voice and keep my original emotions and inflections.
@Vegan Pete But will it be able to apply those emotions at the exact places where I need them, and not by some "typical behavior"?
An example. I want to have a sentence that sounds authoritative and patronizing and puts accents on specific words. Let's say, I have a large voice library of someone who has been reading different styles - patronizing, normal, depressed etc.
I doubt Descript will somehow automagically know which voice style to pick for which sentence, and which words to accentuate.
If there was a system that could pick up the emotions and accents for specific text I'm recording, and then apply them directly onto a TTS engine voice, this would make indie game development so much easier - you wouldn't need voice actors to record your phrases, you could record them yourself and then run through the hypothetical "voice changing engine".
That's incredible. What about different language accents?
How do you import data sets like you have in the drop-down menu? Mine says "Random".
It still sounds a bit robotic. You definitely can hear a small difference but whoa I love it
Librosa will be unable to open mp3 files if additional software is not installed.
Please install ffmpeg or add the '--no_mp3_support' option to proceed without support for mp3 files.
the got the above message when run this:python demo_toolbox.py
I had already installed ffmpeg.
Why not start with a demonstration of how to install and run?
I know, im trying to work that out now
:D
use google man
or you could just follow the guide on the github page. just dont be lazy
@@WomboBraker followed the instructions on github, still having problems to run this thing
isn't there a way to do stuff like this without having to "schedule a demo" and all of this extra, unneeded bullshit? literally almost all natural voice generation/speech websites have the same infuriating methods when it comes to just trying it out... i just wanna generate voices for non-profit, fun use. is that too much to ask for?
I can't execute pip install -r requirements.txt
where i am supposed to find that txt file?
wonder what gpu did you use? its running pretty fast or you make the learning video shot fast?
DID IT !!!!!! Had errors so hours of troubleshooting. Use anaconda with a virtual environment theres a command for getting cuda, and others all working from the virtual env. !
Where did you get the software? I'm still having trouble finding it. Thank you
nice bro thank you. btw i have questions : what method or algorithm do you use bro????
That's amazing.... GENIUS!!!
Thanks For The Cloned Card My Dogg *DAWGGOAT* I Almost Went Crazy Cuz Of Yo Card😂 When Atm Approved The Card And Start Throwing Da Amount I Click On Out.. You Just Got Yourself Another Long Term Business Customer
How do i open the toolbox? i DONT get it... the video starts already with everything open... no tutorial... nothing to help you know what to do
Which languages have been supported? I didn't find any info about it. I would like to play with hungarian language and text.
Amazing. This could be really useful for people with Lou Gehrig's disease etc if the model could be trained or 'banked' before serious symptoms appear.
I wonder if anyone has already done any work in that area/that regard.
I arrived here exactly with that in mind, probably will be running some experiments soon
Running this tool is cool, but haphazard.. I get a lot of long pauses, "breathing", lost audio, garbled audio. I have good samples. Do the samples HAVE to conform to 5 seconds, or can you use longer ones? Again, this works, but I want to make the clones sound more natural. If anyone has some tips on improving playback, please post them here? Looking for things to do during this unending layover at home due to C-19.
I'm having the same issue as you -- long pauses, missing words, garbled audio, and that weird "blowing on the microphone" noise. Haven't been able to find a fix yet.
the fact that this has already existed 4 years ago
Hello! whenever I try to import audio file (Urdu language ) in sv2tts so i get this error (can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool).
And I already convert my audio file datatype for float, int, complex but still not working. I am very disappointed. Please help me as soon as possible. @Corentin Jemine
This is so awesome that you're offering this as open source! There'd be only two requests I'd have:
a.) What about adding support for other languages? And I don't mean the buttons or menus, specifically I mean sampling German voices and generating audio from German text.
b.) I know this will be even harder to do, but what about going beyond a simple, robotic TTS approach to a design of *REPLACING* a voice in an existing recording with another, while it's still saying the same words, in the same mood, timing, acting manners, etc. That would would make for much more natural sounding results. I'd think of the sampling still working a lot like, then giving the program an audio file where it's supposed to replace the original voice with the voice you've just cloned. You may even keep the typable text feature in order to help the program understand the voice to be cloned and its words better, as well as the words in the recording that it will then have to change to the sampled voice.
Here's a few guys with open-source papers working on making the leap from TTS to voice replacement, in case that helps you guys: en.wikipedia.org/wiki/WaveNet (see footnotes for the papers).
They're also working on highly-efficient automatic speech-to-text recognition: proceedings.mlr.press/v32/graves14.pdf This can come in handy when you require your speech recordings also as written text, as your Github is saying that you do for training the program.
@@tlatosmdso basicaly you talk and you have Someone else voicw
how long does it take to synthesize?
How do I even get started? github download has no app to launch or anything. How do I get to that program?
one video, 1.62K subs, you're a legend
How do you actually set it up tho? I downloaded the "Real-time voice cloning master" and it just gave me a bunch of files but no application.
Hello Sir. Tried to install your tool from your github link. But facing some issue with pytorch and tensorflow-gpu. I even tried to modify the requirements.txt but to no avail. Can you please post a video for the installation of this great tool. Thanks in advance!
@Chris Connelly not yet
hey !!!!
did you install it in terminal / cmd ,
see this is an project which is done on PyCharm, and that's the project file
@@hariprasath9222 thanks Hari for the pointer.
Issues with TensorFlow usually relate to the CUDA driver. You MUST have the CUDA toolkit installed with a driver at 9.0 or higher. I ran into this while trying to use an old supposedly CUDA-enabled video card installed, but this failed. I then switched to a Dell laptop running native Ubuntu (no VirtualBox VMs, must be native) and got it work pretty easily once I figured out all the tools that needed to be downloaded. Also, don't edit requirements txt file, there is no need. In addition, you don't need the audio sample library, as this is 5.6 GB of wasted HD space. Just make sure your WAV files are of good quality with little to no noise in the background. I used TV news people and it worked pretty well.
I would love to get this on my PC. Sadly, we all get lost after reading Number 1.
Can the model be used as standalone and/or integrated to a custom python program instead of your GUI? (I'm an ML engineer so I'm confortable with retraining if needed)
yeah uh, is there a way to get a pre-compiled executable version of this that doesn't require me to manually install and run python and know how to use python?
Do you want easy, or do you want free? There's plenty of paid services out there if you don't want to put in the minimal work to get this set up
@@markgiroux3442 all the paid services are web based AFAIK, i want a desktop executable, which is, i assume, how python *would* be working once set up, its just the set up part that i cant seem to do correctly
0:22 Bill Bailey!
can you do a tutorial for infusing this technology into a voice Chatbot?
Where find this vocals, that you use for example
File “C:\Users
ame\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\__init__.py”, line 81, in from torch._C import *ImportError: DLL load failed: The operating system cannot run %1.
this error comes when running python demo_toolbox.py, what can i do to fix this?
I see so many people getting this error, I got another error, did you try asking on their GitHub?
And that's how Terminator discovered that John Connor's adoptive parents were killed by T-1000
do you have standalone ready to run .exe? i dont really know how to use python :(
Can voices in other languages be used too?
I think you are waiting for Japanese?)
@@NoName-br8pb maybe? ;) But Russian Shrek dub was amazing.
ЧТО ТЫ ДЕЛАЕШЬ В МОЁМ БОЛОТЕ!?
@@KatouMegumiosu lol
Doesn't look like could be done easily - but can be done. See this thread: github.com/CorentinJ/Real-Time-Voice-Cloning/issues/30#issuecomment-507864097
@@KatouMegumiosu привет, а можешь, пожалуйста, подсказать, как запустить программу?) Я туплю дико. Скачал zip с гита, но не догоняю, где exe)).
I'mma feed this thing so many death grips vocals
The fact is that the AI tools that exist today existed before
How can I add that librispeech repo to the program???
3:50 Is it just me or does that sound like Salman Khan from Khan acadamy????
our hero
this is giving me ‘Miyazaki watching the zombie animation’ vibes
I will use this to clone my grandpa voices, because i miss him so much.
I lost mine too, not alone dude
Thank you! This finally works, but "Dataset", "Speaker", and "Utterance" are all greyed out. also, I have no option for "pretrained" under any sub-heading. I have "Encoder" under encoder, "Synthesizer" under synthesizer, and "Vocoder" and "Griffin-Lim" under vocoder. Can you help me sort that out please? I have downloaded LibriSpeech and unpacked it to a folder within the root directory, but I have no option to select it anywhere that I can see.
UPDATE: 2 things, I needed to pass this argument: [python demo_toolbox.py -d ], and then I had to remember that the GZ file had to be unzipped, and THEN the resulting TAR file had to be unzipped. 😕...But now my noob ass has actually unlocked this thing, and it's working - even the dataset voices!
I STILL have no option for "Pretrained" anywhere, so I don't know why others have that, but that's pretty much the last thing I've seen that I'm lacking.
Thanks, I've been looking for this solution
Did you ever figure out the pretrained or other vocoder options?
@@michaelsmith4904 Sadly, no. :(
how exactly did you get them working now?
@@fischy0339 That bit of coding that's within the "[ ]" in my response is the bit that you need to copy/paste into a command prompt, and hit "enter". That passes an argument that allows the program to access those functions. You also need to make sure that the GZ is unzipped to a TAR file, and then that has to be unzipped also. It has now been a while since I did this myself, so hopefully it will help you!
I'll read you're thesis in my free time!