5x Faster Voice Cloning | Tortoise-TTS-Fast | Tutorial

Martin Thissen

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 лис 2024

КОМЕНТАРІ • 170

@ymroddi Рік тому ⁺⁴²
Something I discovered with the original model - you can do voice prompting! Per author:
For example, you can evoke emotion by including things like "I am really sad," before your text. I've built an automated redaction system that you can use to take advantage of this. It works by attempting to redact any text in the prompt surrounded by brackets. For example, the prompt "[I am really sad,] Please feed me." will only speak the words "Please feed me" (with a sad tonality).
@ymroddi Рік тому
@@Nathansthing the 'for example...' paragraph I quote is direct from the repo.
@ymroddi Рік тому
@@Nathansthing the original tortoise repo.
@DJPapzin Рік тому ⁺²
or you can simply use {sad} before the text
@b0b10 Рік тому ⁺⁵
Thanks, I got it running on windows. It's still slow compared to coquitts though. It uses 8.5 GB to vram.
It took 20 seconds to generate the test phrase for Tortoise original.
35 seconds to generate for average per 4.27s (broken on small files)
32 seconds to generate for average per voice file (broken on small files)
This is on a rtx 3060
@MackNcD Рік тому
Ah gee whiz would 8gb not be enough to run it then? Would i have to use a remote gpu
@ChasingStars7111 Рік тому
Hi! how did you get it running on windows? can you make a guide about it? please !
@CodePythonicDev Рік тому ⁺³
For anyone wonder what to put in the gpt checkpoint and diffuser drop down: it means in this folder: "C:\Users\[yourname]\.cache\tortoise\models" (which is the default download location for those) The gpt checkpoint is autoregressive.pth, and the decoder is diffusion_decoder.pth (found out by a few hours debugging hands on and with chat-gpt
@sidarth404 Рік тому
can you please tell in more details how to solve.
@greenockscatman Рік тому ⁺⁵
Great work on the Web UI! Makes creating new voices a breeze.
@ChaseEverything Рік тому
How much quicker coz im currently waiting ages
@ChaseEverything Рік тому
How much quicker coz im currently waiting ages
@ASlaveToReason Рік тому ⁺⁸
Great video love me some attenborough
@woo__ooow Рік тому ⁺⁴
The voice in the end reminds me of the BBC narrator who narrates the Planet Earth series! :D
Very awesome tutorial -- thank you! :)
Would appreciate creating a video on bark (JonathanFly/bark) -- it's not for voice cloning, but a promising TTS model.
@ReclaimYourSpark Рік тому ⁺³
Sir David Attenborough :)
@TechInRealEstate Рік тому ⁺⁵
Thank you for sharing your knowledge this is a great tutorial! For the Google Colab notebook do you know what the endpoint IP to access the streamlit app?
@gitcotech Рік тому
did you get that?
@youth.realms Рік тому ⁺²
Thank you very much both for the video and for the links!
Original Tortoise results sound much better though. Maybe gotta look into the settings.
@slimyelow Рік тому ⁺¹
I am 2 minutes into this video and am already your fan
@martin-thissen Рік тому ⁺¹
Love to hear that! Hopefully you liked the rest of it as well :-)
@ILIKECYDIA Рік тому ⁺⁸
Another great video. It's interesting to see how fast it can be. Unfortunately the clones aren't completely like their counterparts but thats expected with the small amount of compute being used. Even getting a slight glimpse of the character is impressive. have you messed around with other voice cloners such as coqui?
@martin-thissen Рік тому ⁺²
Actually only used Coqui for speech synthesis without voice cloning. But looking into bark (just released a few days ago) now.
@IDKOKIDK Рік тому
So it's faster but lower quality?? That doesn't sound like much of an upgrade. Web interface sounds great though
@sunburnfm Рік тому
I tried Coqui and had a problem. Three months and I'm still waiting for the devs to answer a question. I would not give them any money.
@daffertube Рік тому
Excellent presentation!
@PwninMcduff Рік тому ⁺¹¹
Been using Tortoise-TTS for awhile now and curious if you see a difference in audio quality between the original TTS and TTS-Fast
@sidarth404 Рік тому ⁺⁷
you colab link say "To access the website, please confirm the tunnel creator's public IP below." what to do
@Kickjess Рік тому ⁺⁹
How are people able to make these kind of things but not able to just make a user friendly version that does not require all kinds of steps where things can go wrong. Tried multiple tutorials and get different errors in all of them.
@lovol2 Рік тому ⁺²
I think it's simply because user interface is simple, in challenging but time consuming to make.
The types of people who make these models are more than happy with command lines. In fact the clutter of a UI gets in the way, you need to script these things, leave running over night etc.
Its just two different types of people imo.
However, you could make a UI! Get chatGPT plus, and spend some time figuring out the best framework. It's an amazing teacher. Just ask it to build things step by step.
@brittanywinston9227 Рік тому ⁺¹
@@lovol2 the UI is great, how can you get the block where you can upload your own files
@Dante02d12 Рік тому ⁺¹
It's a research project. The devs work hard on the stuff itself. If we are able to download it, it's because it's open source. But "open source" doesn't mean it's meant for consumers. UI is a fluff. It matters for end-users, not for research.
@kashtanrusgib3768 10 місяців тому
at 18:30 I said "David Attenbobough". Good tutorial, Thank You
@polyhistorphilomath Рік тому ⁺³
I just realized--if I understand correctly--that it's not an analogy. You're saying the image being constructed is the mel spectrogram, so it can simply be pushed back into the time domain. That's the weirdest thing I've heard in a while.
@martin-thissen Рік тому ⁺³
Yes, it's exactly that way. The diffusion model generates a mel spectogram, which is then transformed into the time domain using a vocoder such as BigVGAN (github.com/NVIDIA/BigVGAN)
@adamrastrand9409 Рік тому ⁺⁶
Can you please make a video on how to set up an audio generation, model called audioldm. It’s available at the top and it allows you to generate text to audio audio to audio and make style transfers from audio files. For example, text to audio your input a sentence, and it will generate a similar sound check in the Readme file for how to do it because I am very confused, so can you please make a video for a beginner to how to set it up in Anaconda from what I heard or how to do it, can you explain in detail what you’re doing since my screen is lagging it would be really nice of you if you could make such a video
@adamrastrand9409 Рік тому
The model is available at guitar, but could you please make a video how to show how to set the model up can you please do it tomorrow or in a couple of days or in a week?
@ณัฐสรณ์สุนทรกาญจนา-ภ8ศ Рік тому ⁺⁴
Can you do a Tortoise-TTS-Fast for local computer in a form of a website that is not using UI itself and in a form of the program that it can installed in your computer itself, Martin?
@jmcllinux 10 місяців тому
Spectacular !!! Good Work!!! 👍👍
@martin-thissen 10 місяців тому
Thank you so much! :-)
@Cbon-xh3ry Рік тому ⁺⁵
You talked about foreign languages and it would be cool to get your feedback with German language for instance. How do you tune it to make it work with German ? is it a try-and-test loop process ?
@thomasliewcw Рік тому
By your can use subtitles German language to help yourself
@nattasornnewsroom Рік тому ⁺¹
I want a Tortoise-TTS-Fast with voice cloning that did both singing and speaking to it, OK, Martin?
@clouds2593 Рік тому ⁺¹
Great video and tutorial.
@chyldstudios Рік тому
Great to see these improvements!
@illuminated2438 Рік тому ⁺²
Hello, thanks for this! It's great. Have enjoyed other videos.
A question: how do you connect this to a Cloud GPU?
@salehshamoon913 Рік тому ⁺¹
Insightful video. Really appreciate it. I have a question. How can we preserve the accents in our cloned voice? I can't find any way to do that
@LordVictorHalgaard Рік тому ⁺²
Unless im missing something, this doesn't work on windows - the entire guided doesn't seem to mention it, but it only works for linux... Which is frustrating
@JavArButt Рік тому
The title sounds very inviting
@battlestarmercury Рік тому ⁺¹
I had tortise-tts working with conda install on Windows 11. I need a variation on how to get gcc versioin working correctly on that platform, Windows box happens to have the right gpu. A proceduer for Mac OS would also b e helpful because I do my actual editing on that platform. Great most awesome enhancement.
@ChasingStars7111 Рік тому ⁺¹
Thanks for the video! can you please make a similar guide for windows ?
@TamilLatest Рік тому
Cool video on this project, can you make one showing how to set up a web gpu in the cloud?
@higuys6 Рік тому ⁺⁴
Has anyone figured out how to stop the audio from cutting off early?
@dogme666 Рік тому ⁺³
amazing work martin thank you! little question , when i run for the first time it asks to select a gpt checkpoint and a diffusion checkpoint , what should i input there?
@CodePythonicDev Рік тому
Hello, in case you still have trouble with this, it means in this folder: "C:\Users\[yourname]\.cache\tortoise\models" (which is the default download location for those) The gpt checkpoint is autoregressive.pth, and the decoder is diffusion_decoder.pth
@tommov2934 Рік тому ⁺³
Really great tutorial..sadly ..i have a 3090 with a 10900k and 64 gig of ram and with the settings you have it still takes 42 seconds to produce 7 words of txt....and that's with low ram unchecked....Got it working by reducing the number of voice samples ...i thought the more samples the better but it slows it down badly...seems good with 20 10 second clips
@FSniffer Рік тому
Holy shit. So this isn't working on potato pc. :(
@nattasornnewsroom Рік тому ⁺¹
Martin, please make the Tortoise-TTS-Fast but is in Thai language, okay?
@ZeroBudgetLife Рік тому ⁺²
HEY THE COLAB VERSIOPN OF THIS stopped working and throwing a runtime pytorch error. please can you fix it.
@juanjesusligero391 Рік тому ⁺²
Thanks a lot for another great video! :D
Could you maybe make another one showing us how to install it locally on Windows? I've tried several times and with several repositories, but with no luck until now ^_^U
@martin-thissen Рік тому ⁺⁵
I will try to also add windows commands in future videos/articles. The problem is I can't verify if they work on my computer.
@juanjesusligero391 Рік тому ⁺²
@@martin-thissen Thanks! ^^ I'll try them, I hope I'm more lucky then, Windows is such a hassle sometimes XD
@battlestarmercury Рік тому
Brilliant idea. I would be greatly indebted for a branch of the install instructions that omits these three steps. Also as my WIndows box has the right GPU there may be other issues. Alternate, I have a red hat kvm virtual server with plenty of disk annd ramm but not the gpu. If you reveal what version of ubuntu you are using I can more precisely use your install method.
@fdimb Рік тому ⁺¹
Now you need to try the Bark fork that allows voice cloning lol
@Raiyan-27 Рік тому
bark clone is better or tortoise is better?
@fdimb Рік тому
@@Raiyan-27 bark has some features that tortoise doesn't, but it has a non commercial licence
@ratside9485 Рік тому
Would be nice for non-English language.
@tubebility Рік тому
I think it sounds like the Smoking Man from X-Files, before his smoking days. 🤭
@sidarth404 Рік тому ⁺¹
almost done in windows but at last in streamlit problems. RuntimeError: Error(s) in loading state_dict for UnifiedVoice: Unexpected key(s) in state_dict:
@-hyphen-3577 Рік тому
Damn, they should have named it Hare-TTS.
@SagarThapa-nm6pe Рік тому ⁺¹
Hello Martin. 🚨🚨🚨🚨🚨🚨🚨
When I try to run the Google Colab Notebook, and after visiting the web interface, it says
"To access the website, please confirm the tunnel creator's public IP below"
What should I do next??
And also thank for the great Tutorial...
SUBBED..... 👍👍👍
@madwurmz Рік тому
very interesting stuff! I just installed the repo from mrq, I wonder if I should instead install this version or is it already implemented, seeing the date of newest changes is the same on both repos. Im not really smart so I got noob questions like that... I just try to clone a voice and it is too slow, have already had to disable bitsbytes becasue errors ... :)
@botlifegamer7026 Рік тому ⁺²
Is there a way to make the voices match better? and also is there a way to train voices for better outcomes?
@KashishNaqvi-ex5me Рік тому
Amazing work! I am using tortoise tts but having issues with intonation, mispronunciation
and screeches, can you suggest some way to improve that?
@kostas9849 Рік тому ⁺³
Does this work with foreign languages or only with english?
@CaptainBipto Рік тому ⁺¹
@Martin Thissen I installed this locally on a Windows 11 pc, using Firefox and Opera browsers. But the WebUI does not have the section Create New Voice. I have select GPT Checkpoint, and select Diffusion Checkpoint, above the Text section. So not sure where I went wrong with the install.
@TheAiConqueror Рік тому
Super Video Martin! Kannst du vielleicht mal erklären was VQVAE ist und wie man es zum Training für Deutsche Stimmen verwendet bzw. generell mal ein Video über Finetuning und so machst. 🤔 Würde mich sehr interessieren 😁 vieleicht gibts dan mal wieder ein Bier von mir 🍻 Gruss
@martin-thissen Рік тому
Habe es mir notiert, danke dir! :-)
@Pasko70 Рік тому
I love your videos! Did you consider using runpod for cloud gpu's? You can keep your volume, when shutting down the instance. If you create venv on your volume you only have to install requirements once. No association with runpod just my favorite platform.
@martin-thissen Рік тому
That's a great idea! Haven't heard about runpod yet, but was looking for such an option, thanks!
@jombbobbo6291 Рік тому ⁺¹
Why does it need to be done in a virtual environment?
@iseahosbourne9064 Рік тому
Hows it going martin! I just completed your tutorial, and I wanted to ask, is there a way to get the voices sounding less robotic?
@Realmariah510 Рік тому ⁺¹
How do I get to the web version? It says we need an Endpoint IP: from the person who shared the link
@mczmv Рік тому
Thanks for the video, brother. I have a question regarding to the library download. I have a mac and do i really need to download the same files from the terminal every time i generate voice? And will the previous files that were downloaded from the terminal will be removed after downloading the next time?
@wakegary Рік тому
I just spent so long cleaning up the code and requirements.txt stuff that both OG Tortoise and Fast are about the same as far as time goes for me. The original Tortoise works, so I'll just go slow and steady. If anybody has a cleaner repository , lemme know. Thx for the tut tho! Using Conda on Win10 ended up being a bit of a nightmare (even with ChatGPT). Lots of torch/numpy/voicefixer dependency loops... just a heads up to the non linux turtleheads out there.
@CedisolmsShorts Рік тому
How can I save the voice model is there any way? So like that i can use that voice for singing.
@CabrioDriving Рік тому
Can you mix a few different voices and achieve a unique voice, still sounding good?
@adrielmendes8998 Рік тому ⁺²
Would Tortoise-TTS-Fast be able to work with the "Brazilian Portuguese" language?
@wifics Рік тому
Você conseguiu?
@gitcotech Рік тому ⁺¹
Hello, Thanks a lot for the wonderful content. I used Colab but at the end when I run the last command and click on the link provided, it asks for an endpoit IP and when i provide it it ruturns 504 gateway timeout. Please How do i fix this
@BooksPoint Рік тому ⁺¹
same error!
@GlenBland Рік тому
Your videos are great. I hope you are able to get an RTX 4090 to run this stuff locally some day. (me too, some day lol) I am also very interested in Stable Diffusion extensions and locally run chatgpt style LLMs. Do you share those interests as well?
@martin-thissen Рік тому ⁺¹
Absolutely, definitively planning to do more videos about Stable Diffusion and LLMs in the future as well.
@ユリア-l1o 8 місяців тому
Is there a video on how I set up a cloud GPU in order to run tortoise TTS? (regardless of whether using the GUI or the standard python code from tortoise-tts rep)
@kushagramaheshwari2537 Рік тому ⁺²
Hi Martin! Great Video!!
The localtunnel is asking for a creator's public IP and not allowing me to interact with the UI.
Can you share that Endpoint IP?
@Realmariah510 Рік тому
same
@rsaylors Рік тому
the medium article has different commands than the video
@juli0n Рік тому ⁺¹
The localtunnel for google collab doesnt work, what alternatives are there?
@souravnair7828 Рік тому
Is there any way that we can finetune this model? It would be really helpful if we can get a tutorial on how we can fientune it. Please look into this
@ishaanshah6454 Рік тому ⁺¹
Is the colab running for anyone else?
@zendogmultimedia Рік тому ⁺¹
It is asking for an IP address. What do I put
@JitendraKumar-uo4tg 4 місяці тому
Why do you make your medium articles "members' only?
@danrobin149 Рік тому ⁺¹
Hey, so I've been trying a lot to get this right. I get until the end but it ends up saying something about not being able to find the "UnifiedVoice" model or something like that. Can you help me?
@HermeyYT Рік тому
Did you manage to find a fix for this?
@Starpluck Рік тому
When I try to run the web UI, it says ModuleNotFoundError: No module named 'tortoise.inference' for Tortoise-tts-Fast. Please help
@Epirium Рік тому ⁺¹
what is tunnel ip dude
@Space-O-2001 Рік тому ⁺²
At what point do they call it the Hare model?
@martin-thissen Рік тому ⁺¹
Fair question haha
@nofreewill Рік тому
I am trying to run the script, but it didn't work, i am using Windows 11.
@steelahlive Рік тому
How easy/hard would this be to wrap into an iOS app ipa ;)😅say a narration app with sleep timer that can read docs or pasted text with little limits. Maybe output to mp3
@StoriesWithAPurpose Рік тому
Won't be easy, you'd have to run this model on some server and add additional code ( some python library) to read the docs , then integrate that into an iOS app. For such I'll suggest doing it with Swift than using flutter.
@tautegu Рік тому ⁺¹
The outro also sounds a bit like Liam Neeson
@polyhistorphilomath Рік тому ⁺¹
Attenborough doesn't know who you are.
Attenborough doesn't know what you want.
If you are looking for ransom I can tell you Attenborough doesn't have money,
but what Attenborough does have are a very particular set of skills.
Skills Attenborough has acquired over a very long career.
Skills that make him a nightmare for people like you.
If you let his daughter go now that'll be the end of it.
Attenborough will not look for you, Attenborough will not pursue you,
but if you don't, Attenborough will look for you, Attenborough will find you and Attenborough will narrate a nature film for you.
@tautegu Рік тому ⁺¹
@@polyhistorphilomath haha. Brilliant
@deeber35 Рік тому
Can you change the tone of the voice reading text {e.g. excited, sad, etc}?
@LyllianaofMirrah Рік тому ⁺¹
bruh linux tutorial there's no windows tutorial?
@AniruddhaArnob-f7o Рік тому
What to put in the endpoint ip in local tunnel? I have a low spec pc. So there is no way but to use the google colab. But stuck in this step.
@battlestarmercury Рік тому
Oh I see Ubuntu 22.04 I will start with a gui workstatio and see how far I get
@AndrewMagee01 Рік тому
It's a cool pet project, but it takes forever to load, crashes frequently, and doesn't sound any better than text-to-speech.
@michaelb1099 Рік тому
would be great if we were allowed to download the sapi file to be used on pc with other softwares voices and now just a record and paste
@quatre1559 Рік тому ⁺¹
is there a way to get a copy of the modified notebook you used that has the drag and drop functions?
@martin-thissen Рік тому
Yes, you can find it here: github.com/thisserand/tortoise-tts-fast/blob/main/scripts/app.py
@brittanywinston9227 Рік тому
@@martin-thissen what do you do with this file ?
@rickyroffey Рік тому
The generated link does not include the 'Create New Voice'... it comes up with 'Select GPT Checkpoint' and 'Select Diffusion Checkpoint' instead.. what am I doing wrong? 🙏
@martin-thissen Рік тому
Can you try it again? I linked the wrong repository in the Colab notebook, sorry.
@jtmcdole Рік тому
What happens if you don't have the "Create New Voice" box? I just have "Select GPT Checkpoint"
@martin-thissen Рік тому
Can you try it again? I linked the wrong repository in the Colab notebook, sorry.
@battlestarmercury Рік тому
biggest issue for me so far is gcc and sreamlit. I see from my rhell virt host that gcc 8.x is pretty close to current. streamlit caused major dependency issues when i tried it as the last pip could not handle tese. I will start over using steamlit install first and s
ee if that helps or hurts.
@FleischYT Рік тому ⁺¹
This works on Windows? a 1-click installer would be awesome
@bobbobby7654 Рік тому
grow up you cry baby
@mrma2776 11 місяців тому
Hi there. I followed the instructions on the site (except for the sudo and ngrok parts), but i cant seem to open the webUI. it keeps telling me this:
AttributeError: module 'torch.nn.utils.parametrizations' has no attribute 'weight_norm'
is this normal? or have I gone wrong somewhere? I also tried a run where I installed ngrok, but the same thing happened. if not this, then something about missing an "open" attribute or lacking the "tortoise.api" module.
what can I do to remedy this?
@downloadpcgamesdirectlinkb7590 11 місяців тому
i think this is compatibility issue
@amedyasar9468 Рік тому
how to train a new model? have any idea?
@LyllianaofMirrah Рік тому ⁺¹
kek you know i gave up i think this shit only runs on linux...imma just try the original one.
@drgnmsr Рік тому
If I am generating long text with this, is there a way to keep it from timing out after a couple paragraphs?
@Korodarn Рік тому
I installed but I don't see the webui having this option to upload the files for training a new voice.
@Korodarn Рік тому
Ok I realized I had checked out the wrong git, but now my problem is that it expects a config value to be set for the extra voices dir that no longer exists as a configuration item on the page, and I'm getting errors when I attempt to merge the portion of the code in that seems to allow setting that.
I'm not that familiar with streamlit, and have used python only lightly so not sure how to resolve.
This is what I copied in before I commented it out, but even after commenting it out hte page is still broken...
#extra_voices_dir = st.text_input(
# "Extra Voices Directory",
# help="Where to find extra voices for zero-shot VC",
# value=conf.EXTRA_VOICES_DIR,
#)
Original error with a clean repo on first load of page, when I attempt to save the config.
Traceback (most recent call last):
File "C:\Users\xanas\Documents\Apps\miniconda3\envs\tts-fast\lib\site-packages\streamlit
untime\scriptrunner\script_runner.py", line 565, in _run_script
exec(code, module.__dict__)
File "C:\Users\xanas\Documents\Apps\tortoise-tts-fast\scripts\app.py", line 306, in
main()
File "C:\Users\xanas\Documents\Apps\tortoise-tts-fast\scripts\app.py", line 200, in main
EXTRA_VOICES_DIR=extra_voices_dir,
NameError: name 'extra_voices_dir' is not defined
@EyeWitnessUganda Рік тому
how do you get the localtunnel version for windows users when using this setup for tortoise?
@charlesmarks1394 Рік тому
Hi excellent tutorial. Does anyone know of a repository of pretrained voices?
@1dgram Рік тому
David Attenborough
@pajarobobo4467 Рік тому
Does it work for any language?
@forgamos2000 Рік тому
Does it support multiple languages?
@shailendrarathore445 Рік тому
Does it support hindi language also if not do make it support to speak im hindi language also..😇😇🥰🥰
@ee242ee Рік тому
can you make a tutorial about soft vc vits?
@Moyano__ Рік тому
Can it work for Spanish and other languages?
@skr_8489 Рік тому ⁺²
Foreign nonenglish languages?
@zerogamerportable Рік тому ⁺¹
x2
@ymroddi Рік тому
Don't think this is really possible with this. You would have to clone a base model in the target language. According to the author, for the English model:
These models were trained on my "homelab" server with 8 RTX 3090s over the course of several months. They were trained on a dataset consisting of ~50k hours of speech data, most of which was transcribed by ocotillo. I currently do not have plans to release the training configurations or methodology.
@BatoolKassem-i7d 2 місяці тому
could you please provide the password for the colab

Наступне

Автоматичне відтворення

Design Accelerator: Ports and Adapters Architecture Part 3