Something I discovered with the original model - you can do voice prompting! Per author: For example, you can evoke emotion by including things like "I am really sad," before your text. I've built an automated redaction system that you can use to take advantage of this. It works by attempting to redact any text in the prompt surrounded by brackets. For example, the prompt "[I am really sad,] Please feed me." will only speak the words "Please feed me" (with a sad tonality).
Thanks, I got it running on windows. It's still slow compared to coquitts though. It uses 8.5 GB to vram. It took 20 seconds to generate the test phrase for Tortoise original. 35 seconds to generate for average per 4.27s (broken on small files) 32 seconds to generate for average per voice file (broken on small files) This is on a rtx 3060
For anyone wonder what to put in the gpt checkpoint and diffuser drop down: it means in this folder: "C:\Users\[yourname]\.cache\tortoise\models" (which is the default download location for those) The gpt checkpoint is autoregressive.pth, and the decoder is diffusion_decoder.pth (found out by a few hours debugging hands on and with chat-gpt
The voice in the end reminds me of the BBC narrator who narrates the Planet Earth series! :D Very awesome tutorial -- thank you! :) Would appreciate creating a video on bark (JonathanFly/bark) -- it's not for voice cloning, but a promising TTS model.
Thank you for sharing your knowledge this is a great tutorial! For the Google Colab notebook do you know what the endpoint IP to access the streamlit app?
Another great video. It's interesting to see how fast it can be. Unfortunately the clones aren't completely like their counterparts but thats expected with the small amount of compute being used. Even getting a slight glimpse of the character is impressive. have you messed around with other voice cloners such as coqui?
How are people able to make these kind of things but not able to just make a user friendly version that does not require all kinds of steps where things can go wrong. Tried multiple tutorials and get different errors in all of them.
I think it's simply because user interface is simple, in challenging but time consuming to make. The types of people who make these models are more than happy with command lines. In fact the clutter of a UI gets in the way, you need to script these things, leave running over night etc. Its just two different types of people imo. However, you could make a UI! Get chatGPT plus, and spend some time figuring out the best framework. It's an amazing teacher. Just ask it to build things step by step.
It's a research project. The devs work hard on the stuff itself. If we are able to download it, it's because it's open source. But "open source" doesn't mean it's meant for consumers. UI is a fluff. It matters for end-users, not for research.
I just realized--if I understand correctly--that it's not an analogy. You're saying the image being constructed is the mel spectrogram, so it can simply be pushed back into the time domain. That's the weirdest thing I've heard in a while.
Yes, it's exactly that way. The diffusion model generates a mel spectogram, which is then transformed into the time domain using a vocoder such as BigVGAN (github.com/NVIDIA/BigVGAN)
Can you please make a video on how to set up an audio generation, model called audioldm. It’s available at the top and it allows you to generate text to audio audio to audio and make style transfers from audio files. For example, text to audio your input a sentence, and it will generate a similar sound check in the Readme file for how to do it because I am very confused, so can you please make a video for a beginner to how to set it up in Anaconda from what I heard or how to do it, can you explain in detail what you’re doing since my screen is lagging it would be really nice of you if you could make such a video
The model is available at guitar, but could you please make a video how to show how to set the model up can you please do it tomorrow or in a couple of days or in a week?
Can you do a Tortoise-TTS-Fast for local computer in a form of a website that is not using UI itself and in a form of the program that it can installed in your computer itself, Martin?
You talked about foreign languages and it would be cool to get your feedback with German language for instance. How do you tune it to make it work with German ? is it a try-and-test loop process ?
Unless im missing something, this doesn't work on windows - the entire guided doesn't seem to mention it, but it only works for linux... Which is frustrating
I had tortise-tts working with conda install on Windows 11. I need a variation on how to get gcc versioin working correctly on that platform, Windows box happens to have the right gpu. A proceduer for Mac OS would also b e helpful because I do my actual editing on that platform. Great most awesome enhancement.
amazing work martin thank you! little question , when i run for the first time it asks to select a gpt checkpoint and a diffusion checkpoint , what should i input there?
Hello, in case you still have trouble with this, it means in this folder: "C:\Users\[yourname]\.cache\tortoise\models" (which is the default download location for those) The gpt checkpoint is autoregressive.pth, and the decoder is diffusion_decoder.pth
Really great tutorial..sadly ..i have a 3090 with a 10900k and 64 gig of ram and with the settings you have it still takes 42 seconds to produce 7 words of txt....and that's with low ram unchecked....Got it working by reducing the number of voice samples ...i thought the more samples the better but it slows it down badly...seems good with 20 10 second clips
Thanks a lot for another great video! :D Could you maybe make another one showing us how to install it locally on Windows? I've tried several times and with several repositories, but with no luck until now ^_^U
Brilliant idea. I would be greatly indebted for a branch of the install instructions that omits these three steps. Also as my WIndows box has the right GPU there may be other issues. Alternate, I have a red hat kvm virtual server with plenty of disk annd ramm but not the gpu. If you reveal what version of ubuntu you are using I can more precisely use your install method.
almost done in windows but at last in streamlit problems. RuntimeError: Error(s) in loading state_dict for UnifiedVoice: Unexpected key(s) in state_dict:
Hello Martin. 🚨🚨🚨🚨🚨🚨🚨 When I try to run the Google Colab Notebook, and after visiting the web interface, it says "To access the website, please confirm the tunnel creator's public IP below" What should I do next?? And also thank for the great Tutorial... SUBBED..... 👍👍👍
very interesting stuff! I just installed the repo from mrq, I wonder if I should instead install this version or is it already implemented, seeing the date of newest changes is the same on both repos. Im not really smart so I got noob questions like that... I just try to clone a voice and it is too slow, have already had to disable bitsbytes becasue errors ... :)
@Martin Thissen I installed this locally on a Windows 11 pc, using Firefox and Opera browsers. But the WebUI does not have the section Create New Voice. I have select GPT Checkpoint, and select Diffusion Checkpoint, above the Text section. So not sure where I went wrong with the install.
Super Video Martin! Kannst du vielleicht mal erklären was VQVAE ist und wie man es zum Training für Deutsche Stimmen verwendet bzw. generell mal ein Video über Finetuning und so machst. 🤔 Würde mich sehr interessieren 😁 vieleicht gibts dan mal wieder ein Bier von mir 🍻 Gruss
I love your videos! Did you consider using runpod for cloud gpu's? You can keep your volume, when shutting down the instance. If you create venv on your volume you only have to install requirements once. No association with runpod just my favorite platform.
Thanks for the video, brother. I have a question regarding to the library download. I have a mac and do i really need to download the same files from the terminal every time i generate voice? And will the previous files that were downloaded from the terminal will be removed after downloading the next time?
I just spent so long cleaning up the code and requirements.txt stuff that both OG Tortoise and Fast are about the same as far as time goes for me. The original Tortoise works, so I'll just go slow and steady. If anybody has a cleaner repository , lemme know. Thx for the tut tho! Using Conda on Win10 ended up being a bit of a nightmare (even with ChatGPT). Lots of torch/numpy/voicefixer dependency loops... just a heads up to the non linux turtleheads out there.
Hello, Thanks a lot for the wonderful content. I used Colab but at the end when I run the last command and click on the link provided, it asks for an endpoit IP and when i provide it it ruturns 504 gateway timeout. Please How do i fix this
Your videos are great. I hope you are able to get an RTX 4090 to run this stuff locally some day. (me too, some day lol) I am also very interested in Stable Diffusion extensions and locally run chatgpt style LLMs. Do you share those interests as well?
Is there a video on how I set up a cloud GPU in order to run tortoise TTS? (regardless of whether using the GUI or the standard python code from tortoise-tts rep)
Hi Martin! Great Video!! The localtunnel is asking for a creator's public IP and not allowing me to interact with the UI. Can you share that Endpoint IP?
Hey, so I've been trying a lot to get this right. I get until the end but it ends up saying something about not being able to find the "UnifiedVoice" model or something like that. Can you help me?
How easy/hard would this be to wrap into an iOS app ipa ;)😅say a narration app with sleep timer that can read docs or pasted text with little limits. Maybe output to mp3
Won't be easy, you'd have to run this model on some server and add additional code ( some python library) to read the docs , then integrate that into an iOS app. For such I'll suggest doing it with Swift than using flutter.
Attenborough doesn't know who you are. Attenborough doesn't know what you want. If you are looking for ransom I can tell you Attenborough doesn't have money, but what Attenborough does have are a very particular set of skills. Skills Attenborough has acquired over a very long career. Skills that make him a nightmare for people like you. If you let his daughter go now that'll be the end of it. Attenborough will not look for you, Attenborough will not pursue you, but if you don't, Attenborough will look for you, Attenborough will find you and Attenborough will narrate a nature film for you.
The generated link does not include the 'Create New Voice'... it comes up with 'Select GPT Checkpoint' and 'Select Diffusion Checkpoint' instead.. what am I doing wrong? 🙏
biggest issue for me so far is gcc and sreamlit. I see from my rhell virt host that gcc 8.x is pretty close to current. streamlit caused major dependency issues when i tried it as the last pip could not handle tese. I will start over using steamlit install first and s ee if that helps or hurts.
Hi there. I followed the instructions on the site (except for the sudo and ngrok parts), but i cant seem to open the webUI. it keeps telling me this: AttributeError: module 'torch.nn.utils.parametrizations' has no attribute 'weight_norm' is this normal? or have I gone wrong somewhere? I also tried a run where I installed ngrok, but the same thing happened. if not this, then something about missing an "open" attribute or lacking the "tortoise.api" module. what can I do to remedy this?
Ok I realized I had checked out the wrong git, but now my problem is that it expects a config value to be set for the extra voices dir that no longer exists as a configuration item on the page, and I'm getting errors when I attempt to merge the portion of the code in that seems to allow setting that. I'm not that familiar with streamlit, and have used python only lightly so not sure how to resolve. This is what I copied in before I commented it out, but even after commenting it out hte page is still broken... #extra_voices_dir = st.text_input( # "Extra Voices Directory", # help="Where to find extra voices for zero-shot VC", # value=conf.EXTRA_VOICES_DIR, #) Original error with a clean repo on first load of page, when I attempt to save the config. Traceback (most recent call last): File "C:\Users\xanas\Documents\Apps\miniconda3\envs\tts-fast\lib\site-packages\streamlit untime\scriptrunner\script_runner.py", line 565, in _run_script exec(code, module.__dict__) File "C:\Users\xanas\Documents\Apps\tortoise-tts-fast\scripts\app.py", line 306, in main() File "C:\Users\xanas\Documents\Apps\tortoise-tts-fast\scripts\app.py", line 200, in main EXTRA_VOICES_DIR=extra_voices_dir, NameError: name 'extra_voices_dir' is not defined
Don't think this is really possible with this. You would have to clone a base model in the target language. According to the author, for the English model: These models were trained on my "homelab" server with 8 RTX 3090s over the course of several months. They were trained on a dataset consisting of ~50k hours of speech data, most of which was transcribed by ocotillo. I currently do not have plans to release the training configurations or methodology.
Something I discovered with the original model - you can do voice prompting! Per author:
For example, you can evoke emotion by including things like "I am really sad," before your text. I've built an automated redaction system that you can use to take advantage of this. It works by attempting to redact any text in the prompt surrounded by brackets. For example, the prompt "[I am really sad,] Please feed me." will only speak the words "Please feed me" (with a sad tonality).
@@Nathansthing the 'for example...' paragraph I quote is direct from the repo.
@@Nathansthing the original tortoise repo.
or you can simply use {sad} before the text
Thanks, I got it running on windows. It's still slow compared to coquitts though. It uses 8.5 GB to vram.
It took 20 seconds to generate the test phrase for Tortoise original.
35 seconds to generate for average per 4.27s (broken on small files)
32 seconds to generate for average per voice file (broken on small files)
This is on a rtx 3060
Ah gee whiz would 8gb not be enough to run it then? Would i have to use a remote gpu
Hi! how did you get it running on windows? can you make a guide about it? please !
For anyone wonder what to put in the gpt checkpoint and diffuser drop down: it means in this folder: "C:\Users\[yourname]\.cache\tortoise\models" (which is the default download location for those) The gpt checkpoint is autoregressive.pth, and the decoder is diffusion_decoder.pth (found out by a few hours debugging hands on and with chat-gpt
can you please tell in more details how to solve.
Great work on the Web UI! Makes creating new voices a breeze.
How much quicker coz im currently waiting ages
How much quicker coz im currently waiting ages
Great video love me some attenborough
The voice in the end reminds me of the BBC narrator who narrates the Planet Earth series! :D
Very awesome tutorial -- thank you! :)
Would appreciate creating a video on bark (JonathanFly/bark) -- it's not for voice cloning, but a promising TTS model.
Sir David Attenborough :)
Thank you for sharing your knowledge this is a great tutorial! For the Google Colab notebook do you know what the endpoint IP to access the streamlit app?
did you get that?
Thank you very much both for the video and for the links!
Original Tortoise results sound much better though. Maybe gotta look into the settings.
I am 2 minutes into this video and am already your fan
Love to hear that! Hopefully you liked the rest of it as well :-)
Another great video. It's interesting to see how fast it can be. Unfortunately the clones aren't completely like their counterparts but thats expected with the small amount of compute being used. Even getting a slight glimpse of the character is impressive. have you messed around with other voice cloners such as coqui?
Actually only used Coqui for speech synthesis without voice cloning. But looking into bark (just released a few days ago) now.
So it's faster but lower quality?? That doesn't sound like much of an upgrade. Web interface sounds great though
I tried Coqui and had a problem. Three months and I'm still waiting for the devs to answer a question. I would not give them any money.
Excellent presentation!
Been using Tortoise-TTS for awhile now and curious if you see a difference in audio quality between the original TTS and TTS-Fast
you colab link say "To access the website, please confirm the tunnel creator's public IP below." what to do
How are people able to make these kind of things but not able to just make a user friendly version that does not require all kinds of steps where things can go wrong. Tried multiple tutorials and get different errors in all of them.
I think it's simply because user interface is simple, in challenging but time consuming to make.
The types of people who make these models are more than happy with command lines. In fact the clutter of a UI gets in the way, you need to script these things, leave running over night etc.
Its just two different types of people imo.
However, you could make a UI! Get chatGPT plus, and spend some time figuring out the best framework. It's an amazing teacher. Just ask it to build things step by step.
@@lovol2 the UI is great, how can you get the block where you can upload your own files
It's a research project. The devs work hard on the stuff itself. If we are able to download it, it's because it's open source. But "open source" doesn't mean it's meant for consumers. UI is a fluff. It matters for end-users, not for research.
at 18:30 I said "David Attenbobough". Good tutorial, Thank You
I just realized--if I understand correctly--that it's not an analogy. You're saying the image being constructed is the mel spectrogram, so it can simply be pushed back into the time domain. That's the weirdest thing I've heard in a while.
Yes, it's exactly that way. The diffusion model generates a mel spectogram, which is then transformed into the time domain using a vocoder such as BigVGAN (github.com/NVIDIA/BigVGAN)
Can you please make a video on how to set up an audio generation, model called audioldm. It’s available at the top and it allows you to generate text to audio audio to audio and make style transfers from audio files. For example, text to audio your input a sentence, and it will generate a similar sound check in the Readme file for how to do it because I am very confused, so can you please make a video for a beginner to how to set it up in Anaconda from what I heard or how to do it, can you explain in detail what you’re doing since my screen is lagging it would be really nice of you if you could make such a video
The model is available at guitar, but could you please make a video how to show how to set the model up can you please do it tomorrow or in a couple of days or in a week?
Can you do a Tortoise-TTS-Fast for local computer in a form of a website that is not using UI itself and in a form of the program that it can installed in your computer itself, Martin?
Spectacular !!! Good Work!!! 👍👍
Thank you so much! :-)
You talked about foreign languages and it would be cool to get your feedback with German language for instance. How do you tune it to make it work with German ? is it a try-and-test loop process ?
By your can use subtitles German language to help yourself
I want a Tortoise-TTS-Fast with voice cloning that did both singing and speaking to it, OK, Martin?
Great video and tutorial.
Great to see these improvements!
Hello, thanks for this! It's great. Have enjoyed other videos.
A question: how do you connect this to a Cloud GPU?
Insightful video. Really appreciate it. I have a question. How can we preserve the accents in our cloned voice? I can't find any way to do that
Unless im missing something, this doesn't work on windows - the entire guided doesn't seem to mention it, but it only works for linux... Which is frustrating
The title sounds very inviting
I had tortise-tts working with conda install on Windows 11. I need a variation on how to get gcc versioin working correctly on that platform, Windows box happens to have the right gpu. A proceduer for Mac OS would also b e helpful because I do my actual editing on that platform. Great most awesome enhancement.
Thanks for the video! can you please make a similar guide for windows ?
Cool video on this project, can you make one showing how to set up a web gpu in the cloud?
Has anyone figured out how to stop the audio from cutting off early?
amazing work martin thank you! little question , when i run for the first time it asks to select a gpt checkpoint and a diffusion checkpoint , what should i input there?
Hello, in case you still have trouble with this, it means in this folder: "C:\Users\[yourname]\.cache\tortoise\models" (which is the default download location for those) The gpt checkpoint is autoregressive.pth, and the decoder is diffusion_decoder.pth
Really great tutorial..sadly ..i have a 3090 with a 10900k and 64 gig of ram and with the settings you have it still takes 42 seconds to produce 7 words of txt....and that's with low ram unchecked....Got it working by reducing the number of voice samples ...i thought the more samples the better but it slows it down badly...seems good with 20 10 second clips
Holy shit. So this isn't working on potato pc. :(
Martin, please make the Tortoise-TTS-Fast but is in Thai language, okay?
HEY THE COLAB VERSIOPN OF THIS stopped working and throwing a runtime pytorch error. please can you fix it.
Thanks a lot for another great video! :D
Could you maybe make another one showing us how to install it locally on Windows? I've tried several times and with several repositories, but with no luck until now ^_^U
I will try to also add windows commands in future videos/articles. The problem is I can't verify if they work on my computer.
@@martin-thissen Thanks! ^^ I'll try them, I hope I'm more lucky then, Windows is such a hassle sometimes XD
Brilliant idea. I would be greatly indebted for a branch of the install instructions that omits these three steps. Also as my WIndows box has the right GPU there may be other issues. Alternate, I have a red hat kvm virtual server with plenty of disk annd ramm but not the gpu. If you reveal what version of ubuntu you are using I can more precisely use your install method.
Now you need to try the Bark fork that allows voice cloning lol
bark clone is better or tortoise is better?
@@Raiyan-27 bark has some features that tortoise doesn't, but it has a non commercial licence
Would be nice for non-English language.
I think it sounds like the Smoking Man from X-Files, before his smoking days. 🤭
almost done in windows but at last in streamlit problems. RuntimeError: Error(s) in loading state_dict for UnifiedVoice: Unexpected key(s) in state_dict:
Damn, they should have named it Hare-TTS.
Hello Martin. 🚨🚨🚨🚨🚨🚨🚨
When I try to run the Google Colab Notebook, and after visiting the web interface, it says
"To access the website, please confirm the tunnel creator's public IP below"
What should I do next??
And also thank for the great Tutorial...
SUBBED..... 👍👍👍
very interesting stuff! I just installed the repo from mrq, I wonder if I should instead install this version or is it already implemented, seeing the date of newest changes is the same on both repos. Im not really smart so I got noob questions like that... I just try to clone a voice and it is too slow, have already had to disable bitsbytes becasue errors ... :)
Is there a way to make the voices match better? and also is there a way to train voices for better outcomes?
Amazing work! I am using tortoise tts but having issues with intonation, mispronunciation
and screeches, can you suggest some way to improve that?
Does this work with foreign languages or only with english?
@Martin Thissen I installed this locally on a Windows 11 pc, using Firefox and Opera browsers. But the WebUI does not have the section Create New Voice. I have select GPT Checkpoint, and select Diffusion Checkpoint, above the Text section. So not sure where I went wrong with the install.
Super Video Martin! Kannst du vielleicht mal erklären was VQVAE ist und wie man es zum Training für Deutsche Stimmen verwendet bzw. generell mal ein Video über Finetuning und so machst. 🤔 Würde mich sehr interessieren 😁 vieleicht gibts dan mal wieder ein Bier von mir 🍻 Gruss
Habe es mir notiert, danke dir! :-)
I love your videos! Did you consider using runpod for cloud gpu's? You can keep your volume, when shutting down the instance. If you create venv on your volume you only have to install requirements once. No association with runpod just my favorite platform.
That's a great idea! Haven't heard about runpod yet, but was looking for such an option, thanks!
Why does it need to be done in a virtual environment?
Hows it going martin! I just completed your tutorial, and I wanted to ask, is there a way to get the voices sounding less robotic?
How do I get to the web version? It says we need an Endpoint IP: from the person who shared the link
Thanks for the video, brother. I have a question regarding to the library download. I have a mac and do i really need to download the same files from the terminal every time i generate voice? And will the previous files that were downloaded from the terminal will be removed after downloading the next time?
I just spent so long cleaning up the code and requirements.txt stuff that both OG Tortoise and Fast are about the same as far as time goes for me. The original Tortoise works, so I'll just go slow and steady. If anybody has a cleaner repository , lemme know. Thx for the tut tho! Using Conda on Win10 ended up being a bit of a nightmare (even with ChatGPT). Lots of torch/numpy/voicefixer dependency loops... just a heads up to the non linux turtleheads out there.
How can I save the voice model is there any way? So like that i can use that voice for singing.
Can you mix a few different voices and achieve a unique voice, still sounding good?
Would Tortoise-TTS-Fast be able to work with the "Brazilian Portuguese" language?
Você conseguiu?
Hello, Thanks a lot for the wonderful content. I used Colab but at the end when I run the last command and click on the link provided, it asks for an endpoit IP and when i provide it it ruturns 504 gateway timeout. Please How do i fix this
same error!
Your videos are great. I hope you are able to get an RTX 4090 to run this stuff locally some day. (me too, some day lol) I am also very interested in Stable Diffusion extensions and locally run chatgpt style LLMs. Do you share those interests as well?
Absolutely, definitively planning to do more videos about Stable Diffusion and LLMs in the future as well.
Is there a video on how I set up a cloud GPU in order to run tortoise TTS? (regardless of whether using the GUI or the standard python code from tortoise-tts rep)
Hi Martin! Great Video!!
The localtunnel is asking for a creator's public IP and not allowing me to interact with the UI.
Can you share that Endpoint IP?
same
the medium article has different commands than the video
The localtunnel for google collab doesnt work, what alternatives are there?
Is there any way that we can finetune this model? It would be really helpful if we can get a tutorial on how we can fientune it. Please look into this
Is the colab running for anyone else?
It is asking for an IP address. What do I put
Why do you make your medium articles "members' only?
Hey, so I've been trying a lot to get this right. I get until the end but it ends up saying something about not being able to find the "UnifiedVoice" model or something like that. Can you help me?
Did you manage to find a fix for this?
When I try to run the web UI, it says ModuleNotFoundError: No module named 'tortoise.inference' for Tortoise-tts-Fast. Please help
what is tunnel ip dude
At what point do they call it the Hare model?
Fair question haha
I am trying to run the script, but it didn't work, i am using Windows 11.
How easy/hard would this be to wrap into an iOS app ipa ;)😅say a narration app with sleep timer that can read docs or pasted text with little limits. Maybe output to mp3
Won't be easy, you'd have to run this model on some server and add additional code ( some python library) to read the docs , then integrate that into an iOS app. For such I'll suggest doing it with Swift than using flutter.
The outro also sounds a bit like Liam Neeson
Attenborough doesn't know who you are.
Attenborough doesn't know what you want.
If you are looking for ransom I can tell you Attenborough doesn't have money,
but what Attenborough does have are a very particular set of skills.
Skills Attenborough has acquired over a very long career.
Skills that make him a nightmare for people like you.
If you let his daughter go now that'll be the end of it.
Attenborough will not look for you, Attenborough will not pursue you,
but if you don't, Attenborough will look for you, Attenborough will find you and Attenborough will narrate a nature film for you.
@@polyhistorphilomath haha. Brilliant
Can you change the tone of the voice reading text {e.g. excited, sad, etc}?
bruh linux tutorial there's no windows tutorial?
What to put in the endpoint ip in local tunnel? I have a low spec pc. So there is no way but to use the google colab. But stuck in this step.
Oh I see Ubuntu 22.04 I will start with a gui workstatio and see how far I get
It's a cool pet project, but it takes forever to load, crashes frequently, and doesn't sound any better than text-to-speech.
would be great if we were allowed to download the sapi file to be used on pc with other softwares voices and now just a record and paste
is there a way to get a copy of the modified notebook you used that has the drag and drop functions?
Yes, you can find it here: github.com/thisserand/tortoise-tts-fast/blob/main/scripts/app.py
@@martin-thissen what do you do with this file ?
The generated link does not include the 'Create New Voice'... it comes up with 'Select GPT Checkpoint' and 'Select Diffusion Checkpoint' instead.. what am I doing wrong? 🙏
Can you try it again? I linked the wrong repository in the Colab notebook, sorry.
What happens if you don't have the "Create New Voice" box? I just have "Select GPT Checkpoint"
Can you try it again? I linked the wrong repository in the Colab notebook, sorry.
biggest issue for me so far is gcc and sreamlit. I see from my rhell virt host that gcc 8.x is pretty close to current. streamlit caused major dependency issues when i tried it as the last pip could not handle tese. I will start over using steamlit install first and s
ee if that helps or hurts.
This works on Windows? a 1-click installer would be awesome
grow up you cry baby
Hi there. I followed the instructions on the site (except for the sudo and ngrok parts), but i cant seem to open the webUI. it keeps telling me this:
AttributeError: module 'torch.nn.utils.parametrizations' has no attribute 'weight_norm'
is this normal? or have I gone wrong somewhere? I also tried a run where I installed ngrok, but the same thing happened. if not this, then something about missing an "open" attribute or lacking the "tortoise.api" module.
what can I do to remedy this?
i think this is compatibility issue
how to train a new model? have any idea?
kek you know i gave up i think this shit only runs on linux...imma just try the original one.
If I am generating long text with this, is there a way to keep it from timing out after a couple paragraphs?
I installed but I don't see the webui having this option to upload the files for training a new voice.
Ok I realized I had checked out the wrong git, but now my problem is that it expects a config value to be set for the extra voices dir that no longer exists as a configuration item on the page, and I'm getting errors when I attempt to merge the portion of the code in that seems to allow setting that.
I'm not that familiar with streamlit, and have used python only lightly so not sure how to resolve.
This is what I copied in before I commented it out, but even after commenting it out hte page is still broken...
#extra_voices_dir = st.text_input(
# "Extra Voices Directory",
# help="Where to find extra voices for zero-shot VC",
# value=conf.EXTRA_VOICES_DIR,
#)
Original error with a clean repo on first load of page, when I attempt to save the config.
Traceback (most recent call last):
File "C:\Users\xanas\Documents\Apps\miniconda3\envs\tts-fast\lib\site-packages\streamlit
untime\scriptrunner\script_runner.py", line 565, in _run_script
exec(code, module.__dict__)
File "C:\Users\xanas\Documents\Apps\tortoise-tts-fast\scripts\app.py", line 306, in
main()
File "C:\Users\xanas\Documents\Apps\tortoise-tts-fast\scripts\app.py", line 200, in main
EXTRA_VOICES_DIR=extra_voices_dir,
NameError: name 'extra_voices_dir' is not defined
how do you get the localtunnel version for windows users when using this setup for tortoise?
Hi excellent tutorial. Does anyone know of a repository of pretrained voices?
David Attenborough
Does it work for any language?
Does it support multiple languages?
Does it support hindi language also if not do make it support to speak im hindi language also..😇😇🥰🥰
can you make a tutorial about soft vc vits?
Can it work for Spanish and other languages?
Foreign nonenglish languages?
x2
Don't think this is really possible with this. You would have to clone a base model in the target language. According to the author, for the English model:
These models were trained on my "homelab" server with 8 RTX 3090s over the course of several months. They were trained on a dataset consisting of ~50k hours of speech data, most of which was transcribed by ocotillo. I currently do not have plans to release the training configurations or methodology.
could you please provide the password for the colab