Setting up Openvoice version 2 and MeloTTS for AI voice cloning
Вставка
- Опубліковано 27 кві 2024
- NOTE: This video is part of the Text-to-speech Comparison Series
I'll be setting up Openvoice version 2 and MeloTTS by MyShell AI. We'll follow their documentation closely and setup MeloTTS to work independently as a text-to-speech engine and as the Base speaker TTS for Openvoice voice cloning engine, so you can easily integrate it into your AI application.
🔗 LINKS
Code Repo: github.com/brainiakk/youtube/...
MyShell.ai HF page: huggingface.co/myshell-ai
Openvoice V2 Huggingface download page: huggingface.co/myshell-ai/Ope...
MeloTTS English V2 Checkpoints download page: huggingface.co/myshell-ai/Mel...
🔗 MY LINKS
Twitter: x.com/alhajibrain
Instagram: / _alhajibrain
Github: github.com/brainiakk
#ai #aivoice #aivoices #texttospeech #tts #openvoice #melotts
Great. Thank you.
You are welcome
great bro!
Great job. Can you show how a PDF or TXT file can be uploaded and used instead of cutting and pasting or typing text? Most videos show short phrases but if you want a paper or a document made text to speech, how would you go about doing this? Thanks again.
Interesting 🤔 might do a video on it, but it’s as simple as adding a function that parses the text from the pdf or txt and unto the text to speech function. It can be passed in chunks, maybe paragraph by paragraph. You could also accept an input of the document file path once you run the python script or make it more lively by using the tts function to say “please provide the file you want me to read” and a file dialog opens your file explorer. It all depends on what you want but it’s possible
Does it work good on other languages audio because i have tried on bark and tacotron 2 but did not get good results for hindi language, thanks for video keep giving good content 😊
I think it’s mostly English, Japanese, Chinese, French, Spanish, Korean language that’s supported, but they it also has Indian accent
A bit confusing. What's the relation between MeloTTS and OpenVoice V2?
Melo tts can act as a stand-alone Text to speech engine or as the Base speaker for Openvoice v2. Openvoice is both a tts and a voice cloning engine. The Openvoice v1 can do without Melo tts as the base speaker
@@techgiantt Thanks for your reply. I'm able to play English voice without any issue. But when I play Chinese, I got the following error message: RuntimeError: Placeholder storage has not been allocated on MPS device! Any suggestion? Thanks.
Hello, first thank you for the tutorial. Currently there are not many out there 🙂
But when running, I'll get this error:
Loaded checkpoint 'modules/openvoice/checkpoints_v2/converter/checkpoint.pth'
missing/unexpected keys: [] []
Any idea what might be wrong? Thank you!
Did you set the directories up exactly as I did? Also, make sure you copied the downloaded checkpoints_v2 folder to the openvoice directory properly and if you did all that, you could go to their huggingface page and redownload the Openvoice V2 converter/checkpoint.pth to replace the old one. I'll add their huggingface link in the description.
Hold on, do you mean: [ ] ? Because what I'm seeing in your comment is this: [] []
@@techgiantt Yes, exactly! Maybe a copy and paste issue.
@@eucharisticadoration Then, that's not an issue. I don't know why they didn't hide that output on this version. Just let it run, if there was a missing key it would be written the square brackets like a list, but since its empty that means everything is okay.
@@techgiantt Ok, thank you very much! Finally I've realized that I had to change some more things (paths) in voice.py and now it is running 🙂
What operating system are you running? Will this work for Windows?
I'm using OSX (Apple Macbook), I think it should work fine on windows
@@techgiantt Appreciate the response! I'm going to give it a shot. Subbed!
@@justindaniels5923 Thanks for the sub
Is there a way to make these TTS more expressive.
Yes, but you need a beefy gpu to use it with an ai model since you won’t want extra latency, but I’ll create a video for that.
@@techgiantt I think it would be amazing if they could act, expressing emotions anger, sadness, sorrow, compassion, confidence, hesitation, shyness, embarrassment, bravado, whisper, fear, shout, laugh, etc. moods and personality expressed via voice.