Fine-tune Text-to-Speech Models for any Language: Introduction to TTS

Emirhan Bilgiç

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 28 лис 2024

КОМЕНТАРІ • 68

@abdelrahmanmohsen6393 2 місяці тому ⁺²
I tried that on Arabic dataset, didn't work. Tried to increase the steps to 5000. Still didn't work, any advice?
@emirhanbilgicai 2 місяці тому ⁺²
Hello Abdelrahman. Can you share the code and the dataset with me? I guess the alphabet must be the problem here. We need to define a function to convert it into English alphabet
@emirhanbilgicai 2 місяці тому ⁺²
we solved the problem with Abdelrahman, indeed, if you're working with a language that has a different alphabet than English, you should conver it to English.
Example:
convert
السلام عليكم
to
alsalam alekum.
@abdelrahmanmohsen6393 2 місяці тому ⁺¹
Yes the problem is that the model's tokenizer can't understand anything rather the English letter. Thus; the trained data should be "Audio+ converted text to English" not the translated though. And it should be working fine right after that.
@funny_animals_world28 Місяць тому
@@abdelrahmanmohsen6393 did you find a solution bro ?
@shahad54-g3p Місяць тому
I am working on same problem , I already did the converting from arabic alphabet to english but the output is not clear at all is there any suggestions to solve this issue?
@Hatice-g4k 2 місяці тому ⁺⁴
Congratulations, your work will be a light for those who aim to make progress in this direction. I wish you continued success. May your path be clear Emirhan
@emirhanbilgicai 2 місяці тому
Thank you so much!
@huseyinboragurer6479 2 місяці тому ⁺³
Thanks to UA-cam I've seen this video, hope Mr. Bilgiç will bless us with new videos.
@emirhanbilgicai 2 місяці тому
Thank you Hüseyin 😄
@carlossunga2271 2 місяці тому ⁺²
This video is helpful for people who want to understand text-to-speech (TTS) and how to make it better. Emirhan,who made the video knows a lot about it, and the part where he show how to write computer code is useful, even I don't know much about coding.
@emirhanbilgicai 2 місяці тому ⁺¹
Thank you for the support Carlos 🙂
@muhammadhaiderbukhari8979 17 днів тому ⁺¹
great sir really loved it. hopping for some tutorials in future too.
@emirhanbilgicai 13 днів тому
thank you so much!
@ufukekingen8395 2 місяці тому ⁺¹
Congratulations emirhan I wish you continued success 😊
@ysancaktutan Місяць тому ⁺¹
Solid explanations, learned a lot! Thanks!
@emirhanbilgicai Місяць тому
Glad you liked it:) 🥰
@zaursamedov8906 2 місяці тому ⁺¹
Looking solid! Congrats Emirhan.
@emirhanbilgicai 2 місяці тому
Thank you so much Zaur!
@okanaslan766 2 місяці тому ⁺²
tebrik ederim dostum, çok temiz ve açıklayıcı bir video olmuş 💯
@emirhanbilgic2475 2 місяці тому
Desteğin için teşekkürler :) Daha ayrıntılı bir şeyler de çekebilirim ilgi olursa.
@emirhanbilgicai 2 місяці тому
Desteğin için teşekkür ederim :) Daha ayrıntılı bir şeyler çekebilirim ilgi olursa.
@nezirbilgic9652 2 місяці тому ⁺²
Tebrikler oğlum çok güzel olmuş ❤
@emirhanbilgicai 2 місяці тому
Teşekkür ederim babacığım ☺❤
@naszoom 2 місяці тому ⁺²
Congrats brother 👏
@emirhanbilgicai 2 місяці тому
Thank you so much 😊
@kralx-t3g 2 місяці тому ⁺⁵
The great turkish robot from mardin teaches us how to fine-tune itself. Ai is really something else.
@emirhanbilgicai 2 місяці тому ⁺³
trrrrum,
trrrrum,
trrrrum!
trak tiki tak!
@thecloudrazor 2 місяці тому ⁺¹
Emeğine sağlık süper olmuş. Başarılar dilerim 🤝
@emirhanbilgicai 2 місяці тому
çok teşekkürler:)
@yunuskaan0 2 місяці тому ⁺¹
Thank you for the great explanation!❤️💯
@emirhanbilgicai 2 місяці тому ⁺¹
Thank you Yunus :)
@figen1152 2 місяці тому ⁺¹
Hayırlı olsun başarılarınin devamını dilerim
@__________________________6910 Місяць тому ⁺¹
Great, make more videos on TTS, voice cloning, multilingual TTS
@emirhanbilgicai Місяць тому ⁺¹
Thank you! will try :)
@AynurGerekan 2 місяці тому ⁺¹
Başarılarının devamını dilerim sonsuz başarılar
@emirhanbilgicai 2 місяці тому
Çok teşekkürler :)
@AT-ww2hi 2 місяці тому ⁺³
best indian youtuber so far ✋🏻 no cap 🧢
@emirhanbilgicai 2 місяці тому ⁺³
Thank you but I am not Indian 😄
@rumeysa538 2 місяці тому
Alanım değil twitterda görüp bakayım dedim bu yoruma koptum 😂@@emirhanbilgicai
@nezmete1760 2 місяці тому ⁺²
Hayırlı olsun.Tebrikler
@emirhanbilgicai 2 місяці тому
Teşekkürler :)
@emirhanbilgicai 2 місяці тому ⁺⁵
For contact and everything: emirhanbilgic.github.io
@ahmeterdonmez9195 Місяць тому ⁺²
Very successful. I subscribed to your channel. You deserve more followers. But for this, I think you need to produce a little more content.💪 Tebrikler....
@emirhanbilgicai Місяць тому
Çok teşekkür ederim:)
@adityapatil6723 12 днів тому ⁺¹
hey i have downloaded " microsoft/speecht5_tts" model now i want to fine tune it is this process still aplicable
@emirhanbilgicai 12 днів тому ⁺¹
yes :)
@AkhandPratapSingh722 Місяць тому ⁺¹
Hiii Emirhan, I am one of your new viewers. I am recently learning machine learning and now i have to fine tune a tts model for interviews based technical words like OAuth, API etc. Can you help me with it or can we connect personally because that project is really important for me
@emirhanbilgicai Місяць тому
Hey! I can give you some tips if you share the details
@mertavci3093 17 днів тому ⁺¹
paylaştığınız bilgiler çok güzel ve çok değerli fakat benim size bir sorum olcaktı modeli kendim eğittiğimde robotik ses elde ettim diğer modellerdede aynı sorunu yaşadığım için bu modelide denemek istedim ve aynı sorunu yaşadım bu problem için tavsiyeniz var mı buna gerçekten çok ihtiyacım var teşekkür ederim
@emirhanbilgicai 13 днів тому ⁺¹
Merhabalar çok teşekkür ederim. Kullandığımız dataseti robotik olduğu için biraz datasetine bağımlıyız maalesef. İki yolu var aslında, ikisi kombine edilirse daha da iyi olur:
1- doğal bir dataseti oluşturarak kullanmak
2- speechT5 yerine daha yeni bir model kullanmayı denemek, StyleTTS olabilir örneğin
@mertavci3093 11 днів тому ⁺²
@@emirhanbilgicai kendi veri setimi oluşturduğumda da aynı sorunu yaşıyorum sesin parametreleriyle alakası olabilir diye düşünmüştüm çünkü verisetim kaliteli seslerden oluşuyor yeni denediğim modeli size gösterip fikrinizi almak isterim.size ulaşabilmemin bir imkanı varmıdır? Bu soru için aylardır uğraşıyorum
@emirhanbilgicai 10 днів тому
@@mertavci3093 Peki verisetinizde tek bir tip ses mi var? Tek bir konuşan kişi olması çok daha iyi olur. Ayrıca veriseti yeterince büyük mü? Verisetinizi paylaşırsanız inceleyeyim.
@mertavci3093 9 днів тому
@@emirhanbilgicai evet tek tip sesten oluşturdum veri seti uzunluğumda 10 saat eğer bana bir mail adresi verebilirseniz verisetimi sizinle paylaşabilirim
@letsdigin9647 Місяць тому ⁺¹
hi, is it possible to train the model in english with only certain words that its currently pronouncing inccorectly ?
@emirhanbilgicai Місяць тому ⁺¹
Hello, if you mean the abbreviations, or something else, you can define a custom function to handle that case like this:
def preprocess(text):
text = number_normalizer(text).strip()
text = text.replace("-", " ")
if text[-1] not in punctuation:
text = f"{text}."
abbreviations_pattern = r'\b[A-Z][A-Z\.]+\b'
def separate_abb(chunk):
chunk = chunk.replace(".","")
print(chunk)
return " ".join(chunk)
abbreviations = re.findall(abbreviations_pattern, text)
for abv in abbreviations:
if abv in text:
text = text.replace(abv, separate_abb(abv))
return text
I took it from: huggingface.co/spaces/parler-tts/parler_tts/blob/main/app.py
Even if you don't do it with an additional function, you can do it by providing enough samples (more than a thousand) to the model.
@og_23yg54 Місяць тому ⁺¹
Hello, my model is generating speech, but it's only producing about two words and cutting off after approximately 0.1 seconds. Do you have any advice or help? Is there a Discord where I can reach you?
@emirhanbilgicai Місяць тому ⁺²
Hello, this could be due to three reasons:
Your individual data samples are small, such as having only two words per sample, making it difficult for the model to learn from longer sequences.
Your dataset is small, for example, only containing 300 sentences. I recommend increasing the size of your dataset.
The model hasn't been trained enough, or you may need to experiment with different hyperparameters.
@og_23yg54 Місяць тому
@@emirhanbilgicai my Audi is like 2-10s long
@og_23yg54 Місяць тому ⁺¹
@@emirhanbilgicai that was true if I fine tune with 20mins each audio will it produce 10 -20 mins long audio ?
@emirhanbilgicai Місяць тому
@@og_23yg54 yes, but it would take ages to train a model with 20min-long samples (with enough number of samples)
@Axecubic 2 місяці тому ⁺³
🧑‍💻💯
@emirhanbilgicai 2 місяці тому ⁺¹
Thank you!
@nysman 2 місяці тому ⁺¹
Bari Türkçe altyazı koy jshs
@emirhanbilgicai 2 місяці тому
Çok uzun zaman gerektiriyor :((
@nicoc6387 2 місяці тому ⁺³
That AI version of Harry Potter is pretty convincing.
@emirhanbilgicai 2 місяці тому ⁺²
Thank you 😅
@__________________________6910 Місяць тому ⁺¹
🤣

Наступне

Автоматичне відтворення