Hello Abdelrahman. Can you share the code and the dataset with me? I guess the alphabet must be the problem here. We need to define a function to convert it into English alphabet
we solved the problem with Abdelrahman, indeed, if you're working with a language that has a different alphabet than English, you should conver it to English. Example: convert السلام عليكم to alsalam alekum.
Yes the problem is that the model's tokenizer can't understand anything rather the English letter. Thus; the trained data should be "Audio+ converted text to English" not the translated though. And it should be working fine right after that.
I am working on same problem , I already did the converting from arabic alphabet to english but the output is not clear at all is there any suggestions to solve this issue?
Congratulations, your work will be a light for those who aim to make progress in this direction. I wish you continued success. May your path be clear Emirhan
This video is helpful for people who want to understand text-to-speech (TTS) and how to make it better. Emirhan,who made the video knows a lot about it, and the part where he show how to write computer code is useful, even I don't know much about coding.
Very successful. I subscribed to your channel. You deserve more followers. But for this, I think you need to produce a little more content.💪 Tebrikler....
Hiii Emirhan, I am one of your new viewers. I am recently learning machine learning and now i have to fine tune a tts model for interviews based technical words like OAuth, API etc. Can you help me with it or can we connect personally because that project is really important for me
paylaştığınız bilgiler çok güzel ve çok değerli fakat benim size bir sorum olcaktı modeli kendim eğittiğimde robotik ses elde ettim diğer modellerdede aynı sorunu yaşadığım için bu modelide denemek istedim ve aynı sorunu yaşadım bu problem için tavsiyeniz var mı buna gerçekten çok ihtiyacım var teşekkür ederim
Merhabalar çok teşekkür ederim. Kullandığımız dataseti robotik olduğu için biraz datasetine bağımlıyız maalesef. İki yolu var aslında, ikisi kombine edilirse daha da iyi olur: 1- doğal bir dataseti oluşturarak kullanmak 2- speechT5 yerine daha yeni bir model kullanmayı denemek, StyleTTS olabilir örneğin
@@emirhanbilgicai kendi veri setimi oluşturduğumda da aynı sorunu yaşıyorum sesin parametreleriyle alakası olabilir diye düşünmüştüm çünkü verisetim kaliteli seslerden oluşuyor yeni denediğim modeli size gösterip fikrinizi almak isterim.size ulaşabilmemin bir imkanı varmıdır? Bu soru için aylardır uğraşıyorum
@@mertavci3093 Peki verisetinizde tek bir tip ses mi var? Tek bir konuşan kişi olması çok daha iyi olur. Ayrıca veriseti yeterince büyük mü? Verisetinizi paylaşırsanız inceleyeyim.
@@emirhanbilgicai evet tek tip sesten oluşturdum veri seti uzunluğumda 10 saat eğer bana bir mail adresi verebilirseniz verisetimi sizinle paylaşabilirim
Hello, if you mean the abbreviations, or something else, you can define a custom function to handle that case like this: def preprocess(text): text = number_normalizer(text).strip() text = text.replace("-", " ") if text[-1] not in punctuation: text = f"{text}." abbreviations_pattern = r'\b[A-Z][A-Z\.]+\b' def separate_abb(chunk): chunk = chunk.replace(".","") print(chunk) return " ".join(chunk) abbreviations = re.findall(abbreviations_pattern, text) for abv in abbreviations: if abv in text: text = text.replace(abv, separate_abb(abv)) return text I took it from: huggingface.co/spaces/parler-tts/parler_tts/blob/main/app.py Even if you don't do it with an additional function, you can do it by providing enough samples (more than a thousand) to the model.
Hello, my model is generating speech, but it's only producing about two words and cutting off after approximately 0.1 seconds. Do you have any advice or help? Is there a Discord where I can reach you?
Hello, this could be due to three reasons: Your individual data samples are small, such as having only two words per sample, making it difficult for the model to learn from longer sequences. Your dataset is small, for example, only containing 300 sentences. I recommend increasing the size of your dataset. The model hasn't been trained enough, or you may need to experiment with different hyperparameters.
I tried that on Arabic dataset, didn't work. Tried to increase the steps to 5000. Still didn't work, any advice?
Hello Abdelrahman. Can you share the code and the dataset with me? I guess the alphabet must be the problem here. We need to define a function to convert it into English alphabet
we solved the problem with Abdelrahman, indeed, if you're working with a language that has a different alphabet than English, you should conver it to English.
Example:
convert
السلام عليكم
to
alsalam alekum.
Yes the problem is that the model's tokenizer can't understand anything rather the English letter. Thus; the trained data should be "Audio+ converted text to English" not the translated though. And it should be working fine right after that.
@@abdelrahmanmohsen6393 did you find a solution bro ?
I am working on same problem , I already did the converting from arabic alphabet to english but the output is not clear at all is there any suggestions to solve this issue?
Congratulations, your work will be a light for those who aim to make progress in this direction. I wish you continued success. May your path be clear Emirhan
Thank you so much!
Thanks to UA-cam I've seen this video, hope Mr. Bilgiç will bless us with new videos.
Thank you Hüseyin 😄
This video is helpful for people who want to understand text-to-speech (TTS) and how to make it better. Emirhan,who made the video knows a lot about it, and the part where he show how to write computer code is useful, even I don't know much about coding.
Thank you for the support Carlos 🙂
great sir really loved it. hopping for some tutorials in future too.
thank you so much!
Congratulations emirhan I wish you continued success 😊
Solid explanations, learned a lot! Thanks!
Glad you liked it:) 🥰
Looking solid! Congrats Emirhan.
Thank you so much Zaur!
tebrik ederim dostum, çok temiz ve açıklayıcı bir video olmuş 💯
Desteğin için teşekkürler :) Daha ayrıntılı bir şeyler de çekebilirim ilgi olursa.
Desteğin için teşekkür ederim :) Daha ayrıntılı bir şeyler çekebilirim ilgi olursa.
Tebrikler oğlum çok güzel olmuş ❤
Teşekkür ederim babacığım ☺❤
Congrats brother 👏
Thank you so much 😊
The great turkish robot from mardin teaches us how to fine-tune itself. Ai is really something else.
trrrrum,
trrrrum,
trrrrum!
trak tiki tak!
Emeğine sağlık süper olmuş. Başarılar dilerim 🤝
çok teşekkürler:)
Thank you for the great explanation!❤️💯
Thank you Yunus :)
Hayırlı olsun başarılarınin devamını dilerim
Great, make more videos on TTS, voice cloning, multilingual TTS
Thank you! will try :)
Başarılarının devamını dilerim sonsuz başarılar
Çok teşekkürler :)
best indian youtuber so far ✋🏻 no cap 🧢
Thank you but I am not Indian 😄
Alanım değil twitterda görüp bakayım dedim bu yoruma koptum 😂@@emirhanbilgicai
Hayırlı olsun.Tebrikler
Teşekkürler :)
For contact and everything: emirhanbilgic.github.io
Very successful. I subscribed to your channel. You deserve more followers. But for this, I think you need to produce a little more content.💪 Tebrikler....
Çok teşekkür ederim:)
hey i have downloaded " microsoft/speecht5_tts" model now i want to fine tune it is this process still aplicable
yes :)
Hiii Emirhan, I am one of your new viewers. I am recently learning machine learning and now i have to fine tune a tts model for interviews based technical words like OAuth, API etc. Can you help me with it or can we connect personally because that project is really important for me
Hey! I can give you some tips if you share the details
paylaştığınız bilgiler çok güzel ve çok değerli fakat benim size bir sorum olcaktı modeli kendim eğittiğimde robotik ses elde ettim diğer modellerdede aynı sorunu yaşadığım için bu modelide denemek istedim ve aynı sorunu yaşadım bu problem için tavsiyeniz var mı buna gerçekten çok ihtiyacım var teşekkür ederim
Merhabalar çok teşekkür ederim. Kullandığımız dataseti robotik olduğu için biraz datasetine bağımlıyız maalesef. İki yolu var aslında, ikisi kombine edilirse daha da iyi olur:
1- doğal bir dataseti oluşturarak kullanmak
2- speechT5 yerine daha yeni bir model kullanmayı denemek, StyleTTS olabilir örneğin
@@emirhanbilgicai kendi veri setimi oluşturduğumda da aynı sorunu yaşıyorum sesin parametreleriyle alakası olabilir diye düşünmüştüm çünkü verisetim kaliteli seslerden oluşuyor yeni denediğim modeli size gösterip fikrinizi almak isterim.size ulaşabilmemin bir imkanı varmıdır? Bu soru için aylardır uğraşıyorum
@@mertavci3093 Peki verisetinizde tek bir tip ses mi var? Tek bir konuşan kişi olması çok daha iyi olur. Ayrıca veriseti yeterince büyük mü? Verisetinizi paylaşırsanız inceleyeyim.
@@emirhanbilgicai evet tek tip sesten oluşturdum veri seti uzunluğumda 10 saat eğer bana bir mail adresi verebilirseniz verisetimi sizinle paylaşabilirim
hi, is it possible to train the model in english with only certain words that its currently pronouncing inccorectly ?
Hello, if you mean the abbreviations, or something else, you can define a custom function to handle that case like this:
def preprocess(text):
text = number_normalizer(text).strip()
text = text.replace("-", " ")
if text[-1] not in punctuation:
text = f"{text}."
abbreviations_pattern = r'\b[A-Z][A-Z\.]+\b'
def separate_abb(chunk):
chunk = chunk.replace(".","")
print(chunk)
return " ".join(chunk)
abbreviations = re.findall(abbreviations_pattern, text)
for abv in abbreviations:
if abv in text:
text = text.replace(abv, separate_abb(abv))
return text
I took it from: huggingface.co/spaces/parler-tts/parler_tts/blob/main/app.py
Even if you don't do it with an additional function, you can do it by providing enough samples (more than a thousand) to the model.
Hello, my model is generating speech, but it's only producing about two words and cutting off after approximately 0.1 seconds. Do you have any advice or help? Is there a Discord where I can reach you?
Hello, this could be due to three reasons:
Your individual data samples are small, such as having only two words per sample, making it difficult for the model to learn from longer sequences.
Your dataset is small, for example, only containing 300 sentences. I recommend increasing the size of your dataset.
The model hasn't been trained enough, or you may need to experiment with different hyperparameters.
@@emirhanbilgicai my Audi is like 2-10s long
@@emirhanbilgicai that was true if I fine tune with 20mins each audio will it produce 10 -20 mins long audio ?
@@og_23yg54 yes, but it would take ages to train a model with 20min-long samples (with enough number of samples)
🧑💻💯
Thank you!
Bari Türkçe altyazı koy jshs
Çok uzun zaman gerektiriyor :((
That AI version of Harry Potter is pretty convincing.
Thank you 😅
🤣