Oh! Cloning my voice using several clips, each with different emotions, seems like a really interesting thing! I hope you make a video about that! :D (preferably on Windows)
@@ThorstenMueller That's great to hear! :D I think you could (and probably even should) split the content into two videos, one for the local installation (XTTS working offline), and another one for the experiment with different voices (same voice but with different emotions). Short and concise videos can be easier to follow, and having more videos will help bringing more people to this great channel :)
Hi Thorsten! So if you want to make a video on your (monetized, i.e., commercial) channel involving XTTS (e.g. tutorial or comparison), you can't legally play anything generated from it. Oder?
So the question is, does it make a difference if a video with XTTS audio is monetized or not. That's a good point 👍. As Josh mentioned i'd recommend you to contact Coqui AI with that question.
@@ThorstenMueller exactly also what I thought. It might have been a better licence-fit if also "small commercial use, like videos here on YT could have been included in the free usage.
this ai is awersome and great. However, I have noticed in the various cloning of voices that for male voices with baritone voices, I cannot recreate the same vocal impression on the low frequencies. (I use the italian language prompt) I have tried several samplers and seem to verify that the software works better in general for female voices than for male voices. I wonder if there might be some development or use of a different system from this point of view that would improve this. Is a frequency bandwith range issue? Thanks anyway. I am creating my own private audiobooks with this system and have found a way to read while listening. (I am a subscriber to audiobooks anyway) but I find that creating something from scratch with a favorite book is a must, obviously not for commercial purposes. In any case I will use different sampler of 6 seconds of the same voice to see what suggested in the interview if this can improve the voice to be more realistic especially in the lower frequencies for male.
Normally i'd recommend to ask this specific question (on frequency stuff) on Coqui TTS community. But as the Coqui AI company has sadly shutdown i'm not sure on how fast you will get a response. As i've no experience with this type of adjustment i guess i can't really be helpful on that issue.
Hallo Thorsten, super Content, genau die selben Fragen hätte ich ihn auch gefragt- genial - danke schön! Ich hätte eine Anfängerfrage: Ich habe Coqui versucht lokal mittels Pinokio zu installieren. Es scheint auch zu laufen... sagt Pinokio, ...aber es gibt da keinen Local-host-Link den ich anklicken könnte... ich komme irgendwie nicht zum Program... ist bestimmt peinliche Anfängerfrage... ich suche nach einer Möglichkeit meine Stimme durch Text zu erzeugen (für Erklärvideos)... so war zumindest der Plan. LG Bobby
Vielen Dank für deinen sehr netten Kommentar 😊. Mit "Pinokio" habe ich noch nicht gearbeitet. Nur damit ich deine Frage richtig verstehe - du möchtest deine eigene Stimme mit XTTS clonen, um sie für Erklärvideos zu nehmen oder möchtest du eine existierende (fremde) Coqui TTS Stimme dafür verwenden?
@@ThorstenMueller Hallo, danke für deine Antwort. Ich möchte primär meine eigene Stimme für Text2Speach Aufnahmen möglichst hoher Qualität verwenden können. Ich habe für mich ein System entwickelt bei dem ich bildliche und sprachliche Inhalte quasi zeitgleich konzipiere, in 8 sek - Blöcken, bei denen dann Sprache zu Bild passen muß. Die Audio-Aufnahmen sind recht aufwendig weil sie ja genau passen müssen. Wär toll wenn das in annehmbarer Qualität über Text ginge. Aber ich schrecke davor zurück dies mit Elevenlabs anzugehen weil ich Abo-Abhängigkeiten scheue.
"I am a student (thus relatively poor) and have ADHD; it is extremely difficult for me to read longer texts. Therefore, I have been using text-to-speech (TTS) for a long time to have texts read aloud. However, there is a huge gap in quality between free TTS and cloud-based, naturally sounding TTS, where one often pays per punctuation mark. XTTS sounds so incredibly promising. I would even consider getting a new graphics card for it. I would be very interested in an installation guide."
Thanks for your feedback. My step-by-step tutorial on XTTS is online now 😊: ua-cam.com/video/HJB17HW4M9o/v-deo.html Do you know Piper TTS, which could be an interesting alternative for you too. ua-cam.com/video/GGvdq3giiTQ/v-deo.html
hi Thorsten! i want to know is it possible that I train a xtts model for a language which is not supported by coqui dev team?? how much data do I need to train a new language with xtts?
When i got Josh Meyer (from Coqui) right, XTTS should be able to clone voices in languages it has not been trained on. Have you watched my interview with Josh on XXTS?
Thanks for your nice and helpful feedback 😊. I have limited configuration options on how many ads are shown inside the video, but i'll keep an eye on that.
2nd time comment I like the video very much. What I don't like is the fact that my first comment disappeared Before typing lot of text for nothing, I'll try with this short one.
Hi, thanks for your comment. I see a second and longer XTTS related comment by you in this video: ua-cam.com/video/9e1Wt82GKmU/v-deo.html&lc=UgzvCm2Ch50-OPeExI94AaABAg Maybe you posted the comment on the wrong video 😉?
The idea of getting to know people behind these awesome technologies is exciting so this format is definately a keeper even if only done on an irregular basis. It would also be interesting to hear more about the history and development of text to speech technology. Maybe you could try to get ahold of someone having been involved in development of the SAM voice on the Commodore 64
Thanks for your nice feedback 😊. I already have a special guest in mind for next time. Not sure if i can get in contact with these pioneers of voice tech (like a C64).
yes please make a local voice cloning xtts tutorial!
It's already on my TODO list 😊.
thanks, great Interview, and very good informations on point, as always.
Thank you very much, @MighyReiti for your nice feedback 😊
Great Interview. Good to hear Josh's voice.
Thanks for your nice feedback 😊. Would you like to see interviews like this on my channel?
Yes . Interviews like these are very helpful and we’d sure like to see more
Oh! Cloning my voice using several clips, each with different emotions, seems like a really interesting thing! I hope you make a video about that! :D (preferably on Windows)
This is already work in progress 😉.
@@ThorstenMueller That's great to hear! :D
I think you could (and probably even should) split the content into two videos, one for the local installation (XTTS working offline), and another one for the experiment with different voices (same voice but with different emotions). Short and concise videos can be easier to follow, and having more videos will help bringing more people to this great channel :)
Hi Thorsten! So if you want to make a video on your (monetized, i.e., commercial) channel involving XTTS (e.g. tutorial or comparison), you can't legally play anything generated from it. Oder?
So the question is, does it make a difference if a video with XTTS audio is monetized or not. That's a good point 👍. As Josh mentioned i'd recommend you to contact Coqui AI with that question.
@@ThorstenMueller exactly also what I thought. It might have been a better licence-fit if also "small commercial use, like videos here on YT could have been included in the free usage.
this ai is awersome and great. However, I have noticed in the various cloning of voices that for male voices with baritone voices, I cannot recreate the same vocal impression on the low frequencies. (I use the italian language prompt) I have tried several samplers and seem to verify that the software works better in general for female voices than for male voices. I wonder if there might be some development or use of a different system from this point of view that would improve this. Is a frequency bandwith range issue? Thanks anyway. I am creating my own private audiobooks with this system and have found a way to read while listening. (I am a subscriber to audiobooks anyway) but I find that creating something from scratch with a favorite book is a must, obviously not for commercial purposes. In any case I will use different sampler of 6 seconds of the same voice to see what suggested in the interview if this can improve the voice to be more realistic especially in the lower frequencies for male.
Normally i'd recommend to ask this specific question (on frequency stuff) on Coqui TTS community. But as the Coqui AI company has sadly shutdown i'm not sure on how fast you will get a response. As i've no experience with this type of adjustment i guess i can't really be helpful on that issue.
Hallo Thorsten, super Content, genau die selben Fragen hätte ich ihn auch gefragt- genial - danke schön!
Ich hätte eine Anfängerfrage:
Ich habe Coqui versucht lokal mittels Pinokio zu installieren. Es scheint auch zu laufen... sagt Pinokio, ...aber es gibt da keinen Local-host-Link den ich anklicken könnte... ich komme irgendwie nicht zum Program... ist bestimmt peinliche Anfängerfrage... ich suche nach einer Möglichkeit meine Stimme durch Text zu erzeugen (für Erklärvideos)... so war zumindest der Plan. LG Bobby
Vielen Dank für deinen sehr netten Kommentar 😊. Mit "Pinokio" habe ich noch nicht gearbeitet. Nur damit ich deine Frage richtig verstehe - du möchtest deine eigene Stimme mit XTTS clonen, um sie für Erklärvideos zu nehmen oder möchtest du eine existierende (fremde) Coqui TTS Stimme dafür verwenden?
@@ThorstenMueller Hallo, danke für deine Antwort. Ich möchte primär meine eigene Stimme für Text2Speach Aufnahmen möglichst hoher Qualität verwenden können. Ich habe für mich ein System entwickelt bei dem ich bildliche und sprachliche Inhalte quasi zeitgleich konzipiere, in 8 sek - Blöcken, bei denen dann Sprache zu Bild passen muß. Die Audio-Aufnahmen sind recht aufwendig weil sie ja genau passen müssen. Wär toll wenn das in annehmbarer Qualität über Text ginge. Aber ich schrecke davor zurück dies mit Elevenlabs anzugehen weil ich Abo-Abhängigkeiten scheue.
"I am a student (thus relatively poor) and have ADHD; it is extremely difficult for me to read longer texts. Therefore, I have been using text-to-speech (TTS) for a long time to have texts read aloud. However, there is a huge gap in quality between free TTS and cloud-based, naturally sounding TTS, where one often pays per punctuation mark. XTTS sounds so incredibly promising. I would even consider getting a new graphics card for it. I would be very interested in an installation guide."
Thanks for your feedback.
My step-by-step tutorial on XTTS is online now 😊:
ua-cam.com/video/HJB17HW4M9o/v-deo.html
Do you know Piper TTS, which could be an interesting alternative for you too.
ua-cam.com/video/GGvdq3giiTQ/v-deo.html
hi Thorsten! i want to know is it possible that I train a xtts model for a language which is not supported by coqui dev team?? how much data do I need to train a new language with xtts?
When i got Josh Meyer (from Coqui) right, XTTS should be able to clone voices in languages it has not been trained on. Have you watched my interview with Josh on XXTS?
Argh sorry, obviously you've seen the interview video 🤦.
Thanks Thorsten, good interview.
PS: the ads are way too many pretty much every few mins, would appreacitate a little lesser ads.
Thanks for your nice and helpful feedback 😊. I have limited configuration options on how many ads are shown inside the video, but i'll keep an eye on that.
Please make tutorial on xtts on Windows or a ubuntu server
This is already work in progress 😉.
Vielen, lieben Dank. Ich benutze Ihre Stimme zum Deutschlernen.
Vielen lieben Dank für den Kommentar, das freut mich sehr 😊. Ich wünsche viel Erfolg beim Lernen.
2nd time comment
I like the video very much. What I don't like is the fact that my first comment disappeared
Before typing lot of text for nothing, I'll try with this short one.
Hi, thanks for your comment. I see a second and longer XTTS related comment by you in this video: ua-cam.com/video/9e1Wt82GKmU/v-deo.html&lc=UgzvCm2Ch50-OPeExI94AaABAg
Maybe you posted the comment on the wrong video 😉?
The idea of getting to know people behind these awesome technologies is exciting so this format is definately a keeper even if only done on an irregular basis.
It would also be interesting to hear more about the history and development of text to speech technology. Maybe you could try to get ahold of someone having been involved in development of the SAM voice on the Commodore 64
Thanks for your nice feedback 😊. I already have a special guest in mind for next time.
Not sure if i can get in contact with these pioneers of voice tech (like a C64).
❤
❤️