Local voice cloning with 6 seconds audio | Coqui XTTS on Windows

Поділитися
Вставка
  • Опубліковано 29 вер 2024

КОМЕНТАРІ • 201

  • @toykotokyoto
    @toykotokyoto 9 місяців тому +13

    another great video, Thorsten 👏 We have a happy update... you can now use unlimited audio for the 0-shot clone :D no longer are you limited to just 6 seconds. The HuggingFace space is still hard coded to max out at 30 seconds though... so we don't overload their servers 😆

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому +5

      You're very welcome and thanks for the update 😊.

    • @juanjesusligero391
      @juanjesusligero391 9 місяців тому

      This is great news! :D You probably should make another video comparing the quality differences between the 6 seconds and 30 seconds input audio! (or maybe more, if you can change that max value in the local installation) ^^ @@ThorstenMueller

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому +4

      @@juanjesusligero391 An audio samples comparison video with different audio input length is already in the making 😉.

    • @tsunderes_were_a_mistake
      @tsunderes_were_a_mistake 8 місяців тому

      Does the output sound better with longer audio? I tried the Japanese version on hugging face and output sounded robotic.

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому

      ​@@tsunderes_were_a_mistake In my german model i didn't encounter a change depending on the text length. But i did not exactly check this specific aspect. If you think this would be helpful i can give it a more specific try (with a german model). But i can't say anything about the Japanese model.

  • @davidtindell950
    @davidtindell950 7 днів тому +1

    Thank You Yet Again! P.S. In addition to "Schei? Encoding" ... I am a fan of: "CAUTION I TEST IN PRODUCTION".

  • @MYODM.
    @MYODM. Місяць тому

    Can I hire you for a few hours? I need help with a project that’s deeply personal and I would like to go the local hosting route.

    • @ThorstenMueller
      @ThorstenMueller  Місяць тому

      Feel free to contact me here (with some additional info). www.thorsten-voice.de/en/contact/

  • @stefanporath8392
    @stefanporath8392 7 місяців тому

    Hello Thorsten,
    great video tutorials but xtts is not for me. No support for windows and never will be. No chance on older macs with nvidia cards because of lacking drivers. No support on linux without cuda. I was really looking forward to this but I simply don't have the time to fidel around for days or weeks. Thank you.

  • @Zimba-box
    @Zimba-box 7 місяців тому

    I got this line or error code when I wanted to in the wheel -U: ERROR: Could not build wheels for tts, which is required to install pyproject.toml-based projects how to fix that?

    • @ThorstenMueller
      @ThorstenMueller  7 місяців тому

      Did you update pip to latest version first - "pip install pip setuptools wheel -U"?

  • @alexlavertyau
    @alexlavertyau 8 місяців тому

    I have tried a some voice cloning tools and provided my voice as a reference audio, but none of the results sound anything like me... : ( I have an australian accent but the generated voices come out with American accents, not sure what I'm doing wrong.

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому +1

      I guess you're doing nothing wrong. Maybe the english model has been trained on a voice dataset with hours of native english speaking people and one phrase has not enough "power" to change the accent. Normally i'd recommend asking in Coqui TTS community, but as Coqui is shutting down, it might take some time to get an answer, because of other priorities maybe.

  • @juanjesusligero391
    @juanjesusligero391 9 місяців тому +6

    I was exactly like you, I also had too high expectations for Coqui XTTS, haha ^_^
    While the outcome wasn't quite what I was expecting, the results are still quite impressive, especially considering they are based on just a 6-second sample. I was also really happy to read in the comments that the devs are working on improvements, like allowing for voice samples longer than 6 seconds.
    I loved the video! Thanks a lot for your work, Thorsten! ^^

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому +1

      Thanks a lot for your nice feedback 🥰.

  • @davidtindell950
    @davidtindell950 7 днів тому +1

    Using my local PC GPU: Cloned Voice WORKED WELL ... and ... sounded 'somewhat ' like me BUT actually BETTER than me ( bolder and stronger ) !!!!

    • @elplayeravefenix2280
      @elplayeravefenix2280 19 годин тому +1

      this work for you actually??????

    • @davidtindell950
      @davidtindell950 16 годин тому +1

      @@elplayeravefenix2280 Yes. Not very well but it ‘worked’. On other projects I have found that more voice samples worked better but takes time. Ok.

  • @callmefred
    @callmefred 4 місяці тому +1

    It's sad that they've discontinued the project.

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому +1

      Yes, but they did not just discontinue the project, but Coqui AI (the company) behind XTTS shut down.

  • @terryjones2213
    @terryjones2213 6 місяців тому +1

    What is your python version?

  • @amp3253
    @amp3253 9 місяців тому +1

    Could you help, please?
    tts : The term 'tts' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
    At line:1 char:1
    + tts --list_models
    + ~~~
    + CategoryInfo : ObjectNotFound: (tts:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому +1

      Did you use a python venv? Is this activated when try to run "tts" command? Does "pip list" show you an installed TTS package?

  • @Reincarnated_Recap
    @Reincarnated_Recap 5 місяців тому +1

    omg, the quality is so good compared to all the other voice-cloning TTS

  • @CatonSilver
    @CatonSilver 8 місяців тому +1

    amazing video! I am wondering if it's possible to train a given voice and then just use that voice for future use. In the "clone your voice locally" section, the code requires the reference audio as an input. I'm thinking in terms of efficiency and that if you plan to use the same voice over and over, you shouldn't need to train the model each time.

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому

      Good question. I didn't think about that - up to now.

  • @GESTOR-SITES
    @GESTOR-SITES 4 місяці тому

    How to fix
    "ERROR: Could not build wheels for tts, which is required to install pyproject.toml-based projects"
    chatgpt cannot help me.
    it´s necessary downgrade python?

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому

      Did you update the python dependencies in your environment? So running "pip install setuptools wheel pip -U"

  • @insanitytoons
    @insanitytoons 5 місяців тому

    Cloning a voice with a sample of just 6 seconds even though it's not 100% identical, for me that's an AI that really needs to be improved, these AI that need dozens of hours to clone a voice didn't interest me much, I did it several tests using samples longer than 30, 60, 80 seconds in various languages and some were perfect, I also copied dozens of voices available on websites and the results were also very good, I suggest saving each audio generated in a different file because each The generated audio will never be the same as the previous one.

    • @ThorstenMueller
      @ThorstenMueller  5 місяців тому +1

      Josh Meyer (co-founder of Coqui AI) mentioned in my XTTS interview that 6 seconds audio input duration should be perfect for XTTS model. ua-cam.com/video/XsOM1WZ0k84/v-deo.html

  • @saadjutt1660
    @saadjutt1660 Місяць тому

    Is there any way we can push this trained model to huggingface? Like once we give the audio sample and next time when pushed to huggingface hub we only need to pass the text to generate the audio with respective voice?

    • @ThorstenMueller
      @ThorstenMueller  Місяць тому

      Do you mean the actual model or a space to use the model out of the box?

  • @alexeyshmelev9115
    @alexeyshmelev9115 2 місяці тому

    "all you need is 6 second audio" is just nonsense. It is not enough and the result is miles away from anything close to the original.

    • @ThorstenMueller
      @ThorstenMueller  2 місяці тому

      I agree, at least on my personal tests with my foreign (german) pronunciation. The result has been far away from being a high class voice clone. Have you seen my interview with Josh (Coqui AI co-founder)? ua-cam.com/video/XsOM1WZ0k84/v-deo.html)

  • @saadjutt1660
    @saadjutt1660 Місяць тому

    Can I still use this toturial? since Coqui is shut down. Plus can I use it for cloning Urdu language?

    • @ThorstenMueller
      @ThorstenMueller  Місяць тому

      Honestly i'm not sure on the future of XTTS (model, code and huggingface space) cause of their shutdown. But right now code and space is still available so it should still work as described but please let me know if you experience bigger problems.

  • @jab4li
    @jab4li 27 днів тому

    If i install xtts on my computer, i can use unlimited characters? Because the demo version on huggingface has 200 characters limitation.
    Thanks.

    • @ThorstenMueller
      @ThorstenMueller  26 днів тому +1

      This should be the case. The limitation is part of their Huggingface space and should not apply locally.
      huggingface.co/spaces/coqui/xtts/blob/d3b67acd01a3f63524371ad7d35a044ac0e75f60/app.py#L200

    • @jab4li
      @jab4li 25 днів тому

      @@ThorstenMueller Nice, i'm gonna try it. Thanks!

  • @Name-is2bp
    @Name-is2bp 4 місяці тому +1

    did you make a tutorial on how to install and use cuda?

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому +1

      No, not yet. But interesting idea. I've added it on my TODO list 😊.

  • @gonzaloorellanatech
    @gonzaloorellanatech 20 днів тому

    how we can get a more fast response?... better hardware?, ram? processing? ... thsnks for the video!

    • @ThorstenMueller
      @ThorstenMueller  15 днів тому +1

      First, you're welcome :). Do you use cpu or gpu? Because gpu (CUDA) provides faster response.

    • @gonzaloorellanatech
      @gonzaloorellanatech 14 днів тому

      @@ThorstenMueller thnks for your response. Yea!... GPU, but my notebook is only to development... i need better process to audio files from cloning voice tts

  • @tobiasd2755
    @tobiasd2755 Місяць тому

    Sehr gut erklärt.
    Ich hatte von dem video jedoch erhofft, nicht nur einen einzelnen speech zu erstellen, sondern mein eigenes model abzuspeichern, so dass es dann z.B. unter tts --list_models auftaucht oder ich es zumindestens bei --model_name angeben kann.
    Ist das auch möglich?

    • @ThorstenMueller
      @ThorstenMueller  25 днів тому

      Vielen Dank 😊. Die "--list_models" Option zeigt Informationen aus der .models.json Datei aus dem Repo an. Du könntest versuchen dein Modell in der Datei lokal bei dir einzutragen. Du hast also bereits ein eigenes Modell trainiert?

  • @RossDCurrie
    @RossDCurrie 4 місяці тому

    "ERROR: Failed building wheel for tts" - What version of python are you running?

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому

      This error often occurs when you use an older version of pip. Did you run "pip install pip setuptools wheel -U" before installing Coqui?

    • @RossDCurrie
      @RossDCurrie 3 місяці тому

      @@ThorstenMueller ​ This may have been the issue. Played around with it a bit and got it working again, but can't recall exactly which thing I did differently. Thanks for the reply though!
      If you're looking for content ideas, one thing I am struggling with is how this all fits together now, in June 2024. Specifically - when I start the server and hit the local webserver, I get a very different UI than what I see in other videos on XTTS. And I know there are all different UIs for XTTS - there's a fine tuning one, a web UI, RVC, etc. and some of them have bits that don't work, and it sounds like Coqui has abandoned the project now and... it's hard to catch up on it all when coming into it for the first time, and it changes so rapidly.
      So I guess what I'm trying to figure out is - if I want to build an AI voice clone of me, today, what's the strategy/stack you recommend?

  • @developerzava
    @developerzava 4 місяці тому

    TTS is available on python 3.12?

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому

      According their README python 3.11 is the max supported version. As Coqui AI hat shut down i'm not sure if or when this will be adjusted to higher python version.

  • @characters1210
    @characters1210 5 місяців тому

    Can i make code clone Arabic voice and read arabic text

    • @ThorstenMueller
      @ThorstenMueller  5 місяців тому

      I've no experience using Arabic with XTTS. Did you already try it using their Huggingface space?

  • @congtaihu1287
    @congtaihu1287 6 місяців тому

    thank you for this video! i am running into problems. when i execute the script, it shows "AssertionError: CUDA is not availabe on this machine.". But i have cuda12.3 and compatible torch and my other ai software ran well. i have no idea what is happening. please help!

    • @ThorstenMueller
      @ThorstenMueller  5 місяців тому

      Does it work if you use it with "use_cuda false" in general?

  • @EfficioIgnisVitae
    @EfficioIgnisVitae 8 місяців тому

    I'm getting this issue where when I try to check for models this happens:
    LLVM ERROR: Symbol not found: __svml_cosf8_ha
    Anyone know what's going on here?

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому

      That's strange. Maybe recreate your python venv and reinstall. Maybe there's an error in your installation.

  • @michaelroberts1120
    @michaelroberts1120 7 місяців тому

    This is only interesting to developers and programmers. Regular hpbbyists will find this video useless, because Coqui has no GUI or server.

    • @ThorstenMueller
      @ThorstenMueller  6 місяців тому

      Coqui TTS has a simple web UI if you run it locally where you can synthesize audio.

  • @PlayGameToday
    @PlayGameToday 4 місяці тому

    Hello, sir Thorsten! The title of the video doesn't really capture the point. Unfortunately, I didn't find in your video how to start the GUI for Coqui TTS. In the title to the video you have stated - XTTS - and just I was hoping that I could run the gradio-gui that was at the beginning of your video. Too bad you don't have a video tutorial on how to deploy on your local machine the handy GUI for voice generation that was in the demo.

  • @ignacioalonsol
    @ignacioalonsol 4 місяці тому

    Has anyone made a comparison between xtts and piper training? I'm curious on what's better quality @thorsten?

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому +1

      Personally i prefer Piper. But i trained my models in piper with way more input data then the 6 seconds input to xtts.

  • @marcinziajkowski3870
    @marcinziajkowski3870 4 місяці тому

    Can we create ready to use object instead of "speaker_wav" list passed every time we generate "output.wav" ? to speed up process ?

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому +1

      As i'm not sure, i'd recommend asking on Coqui community on github. But as Coqui AI (the company) has shut down, i'm not sure on how fast you might get a reaction.

  • @IvarDaigon
    @IvarDaigon 6 місяців тому

    I've been using coqui for months and it's amazing that Coqui simulates breathing at all, but breathing is typically the most distorted part of the generated the audio which can make it sound unnatural.. I'm wondering if you remove the breathing from the source audio whether that will improve the quality of the cloned voice or whether the distorted breathing is just a symptom of the underlying model.

    • @ThorstenMueller
      @ThorstenMueller  6 місяців тому

      I've no idea how this could work. Maybe it helps if you use audio tools to cut out your breathing from the recording you provide to XTTS. Or maybe there are audiofilters like sox or ffmpeg that can remove breathing sounds from the generated audio.

  • @Schawum
    @Schawum Місяць тому

    --- hallo, bitte das tutorial nochmal auf deutsch. weil das würde mich wirklich sehr interessieren. aber englsich verstehe ich kein wort.

    • @ThorstenMueller
      @ThorstenMueller  Місяць тому

      Hallo, helfen dir vielleicht zunächst die automatisch auf Deutsch übersetzen Untertitel?

    • @Schawum
      @Schawum Місяць тому

      @@ThorstenMueller die sind immer aus bei mir. weil ich beim lesen dem video nicht volgen kann. daher bringt mir das nicht wirklich was.

  • @MuhammadChanif-cp2ut
    @MuhammadChanif-cp2ut 5 місяців тому

    Anjai

  • @64jcl
    @64jcl 9 місяців тому

    Btw, how do I get the gpu parameter to work. I have a 3000 series GPU but even if I select gpu=True it says CUDA is not available. Also I have noticed that the cloned voice from my own speech shifts to sometimes output british accent and sometimes american (likely because my accent is neither). But it also means it is impossible to get consistent results with this. Is there some way to save a snapshot of whatever it came to was "the voice" and reuse that as input on subsequent generations. If not it is quite useless and just a fun demo really.

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому +1

      Did you install CUDA and is it working? There are Python code sniplets available to check if CUDA is working.

  • @ricardorey259
    @ricardorey259 8 місяців тому

    Hello, good video, do you know how to remove the character limit restriction when writing?
    Warning: The text length exceeds the character limit of 239 for language 'es', this might cause truncated audio.

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому

      Thanks for your nice feedback 😊. Hmm, not really. Earlier we sometimes run into a "max_decoder_steps" which caused truncated audio, but i'm not sure if this applies here too.

  • @spiritual_audiobooks
    @spiritual_audiobooks 5 місяців тому

    What do you say to Applio TTS? Maybe the best Open Source TTS?

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому

      I didn't heard about Applio TTS. You say it's worth giving it a try?

  • @rogerperez9856
    @rogerperez9856 8 місяців тому

    Hello, do you know why when converting a text of about 500 words it takes about 25 minutes?

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому

      I didn't try it with such long texts. Is it faster when you split it into smaller pieces and put the chunks together in post generation?

  • @Cmapukan
    @Cmapukan 6 місяців тому

    Thanks for the good explanation and clear example. I wish you prosperity and new opportunities. I apologize for my broken English.

    • @ThorstenMueller
      @ThorstenMueller  6 місяців тому +1

      Thank you for your nice comment. I wish you all the best, too 😊.

  • @tsunderes_were_a_mistake
    @tsunderes_were_a_mistake 8 місяців тому

    I tried it on huggingface with Japanese but it sounded robotic. Can you make a tutorial on how to finetune xtts on local?

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому

      Thanks for your topic suggestion. I've added it on my TODO list but it might take some time.

  • @LeSchurke
    @LeSchurke 4 місяці тому

    nices video ;)
    und ei gude wie?
    Is it better, when the ref voice is longer than 6 sec?
    or doesn't matter or more worse? 00:43

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому

      Ei subba, freut mich', dass des Video gefällt :)
      According to my talk with co-founder of Coqui AI, Josh Meyer, the model is optimized for a 6 second audio input. Before trying longer audio input try using other 6 second clips.

  • @gorizon9802
    @gorizon9802 7 місяців тому

    Is it possible to use AI even with texts in another language? I would really like to know because I want to dub a game with this tool.

    • @ThorstenMueller
      @ThorstenMueller  7 місяців тому

      I'm not sure about that. I'd recommend asking on Coqui community, but as Coqui AI (the company) has shut down i'm not sure on how fast you might get an answer.

  • @PlayGameToday
    @PlayGameToday 4 місяці тому

    What parameters I need to include to make audio output more quality? It's looks like only 96kbps bitrate..

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому +1

      Normally generated output is the same samplerate as the voice dataset the model has been trained on. Maybe you can use tools like ffmpeg to adjust samplerate afterwards, but i doubt if this will increase the quality.

    • @PlayGameToday
      @PlayGameToday 4 місяці тому

      @@ThorstenMueller I need to train my own model in 48KHz, so the output will be more quality

  • @asdasdaa7063
    @asdasdaa7063 8 місяців тому

    I love your videos bro but you gotta speak a bit faster XD I have to play the video at 1.5x speed haha still love the videos!

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому +1

      Hehe, thanks for your suggestion. I'll keep it in mind for next videos. As a non-native english speaker i have to think a little while for the right words 😆.

  • @64jcl
    @64jcl 9 місяців тому

    Quite amazing that they can do this with such a short clip. I had the same results as you with english, it doesn't really sound like me even though I tried to speak my best english. :) - How would you compare it with Piper with regards to TTS performance? Ofc Piper is quite difficult to train for new voices, but its free to use commercially even. I wish there was some simpler way to clone voices with it and that would be golden. I have looked at your video for this but preparing the training set seems like a chore.

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому

      Thanks for your comment 😊. I didn't compare the performance between XTTS and Piper TTS. I guess when you want a free and best voice clone i'd go with Piper TTS right now, but the effort is higher - as you said.

  • @МихаилЮрков-т1э
    @МихаилЮрков-т1э 6 місяців тому

    Thanks for the informative video and interesting presentation.
    Please make a guide on how to train a model on a custom dataset.

    • @ThorstenMueller
      @ThorstenMueller  6 місяців тому

      Thanks for your nice feedback 😊. This topic is already on my (growing) TODO list.

  • @asanostudio
    @asanostudio 7 місяців тому

    Have you made a video tutorial to create a voice model for Indonesian, or how to add a voice model, I want to make an Indonesian voice model

    • @ThorstenMueller
      @ThorstenMueller  7 місяців тому

      No. But as Coqui (company) shut down i'm not sure on further development of their code. Maybe it's worth taking a look to Piper TTS for training an Indonesian tts model. ua-cam.com/video/b_we_jma220/v-deo.html

  • @tonysolar284
    @tonysolar284 8 місяців тому

    coqui is now dead

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому +1

      Sadly yes, at least the company, let's see what's happening with the code and community.

  • @animations.ki.anokhi.duniya
    @animations.ki.anokhi.duniya 5 місяців тому

    Coqui tts is shotting down?

    • @ThorstenMueller
      @ThorstenMueller  5 місяців тому

      Sadly, yes. I've made a short about it. ua-cam.com/users/shortsQMruRTlQu7I?si=JyDY8ziFJC8omAPY

  • @orcunaicovers17
    @orcunaicovers17 4 місяці тому

    It says Cuda is not available on this machine

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому

      I'm working on a video about CUDA. If you want i can post an update here when it's online 😊.

    • @orcunaicovers17
      @orcunaicovers17 4 місяці тому

      @@ThorstenMueller I've solved the problem. Torch and CUDA version should be compatible with each other

    • @ThorstenMueller
      @ThorstenMueller  4 місяці тому

      @@orcunaicovers17 Happy you could solve it 😊.

  • @ari4340
    @ari4340 8 місяців тому

    Hello! I've been using this on hugging face for a few months, but today when I went to the page this error appears: Runtime error
    Scheduling failure: not enough hardware capacity
    Container logs:
    Fetching error logs...
    Any idea of what's happening? Thank you!

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому +1

      According to the error message the XTTS container does not have enough compute power on Huggingface platform. This might be a temporary problem or might relate to the shutdown of Coqui AI as a company.

    • @ari4340
      @ari4340 8 місяців тому

      @@ThorstenMueller Thanks for your reply! I hope it's not the later, It's the only free and online option that I knew of 😓

  • @Aiolia_Games
    @Aiolia_Games 5 місяців тому

    Posso usar essa voz para narrar um vídeo no UA-cam?

  • @TomiTom1234
    @TomiTom1234 9 місяців тому

    Can you please tell me what program did you use to run the codes on @15:28 ?

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому +1

      Sure, it's a code editor from Microsoft, called "Visual Studio Code".

  • @AmrAli-ig2mk
    @AmrAli-ig2mk 5 місяців тому

    Thanks a lot for your efforts. you are doing great work, keep it up.

    • @ThorstenMueller
      @ThorstenMueller  5 місяців тому +1

      Thank you a lot for your kind feedback - this keeps me motivated 😊

  • @Chriscs7
    @Chriscs7 7 місяців тому

    What is better this or Tortoise TTS (Ecker Voice Clone) ?

    • @ThorstenMueller
      @ThorstenMueller  6 місяців тому

      Hard to say, as i didn't give Tortoise TTS a closer look, but it's still on my todo list.

  • @Gute_Nacht_Kurzgeschichten
    @Gute_Nacht_Kurzgeschichten 6 місяців тому

    Super erklärt 👍Wie kann ich denn meine Stimme Klonen das er mir ganze Texte vorliest? z.b. eine PDF Datei oder ein Word Dokument, oder beschränkt es sich nur auf 6 Sek.

    • @ThorstenMueller
      @ThorstenMueller  5 місяців тому

      Vielen Dank für das Lob - das freut mich sehr 😊. Eine fertige Lösung für Text/Word/PDF Input gibt es (glaube ich) nicht, aber generell kannst Du längeren Output erzeugen. Du musst den Eingabetext vielleicht aufteilen, aber sicherlich gehen deutlich mehr als 6 Sekunden.

  • @starbuck1002
    @starbuck1002 9 місяців тому

    Ich habe mich ebenfalls ein wenig mit Coqui XTTS ausprobiert. Ich bin zu dem Entschluss gekommen dass es sich nicht lohnt.
    1. kann coqui XTTS nicht annährend mit den führenden Mitstreitern bezogen auf Qualität der clones mithalten.
    2. Ist coqui XTTS für diese Qualität bei diesem Preis meiner Meinung nach nicht lohnenswert, betrachtet man auch hier die Qualität und Pricings der Mitstreiter!
    Trotzdem wieder vielen Dank für dein Video Thorsten!

    • @ratside9485
      @ratside9485 9 місяців тому

      Welchen Preis? 1$ am Tag für Unternehmen sonst ist es Kostenlos.

  • @MrScesher
    @MrScesher 9 місяців тому

    Hi Thorsten,
    I can't get it to run. I always receive "No module named 'TTS.api'; 'TTS' is not a package" Even though the tts package is installed. Pip lists it in the installed packages.
    The few threads I found are no help. Maybe you have an idea?

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому

      This is strange. If "pip list" shows the tts package then it seems that everything is installed correctly. Are you running your python script really in the right python venv? Can you run "tts --help" in the command line successful?

    • @MrScesher
      @MrScesher 9 місяців тому

      @@ThorstenMueller The tts command in the console works. tts --list_models too.
      And yes i am running the created venv.

    • @MrScesher
      @MrScesher 9 місяців тому

      @@ThorstenMueller I managed to get it running briefly when I use the setup of the git repo. But it is only working in that terminal and after closing it everything is gone with it. Thats not a solution, because the setup is taking too long.

  • @chrispeters8295
    @chrispeters8295 7 місяців тому

    Thank you for the super informative video! You're awesome!

    • @ThorstenMueller
      @ThorstenMueller  6 місяців тому

      Wow, thanks a lot for your nice feedback 😊.

  • @DrFukuro
    @DrFukuro 9 місяців тому

    Ich mag deine Videos sehr, auch wenn viele leider nur auf Englisch sind. Könntest du dir vorstellen, einmal ein generelleres Übersichtsvideo zur Sprachsynthese machen? Auch nach tagelager Recherche blickt man als Laie nur unvollständig durch, es wäre großartig, wenn mal ein Profi wie Du für den Interessierten etwas tiefergehend folgende Themen erläutert:
    Was genau ist/machen Coqui,
    Xtts, Tortoise, Espeak / espaek-ng und wo ist der Unterschied zu
    Mbrola und dessen Stimmen? (Kann ich tts anstelle von Mbrola in Skripten verwenden? Ja/nein - Wie/Warum?)
    Beispielhafte Fragen zu xtts:
    Was ist eine Multilingual Voice im Unterschied zur Thorsten Voice?
    Was genau ist voice cloning im Gegensatz zu voice transfer?
    Was machen/sind Coqui speakers?
    Wo ist der Unterschied darin, des xtts Modell zu feintunen und einfach nur
    eine speaker_wav Referenz anzugeben?

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому

      Vielen Dank für deine tolle Rückmeldung und den Vorschlag 😊. Das Thema gefällt mir sehr gut. Wenn man sich so lange und intensiv mit einem Thema beschäftigt, dann werden diese "Grundlagen" irgendwie so normal, dass man gar nicht mehr drüber nachdenkt. Ich habe das Thema auf meine TODO Liste gesetzt. Besten Dank dafür 😊.

  • @Silberschweifer
    @Silberschweifer 21 день тому

    oh no desynchorn video

    • @Silberschweifer
      @Silberschweifer 21 день тому

      do you clap your hands by recording?

    • @ThorstenMueller
      @ThorstenMueller  15 днів тому

      No, but thanks for the idea to optimize video/audio sync but clapping 👍.

  • @nerdynav
    @nerdynav 9 місяців тому

    Hi Thorsten, I am a computer engineer and AI UA-camr myself (who isn't nowadays? haha :P). Just wanted to say that you make great tutorials on AI voice. I stumbled on this tutorial while exploring Coqui and it is the best tutorial I found. Thanks for taking the time to do these.
    Also, a subscriber asked me for a resource on Coqui TTS tutorials on reddit, I have shared your channel! Keep up the great work.

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому

      Hi 👋. Thanks for your kind feedback on my content 😊. You're right, we are not alone on AI content 😆.

    • @ThatGuyNamedBender
      @ThatGuyNamedBender 7 місяців тому

      Pretty much 95% of youtube and the working class are against AI lmfao but keep daydreaming

  • @secondaccount5512
    @secondaccount5512 9 місяців тому

    Great video, expectations after listening to the interview with Josh were high, but XTTS is still kinda new, so I am excited for the future improvements.

  • @john_blues
    @john_blues 9 місяців тому

    Is this able to pull text from a text file? I have a Tortoise version that can do it, and it is helpful for long form text.

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому

      IMHO this isn't supported by now. But finding a suitable solution for that is on my TODO list.

    • @john_blues
      @john_blues 9 місяців тому

      @@ThorstenMueller For some reason my reply keeps getting deleted. Anyhow, I run a local TTS that can pull from a text file. Maybe it will help you. It is by neonbjb on Github.

    • @ThatPain1
      @ThatPain1 7 місяців тому

      @john_blues You can totally read in, one or muliple files via python, transform the text as you like, and use xtts to generate a synthetic speech audiofile from it.
      Im using i currently to create sort of a audobook from a fanfiction.
      Removing points at end of sentences improved the result quite a lot.

  • @nuborn.studio
    @nuborn.studio 7 місяців тому

    Nettes Tool und großen Respekt an den Entwickler! Ich finde die Idee super, allerdings könnte ich persönlich nichts mit der Qualität anfangen. Aber hey, für 6 Sekunden input ist dass doch ein mega Ergebnis finde ich!

  • @timo1949
    @timo1949 7 місяців тому

    Sehr sehr guter Kanal! 👍 Ich habe mich gefragt: Was ist denn der Grund für die doch niedrige Samplingrate von 22.050Hz im ThorstenVoice Dataset? Einfach eine schnellere Vearbeitung der Daten?

    • @ThorstenMueller
      @ThorstenMueller  7 місяців тому

      Vielen Dank für deine tolle Rückmeldung 😃. In den Tests war in der Audioausgabe kaum ein Unterschied hörbar, dafür aber war der Rechenaufwand bei bspw. 44kHz merklich höher.

    • @timo1949
      @timo1949 7 місяців тому

      @@ThorstenMueller Danke für die Info. Elevenlabs will ja für ein Professional Voice Cloning auch nur 128kbps mp3 und meint, dass kein Nachteil feststellbar ist. Sehr interessant, wie die AI das verarbeitet.

  • @nomadhgnis9425
    @nomadhgnis9425 8 місяців тому

    have a question for you. IF I wanted to pause for a number of seconds between sentences then how can I do that. Piper is really cool. Thanks.

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому

      Normally this is an aspect of SSML (Speech Synthesis Markup Language), which is by now not supported by Coqui and Piper. Maybe you can try a workaround and add multiple dots (....) to create a pause. But i didn't try it out myself.

    • @nomadhgnis9425
      @nomadhgnis9425 8 місяців тому

      @@ThorstenMueller thanks. will try that.

    • @nomadhgnis9425
      @nomadhgnis9425 8 місяців тому

      @@ThorstenMueller just tried it. I put dots where I wanted to pause bit it does not work. It only responds to one dot.

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому

      ​@@nomadhgnis9425 Okay, then maybe it's a workaround to create multiple tts wave files and merge them together including pauses. That's not an optimal way but it could do the job.

    • @nomadhgnis9425
      @nomadhgnis9425 8 місяців тому

      @@ThorstenMueller I found a way. I am using debian. I had to create a 3 second silent wav file and split the paragraphs into different wav files and then merge them together with the ilent wav where I need it. I done this with a bash script. So problem solved. Do you know where I can get more voice files other then the ones listed.

  • @chrsl3
    @chrsl3 9 місяців тому

    Amazing result.

  • @NoxmilesDe
    @NoxmilesDe 9 місяців тому

    Is there a TTS for Android?

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому

      IMHO by now there's no support for Coqui und Piper TTS on Android. But this would be really cool 😎. Did you ask already at their communities?

  • @bobbyboe
    @bobbyboe 9 місяців тому

    Hi Thorsten, sieht so einfach aus bei dir. Ich hab Coqui über Pinokio installiert und gestartet, in der Erwartung dann irgendwie lokal zu dieser GUI zu kommen. Pinokio sagt dann auch "running" aber unter den üblichen local hosts im browser finde ich nichts. Dann gibt es noch einen button "server", den hab ich mal gedrückt und bekomme die Antwort: .........Connected! Macht alles den Eindruck als liefe alles wie es soll... nur für mich endet das Erlebnis dort, weil ich nicht weiß wo sich Coqui mir zeigen könnte... schade eigentlich. Pinokio ist normalerweise ein gute Zugang für Non-Coder.

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому

      Meinst Du die GUI von Huggingface?

    • @bobbyboe
      @bobbyboe 9 місяців тому

      @@ThorstenMueller ja, ich meinte generell irgendeine GUI

  • @JamBassMusic
    @JamBassMusic 8 місяців тому

    Thank you!!

  • @Jed-i6j
    @Jed-i6j 9 місяців тому

    Not for commercial use. We need a truly open solution.

    • @juanjesusligero391
      @juanjesusligero391 9 місяців тому

      Yeah, it's a shame it's not 100% open. Fortunatelly, we'll always have Tortoise tts :)

    • @chryseus1331
      @chryseus1331 9 місяців тому

      Who cares it's not like they're going to sue you if you do.

    • @juanjesusligero391
      @juanjesusligero391 9 місяців тому +1

      @@chryseus1331They could, though. If you have a company and want to use a software for commercial use, I wouldn't recommend ignoring its license.

  • @schakuun1995
    @schakuun1995 9 місяців тому

    Great video! I'm really getting into TTS and it's so exciting to see what's possible now. It's incredible how something that needed hours of data a year ago can now be done in just 6 seconds. It's fascinating to watch this tech evolve

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому

      Thank you for your nice feedback 😊. I'm really curious to see where quality is going in near future.

  • @MarcoManzo
    @MarcoManzo 9 місяців тому

    Great! I was looking forward to this, only got it running on linux. Thank you for the tech support ;-)

    • @MarcoManzo
      @MarcoManzo 9 місяців тому

      😂 maybe cuda is exactly my problem on windows🤷‍♂

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому

      Thanks and you're welcome 😊. I'm happy if people find my videos helpful.

  • @IngridUterus
    @IngridUterus 8 місяців тому

    Hey, ich habe das über Pinokio installiert, da ich es anders nicht zum laufen gebracht habe. Allerdings weiß ich nicht, wie ich bei coqui-tts auf GPU umstellen kann. Welche Datei muss ich öffnen? Auch die Geisterstimmen möchte ich gerne verhindern. Weißt du wo ich da was einstellen muss? Ich weiß, dass es möglich ist, da ich einen Telegram-Bot verwende, der mit coqui arbeitet und fehlerfrei funktioniert, allerdings mit starker Zeichenbegrenzung. Achja, Zeichenbegrenzung :D wo kann ich die auch ändern? Danke dir im vorraus

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому +1

      Bei den Coqui TTS Modellen gibt es einen Kommandozeilenparameter "--use_cuda". Damit sollte die GPU genutzt werden. Zur Länge kannst Du mal versuchen die Konfigurationsdatei des Modells zu öffnen und den Wert von "max_decoder_steps" zu erhöhen (habe ich aber bei XTTS selber noch nicht versucht). Viel Erfolg 😊.

    • @IngridUterus
      @IngridUterus 8 місяців тому

      @@ThorstenMueller danke. Das werde ich heute Abend mal versuchen. Wo genau finde ich die Konfigurationsdatei? Ist das die configs.py im TTS Ordner? Gibt es auch eine Möglichkeit, die Fehler am Ende von Sätzen und in den Stellen zwischen den Sätzen zu vermeiden? Oft entstehen da auch eine Art Geisterstimmen, die echt seltsam klingen xD

    • @ThorstenMueller
      @ThorstenMueller  7 місяців тому

      @@IngridUterus Hast Du die config Datei gefunden?

    • @IngridUterus
      @IngridUterus 7 місяців тому

      @@ThorstenMueller Ja, ich habe eine bessere variante für coqui-tts gefunden, die wesentlich einfacher für Anfänger ist. Kann ich dir nur empfehlen: Alltalk_tts

  • @ratside9485
    @ratside9485 9 місяців тому

    Kannst du auch zeigen, wie man es finetune kann? Aber Lokal? Danke

    • @ThorstenMueller
      @ThorstenMueller  9 місяців тому +1

      Danke für deinen Themenvorschlag 😊. Ich habe es auf meine TODO Liste gesetzt.

    • @ratside9485
      @ratside9485 9 місяців тому

      @@ThorstenMueller gibt inzwischen auch auf GitHub ein WebUI fürs finetunen 🙌 funktioniert ganz gut. Das einzige was noch ein Problem ist das sich die Einstellungen ändern Temperatur und Co hab da Stunden ausprobiert es werden immer Sätze übersprungen.

  • @__________________________6910
    @__________________________6910 9 місяців тому

    Sir, your explanation is very easy to understand.

  • @anarmustafayev9145
    @anarmustafayev9145 9 місяців тому

    Genau das haben wir gesucht. Herzlichen Dank 👍

  • @TNMPlayer
    @TNMPlayer 9 місяців тому

    For some reason my terminal doesn't run in the venv.

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому

      Could you successfully create a venv and just can't activate it or can't you create it?

    • @TNMPlayer
      @TNMPlayer 8 місяців тому

      @@ThorstenMueller the venv created just fine but I couldn’t open a terminal within it

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому

      @@TNMPlayer That's strange. Do you use the .bat or powershell (.ps1) file to activate the venv?

    • @TNMPlayer
      @TNMPlayer 8 місяців тому

      @@ThorstenMueller I used the .ps1

    • @ThorstenMueller
      @ThorstenMueller  8 місяців тому

      @@TNMPlayer Maybe try out the .bat version, this could have an effect.

  • @FrankGlencairn
    @FrankGlencairn 9 місяців тому

    Leider ist das ohne UI ein verdammter Alptraum für jeden der kein Programmierer ist.

    • @starbuck1002
      @starbuck1002 9 місяців тому

      Ja, dann benutz doch einfach das UI! xD

    • @ratside9485
      @ratside9485 9 місяців тому

      Kannst Pinokio nutzen, mit automatischer Installation hat das Web UI von Huggingface

    • @FrankGlencairn
      @FrankGlencairn 9 місяців тому

      @@ratside9485 leider bekomm ich da immer ne Fehlermeldung bei der installation,

  • @רחלישדה-ה4מ
    @רחלישדה-ה4מ 6 місяців тому

    must GPU?

    • @ThorstenMueller
      @ThorstenMueller  6 місяців тому

      Generally (not sure for XTTS in special) CPU might work but way slower than using a CUDA enabled GPU.

    • @רחלישדה-ה4מ
      @רחלישדה-ה4מ 6 місяців тому

      if i want to clone my own voice,i need to train this?how?@@ThorstenMueller

    • @ThorstenMueller
      @ThorstenMueller  6 місяців тому

      @@רחלישדה-ה4מ I'd recommend you taking a look to Piper TTS for that. ua-cam.com/video/b_we_jma220/v-deo.html

    • @רחלישדה-ה4מ
      @רחלישדה-ה4מ 6 місяців тому

      thanks!@@ThorstenMueller

  • @downloadpcgamesdirectlinkb7590
    @downloadpcgamesdirectlinkb7590 9 місяців тому +10

    i review its documentation you can't use this commercially, why waste time on this haha.

    • @Abwaham
      @Abwaham 4 місяці тому +2

      To learn? Or like, for fun?