Local AI Voice Cloning with Tortoise TTS - 2024 Installation (Check LATEST update in description)

Поділитися
Вставка
  • Опубліковано 19 жов 2024
  • Links referenced in the video:
    LATEST Update - • Updated AI Voice Cloni...
    Github Repo - github.com/Jar...
    Curate Dataset - • How to Make the PERFEC...
    Training Better Models - • A Tip on Training Bett...
    Timestamps:
    Demo - 0:07
    Installation - 0:40
    Starting and Using - 2:27
    Add Voices/Zero Shot Voice Cloning - 6:05
    Training a Voice Model - 9:04
    Generate Config - 13:33
    Run training - 15:32
    Using Trained Model - 17:12
    Hardware for my PC:
    Graphics Card - amzn.to/3pcREux
    CPU - amzn.to/43O66Ir
    Cooler - amzn.to/3p98TwX
    RAM - amzn.to/3NBAsIq
    SSD Storage - amzn.to/42NgMFR
    Power Supply (PSU) - amzn.to/430bIhy
    PC Case - amzn.to/447499T
    Mother Board - amzn.to/3CziMXI
    Alternative prebuilds to my PC:
    Corsair Vengeance i7400 - amzn.to/3p64r22
    MSI MPG Velox - amzn.to/42MnJHl
    Cheapest and PC recommended:
    Cyberpower 3060 - amzn.to/3XjtZoP
    Come join The Learning Journey!
    Discord - / discord
    Github - github.com/Jar...
    TikTok - / jarodsjourney
    If you found anything helpful, please consider supporting me and the content I am trying to produce!
    www.buymeacoff...

КОМЕНТАРІ • 522

  • @Mowgi
    @Mowgi 10 місяців тому +108

    We're all very lucky to have someone dedicated to not only teaching us how to use these awesome technologies, but making it as simple and up to date as possible. Keep up the great work, we don't deserve you 🙌

    • @Jarods_Journey
      @Jarods_Journey  10 місяців тому +8

      Thank you thank you 🙏🙏! Really much appreciate it and you're too kind 🥹

    • @PlaystationEu
      @PlaystationEu 10 місяців тому

      ​@@Jarods_Journeythanks a lot for your work, it's really awesome 😊

    • @pc_boy5371
      @pc_boy5371 10 місяців тому

      I agree with you a 100% love the channel

    • @brianlink391
      @brianlink391 8 місяців тому +1

      Speak for yourself - I totally deserve him! 😉

    • @SirRubyRed
      @SirRubyRed 7 місяців тому

      Is it not possible to download pretrained voices?

  • @shawn4990
    @shawn4990 10 місяців тому +32

    After getting into AI and programs like Stable Diffusion over the last year, I had to learn some code with all that's required to get them to run properly. However, since I'm not a programmer, what ended up happening is I created more issues for myself, which took way too much time to google and fix my mistakes. Yes, I've learned a ton, but I've pulled nearly all of my hair out in the process. So, thank you for making this a code-free install. Saves me time and more hair-pulling. Again, thank you Jarod... your efforts are appreciated.

    • @Jarods_Journey
      @Jarods_Journey  10 місяців тому +16

      Appreciate it! I know there are a lot of folks that are interested in AI but all of the code revolving around it and dependency managing... Is a hell scape. So, glad that my code free install can help others out there and it also makes sure the tutorial stays the same throughout time :)!

    • @2mShortFormCC
      @2mShortFormCC 8 місяців тому +1

      GPT can code if you know what to ask for

    • @33rdframe
      @33rdframe 5 місяців тому +2

      i am the 24th person to REALLY feel this message, lol. i never wanted to learn python 😂

  • @haydar_kir
    @haydar_kir 9 місяців тому +17

    The way ai tts companies charging people is ridiculous. I am glad there are people like you. Thank you.

    • @compositeur8455
      @compositeur8455 9 місяців тому +1

      You need an Nvidia GPU to run this crap, so it's not much better

    • @1ajayc
      @1ajayc 3 місяці тому

      @@compositeur8455 most people have this already - its the most popular GPU

    • @herculeholmes504
      @herculeholmes504 13 днів тому

      I'd be quite happy to pay for an offline TTS with good quality voices, but the commercial software creators only offer online options that come in two price tiers: Ultra-expensive for commercial use, or free for private use. Which sounds nice, but being old-fashioned I just can't and won't trust anything that is "free" and online; it's my data on someone else's computer.

  • @tyc00n
    @tyc00n 10 місяців тому +7

    super awesome, I tried doing that recently and gave up. Really good idea including all the dependencies so the process becomes 1. Download 2. Extract 3. Run like everything else people download 😊

    • @Jarods_Journey
      @Jarods_Journey  10 місяців тому +2

      Thanks! The key is using the python embeddable packages, though there are a lot of steps to getting a package up and running correctly😅

    • @black_dragon274
      @black_dragon274 9 місяців тому +1

      @@Jarods_Journey Why isn't there a GUI interface for this? Does it have to be through a terminal or browser? It's so primitive!

  • @nodewizard
    @nodewizard 10 місяців тому +9

    We have quantized LLMs and Turbo SDXL and LCM models. I think it's time for a turbo/quantized TTS in 2024. Thank you as always for your tutorials and updates.

  • @BlueprintBro
    @BlueprintBro 10 місяців тому +7

    Thank you so much for always making up to date and accessible guides for everyone!

  • @IOSALive
    @IOSALive 6 місяців тому +1

    This made me so happy! I liked and subscribed!

  • @ShannonWare
    @ShannonWare 6 місяців тому +1

    This is an amazing video. Not only has it gotten me started with voice cloning, it is an excellent summary of quick and dirty model training.

  • @supaplay3947
    @supaplay3947 9 місяців тому +3

    I'm so thankful for u making this video and for the community who makes these tools. I really want to change my video from silent type of video to more of a entertainment type videos but my main problem is my voice, I was born with bad voice and so I really need something like this for the voice of my video

  • @zanshibumi
    @zanshibumi 6 днів тому

    It works so perfectly well, and you made it so simple!
    This is amazing, thank you so much. Going to the support page right now.

  • @TweetykachuDenzelAbaya
    @TweetykachuDenzelAbaya 22 дні тому +1

    Lubos akong nagpapasalamat sa paggawa mo ng bidyo na ito at sa komunidad na gumagawa ng mga tool na ito. Gusto ko talagang baguhin ang aking bidyo mula sa tahimik na uri ng bidyo sa higit pang isang uri ng pang-libang na mga bidyo ngunit ang aking pangunahing problema ay ang aking boses, ako ay ipinanganak na may masamang boses at kaya kailangan ko ng ganito para sa boses ng aking bidyo.

  • @MR.RECAPER
    @MR.RECAPER 10 місяців тому +1

    👌👌thanks, i have trying to install tortoice tts from your first video about it. but i always get error when installing pakages but this it was so easy and it actually worked.😊😊😊😊😊😊

  • @legend_of_ray
    @legend_of_ray 10 місяців тому +1

    I managed to find the original repo a little while back. Glad your your keeping it alive...thanks for this!

  • @PNN_ParodyNewsNetwork
    @PNN_ParodyNewsNetwork 2 місяці тому +2

    Thanks bro! thumbs up for this video

  • @Nathanizer
    @Nathanizer 9 місяців тому

    Thanks a lot ! I was trying stuff with Conda but all didn't work out as I expected. So followed your video, and with the own custom voices. It all works perfectly. Thanks :)

  • @rettbull9100
    @rettbull9100 8 місяців тому +1

    My clone voice came out sounding horrible. I used same audio clips that I've used with RVC, which sounds really good. I used all the same setting and did like you said. Though for some reason my long clip was broken up into 0 to 4 sec clips. I made sure all my sets matched what you used.
    It original audio clip was 54 minutes long. Took over a day to train.
    edit: the graph lost-mel, green light, was almost at zero at the end of training. I trained it for 500 epochs.

  • @Dalin_B
    @Dalin_B Місяць тому

    Working with this now as I speak. Great job man. Really appreciate it

  • @MatthewJettHall
    @MatthewJettHall 4 місяці тому

    OMG you rock!!! Thank you so much for putting this package together for us. It works amazing!!!! Thank you again!

  • @HotDrawingWithSugawara
    @HotDrawingWithSugawara 2 місяці тому

    Thank you for making a real video with real data in it. The FOUR videos I tried before this one contained nothing of value.

  • @Samuel-wl4fw
    @Samuel-wl4fw 10 місяців тому

    Thanks a lot, have been struggling with dependencies, and have been following a few of your videos :)

  • @audio.video.disco.
    @audio.video.disco. 4 місяці тому +3

    Please, do a series only on how to install and use each of these TTS models
    i'm not a programmer and im having a really hard time, i think you would get a lot of views from these video tutorials.

  • @huyked
    @huyked 9 місяців тому +1

    I wish all the github stuff (I'm a newbie/non-programmer) was this simple. Lol. Thank you!

    • @Jarods_Journey
      @Jarods_Journey  9 місяців тому +1

      And that's why I wanna try and make it as hands off as possible :)! The learning curve sucks in the beginning, but it does get easier though the more you learn it for GitHub though!

  • @UmakantMishra
    @UmakantMishra 8 місяців тому

    Great package. I will install and explore it. Thank you for sharing your valuable knowledge and experience. Big Like.

  • @KurtStaInes
    @KurtStaInes 10 місяців тому

    LMAO this program now became the Stable Diffusion of voice generation, I admit that it won't take that long for this to improve . Thanks for the fork looking forward for the documentation.

  • @puntogcb
    @puntogcb 8 місяців тому

    Hey Jarod! Just wanted to drop a quick note of appreciation for your content on AI. Your journey into the world of artificial intelligence is both fascinating and informative. Thanks for making complex topics so engaging and easy to understand. Keep rocking those AI insights! 🚀
    By the way, any chance trainig Spanish LATAM voices in the future? That would be fantastic! How would it work? Muchas muchas gracias! Abrazo de Argentina!

  • @Jimbo116
    @Jimbo116 Місяць тому

    This is so cool, and that it is free is a big bonus. Thanks for the teaching..really good 🙂

  • @bwowzah
    @bwowzah 10 місяців тому +1

    Fantastic video! I greatly appreciate the hard work and dedication you put into what you do on this channel. You've helped me out immensely.

  • @jonnysmith9328
    @jonnysmith9328 7 місяців тому

    You're Awesome ! I love your videos. They make sense and easy to follow.

  • @lightning_dynamics
    @lightning_dynamics 8 місяців тому

    thank you so much for putting this all together, I'm making an audiobook and this helps a lot !!!

  • @Vlkn7
    @Vlkn7 10 місяців тому +1

    Thank you for making videos on rvc and tortoise tts , i hope that one click pipeline comes soon

  • @spiffylich3349
    @spiffylich3349 9 місяців тому +1

    Awesome Video! I'm a bit stuck, though- I have about a 45 minute clip of a character talking, and I've gone and processed it with UVR-5 and the audio-splitter project you linked, so I have a ton of smaller voice-line wav files. But when I try and train the model on them for ~200 epochs, the results I get from using the model are awful!
    its like around 50% of the words spoken by the generated audio are just noise, or the AI struggling very hard to speak a word.
    any tips for getting clearer audio? like, should I put my 45 minute video into the voice folder instead of the multiple clips?

  • @TweetykachuDenzelAbaya
    @TweetykachuDenzelAbaya 22 дні тому +1

    sobrang galing, sinubukan kong gawin iyon kamakailan at sumuko. Talagang magandang ideya kasama ang lahat ng mga dependency upang ang proseso ay maging 16. I-download 17. I-extract ang 18. Patakbuhin tulad ng lahat ng dina-download ng mga tao ☢☢☢☢

    • @TweetykachuDenzelAbaya
      @TweetykachuDenzelAbaya 22 дні тому +1

      Salamat! Ang susi ay ang paggamit ng mga python embeddable packages, bagama't maraming mga hakbang upang maihanda ang isang package at tumakbo nang tama ☢☢☢☢

    • @TweetykachuDenzelAbaya
      @TweetykachuDenzelAbaya 22 дні тому +1

      @Jarods_Journey Bakit hindi bawal ang walang GUI interface para dito? Mekus ito ay kailangang sa pamamagitan ng isang terminal o browser? Napaka primitive nito!

  • @cuccurese
    @cuccurese 8 місяців тому

    I did everything you told in the video, after all, my audio speech has an American accent, but my audio is in Italian language. :D i spent so much time and training.

    • @prizegotti
      @prizegotti 8 місяців тому +1

      It's not trained for Italian. Just American English and Japanese.

    • @cuccurese
      @cuccurese 8 місяців тому

      @@prizegotti Thanks!!!!

  • @RobertSmith-kj6eb
    @RobertSmith-kj6eb 10 місяців тому +2

    Bro, I got this working real quick. It is amazing. I copied and pasted voices from a different tortoise-tts and it sounds great! Thanks for sharing!

    • @Samuel-wl4fw
      @Samuel-wl4fw 10 місяців тому +3

      Where do you find some available voices? I tried to look but couldn't find any

    • @leighenhenkelman8648
      @leighenhenkelman8648 10 місяців тому

      I'm looking for voices too!@@Samuel-wl4fw

  • @Random_person_07
    @Random_person_07 10 місяців тому

    Thanks so much for making this! it's awesome keep it up!

  • @DM-dy6vn
    @DM-dy6vn 6 місяців тому +2

    5:12 As far as "Samples" are concerned, I noted that the "sample_batch_size" is implicitly set to 16 in the code. You can see it in the console when generating. Having "Samples" set to 16 means that there is one batch to process. If you set Samples=100, then 6 full batches will be processes + 4 samples in 7th batch. The time needed is nearly proportional to the number of batches. Having said that, it is not "exponential". The iterations behave close to square root. Quadrupling "iterations" would approx. double the processing time. A batch of samples will be placed in VRAM, and depending on the length of a text chunk, it could push your GPU to the limit as far as VRAM is concerned. Setting "Samples" to something lower than 16 will free VRAM, but potentially lower the quality, since less samples will be used. Do not feed it overly long sentences. Use "Line delimiter" to separate your sentences during processing. You should avoid GPU using "Shared GPU memory" (my RTX 3090 can do this), because by opting for the PC RAM the processing will become even slower (slow data swapping).

  • @joshuadelacruz3907
    @joshuadelacruz3907 9 місяців тому

    Thanks, mate! This is such an awesome job!

  • @schakuun1995
    @schakuun1995 8 місяців тому +1

    Genuis!, great Tutorial thanks :)

  • @HaloRian
    @HaloRian 10 місяців тому

    Thanks for the work ! And the tutorial ! I have leave a subscripton to your channel ! Hope you are well and Start good into the New year!

  • @LucidFirAI
    @LucidFirAI 7 місяців тому

    I am in love with this install method! Your tutorials a year ago were usable but kinda hard to follow, this method however is f'ing perfect :)
    Is there a way to control tortoise through command line so I can run it with a batch file? What is the best way to run it for stable outputs at the expense of perfection?

  • @syrcon
    @syrcon 9 місяців тому

    Your videos are Awesome Jarod! You do such a good job explaining how to install and setup these repositories (even going the extra mile to fork them yourself to make them easier to work with)!
    Is it possible to fuse two voices together, or is it viable to train a model by combining two datasets from two different speakers?

    • @Jarods_Journey
      @Jarods_Journey  9 місяців тому +1

      Appreciate it! For tortoise, I believe if you train on two voices, you get a mix or average between the two as this does occur when you use two different files as reference audio files. I actually haven't yet tried this for training so this may be a useful experiment to try.

    • @syrcon
      @syrcon 9 місяців тому

      @@Jarods_Journey I'll have to try it out as well. I assumed that it would have negatively impacted the training of the model, but if it instead blends the two, then that would be really interesting.

  • @WackFPV
    @WackFPV 7 місяців тому +1

    Man those 4090s! I'm still sticking to my 3080ti, feels like i'm shelling out twice as much money for a very very very small upgrade...

  • @hamsteralliance
    @hamsteralliance 10 місяців тому

    I haven't been able to find an answer to this, so I'm hoping you can help. What's going on when RVC training spits out a "nan"? More specifically, will it cause problems?
    My training output will look like: loss_disc=4.060, loss_gen=2.968
    Then 15 epochs later I'll get a: loss_disc=nan, loss_gen=nan
    If I stop and restart training, it'll resume from the last checkpoint and start displaying normal numbers again. Anything you know about this would be appreciated, thanks! :D

    • @Jarods_Journey
      @Jarods_Journey  10 місяців тому +1

      Mmph, nan is some undefined number. I'm not sure what causes it, but I've seen people report this occuring on logs. If you can still train successfully without problems, then you should be fine

    • @weightlossmotivation4070
      @weightlossmotivation4070 10 місяців тому

      If you are trying to finetune the model and using the weights from the previous training instead of the base D and G pth, sometimes the generators die. So maybe stick with the base weights if you have changed them. Also you might have not trained them on enough steps (talking about the finetuned weights).

  • @bobbyboe
    @bobbyboe 10 місяців тому

    Thank you thank you man... finally I have this thing running! Question: Does DeepSpeed make a diffrent in Quality aswell?

    • @Jarods_Journey
      @Jarods_Journey  10 місяців тому +2

      Deepspeed does not as far as my observation sinces it's just parallelizing the process of the autoregressive model to make it faster. At least that's my understanding of it :)!

  • @Chriscs7
    @Chriscs7 7 місяців тому +1

    11:56 - What model is better in the generate tab ? base, whisperX or something else?
    You need to explain what gives the most accurate cloning not only what is faster to train

  • @sin_z1
    @sin_z1 10 місяців тому

    Mssive Respect to you my dude. Really needed this

  • @shiviarora4173
    @shiviarora4173 Місяць тому

    this video is so helpful damn, thanks bro

  • @csiguszfoxoup
    @csiguszfoxoup 10 місяців тому

    Thank you! Amazingly explained!

  • @soundgif
    @soundgif 7 місяців тому

    Hey, thanks for this awesome video.
    Question - how is the autoregressive model tuned without the VQ-VAE? Since CLVP and CVVP operate on the VQ codes produced by the autoregressive output, wouldn't this harm selection of the samples generated by the autoregressor?
    I understand that the downstream diffusion model (and presumably the hifigan) operate on the final latents produced by the autoregressive model (and not the codes), so in theory this could be used to tune the autoregressive model weights, but wouldn't it result in poor sample selection performance -- since the autoregressive mel code head can't be trained without the VQ-VAE?
    Also, just curious - why choose to train the autoregressive model without training the diffusion model (possibly in tandem)? Has any experimenting been done in this area?

    • @Jarods_Journey
      @Jarods_Journey  7 місяців тому +1

      We do have the VQVAE, it's the dvae.pth model inside of the models folder. I'll give you the 2 blogs posts about this: 152334h.github.io/blog/tortoise-fine-tuned/ and 152334h.github.io/blog/tortoise-fine-tuning/ which are better explanations than I can give at the moment.
      As for training the diffusion model, I don't have a strong enough understanding yet on what finetuning would do for it, but as far as my understanding is with the AR model, we are training in new representations for the tokens in its vocabulary so that it can output appropriate mel tokens for whatever dataset you use.

    • @michaelmezher9635
      @michaelmezher9635 7 місяців тому

      Wow! Wish I knew the VQVAE was available before!
      I'd think tuning the diffusion model may be useful for dramatically different voices from whats found in libritts, since theoretically the space of what can be represented in the diffused Mels is limited to these voice characteristics.
      This is especially true because the diffusion model is trained (fine tuned after autoregressive model convergence) on the autoregressive latents, not the Mel codes.

  • @jurandfantom
    @jurandfantom 10 місяців тому

    Just noticed that you synch your voice with video

  • @datorresramos
    @datorresramos 9 місяців тому +1

    Nice video, super easy to understand how to install this Tortoise TTS, i have a question how can i access the webgui from another computer on the same network ?

  • @simonhadid5894
    @simonhadid5894 Місяць тому +2

    Thanks bro you are e very nice helpful guy. I am amazed with your explanation. Please I need to download the file but it is too big. Do you have a torrent link? Or may divide it to 14 zip files or less. Thanks and a big big thumb up! :)

  • @Cadaveri
    @Cadaveri 9 місяців тому

    Thank you so much for this release. Finally something that anyone can install and understand without problems!
    Btw are there any sort of pre-trained datasets or sound file databases available anywhere on the internet that you know of? (popular video game characters etc)?

    • @Jarods_Journey
      @Jarods_Journey  9 місяців тому +1

      Np! As for dataset, I'm not sure, but am pretty sure the audio exists somewhere out there on the web!

  • @Starpluck
    @Starpluck 10 місяців тому

    Thank you for this tutorial. I will ensure you will be greatly rewarded for it. --Tutankhamun

  • @SirChogyal
    @SirChogyal 8 місяців тому

    I love this. But unlike other applications, why is this AI voice cloning messed up with large files?

  • @jeffisgett
    @jeffisgett 7 місяців тому

    Jarod, quick (and hopefully not too stupid) question: I am using a somewhat dated graphics card (GTX 1070), does this prevent me from doing local voice cloning? Sorry if this is already answered elsewhere, but I'm a little overwhelmed by all that's out there, and I'm hoping to be able to use voice cloning and voice changeover to do some very small, short, independent movies. Helping with minor audio edits without needing individual actors to return just to change a phrase or an inflection, etc.

    • @Jarods_Journey
      @Jarods_Journey  7 місяців тому

      I think you should be fine... 4gb of vram I think on that card? But it'll be very slow. If you try and run into out of memory errors, it might be too small

  • @dezenzplay
    @dezenzplay 9 місяців тому

    Thank you for all your work to keep the project alive! :)
    I've already created some fun gifts for a few friends with Tortoise TTS in the last year and without your help and videos, I would never have thought of it!
    I can't wait to see how fast generating with DeepSpeed works now with the updated version!
    I do have one question though, unfortunately I couldn't find it in the wiki of the original repo or anywhere else on the net. Is there a possibility or a command to save several sentences in the input prompt as separate audio files instead of a combined .wav-file which includes all the sentences? I'm planning to create a kind of podcast with two speakers, for which I'll copy the entire dialogue of a single speaker into Tortoise and then repeat the whole thing for the second speaker. Then I'll put the individual snippets together in Audacity. It would therefore be easier for the project if the WebUI produced individual sound snippets directly instead of cutting the coherent .wav-file by hand. :D
    EDIT: Okay, that's cleared up. I finally figured out how to load the model at the beginning using a JSON-command so that the appropriate autoregressive model can be loaded for each speaker. For people who also want to try this, the command is e.g:
    {"voice": "Peter", "autoregressive_model": "./training/Peter/finetune/models/5000_gpt.pth"} Text for the prompt
    But it seems like, if you're doing this method with changing models and DeepSpeed, there will occur a CUDA error and it asks to recompute voice latents. But doing it with the involved models doesn't seem to work.

    • @Jarods_Journey
      @Jarods_Journey  9 місяців тому +1

      There should be an option to save uncombined sections individually, should be in the settings somewhere I think.
      As for the error, deepspeed for some reason doesn't like being unloaded and reloaded for new models which is why I show that you have to restart TTS each time you change models. Idk why this is the case and I spent quite a bit of time trying to find out why 😅

    • @dezenzplay
      @dezenzplay 9 місяців тому

      @@Jarods_Journey Thank you very much for your feedback! :) I'll have a look, but luckily I've now found an even better way with the method mentioned above.
      Ah okay, so that's the reason behind the reload of TTS. But well, you can make sacrifices at the cost of speed. :D Thanks for all the videos on the subject so far, the new implementation with RVC on your part could also be very useful for the future!

  • @Elrevisor2k
    @Elrevisor2k 8 місяців тому +1

    How do you create a voice model? For other languages? Great video

  • @thebigbigdaddy
    @thebigbigdaddy 9 місяців тому

    great video - did you ever entertain to integrate this with Twilio for creating phone gpt agents?

  • @pogiman
    @pogiman 7 місяців тому

    it worked!! thanks man!!

  • @MrTompkins
    @MrTompkins 9 місяців тому +1

    I get a file not found error when running the start.bat file, but the file does seem to be there! - FileNotFoundError: Could not find module 'D:\Games\vc
    untime\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

    • @pb2806
      @pb2806 9 місяців тому

      Install CUDA with all its features. That's what I had to do to fix this

  • @CptTurk81
    @CptTurk81 8 місяців тому

    This is amazing. I can see there's an api option, do you have any guides on how to use it programmatically? Say for automation?

  • @gu9838
    @gu9838 10 місяців тому

    will try it out had issues with the cloning part a wile back so we will see thanks!

  • @Parsitube_yt
    @Parsitube_yt 5 місяців тому

    i wish there was such a TTS for persian language as well

  • @neros1277
    @neros1277 7 місяців тому

    spent few hours learning from your videos on tortoise tts then tried to make my own module, i decided to go with cloaker from payday 3, result was stupiditely high pitched voice that sounded like shit, trained it on 10 clips 5 second each, set epochs to 200, would you say i should use more samples and more epoch on training? also should samples of character yellign and speaking soflty be traned together or make it separate modules?

  • @yaracorreia8209
    @yaracorreia8209 10 місяців тому +2

    Thank You so much for all your content! Really Awesome

    • @al3x__0
      @al3x__0 10 місяців тому

      reinstall thats what I did and it worked

    • @yaracorreia8209
      @yaracorreia8209 10 місяців тому

      @@al3x__0 were you able to add new voices using the .PTH voice model? for tts?

    • @Jarods_Journey
      @Jarods_Journey  10 місяців тому

      You have to use tortoise models, and they would need to be placed in training.
      It would look something like this:
      training/name of folder/finetune/models/put the tortoise tts models here.

  • @negociodenerd
    @negociodenerd 9 місяців тому

    Congratulations on the work, I've been following you for a few months now. I would like to know how I can create a model in other languages and make voice cloning at least acceptable.

  • @DM-dy6vn
    @DM-dy6vn 6 місяців тому

    5:12 For the sake of speed (without decrease in quality), you should definitely use "Half precision" (see Experimental settings).

  • @vrtech473
    @vrtech473 10 місяців тому +1

    nice one ❤ Thanks!

  • @midnitejesus
    @midnitejesus 5 місяців тому +1

    My model came out sounding nothing like it was trained on. I had 2300 super clean chopped samples for a character and realized my 3080 would take forever. I trained on 250 samples over 3 hours. The output was 7 models, from 60_gpt to 402_gpt. I tried them all and the voice is simply pitched too high and sounded nothing like the source files. I followed your instructions to the T. Any suggestions?

  • @NoahMine1
    @NoahMine1 4 місяці тому

    wait so do you need the training part the voices dont sound bad without training its not much of a difference

  • @kaziahmed
    @kaziahmed 6 місяців тому

    Follow the steps, got this error: Something went wrong
    'tuple' object has no attribute 'squeeze'

  • @reesaoldyear7763
    @reesaoldyear7763 10 місяців тому

    Thanks for your video. What languages does it support?

    • @Jarods_Journey
      @Jarods_Journey  10 місяців тому

      English only, but others could be supported if you were able to add a tokenizer

  • @paul.j478
    @paul.j478 10 місяців тому

    that's freaking awesome!!

  • @madokahomura929
    @madokahomura929 7 місяців тому

    Thanks all worked great.
    UPD: I fixed it. If anyone encounters same thing just increase your paging size or set it to automatic.
    But suddenly I started running into problem. With tortoise TSS it simply doesn't load whisper larger (or higher model) if though everything worked perfectly. It just freezes with connection error. In configuration I get:
    "Batch size exceeds validation dataset size, clamping validation batch size to 0"
    and when I finally press train, I get
    "UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`."
    and it freezes. Sometimes this error appears
    "dll load failed while importing _iterative: the paging file is too small for this operation to complete."
    or
    "CUDA out of memory. Tried to allocate 12.00 MiB. GPU 0 has a total capacty of 15.99 GiB of which 13.53 GiB is free. Of the allocated memory 1.18 GiB is allocated by PyTorch, and 3.90 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
    With RVC the same thing basically. It complains about not having enough memory even though everything worked fine two weeks ago forcing me to reduce data set or batch size.
    Would really appreciate your help. Thanks.
    UPD: It was all because paging file size was too small (512mb). I set it automatically managed size and it fixed it.

  • @HyperUpscale
    @HyperUpscale 9 місяців тому +1

    I have a silly question - why the voice training needs to be done this way and so complicated?
    Jarod, could you please do me a favor and check which years is it now? (Hint It is not 2020...)

  • @SAnsAN091190
    @SAnsAN091190 10 місяців тому +1

    Hi! Thank you very much for your videos! I would like to know if you have tried to train models in languages other than English? What are the successes?

    • @Jarods_Journey
      @Jarods_Journey  10 місяців тому

      I still haven't trained other languages unfortunately 😅

    • @ForTheEraOfLove
      @ForTheEraOfLove 10 місяців тому +1

      @@Jarods_Journey The docs are so convoluted when dealing with the language switching and training. I look forward to more tutorials from you brotha

    • @SAnsAN091190
      @SAnsAN091190 9 місяців тому

      @@Jarods_Journey It's a pity. We look forward to more content like this from you in the future! 🤗

  • @bridicot
    @bridicot 10 місяців тому

    Great tutorial. I wanted to buy a desktop to do this. Will an RTX 4070 work? I am thinking of having 32GB DDR5, a 2TB SSD, and either an i7 or i9 CPU.

  • @Soljarag5
    @Soljarag5 9 місяців тому

    Thanks so much for ut tutorials! What does the temperature setting do?

    • @Jarods_Journey
      @Jarods_Journey  9 місяців тому +1

      Temperature is kinda like randomness. Higher means possibly more random and unstable, lower is more deterministic and stable.

    • @Soljarag5
      @Soljarag5 9 місяців тому

      @@Jarods_Journey thanks man

  • @ash3844
    @ash3844 8 місяців тому

    Hi, Thanks for the content. Does it work on Ubuntu? or only windows? facing few issues while running on ubuntu 22

  • @kiranaric
    @kiranaric 7 місяців тому

    Your channel is excellent, I'm able to get some AI voices up and running without having to learn much of coding. I am encountering an error in this particular case though - in regards to training a new voice. When I click 'Validate Training Configuration', every option in Generate configuration page is turning into a red Error message and I get the notification 'Empty Dataset'. How do I solve this? I did follow the prepare Dataset step before I went to this page. EDIT: I was able to rectify the error. Looks like there was an error with the Prepare Dataset phase the first time for the train.txt file was empty. I did a TTS Reload from the Settings, deleted the voice training data entirely and created a dataset afresh and it worked this time. I am currently in the Run Training page. Feeling positive! Thanks for you awesome work!

  • @HistoryIsAbsurd
    @HistoryIsAbsurd 9 місяців тому

    Worth the sub thanks alot

  • @kapteinkonyn3450
    @kapteinkonyn3450 7 місяців тому +1

    When clicking on generate, I get: Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

  • @parmesanzero7678
    @parmesanzero7678 9 місяців тому

    Is there an ideal script for voice training? That is, is there an ideal series of things to have the speaker saying to get the best results for new speech from the voice model?

  • @Skalekul
    @Skalekul 4 місяці тому +1

    Do you have any idea why custom trained models don't work using hifigan, which produces the error 'tuple' object has no attribute 'device'

  • @thisisashan
    @thisisashan 9 місяців тому +3

    So I followed step by step, and for some reason Tortoise TTS just pauses when I try to train...
    Seems to be related to the /usr/cuda files missing.
    Already a bug posted in the Git repo. Don't want to spam you but currently this tutorial doesn't work. missing /cuda/lib64 or whatnot error.

    • @samodzielny4504
      @samodzielny4504 7 місяців тому

      Same problem

    • @corbinangelo3359
      @corbinangelo3359 5 місяців тому

      I had the same error. How much Vram does your card have. Mine only has 8GB, lower all the setttings, like 100 epocs, batch 4, gradient 2 etc. It worked for me.

    • @thisisashan
      @thisisashan 5 місяців тому +1

      @@corbinangelo3359 my issue was fixed on his git repo
      The current releases no longer have this issue, is my understanding.
      Also, if you don't want to lose quality, there are low memory flags you can use to do higher res pictures

  • @DYLOGaming
    @DYLOGaming 6 місяців тому

    Yo! Any reason why my vocals end up sounding super robotic? I'm using custom vocals, but idk why they sound filtered and very bad. Any assistance would be greatly appreciated!

  • @trailboss-
    @trailboss- 2 місяці тому +1

    what is the best free, thats not nvidia? I have AMD video card, thanks for all your videos

  • @shovonjamali7854
    @shovonjamali7854 7 місяців тому

    Another great one! But can you show us, how can I run this thing in google colab as I don't have sufficient hardware to run this?

  • @ErnestoPossiSpanishVoiceOver
    @ErnestoPossiSpanishVoiceOver 6 місяців тому

    Do you know how to make it the same software in a Mac system? Great video Jarods!

    • @akshatgoel5708
      @akshatgoel5708 2 місяці тому

      Have you found a way or to make it work ?

  • @MrUsamamubeen1
    @MrUsamamubeen1 6 місяців тому

    Once the model is trained, can we delete those audio files instead to just putting them in "backup" folder? Or will they be needed for something else. Just asking because I like my folders clean without any unnaccessary files. Thank you.

    • @Jarods_Journey
      @Jarods_Journey  6 місяців тому +1

      Yep, you can delete them. If you don't need them for additional training, it just takes up space!

    • @MrUsamamubeen1
      @MrUsamamubeen1 6 місяців тому

      @@Jarods_Journey Thank you

  • @poszukujacprawdy
    @poszukujacprawdy 8 місяців тому

    Hi Jarod, can I train my voice in Polish language as well or is it only for English?

  • @mohamedemam-5807
    @mohamedemam-5807 10 місяців тому

    thank you for your useful content :), i have a question . is it possible to use the trained models in rvc in tortoise tts ?

    • @Jarods_Journey
      @Jarods_Journey  10 місяців тому

      Unfortunately not, they're different architectures so it won't work

  • @Razor7557
    @Razor7557 3 місяці тому

    Any suggestions how to make it clone a voice that had certain effects applied to it? Namely I mean Mr. House from Fallout New Vegas. I have the voice files from the game, but they have a slight "speaking through speaker" effect applied to them(Which is kinda important to keep too...), and the results are pretty bad, sounding nothing like they should and/or turning into completely another voice from one sentence to another.
    Should I try making entire model with them instead? If so what would be recommended settings?

  • @Dr.Lagosta
    @Dr.Lagosta 2 місяці тому

    About the data set to train the model... what should I do to train my own voice? Record random text? How many clips, and what about the length?

    • @Jarods_Journey
      @Jarods_Journey  2 місяці тому

      I'd recommend recording your own voice, reading out a book or something similar. How you want the model to sound, is how you should record your voice

  • @KeremYurtsevenOfficial
    @KeremYurtsevenOfficial 6 місяців тому +2

    I already trained a voice model. So I only have a pth and an index file. How can I use those on TTS?

    • @corbinangelo3359
      @corbinangelo3359 5 місяців тому

      I'm very curious about that too, If you figured out a way. please let me know.

  • @LongevityLotusInn_
    @LongevityLotusInn_ 10 місяців тому

    Thank you so much Jarods. I'm wondering is it possible to set "speaking speed" when using tortoise?

    • @Jarods_Journey
      @Jarods_Journey  10 місяців тому +1

      Unfortunately not, that is a randomized feature

    • @LongevityLotusInn_
      @LongevityLotusInn_ 10 місяців тому

      ​​@@Jarods_JourneyThank you for your reply!❤ I also find that it is easy to pop up "CUDA out of memory" on my computer, so is there are any chance to run it online?

  • @RobertJene
    @RobertJene 9 місяців тому

    gettin ur 2024 video in early I see

    • @Jarods_Journey
      @Jarods_Journey  9 місяців тому +1

      If I put 2023 on it, it'd be outdated a month later 😂

  • @ScottBrown-s6q
    @ScottBrown-s6q 3 місяці тому

    Hey, so sorry for commenting again - I followed the tutorial and cloned my voice and it's great - but whenever I click generate it goes through the generating autoaggressive samples process again, even though I have it set on the settings page
    Anything spring to mind as to what I might've missed?

  • @23Puck666
    @23Puck666 24 дні тому

    NICE RE:ZERO REFERENCE!