My Top 5 Open Source Text to Speech Softwares Starting off in 2024

Поділитися
Вставка
  • Опубліковано 29 вер 2024

КОМЕНТАРІ • 189

  • @RobertJene
    @RobertJene 8 місяців тому +25

    ⌚ Timestamps
    0:00 - introduction
    0:14 - Suno Bark
    1:22 - Valle-X
    3:00 - StyleTTS2
    4:07 - CoquiTTS - XTTS
    5:40 - Tortoise TTS

    • @ahmetab06
      @ahmetab06 8 місяців тому +1

      which is the best tts ?

    • @kanavwastaken
      @kanavwastaken 8 місяців тому

      ​@@ahmetab06Tortoise

    • @RobertJene
      @RobertJene 8 місяців тому

      @@ahmetab06 watch the video

  • @Seeker_Now
    @Seeker_Now 3 місяці тому

    Hey man, thanks for your amazing work. I can't find any info on this-Is there a way I can integrate these trained AI voices with Balabolka? Hope you can answer this.

  • @grahamcookfera934
    @grahamcookfera934 11 днів тому

    Clark Amy Jones Maria Wilson Brian

  • @dontrez8412
    @dontrez8412 5 місяців тому +8

    Pretty good results. I didn't think that first Bark voice bad, though. Thanks for the comparisons.

  • @jonathandawson3091
    @jonathandawson3091 4 місяці тому +5

    Hi can you please make a tutorial of the Audiobook Maker, or how to create such pipelines? In particular you mentioned something along the lines of "RVCS" which seemed to make a dramatic difference in the last voice that you demonstrated! How is it done?

  • @LucidFirAI
    @LucidFirAI 8 місяців тому +1

    I would love for you to update ua-cam.com/video/zRjLFFU3INg/v-deo.html text to sing...

  • @vikramr60
    @vikramr60 8 місяців тому +33

    Coqui TTS is not open source, means it can't be used for commercial purposes,only for research and educational

    • @motionmix2523
      @motionmix2523 6 місяців тому +2

      It says commercial use now.

    • @lukasnesvarbu1485
      @lukasnesvarbu1485 6 місяців тому +1

      @@motionmix2523 i think its because it died

    • @maikelm20
      @maikelm20 6 місяців тому +1

      Coqui XTTS is. Not for commercial use.
      It does have others models which are open source

    • @willmedrano98
      @willmedrano98 3 місяці тому +3

      Not sure if CoquiTTS is open source or not, but open source does not mean that you can use for commercial purposes.

    • @DihelsonMendonca
      @DihelsonMendonca 2 місяці тому

      There are two different categories, tts-1 which is free and hd voices, which are not for commercial use.

  • @Superchunk-k2h
    @Superchunk-k2h 5 місяців тому +2

    Is there any tool that doesn't require downloading a 70gb model

  • @chaks2432
    @chaks2432 8 місяців тому +6

    I built a GUI for XTTS using flask and svelte and finally got rvc running yesterday. Got inspired by your audiobok_maker, but it was missing some features I figured could be pretty useful (Like allowing users to edit text inside the GUI, add/delete/reorder lines), I'm pretty happy with the result, even if the UI looks like crap and it's still a little buggy. I also got everything to run together, so I don't need the ai-voice-cloning webUI running for it to work

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому +1

      Awesome! Usually it's best to get things working first, then you can make it look pretty. Glad to hear!

    • @Hellosirrrr
      @Hellosirrrr 8 місяців тому

      You’re my new idol! What’s your GitHub?

    • @TheMegadeth350
      @TheMegadeth350 6 місяців тому

      Hey. I am working at similar project by myself and I have a couple questions. Could you please give me some contact to yourself??

  • @rohanrjoshiimakemyvid7285
    @rohanrjoshiimakemyvid7285 4 дні тому

    Can you suggest some totally free APIs that I can integrate on my website. I am looking for "videos" related API. It will be great help

  • @DihelsonMendonca
    @DihelsonMendonca 2 місяці тому

    💥 You chose some really weird voices, bro. Looks like a horror movie. These japanese voices suck. Some of these. I use coqui and it has fantastic voices. Also, you didn't even mention the best one: ELEVEN LABS. Unmatched ! 🙏👍💥

  • @spiritual_audiobooks
    @spiritual_audiobooks 7 місяців тому +1

    A Open Source local, fast neural text to speech system that sounds great is Piper TTS.

  • @keithmorse9716
    @keithmorse9716 6 місяців тому +1

    It seems like you have this targeted for probably a broader audience than I am. attached to. because I'm dyslexic, so I'm trying to find something to help me keep interested in material while having. difficulties being able to consistently read materials.

  • @williamwallace9826
    @williamwallace9826 4 місяці тому

    Why did you bother making this video? It is NOT helpful. Text-to-speech is about reading typed text, it is NOT about recording and synthesizing your voice.

  • @gabrielv.4358
    @gabrielv.4358 4 місяці тому

    Very good recommendations, but they need to be installed, which I dont like... And you probably need a $1000 GPU also.

  • @HugginsAmy-d3h
    @HugginsAmy-d3h 8 днів тому

    Walker Mark Robinson Kevin Rodriguez Sarah

  • @nielsieboy19
    @nielsieboy19 8 місяців тому +2

    From what I've seen StyleTTS does a much better job of cloning a voice, it's also an order of magnitude faster than Tortoise. Only thing holding it back are the absolutely mental VRAM requirements for training and multilingual models (which are being worked on by the community).

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому

      The samples I've heard have been pretty awesome and I agree on the speed as well

  • @THEKL7773
    @THEKL7773 2 місяці тому

    they all feel like lazy code people trought out there, none of them even have a proper Ui or can even be considerd a proper program. You have to literallty do all the work rather than just getting something you can just trough text and have it work, imagine actually working on a full book with this itd be a nightmare. Why are you all just cool with this level of shit.

  • @Lenox-bp3lu
    @Lenox-bp3lu Місяць тому +1

    Which one of these did you use for the accent conversion at the start of this video? Please and thank you

  • @SyamsQbattar
    @SyamsQbattar 2 місяці тому

    Does those LOCALs AI Voices support Indonesian language?

  • @NathanieiLaura-l9j
    @NathanieiLaura-l9j 18 днів тому

    Lee Brenda Thomas Shirley Gonzalez Richard

  • @mohsenghafari7652
    @mohsenghafari7652 3 місяці тому

    hi
    coquiAI library support Persian language ?
    thanks

  • @BuyTrustpilotReviews-vn8bu
    @BuyTrustpilotReviews-vn8bu 26 днів тому

    Williams Elizabeth Thomas Lisa Anderson Laura

  • @davidtindell950
    @davidtindell950 7 днів тому

    NEW SUBSCIBER ! Arigato 😮

  • @OleneHadden-v2p
    @OleneHadden-v2p 19 днів тому

    Williams Timothy Thomas Anna Anderson David

  • @SophiaBlanche-y4v
    @SophiaBlanche-y4v 14 днів тому

    Miller William Martin Angela Rodriguez Angela

  • @ferysery
    @ferysery 6 місяців тому +1

    hi . where can i get my hands on ur AUDIOBOOK MAKER windows desktop app?

  • @MildredRubio-s6g
    @MildredRubio-s6g 27 днів тому

    Lopez Laura Taylor Jessica Anderson Melissa

  • @BuyTrustpilotReviews-vn8bu
    @BuyTrustpilotReviews-vn8bu Місяць тому

    Hernandez Scott Harris George Rodriguez Scott

  • @JohnHernandez-e6v
    @JohnHernandez-e6v 21 день тому

    Perez Angela Brown Charles White Donna

  • @everybodyguitar5271
    @everybodyguitar5271 5 місяців тому

    Bark is really slow when doing training in Mac.

  • @RoyLisi-u4q
    @RoyLisi-u4q 29 днів тому

    Hernandez Linda Walker Patricia Taylor Karen

  • @HOWDO7
    @HOWDO7 4 місяці тому

    Is there any software or TTS tool that has caribbean accents?

  • @Edward_ZS
    @Edward_ZS 6 місяців тому +1

    What option runs the fastest
    And do any of these work without a GPU

    • @Jarods_Journey
      @Jarods_Journey  6 місяців тому

      In this example, style TTS is the fastest. They do work with CPU, it's just much too slow to utilize ATM.

  • @0chiel
    @0chiel 8 місяців тому +1

    Dumb questions(novice):
    -Are these free to use commercially?
    -Can a standard m1/m2 level mac run them locally
    Thank you

    • @psalmy26
      @psalmy26 8 місяців тому +1

      Find their repos and look at their licenses. Anything with a MIT license is, other things get a bit more nuanced.

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому +1

      I'm not a lawyer, but the ones with MIT licensing are Tortoise TTS, bark, and valle-x. Xtts has a non commercial licence for their free stuff and styletts has a unique one where I think you have to have a disclaimer about it... That is unless you train up your own base models.

  • @me-cm8or
    @me-cm8or 8 місяців тому +1

    Does all of these or one of these have like a local API that allows you to link them up with other local apps throw API calls?

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому

      Any graido interface you should be able to... But the only one I know for sure is tortoise TTS with the AI voice cloning repo

  • @DavidSeguraIA
    @DavidSeguraIA 8 місяців тому +1

    Thanks so which is the best open source for Spanish tts or voice cloning?

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому +1

      Xtts is your best bet. Just has some licensing things you'd need to look at

  • @svenbjorn9700
    @svenbjorn9700 6 місяців тому +1

    Where can we get the Audiobook Maker app? It’s not linked in the description.

    • @ferysery
      @ferysery 6 місяців тому

      huggingface

  • @idkman8520
    @idkman8520 7 місяців тому +1

    Hi! I love your videos!!
    Quick question! I made a chatbot... how do i use these voices??

    • @sigma_z
      @sigma_z 6 місяців тому

      You integrate it in.

  • @mikeg9b
    @mikeg9b 4 місяці тому

    It's 2024 and I still use espeak.

  • @christophermoreira6198
    @christophermoreira6198 3 місяці тому

    What about PiperTTS?

  • @beetlejuss
    @beetlejuss Місяць тому

    I am just starting to learn about TTS, but it seems you can run an app with different models, for example coquiTTS with bark model or tortoise model etc. But it seems from your video you are comparing models not software. Or maybe I am confused. Also can you please make a video about the best options for CPU, I don't have a GPU.

  • @FatheredPuma81
    @FatheredPuma81 Місяць тому

    If Tortoise is the number 1 then I think AI TTS isn't worth using right now. I just don't have the time to manually transcribe dozens of lines for training and manually regenerate each line a dozen times until it gets it right.

  • @stefanomonziocompagnoni8302
    @stefanomonziocompagnoni8302 8 місяців тому +1

    hi Jarods!
    Nice video!
    I'm looking for software (better if open source) that changes a recorded audio voice.
    I mean, If I record my voice, I would like to use a different voice, keeping my prosody, tone, speed, etc....just changing the timbre.
    Any advice?

  • @wagnerfreitas3261
    @wagnerfreitas3261 4 дні тому

    thanks bro

  • @AstridKey
    @AstridKey 3 дні тому

    more on this

  • @marcusunivers
    @marcusunivers 8 місяців тому

    Is there also some open source text to sing vocal generator. 🤔
    Something like Vocaloid, Utau, SynthV or ACEStudio where you can also add Midi information to your vocal to pitch it? ☺

  • @tylerchambliss8379
    @tylerchambliss8379 8 місяців тому +1

    Hey Jarrod. It's me Tyler again, and I'm still having issues training models on my machine with Tortoise. I've set the batch size and gradient accumulation as low as I can and it's still not training. It just gets to the loading auto regressive model and doesn't go any further. Might some of these other TTS models be easier for me to use instead? I'm just about ready to give up on Tortoise because it's been almost 2 months and I still can't figure it out.

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому

      I see your post on GitHub, I'll have to get back to you on this tomorrow!

  • @trilogen
    @trilogen 2 місяці тому

    I don't like how the word Open Source is used loosely. Open Source means it can be used for commercial purposes and none of these give you that type of license.

  • @braivco
    @braivco 3 місяці тому

    Hey Jarod, would you consider building an XTTS > RVC pipeline app similar to what you've built with Tortoise?

  • @danzai
    @danzai 4 місяці тому

    Nice, but i'm not interested in cloning.. just a realistic voice. Would be nice if you made a tutorial on how people can just set these up

  • @expandablevictor7858
    @expandablevictor7858 5 місяців тому

    Suno Bark is a scam, try it out, it is miles away from the advertisement. It doesn't laugh like that by the way.

  • @moqingyu730
    @moqingyu730 Місяць тому

    Hi, excellent work, have you found out which one is the fastest one?

  • @robertbutcher222
    @robertbutcher222 4 місяці тому

    Sorry if this is a bad question for this sort of this video, but is there a way to use one of these in Linux Mint or Ubuntu to read selected text? I like to have selectable text read to me when I highlight text with the mouse cursor. There is a script I could make, if I find the instructions again, but the voice is very robotic. So, I was wondering if one of these could somehow be used, preferably offline.

  • @Ulibert
    @Ulibert 8 місяців тому

    hey can you tell us where to get SAPI5 file of text to speech tagalog accent?

  • @noonesbiznass5389
    @noonesbiznass5389 8 місяців тому

    It's too bad Bark doesn't seem to be maintained or improved in the last year or so. It's the only one that has any form of truly convincing inflections, granted at a very poor quality and lots of hallucinating.

  • @blackswan6386
    @blackswan6386 4 місяці тому

    pro why skip the installation part, how i can get this run ? it says i need python ? would be cool if you can some help

  • @orpit48
    @orpit48 4 місяці тому

    I'm searching for a software to train my own voice models and use it tts, is there an option you could recommend?

  • @Mowgi
    @Mowgi 8 місяців тому +3

    Tortoise wins out, but I'm very interested in seeing more from Eleven's. From these examples, Coqui definitely seems to get the closest to your voice out the box, but the actual quality of the audio sounds very low. Is there a way to set the bitrate?

    • @PROJECTSSourceEngineLessons
      @PROJECTSSourceEngineLessons 8 місяців тому

      if we talk about XTTS, the initial quality is 22 hz, but there is a resample function at 44 hz, of course, artifacts may appear

    • @blender_wiki
      @blender_wiki 8 місяців тому +1

      not about bit rate, you get the file in WAV, the issues is that XTTS model is working with a temporal resolution of 22.05Khz so you must have a very good recording and EQ your voice sample taking this into consideration. If you have certain harmonics that are "flanging" .
      The voice at the beginning of the video is generated with XTTS: ua-cam.com/users/shorts8YyHxD42k-A
      Still not perfect but better than the example shown here just because the sample provided to the model is recorded ans prepared in a better way.
      We are trying to train a fresh model with 44.1Khz data set

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому

      I think the commenters gotcha answered here, but not to my knowledge. That's why running through RVC is kinda an "audio upscaler" as i am using the 48k models in RVC.

  • @Lenox-bp3lu
    @Lenox-bp3lu Місяць тому

    Nevermind my last comment, you answered it at the end of the video

  • @spacemule1
    @spacemule1 5 місяців тому

    suno straight gross

  • @ElmorenohWTF
    @ElmorenohWTF 8 місяців тому

    Please make a tutorial on how to train using Google colab to the AI that you think gives the best multilanguage result

  • @PLAGUEDOCTOR2006
    @PLAGUEDOCTOR2006 5 місяців тому

    how do i have 2 views and 129 comments i think you guys are bots

  • @hassanawan3622
    @hassanawan3622 6 місяців тому

    How can I deploy a Tortoise TTS trained model on the Flask web App ?

  • @ea02ca6f
    @ea02ca6f 7 місяців тому

    Why not order the links in the description in the same order they are mentioned in the video with missing links added?

  • @ruudygh
    @ruudygh 6 місяців тому

    What is that Audiobook? how does it magically makes a bad audio to become a good audio?

  • @Morpheus.999
    @Morpheus.999 3 місяці тому

    i can't find the audio sample at 3:21 , how could download this?

  • @davidhazout6009
    @davidhazout6009 Місяць тому

    Very smart !! do you have a French voice for me ?

  • @king-zu3ih
    @king-zu3ih 6 місяців тому

    can you suggest any ptocject can make an audio when people sing or rap a song. thank you

  • @RobertJene
    @RobertJene 8 місяців тому

    intro be like "why are you bri-ish"

  • @khajask8113
    @khajask8113 4 місяці тому

    Which one is best for clone my own voice..?

  • @qodeninja
    @qodeninja 7 місяців тому

    Do you have videos on setting this up? oh yes look at that you do

  • @Nightcortex
    @Nightcortex 3 місяці тому

    How can I get pre trained models?

  • @KJ7JHN
    @KJ7JHN 2 місяці тому

    many of these voices are fantastic!

  • @sujetodelta1019
    @sujetodelta1019 6 місяців тому

    i have a question, if i plan to use any tts for voicecalls and use it with virtual cable any of these can help instead of download the audio files?

  • @trush1090
    @trush1090 8 місяців тому

    Hi Jarod can you make a video on how to resume training? Say I finished training at 50 epochs. How would I add 50 more without resetting.
    Also how to eliminate static sounds from generated sounds. I trained 2 hours on 60 epochs just for it to have a static sound.

  • @opaleyeakintunde4827
    @opaleyeakintunde4827 8 місяців тому

    How can i finetune or Configure , my Tortoise TTS to clone good just like yours ? Thank you

  • @breakmillions2347
    @breakmillions2347 5 місяців тому

    not gonna show how to run it locally?

  • @ALAN-lv1zj
    @ALAN-lv1zj 6 місяців тому

    Are all this free for commercial use

  • @Because_Reasons
    @Because_Reasons 7 місяців тому

    What does RVC after do? Do you have a tutorial?

  • @lismoiunehistoire
    @lismoiunehistoire 6 місяців тому

    are these free for commercial use?

  • @johnlenoob6951
    @johnlenoob6951 8 місяців тому

    Hi Sir, thanks for all of your original and well done content !!! May you give some tips and tricks to become an organized guru dev as you. I m a lame at managing my python env and all ai project ;) Tried env, conda/miniconda ...

  • @blender_wiki
    @blender_wiki 8 місяців тому

    a short comment just to wish happy new year and improve your YT engagement score.
    I think you must check your recording workflow and final quality sample because all your voices sound too robotic compared to to what i am used with this tools. Maybe you record with the mouth to close to the microphone and the low frequency of you voices are too present and this models dont like a too much low frequencies, maybe is just enough doing an EQ with -3db under 80Hz

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому

      Happy New years too! It's funny you mention that because the dataset on my voice is a pretty crappy one. I just grabbed it from a UA-cam video and the eq I do on my UA-cam videos are generally bass enhancing 😅. Id say models trained on my voice are not as good as others that I've done with other voices

  • @RobertJene
    @RobertJene 8 місяців тому

    you forgot to put the link for Tortoise TTS in the descriptables

  • @enriquemontero74
    @enriquemontero74 8 місяців тому

    I also need to add open voice here

  • @RobertJene
    @RobertJene 8 місяців тому

    6:15 - what do you mean, pipeline from Tortoise TTS to RVC? Like you train a model in Tortoise and then use it in RVC or something?

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому

      This video I think: ua-cam.com/video/MckT7z7W_qM/v-deo.html
      But yeah, run tortoise audio into RVC to make it sound better

  • @poco7193
    @poco7193 8 місяців тому

    With tortoise TTS I have been issues with training it. I will upload my audio for training and go through the first two steps smoothly, when I actually try to run the training it freezes with some text then just never unfreezes no matter how long I wait. Also I was wanting to know what the 2nd software you were using in this video to make the tortoise tts sound smoother. I am trying to make a podcast for a school project and desperately need a smooth tts for some of my characters

  • @Powerlevelover9000
    @Powerlevelover9000 8 місяців тому

    Does anyone here know a good database for tts voice models ? RVC has lot of voice models available but I can't seen to find a good database be it a website , discord etc for tortoise tts.

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому +1

      Tortoise TTS, not too widely adopted unfortunately. Haven't seen anything pop up

  • @Bigjuergo
    @Bigjuergo 8 місяців тому

    Can you explained Nr 5 in more Detai please?
    How does the audiobookmaker work?

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому

      Probably wanna check out this video! ua-cam.com/video/xbheTi1YjnM/v-deo.html

  • @billyindrajaya
    @billyindrajaya 8 місяців тому

    Hi jarod .. for your subscribers why you didnt give us link google colabs?

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому

      Any of the repos that have collabs will be on their githubs

  • @luigivitofrancesco6221
    @luigivitofrancesco6221 8 місяців тому

    Which is the fastest to install? like wokada, so just one click to install everything

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому +1

      Probably tortoise TTS as I have an installable package for that one on UA-cam

  • @k9clubme
    @k9clubme 8 місяців тому

    Thank you very much for sharing your knowledge with us. Is there a way that we can modify a lyric of a song and then make it sound as if the artist is singing the revised lyric?

  • @dohyunio
    @dohyunio 7 місяців тому

    Could you include your mic in your hardware list?
    Great vid!

  • @CooloSolo
    @CooloSolo 8 місяців тому

    No 1 suno is awesome

  • @AntiAnti
    @AntiAnti 8 місяців тому

    Is there any ready-to-use via local http server? I mean, I want to send json http requests from another app and receive audio data.
    I know it should be very easy to do, but python isn' t my thing.

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому +1

      You can use the gradio interfaces for this. Tortoise launched on localhost:7860 and so you can interact with it using the Gradio API which you'd find at the bottom of the Gradio page.

    • @AntiAnti
      @AntiAnti 8 місяців тому

      @@Jarods_Journey Found it. Thanks.

  • @TerrennonPriv
    @TerrennonPriv 8 місяців тому

    By the way, thanks you Jarod, update on my side, for my lore my language project. xtts was the way to go and I'm happy with the results.

  • @timeship
    @timeship 6 місяців тому

    Tanks for everything. BTW, what was that A.I. Audiobook Maker you showed in the video? I can't seem to find it anywhere ;-) THX

    • @Jarods_Journey
      @Jarods_Journey  6 місяців тому +1

      Search up AI audiobook maker on UA-cam, it should be on the search results :)!

    • @timeship
      @timeship 5 місяців тому

      @@Jarods_Journey, I tried, but it lists millions of A.I. voice makers, and not the software on your screen. Which company made it? Give me some clue ;-) THX

  • @ArlGeales
    @ArlGeales 3 місяці тому

    Thank you!

  • @EmpowerMuse
    @EmpowerMuse 8 місяців тому

    Have you tried GPT SoVITS tts?

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому

      Still trying it out, but the audio isn't too bad. It's not completely finished and the process is difficult to follow as there are areas I'm running into difficulties with, so I'm still waiting a bit on it.

  • @dthSinthoras
    @dthSinthoras 8 місяців тому

    Which ones can handly german well? Tortoise was failing very hard when leaving englisch...

    • @Jarods_Journey
      @Jarods_Journey  8 місяців тому +1

      I think xtts is the only multilingual one. Bark is as well, but it's quality is not there

    • @dthSinthoras
      @dthSinthoras 8 місяців тому

      Thank you!