Sam, I can't access the shortened URL links. I can't name this website shortener in my comment but you know which one you are using. it either timesout or is unreachable. Anyone else bothered with this issue?
Good question unfortunately it’s not really possible to fade between them because you need to put the full embedding in at the generation time and you can only put one in.
@@samwitteveenai , ok, so I should iterate word by word from 0.0 to 1.0 for both of the values. 😆 Why not? At least the same sentence multiple times to compare it.
This would be good for people that want to run something like Alexa locally at home. I know some people have been putting together systems for home assistant. While maybe the OpenAI integration might sound slightly better I'd consider this more than good enough to replace that and not have to send your data to OpenAI.
To get a good result you would probably need to mix some real Bahasa audio into the train mix. Or fine tune it later. Might be able to do something with with a phoneme dictionary but really need some example audio
hmm Tiny TTs is definitely an interesting name
Took a bit… 🧿🧿
love to see video on conversation with local agents
Sky is back! Wooohooo!!! ❤❤❤❤
Thanks for making this video.
Were there any instuctions on how to train voicepacks?
No I don’t think they have made any
Very helpful, thanks!
Any chance you could take a look at RealtimeSTT? And maybe put that and Koroko into a single local conversational AI agent?
What I‘d use it for? Voice Chat, based on aya-expanse.
Sam, I can't access the shortened URL links. I can't name this website shortener in my comment but you know which one you are using. it either timesout or is unreachable. Anyone else bothered with this issue?
Is it possible to train own model for some language other than US from scratch?
Yes or you could fine tune this to another language, but you would need some training code as well which currently isn’t in the repo
Is it possible to fade from one voice to another voice? Could help to find great voices. (With values in terminal)
Good question unfortunately it’s not really possible to fade between them because you need to put the full embedding in at the generation time and you can only put one in.
@@samwitteveenai , ok, so I should iterate word by word from 0.0 to 1.0 for both of the values. 😆 Why not? At least the same sentence multiple times to compare it.
This would be good for people that want to run something like Alexa locally at home. I know some people have been putting together systems for home assistant. While maybe the OpenAI integration might sound slightly better I'd consider this more than good enough to replace that and not have to send your data to OpenAI.
Yeah that is how I feel too. It’s not the best but it is damn good .
Very interesting, can we use it as a pdf reader where it reads in real time and not after processing the whole text ?
You would probably process a sentence or a line at a time(maybe even a paragraph to help it with prosody), but should be possible
Transformers js version coming soon from Xenova 👀
Is there a defined context length it can parse and process at a time? I want to test it out for large text sources.
Idk but I just generated 25min long audio file but it took 5-10mins to generate.
Interesting -- definitely is fast for the quality
Do you know how to add a new language, like Indonesian?
To get a good result you would probably need to mix some real Bahasa audio into the train mix. Or fine tune it later. Might be able to do something with with a phoneme dictionary but really need some example audio
@@samwitteveenai Is there a step-by-step tutorial on this?
Yes, adding a new language is what I would be also interested in...
Please enlighten us if you have any clue. 😊
I would appreciate a fine tuning tutorial for a custom voice in any language
Is it better than piper-ttts? piper is sooooo fast and decent