Seems way better then 80% of human narrators or dub I encountered. I still can sometimes hear the ai trembling noise in the background, but imo it is already useful for idk, audiobooks
Huggingface configuration he used is low quality. If you install it locally and get the high quality model which is available to download, and configure it to generate a high quality version, then it generates very smooth, human-like voices without robotic trembles. P.s. Removing Silence gives worse results than not removing it. So, I recommend keeping it un-toggled.
what would be the use cases? we were used to ai pre-built voices for so long, if it mimics the user's voice, would'nt that be weird to hear his voice . The psychological trade-off has to be considered as well. Let's say the traditional approach b/w the user and ai voice agent, after the voice agent responds and the query requirements are met, the user tend to get a satisfaction, but lets say with this approach, even though if the agent gives the correct response, wouldn't be weird to hear it and accept it? users rely on ai agent because of the "trust ability" emotion tat agent knows better, where the "mimicking" model mimics the users voice, will the user get the same emotion they get from speaking with traditional voice agents. "This is rather a discussion than a directed question!" :)
@ostelaymetaule how much computational power do i need for that. And did he give any information about how the dataset should be prepared? Can you provide the video link here?
Doesn’t seem so… Natural. What exactly would be the use-case of something like this?
Seems way better then 80% of human narrators or dub I encountered. I still can sometimes hear the ai trembling noise in the background, but imo it is already useful for idk, audiobooks
Huggingface configuration he used is low quality.
If you install it locally and get the high quality model which is available to download, and configure it to generate a high quality version, then it generates very smooth, human-like voices without robotic trembles.
P.s. Removing Silence gives worse results than not removing it. So, I recommend keeping it un-toggled.
what would be the use cases? we were used to ai pre-built voices for so long, if it mimics the user's voice, would'nt that be weird to hear his voice . The psychological trade-off has to be considered as well. Let's say the traditional approach b/w the user and ai voice agent, after the voice agent responds and the query requirements are met, the user tend to get a satisfaction, but lets say with this approach, even though if the agent gives the correct response, wouldn't be weird to hear it and accept it? users rely on ai agent because of the "trust ability" emotion tat agent knows better, where the "mimicking" model mimics the users voice, will the user get the same emotion they get from speaking with traditional voice agents. "This is rather a discussion than a directed question!" :)
Can i train it in my native language? Do you have any idea?
Yes you can, on UA-cam a guy did that for Japanese, but you will need some training data
@ostelaymetaule how much computational power do i need for that. And did he give any information about how the dataset should be prepared?
Can you provide the video link here?
Hi, thanks, I would like to see local install
We'll consider adding a local install tutorial to our future videos!
you sound like a tts too dude
He's probably a robot.i mean.... What makes you so sure you're not a robot? You think you're human? That's what exactly how robot/cyborg thinks!!
That’s not Spanish, I’m Spanish guy
it sound like shit