I managed to get coqui tts running under conda on windows the other day. idk if it would be appropriate, but you can pipe text to it and have it send back a wav. maybe have the web interface dump text to the backend, get the wav, play the file in browser. coqui has some sort of web interface thing on its own but idk if it has an api to plug in to. some of the more elaborate voice models like VITS can have multiple speakers.
I managed to get coqui tts running under conda on windows the other day. idk if it would be appropriate, but you can pipe text to it and have it send back a wav. maybe have the web interface dump text to the backend, get the wav, play the file in browser. coqui has some sort of web interface thing on its own but idk if it has an api to plug in to. some of the more elaborate voice models like VITS can have multiple speakers.