Ultravox already does this for a few months now and is open weights (Llama 3.1). You can simply train the projector to leverage a larger Llama model and do function calling as well.
Thank you for pointing out github.com/fixie-ai/ultravox. Really cool project! However, I'd like to seriously compare it against OpenAI's realtime + function calling capabilities. In the past, there have been many open-source voice transcription models claiming to be better than Whisper (which also has an open-source version). However, in all my testing, nothing was ever as good as Whisper accuracy wise. Maybe it's the accent + medical domain? Would really love to see benchmarks on Ultravox and see how it compares!
Oh how exciting, another idea for ambient AI scribe... At least you realize, it seems, that RealtimeAPI renders 90% of the currently available ambient AI scribes obsolete.
Haha, yes, there are so many out there. I agree. However, these are probably going to be "fundamental pieces" that people need to start integrating into their existing products rather than be released as a separate product. AI in my opinion is a "feature" and not a "product". More on this by MKBHD - ua-cam.com/video/sDIi95CqTiM/v-deo.html
Do you want to become a FHIR 🔥 expert? Watch my exclusive webinar recording here: link.medblocks.com/fhir-184245
Expecting more videos like this Dr Sidharth! Insightful
Felt good making this video!
Ultravox already does this for a few months now and is open weights (Llama 3.1). You can simply train the projector to leverage a larger Llama model and do function calling as well.
Thank you for pointing out github.com/fixie-ai/ultravox. Really cool project! However, I'd like to seriously compare it against OpenAI's realtime + function calling capabilities. In the past, there have been many open-source voice transcription models claiming to be better than Whisper (which also has an open-source version). However, in all my testing, nothing was ever as good as Whisper accuracy wise. Maybe it's the accent + medical domain? Would really love to see benchmarks on Ultravox and see how it compares!
This is amazing... theres lots of potential for this. Great demo.
Thank you! It's just the start ;)
Really interesting. Great content, just subscribed and hope to keep learning more stuff, ty
Welcome aboard!
Oh how exciting, another idea for ambient AI scribe... At least you realize, it seems, that RealtimeAPI renders 90% of the currently available ambient AI scribes obsolete.
Haha, yes, there are so many out there. I agree. However, these are probably going to be "fundamental pieces" that people need to start integrating into their existing products rather than be released as a separate product. AI in my opinion is a "feature" and not a "product". More on this by MKBHD - ua-cam.com/video/sDIi95CqTiM/v-deo.html