You asked for it - and I delivered | Live speech transcription with OpenAI Whisper STT

Hardware.ai

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 13 кві 2024
Run live speech transcription on Raspberry Pi 5 with faster-whisper and WhisperLive, see the transcription results as they are processed and send the final output to an LLM or TTS. Less finicky than SDL2, WhisperLive instead uses PyAudio for audio capture. Tested with two microphones: ReSpeaker 2-Mics Pi HAT and ReSpeaker USB Mic Array.
How to make whisper.cpp transcribe faster? (audio_ctx explanation)
/ how-to-make-cpp-102337630
Microphones:
www.seeedstudio.com/ReSpeaker...
www.seeedstudio.com/ReSpeaker...
Barebone single-threaded implementation of FasterWhisper transcription from the microphone
gist.github.com/AIWintermuteA...
My fork of whisper.cpp Python bindings
github.com/AIWintermuteAI/whi...
My fork of Whisper live - git clone this to follow along the video
github.com/AIWintermuteAI/Whi...
faster-whisper repository
github.com/SYSTRAN/faster-whi...
Piper TTS
github.com/rhasspy/piper
Наука та технологія

КОМЕНТАРІ • 33

@dad2979 Місяць тому ⁺²
I have been following you for a long time and you have done such a fantastic job of crafting your style and keeping your content relevant. Great job Dmitry!
@Hardwareai Місяць тому
Thank you for leaving this comment!
I'm still refining my style to tell the truth. One of the things I was successful recently (I think) is keeping my videos more to the point, with good flow of information. Now it looks to me I was blabbering way too much in my older videos at times. I cut a lot of stuff now on post-processing if I feel the video is overloaded.
I plan to make some more storytelling-oriented robotics content next half-year, stay tuned and see how it goes.
@exploring-electronic 2 місяці тому ⁺¹
Thank you for making this follow up!
@Hardwareai 2 місяці тому
Appreciate your support
@georgeknerr 2 місяці тому
Excellent work, keep it up!!! Shared on Twitter too.
@Hardwareai 2 місяці тому
Thanks for sharing!!
@jackwarner5445 25 днів тому ⁺¹
I'm trying to make an AI voice assistant and would be completely lost without your videos. Thanks so much!
@Hardwareai 22 дні тому
Glad I could help!
@justquicker5044 2 місяці тому
Thank you so much! You’ve really helped me speed up my project. I normally don’t like and subscribe but I made an exception 🙃. Keep it up!!
@Hardwareai 2 місяці тому
Thank you for your support!
@MrTubertub День тому ⁺¹
Hi there, could you please advise what is the best and easiest way to transcribe mp3 files speech recordings to text with no coding experience at all. Thank you
@MarkD-p2h 12 годин тому
Thank you for sharing your knowledge. I'm trying to do "float16" STT transcription with diarization using WhisperX on an 8GB Pi5, but "the ctranslate2 package does not compile with CUDA support." Per the whisperx readme, I tried to install pytorch v11.8 from the PyTorch pip command, and then I tried the current version, before trying to install whisperx with no joy. Apologies if this is a silly question, but is there a CUDA version that works on a Pi5 GPU (Broadcom VideoCore VII), or must I only use CPU CUDA? What do you recommend? Thanks!
@ameetkarn 8 днів тому
This is too good....I think this should fit in directly with one of my project. Do you have any recommendation for real time TTS ?
@Hardwareai 6 днів тому
Hopefully!
I used espeak before for other projects... it is pretty horrible by modern standards, but does its job.
For this example I used piper TTS - much better quality, but not as fast as espeak.
@inout3394 2 місяці тому
Thx
@Hardwareai 2 місяці тому
My pleasure! Thanks for commenting!
@ameetkarn День тому
hi,
I am getting following error while running the fork..any ideas ?
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy
@domesticatedviking 2 місяці тому
Hey, just wanted to say I really appreciated your last two videos. Will you please be my sensei? Thank you!!
@Hardwareai 2 місяці тому
I appreciate your appreciation! xD
I'd say that I'm already a sensei of sorts... You always can support me on Patreon for some extras, but otherwise simply stay tuned for more videos!
@shakhizatnurgaliyev9355 2 місяці тому
Like!!!Dima, awesome content, what do u think about VOSK API and compare it to Whisper? Great example of PiperTTS. Thank you!
@Hardwareai 2 місяці тому
Thanks, appreciate it!
I'll try it out and compare it - I don't think I'll make a video about it, but maybe a blog article :)
@bystander85 28 днів тому
I've been trying to find a way to make end of speech flag to be more intelligent than just detecting a pause. I find it common that I may have a mental blank, or misspeak, and the delay in my speech incorrectly flags end of speech. It would be interesting if STT systems can continue listening after a pause if it detects an incomplete sentence. Any ideas?
@Hardwareai 22 дні тому ⁺¹
That's a hard one. I don't think this one is solved even in commercial STT engines - e.g. google assistant or siri.
That would require understanding on sentence context. We might be getting somewhere with multi-modal models, such as GPT4o, but I don't think there is anything available to be run on Raspberry Pi format computer.
Also, as a shortcut, perhaps it would be possible to either run a classifier or modify whisper model to output probability of sentence being finished... It's just an idea though, finding out how well will it work is another thing entirely.
@glikoz 12 днів тому
Please advise the hardware setup for offline RAG, TTS, STT
@Hardwareai 10 днів тому
Hard to estimate without knowing the details?
@Onlyindianpj 9 годин тому
Real implementation is using websocket
Idea is
App is transmitting PCM 16k raw audio
WS Server will capture those audio packets
Sent that to whisper ai to get transcription and return to app in json
@Andriu66 2 місяці тому
do you use the 8gb raspberry pi?
@Hardwareai 2 місяці тому ⁺²
Yes, Raspberry Pi 5 8 Gb - but RAM is hardly relevant here, for tiny.en model.
@isaacfranklin2712 Місяць тому
@@Hardwareai thinking of getting the Pi 4 with 1GB RAM. shouldn't be an issue to replicate hopefully.
@Hazar-bt6nf 4 дні тому
Can raspberry pi5 run whisper using Python?
@Hardwareai 2 дні тому
Yes. absolutely!
@levbereggelezo 2 місяці тому
Thx
@Hardwareai 2 місяці тому
Appreciate it!

Наступне

Автоматичне відтворення

Can Whisper be used for real-time streaming ASR?