You won't believe how fast it is | Raspberry Pi Speech-to-Text
Вставка
- Опубліковано 15 гру 2023
- Faster than real-time offline speech transcription on Raspberry Pi - or any other computing system, including Orange Pi, Jetson Nano and many other Linux SBCs. A quick hands-on guide from installing necessary packages to running Whisper model with whisper.cpp or faster-whisper.
Whisper.cpp Python bindings repository:
github.com/AIWintermuteAI/whi...
faster-whisper:
github.com/SYSTRAN/faster-whi...
Benchmark gist:
gist.github.com/AIWintermuteA... - Наука та технологія
The follow-up video is also live on UA-cam - find it in my channel.
Support my work on making tutorials and guides on Patreon!
www.patreon.com/hardware_ai
I need this because im building a translator for my sister. There’s a new person in her class that can only speak Spanish, so im making this.
Good usage!
love this video! i rarely find myself pausing, and rewinding but here the details were coming fast enough that i became the weak link. love this.
Glad to hear I found the right pace. Thank you for the feedback!
Hey this is incredible. really appreciate your work
Thank you so much 😀
Very cool guide. Thank you.
Glad you enjoyed it!
Hello! Great work, will try test it. Your projects are interesting (for me since Kendryte K210).
Thanks! I see you have been following my channel for a while :)
I’m using your repository. Thanks you
Thanks for the feedback!
Thanks for the work done fixing the whisper.cpp python bindings! I'll check them out.
😊
Yes, let me know if you run into any issues.
I'm going to try to install it.
There is a known issue at the moment: github.com/AIWintermuteAI/whispercpp/issues/88#issuecomment-2171120795
I'll be fixing it once I get back from traveling, beginning of July.
@@Hardwareai Certainly let me know. I'm excited to try it. Happy traveling!! Be safe!
>M
Well done! Was whisper.cpp compiled with BLAS optimizations?
No, it wasn't. It is a possible way to slightly improve the results, but at least on raspberry pi it will not change the outcome too much, faster-whisper still will be faster. Jetson series on the other hand might take advantage of CUBLAS, so it is more interesting.
Hi! This is amazing! Thank you very much! Just a quick question, should it work on Windows? Because I get an error when I run "python -m build -w":
* Building wheel...
running bdist_wheel
Building pybind11 extension...
error: [WinError 193] %1 not a valid Win32 app
ERROR Backend subprocess exited when trying to invoke build_wheel
Thank you for the feedback!
While theoretically it SHOULD run on Windows as well, I only tested it on Raspberry Pi (so Debian Linux) and MacOS...
Would you consider showing how to implement live real time streaming with faster-whisper? Seems like that would be a huge way forward
Yes, this is much requested. So stay tuned.
How hard would it be to add a continuous background search process taking keywords from the conversation? I wanna have a screen in my office that's supporting the dialogue with more right brain material. Of course, they need to interrupt and follow the sauce for resource would be important.
follow the sauce for resource? Very interesting.
Anyways, this is already shown in the example here:
github.com/AIWintermuteAI/whispercpp/blob/e46fd2da91bab8cfd98a0af886230cc773afd982/examples/stream/stream.py#L18
Can the program be modified so that all recognized texts are consolidated into a single paragraph upon exiting the program?
Append strings to the list and then concatenate and print them at the end?
Hey, thanks for the video. I'm encountering an error though around 2:33, if you've any suggestions please let me know.
stream.py: error: argument --model_name: expected one argument
Changing the command to : python stream.py --model tiny
But seeing this error now:
ERROR: Failed to initialized SDL: dsp: No such audio device
I've got a headset with microphone attached to the Pi 5 via USB port. Is it because I need an external soundcard/other hardware like in your video? Any ideas what the issue could be?
Yes, there is an ongoing issue, which I am working on fixing: github.com/AIWintermuteAI/whispercpp/issues/88
I had heard about faster whisper on other channels but thought it couldn't work on an SBC because it uses GPU which an SBC doesn't have. I have no idea how you did this. Thanks!
Interesting. No, it certainly can run on CPU - I made a follow-up on this video, explaining more about faster-whisper specifically, you can find it on my channel.
hello, same benchmark results in 5925.774ms computation time on my RPI 5 currently, should I do anything differently? the audio file i've used is 10 secs, same JFK speech
One thing I could have improved about my little benchmark script is multiple measurements. First run is always the slowest. Is 5925 ms. for the first run or even for later concurrent runs as well?
@@Hardwareai Ooh that was it, now I get ~600ms. Thanks! Also I got 1.218 sec computation for a 145 seconds talk, I don't know how it works but segmentation takes much longer
hi,
What should I do to make it understand in more than one language? Is this possible?
Use tiny model instead of tiny.en. Do keep in mind the quality of recognition is likely to be worse with multi-language model.
Can i use INMP441 Microphone Module I2S instead of
respeaker 2-mics pi hat fir real time transcription? If yes what will be my pin configuration fot that? And will there be any changes on the code?
In theory you can use any audio input device. In practice your mileage will vary, some hardware choice will be more difficult to work with from software perspective. For pin configuration you can have a look at INMP441 related docs. The code uses SDL for audio capture, so if INMP441 can work with that, there should minimal to none code changes. Can't say for sure tho until you try :)
@@Hardwareai oh understood! So I have to select that microphone which supports SDL!?
If you want minimum code changes - yes. Otherwise, you could of course re-write the code to support any audio input device - whisper model by itself is obviously device agnostic, as long as you can provide audio in a specified format supported by the model.
Ohkii! Understood 😃 thank you!!
oh man, please teach me the ways. Like, for real. I saw you provide 1:1 consultancy, but I need to know if your price is per meeting of for a full project.
The ways of hardware, tricky they are, young padawan...
Okay, jokes aside - I did reply in the other comment xD long story short - I'm focused on getting my YT channel back on track at the moment, at least getting back monetization would be nice (YT took it away from me). So I'm not really doing consulting - but if your project is based on my videos/tutorials, I can provide some feedback.
@@Hardwareai Oh master. Sorry I missed your last message!
Thanks for replying again, though! Oh man, sad to hear you're not doing consulting. But I still appreciate watching your incoming videos so that's a win anyway.
And yeah, your videos are the main inspirational source for me. So it'd be amazing to get some feedback as I'm sure I'll get stuck with something along the way - as its usual with all things computer related. May I let you know when that happens?
If you are doing something related to my projects, then yes :) QA is always welcome
I did find this video while coding the next big thing, how did you know 🤣
Magic 8 ball xD
I downloaded this on the Raspberry Pi 4, bookworm 64 bit and I got the following error:
fatal: remote error: upload-pack: not our ref c9d5095f0c64455b201f1cd0b547efcf093ee7c3
fatal: Fetched in submodule path 'extern/whispercpp/bindings/ios', but it did not contain c9d5095f0c64455b201f1cd0b547efcf093ee7c3. Direct fetching of that commit failed.
fatal: Failed to recurse into submodule path 'extern/whispercpp'. Any suggestions?
It sounds like you git cloned the upstream? I solved exactly the same issue in my fork
Can you do git log and paste the output?
@@Hardwareai I messaged you on linkedin because I think the youtube spam filter is not letting me paste the output.
Oh, all right, that is possible. For future generations, who find this comment - if it is code related, creating an issue in GH is preferable.
When i run in raspberry pi, It raise an error: AttributeError: module 'os' has no attribute 'add_dll_directory'
I built it just today and it works as expected. If you are still struggling, it's best to open an issue in my fork on GH.
What is the full command line at 2:35 ?
python stream.py --model_name tiny.en-q5_1
error in pip install build
error-externally managed enviorment
I think you missed one step in my video - regardless, here it is stackoverflow.com/questions/75608323/how-do-i-solve-error-externally-managed-environment-every-time-i-use-pip-3.
Does this require Internet?
Nope. Completely offline.
How to correct SDL error of audio device not found and what mic are you using???
Mic - reSpeaker 2-mic hat for Raspberry Pi. Your SDL troubles will heavily depend on the device you are trying to run this on ....
@@Hardwareai which file in whispercpp do i alter to use usb mic as input?
Since it relies on SDL2 for sound capture, theoretically you don't have to change anything...
It's not real time. it's from a file, if you want to test real-time stream it and get output back as you speak
It is though? I start with whisper.cpp streaming example, which is also real-time for quantized model.
yes but you keep repeating real time when using pre-recorded file sent to the cloud. that's not the definition of real time, although the module is real time but the method and technique used is not@@Hardwareai
/scratching the head/ are we talking about the same video? at 2:30 I run whisper model on my voice in real-time, not from file, but from respeaker mic.
my bad, you're right it's different video. lol.@@Hardwareai
It would be awesome if you combined this with piper(rhasspy) to make a hardware device capable of STT to TTS. It would be a Zentreya-style portable voice changer.
The TTS part of it would be more computationally expensive - but you can have a try!
@hardwareai what raspberri pi are you sing?
I normally sing raspberri pi tenor, but I can do raspberri pi falsetto as well for comic effect xD
Okay, I guess you asked what raspberry pi was I using, not singing.
For this video it was Raspberry Pi 4. There is another newer video where I was using Raspberry Pi 5 as well, ua-cam.com/video/3yLFWpKKbe8/v-deo.html
I was trying to use this in a intel based mini pc running Ubuntu22.04 and ran into audio issues. When I run python stream.py --model_name tiny , I get
ERROR: Failed to initialized SDL: Audio target 'pulseaudio' not available
Traceback (most recent call last):
File "/home/souvik/whispercpp/examples/stream/stream.py", line 30, in main
transcription = self.transcriber.stream_transcribe(callback=self.store_transcript_handler, **kwargs)
File "/home/souvik/whisper/lib/python3.10/site-packages/whispercpp/__init__.py", line 257, in stream_transcribe
raise RuntimeError("Failed to initialize audio capture device.")
RuntimeError: Failed to initialize audio capture device.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/souvik/whispercpp/examples/stream/stream.py", line 100, in
transcriber.main(**vars(args))
File "/home/souvik/whispercpp/examples/stream/stream.py", line 32, in main
assert transcription is not None, "Something went wrong!"
AssertionError: Something went wrong!
The audio system is fine, I can record audio using parecord, arecord etc....and SDL libraries are all installed.
Yeah, these issues can be tough to diagnose unfortunately. The problem as you can see is not really with whisper.cpp, but rather with SDL not wanting to play nicely with your audio setup.
It threw me an error at "python3 -m build -w": PermissionError: [Errno 13] Permission denied: 'src/whispercpp/__about__.py'
ERROR Backend subprocess exited when trying to invoke get_requires_for_build_wheel
Can you post this error with detailed steps preceding it and some environment info (OS, architecture) to the Github issues and tag me there?
I canºt seem to get the stream.py to work. gives this error:
ERROR: Failed to initialized SDL: Audio target 'pulseaudio' not available
Traceback (most recent call last):
File ".../whispercpp/examples/stream/stream.py", line 30, in main
transcription = self.transcriber.stream_transcribe(callback=self.store_transcript_handler, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "..whisper/lib/python3.11/site-packages/whispercpp/__init__.py", line 257, in stream_transcribe
raise RuntimeError("Failed to initialize audio capture device.")
RuntimeError: Failed to initialize audio capture device.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "../examples/stream/stream.py", line 100, in
transcriber.main(**vars(args))
File "..whispercpp/examples/stream/stream.py", line 32, in main
assert transcription is not None, "Something went wrong!"
^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Something went wrong!
Any ideas?
i get this when i run list audio devices
ERROR: Failed to initialized SDL: Audio target 'pulseaudio' not available
Hmm, I am able to find something on Google for Audio target 'pulseaudio' not available, but it is for OpenSUSE. Are you using the latest Raspberry Pi OS?
@@Hardwareai yeah i am using the latest pi os
Okay, then it is likely something specific to the mic setup. I was using reSpeaker 2 mic raspberry pi hat. First thing to try would be to see if the mic works correctly (with arecord) and then if it does, debug the issue with this particular mic and SDL.
To summarize, the issue is not in the stream.py code or even whisper.cpp, but rather that SDL does not seem to be working with your mic setup...