You won't believe how fast it is | Raspberry Pi Speech-to-Text

Поділитися
Вставка
  • Опубліковано 15 гру 2023
  • Faster than real-time offline speech transcription on Raspberry Pi - or any other computing system, including Orange Pi, Jetson Nano and many other Linux SBCs. A quick hands-on guide from installing necessary packages to running Whisper model with whisper.cpp or faster-whisper.
    Whisper.cpp Python bindings repository:
    github.com/AIWintermuteAI/whi...
    faster-whisper:
    github.com/SYSTRAN/faster-whi...
    Benchmark gist:
    gist.github.com/AIWintermuteA...
  • Наука та технологія

КОМЕНТАРІ • 88

  • @Hardwareai
    @Hardwareai  2 місяці тому +2

    The follow-up video is also live on UA-cam - find it in my channel.
    Support my work on making tutorials and guides on Patreon!
    www.patreon.com/hardware_ai

  • @C0ldSpace
    @C0ldSpace 5 місяців тому +9

    I need this because im building a translator for my sister. There’s a new person in her class that can only speak Spanish, so im making this.

  • @brianmeyer107
    @brianmeyer107 5 місяців тому +2

    love this video! i rarely find myself pausing, and rewinding but here the details were coming fast enough that i became the weak link. love this.

    • @Hardwareai
      @Hardwareai  5 місяців тому

      Glad to hear I found the right pace. Thank you for the feedback!

  • @newtownsmells
    @newtownsmells 6 місяців тому +3

    Hey this is incredible. really appreciate your work

    • @Hardwareai
      @Hardwareai  6 місяців тому +1

      Thank you so much 😀

  • @TomanswerAi
    @TomanswerAi 2 місяці тому +1

    Very cool guide. Thank you.

  • @tribelessa
    @tribelessa 6 місяців тому

    Hello! Great work, will try test it. Your projects are interesting (for me since Kendryte K210).

    • @Hardwareai
      @Hardwareai  6 місяців тому

      Thanks! I see you have been following my channel for a while :)

  • @emanuelepapa3548
    @emanuelepapa3548 2 місяці тому

    I’m using your repository. Thanks you

    • @Hardwareai
      @Hardwareai  2 місяці тому

      Thanks for the feedback!

  • @exploring-electronic
    @exploring-electronic 6 місяців тому +3

    Thanks for the work done fixing the whisper.cpp python bindings! I'll check them out.

  • @markantinozzi4970
    @markantinozzi4970 25 днів тому

    I'm going to try to install it.

    • @Hardwareai
      @Hardwareai  22 дні тому

      There is a known issue at the moment: github.com/AIWintermuteAI/whispercpp/issues/88#issuecomment-2171120795
      I'll be fixing it once I get back from traveling, beginning of July.

    • @markantinozzi4970
      @markantinozzi4970 14 днів тому

      @@Hardwareai Certainly let me know. I'm excited to try it. Happy traveling!! Be safe!
      >M

  • @levbereggelezo
    @levbereggelezo 6 місяців тому +1

    Well done! Was whisper.cpp compiled with BLAS optimizations?

    • @Hardwareai
      @Hardwareai  6 місяців тому +1

      No, it wasn't. It is a possible way to slightly improve the results, but at least on raspberry pi it will not change the outcome too much, faster-whisper still will be faster. Jetson series on the other hand might take advantage of CUBLAS, so it is more interesting.

  • @antoniorodriguez-ynyestosa5907
    @antoniorodriguez-ynyestosa5907 5 місяців тому +1

    Hi! This is amazing! Thank you very much! Just a quick question, should it work on Windows? Because I get an error when I run "python -m build -w":
    * Building wheel...
    running bdist_wheel
    Building pybind11 extension...
    error: [WinError 193] %1 not a valid Win32 app
    ERROR Backend subprocess exited when trying to invoke build_wheel

    • @Hardwareai
      @Hardwareai  5 місяців тому +2

      Thank you for the feedback!
      While theoretically it SHOULD run on Windows as well, I only tested it on Raspberry Pi (so Debian Linux) and MacOS...

  • @newtownsmells
    @newtownsmells 5 місяців тому +1

    Would you consider showing how to implement live real time streaming with faster-whisper? Seems like that would be a huge way forward

    • @Hardwareai
      @Hardwareai  5 місяців тому

      Yes, this is much requested. So stay tuned.

  • @phillipreay
    @phillipreay 4 місяці тому

    How hard would it be to add a continuous background search process taking keywords from the conversation? I wanna have a screen in my office that's supporting the dialogue with more right brain material. Of course, they need to interrupt and follow the sauce for resource would be important.

    • @Hardwareai
      @Hardwareai  4 місяці тому

      follow the sauce for resource? Very interesting.
      Anyways, this is already shown in the example here:
      github.com/AIWintermuteAI/whispercpp/blob/e46fd2da91bab8cfd98a0af886230cc773afd982/examples/stream/stream.py#L18

  • @user-nf2pe4kr3n
    @user-nf2pe4kr3n 5 місяців тому

    Can the program be modified so that all recognized texts are consolidated into a single paragraph upon exiting the program?

    • @Hardwareai
      @Hardwareai  4 місяці тому

      Append strings to the list and then concatenate and print them at the end?

  • @sephtronics
    @sephtronics 5 днів тому

    Hey, thanks for the video. I'm encountering an error though around 2:33, if you've any suggestions please let me know.
    stream.py: error: argument --model_name: expected one argument
    Changing the command to : python stream.py --model tiny
    But seeing this error now:
    ERROR: Failed to initialized SDL: dsp: No such audio device
    I've got a headset with microphone attached to the Pi 5 via USB port. Is it because I need an external soundcard/other hardware like in your video? Any ideas what the issue could be?

    • @Hardwareai
      @Hardwareai  2 дні тому +1

      Yes, there is an ongoing issue, which I am working on fixing: github.com/AIWintermuteAI/whispercpp/issues/88

  • @bens4446
    @bens4446 2 місяці тому

    I had heard about faster whisper on other channels but thought it couldn't work on an SBC because it uses GPU which an SBC doesn't have. I have no idea how you did this. Thanks!

    • @Hardwareai
      @Hardwareai  2 місяці тому

      Interesting. No, it certainly can run on CPU - I made a follow-up on this video, explaining more about faster-whisper specifically, you can find it on my channel.

  • @ptsckts6123
    @ptsckts6123 2 місяці тому

    hello, same benchmark results in 5925.774ms computation time on my RPI 5 currently, should I do anything differently? the audio file i've used is 10 secs, same JFK speech

    • @Hardwareai
      @Hardwareai  2 місяці тому +1

      One thing I could have improved about my little benchmark script is multiple measurements. First run is always the slowest. Is 5925 ms. for the first run or even for later concurrent runs as well?

    • @ptsckts6123
      @ptsckts6123 2 місяці тому

      @@Hardwareai Ooh that was it, now I get ~600ms. Thanks! Also I got 1.218 sec computation for a 145 seconds talk, I don't know how it works but segmentation takes much longer

  • @abdullahdogan5822
    @abdullahdogan5822 4 місяці тому

    hi,
    What should I do to make it understand in more than one language? Is this possible?

    • @Hardwareai
      @Hardwareai  4 місяці тому

      Use tiny model instead of tiny.en. Do keep in mind the quality of recognition is likely to be worse with multi-language model.

  • @yashvishah9315
    @yashvishah9315 4 місяці тому

    Can i use INMP441 Microphone Module I2S instead of
    respeaker 2-mics pi hat fir real time transcription? If yes what will be my pin configuration fot that? And will there be any changes on the code?

    • @Hardwareai
      @Hardwareai  4 місяці тому

      In theory you can use any audio input device. In practice your mileage will vary, some hardware choice will be more difficult to work with from software perspective. For pin configuration you can have a look at INMP441 related docs. The code uses SDL for audio capture, so if INMP441 can work with that, there should minimal to none code changes. Can't say for sure tho until you try :)

    • @yashvishah9315
      @yashvishah9315 4 місяці тому

      @@Hardwareai oh understood! So I have to select that microphone which supports SDL!?

    • @Hardwareai
      @Hardwareai  4 місяці тому

      If you want minimum code changes - yes. Otherwise, you could of course re-write the code to support any audio input device - whisper model by itself is obviously device agnostic, as long as you can provide audio in a specified format supported by the model.

    • @yashvishah9315
      @yashvishah9315 4 місяці тому

      Ohkii! Understood 😃 thank you!!

  • @danilovaz9839
    @danilovaz9839 6 місяців тому

    oh man, please teach me the ways. Like, for real. I saw you provide 1:1 consultancy, but I need to know if your price is per meeting of for a full project.

    • @Hardwareai
      @Hardwareai  6 місяців тому +1

      The ways of hardware, tricky they are, young padawan...
      Okay, jokes aside - I did reply in the other comment xD long story short - I'm focused on getting my YT channel back on track at the moment, at least getting back monetization would be nice (YT took it away from me). So I'm not really doing consulting - but if your project is based on my videos/tutorials, I can provide some feedback.

    • @danilovaz9839
      @danilovaz9839 6 місяців тому

      @@Hardwareai Oh master. Sorry I missed your last message!
      Thanks for replying again, though! Oh man, sad to hear you're not doing consulting. But I still appreciate watching your incoming videos so that's a win anyway.
      And yeah, your videos are the main inspirational source for me. So it'd be amazing to get some feedback as I'm sure I'll get stuck with something along the way - as its usual with all things computer related. May I let you know when that happens?

    • @Hardwareai
      @Hardwareai  6 місяців тому

      If you are doing something related to my projects, then yes :) QA is always welcome

  • @BogdanMnikov
    @BogdanMnikov 6 місяців тому

    I did find this video while coding the next big thing, how did you know 🤣

  • @user-cl2og
    @user-cl2og 6 місяців тому

    I downloaded this on the Raspberry Pi 4, bookworm 64 bit and I got the following error:
    fatal: remote error: upload-pack: not our ref c9d5095f0c64455b201f1cd0b547efcf093ee7c3
    fatal: Fetched in submodule path 'extern/whispercpp/bindings/ios', but it did not contain c9d5095f0c64455b201f1cd0b547efcf093ee7c3. Direct fetching of that commit failed.
    fatal: Failed to recurse into submodule path 'extern/whispercpp'. Any suggestions?

    • @Hardwareai
      @Hardwareai  6 місяців тому

      It sounds like you git cloned the upstream? I solved exactly the same issue in my fork

    • @Hardwareai
      @Hardwareai  6 місяців тому

      Can you do git log and paste the output?

    • @user-cl2og
      @user-cl2og 6 місяців тому

      @@Hardwareai I messaged you on linkedin because I think the youtube spam filter is not letting me paste the output.

    • @Hardwareai
      @Hardwareai  6 місяців тому

      Oh, all right, that is possible. For future generations, who find this comment - if it is code related, creating an issue in GH is preferable.

  • @lagkdd2913
    @lagkdd2913 3 місяці тому

    When i run in raspberry pi, It raise an error: AttributeError: module 'os' has no attribute 'add_dll_directory'

    • @Hardwareai
      @Hardwareai  3 місяці тому

      I built it just today and it works as expected. If you are still struggling, it's best to open an issue in my fork on GH.

  • @DavidPsurny
    @DavidPsurny 29 днів тому

    What is the full command line at 2:35 ?

    • @Hardwareai
      @Hardwareai  22 дні тому

      python stream.py --model_name tiny.en-q5_1

  • @harokk4242
    @harokk4242 4 місяці тому

    error in pip install build
    error-externally managed enviorment

    • @Hardwareai
      @Hardwareai  4 місяці тому +1

      I think you missed one step in my video - regardless, here it is stackoverflow.com/questions/75608323/how-do-i-solve-error-externally-managed-environment-every-time-i-use-pip-3.

  • @Tyrone-Ward
    @Tyrone-Ward 6 місяців тому

    Does this require Internet?

    • @Hardwareai
      @Hardwareai  6 місяців тому +1

      Nope. Completely offline.

  • @moneshraghu5598
    @moneshraghu5598 3 місяці тому

    How to correct SDL error of audio device not found and what mic are you using???

    • @Hardwareai
      @Hardwareai  3 місяці тому

      Mic - reSpeaker 2-mic hat for Raspberry Pi. Your SDL troubles will heavily depend on the device you are trying to run this on ....

    • @moneshraghu5598
      @moneshraghu5598 3 місяці тому

      @@Hardwareai which file in whispercpp do i alter to use usb mic as input?

    • @Hardwareai
      @Hardwareai  3 місяці тому

      Since it relies on SDL2 for sound capture, theoretically you don't have to change anything...

  • @LoneEntrepreneur
    @LoneEntrepreneur 6 місяців тому

    It's not real time. it's from a file, if you want to test real-time stream it and get output back as you speak

    • @Hardwareai
      @Hardwareai  6 місяців тому

      It is though? I start with whisper.cpp streaming example, which is also real-time for quantized model.

    • @LoneEntrepreneur
      @LoneEntrepreneur 6 місяців тому

      yes but you keep repeating real time when using pre-recorded file sent to the cloud. that's not the definition of real time, although the module is real time but the method and technique used is not@@Hardwareai

    • @Hardwareai
      @Hardwareai  6 місяців тому

      /scratching the head/ are we talking about the same video? at 2:30 I run whisper model on my voice in real-time, not from file, but from respeaker mic.

    • @LoneEntrepreneur
      @LoneEntrepreneur 6 місяців тому +1

      my bad, you're right it's different video. lol.@@Hardwareai

  • @40centuriones
    @40centuriones 4 місяці тому

    It would be awesome if you combined this with piper(rhasspy) to make a hardware device capable of STT to TTS. It would be a Zentreya-style portable voice changer.

    • @Hardwareai
      @Hardwareai  3 місяці тому

      The TTS part of it would be more computationally expensive - but you can have a try!

  • @jlbciriaco3142
    @jlbciriaco3142 2 місяці тому

    @hardwareai what raspberri pi are you sing?

    • @Hardwareai
      @Hardwareai  2 місяці тому

      I normally sing raspberri pi tenor, but I can do raspberri pi falsetto as well for comic effect xD
      Okay, I guess you asked what raspberry pi was I using, not singing.
      For this video it was Raspberry Pi 4. There is another newer video where I was using Raspberry Pi 5 as well, ua-cam.com/video/3yLFWpKKbe8/v-deo.html

  • @SouvikPal-notionvidz
    @SouvikPal-notionvidz 5 місяців тому

    I was trying to use this in a intel based mini pc running Ubuntu22.04 and ran into audio issues. When I run python stream.py --model_name tiny , I get
    ERROR: Failed to initialized SDL: Audio target 'pulseaudio' not available
    Traceback (most recent call last):
    File "/home/souvik/whispercpp/examples/stream/stream.py", line 30, in main
    transcription = self.transcriber.stream_transcribe(callback=self.store_transcript_handler, **kwargs)
    File "/home/souvik/whisper/lib/python3.10/site-packages/whispercpp/__init__.py", line 257, in stream_transcribe
    raise RuntimeError("Failed to initialize audio capture device.")
    RuntimeError: Failed to initialize audio capture device.
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
    File "/home/souvik/whispercpp/examples/stream/stream.py", line 100, in
    transcriber.main(**vars(args))
    File "/home/souvik/whispercpp/examples/stream/stream.py", line 32, in main
    assert transcription is not None, "Something went wrong!"
    AssertionError: Something went wrong!

    • @SouvikPal-notionvidz
      @SouvikPal-notionvidz 5 місяців тому

      The audio system is fine, I can record audio using parecord, arecord etc....and SDL libraries are all installed.

    • @Hardwareai
      @Hardwareai  4 місяці тому

      Yeah, these issues can be tough to diagnose unfortunately. The problem as you can see is not really with whisper.cpp, but rather with SDL not wanting to play nicely with your audio setup.

  • @torstenaltmann62
    @torstenaltmann62 Місяць тому

    It threw me an error at "python3 -m build -w": PermissionError: [Errno 13] Permission denied: 'src/whispercpp/__about__.py'
    ERROR Backend subprocess exited when trying to invoke get_requires_for_build_wheel

    • @Hardwareai
      @Hardwareai  Місяць тому

      Can you post this error with detailed steps preceding it and some environment info (OS, architecture) to the Github issues and tag me there?

  • @newtownsmells
    @newtownsmells 6 місяців тому

    I canºt seem to get the stream.py to work. gives this error:
    ERROR: Failed to initialized SDL: Audio target 'pulseaudio' not available
    Traceback (most recent call last):
    File ".../whispercpp/examples/stream/stream.py", line 30, in main
    transcription = self.transcriber.stream_transcribe(callback=self.store_transcript_handler, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "..whisper/lib/python3.11/site-packages/whispercpp/__init__.py", line 257, in stream_transcribe
    raise RuntimeError("Failed to initialize audio capture device.")
    RuntimeError: Failed to initialize audio capture device.
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
    File "../examples/stream/stream.py", line 100, in
    transcriber.main(**vars(args))
    File "..whispercpp/examples/stream/stream.py", line 32, in main
    assert transcription is not None, "Something went wrong!"
    ^^^^^^^^^^^^^^^^^^^^^^^^^
    AssertionError: Something went wrong!
    Any ideas?
    i get this when i run list audio devices
    ERROR: Failed to initialized SDL: Audio target 'pulseaudio' not available

    • @Hardwareai
      @Hardwareai  6 місяців тому

      Hmm, I am able to find something on Google for Audio target 'pulseaudio' not available, but it is for OpenSUSE. Are you using the latest Raspberry Pi OS?

    • @newtownsmells
      @newtownsmells 6 місяців тому

      @@Hardwareai yeah i am using the latest pi os

    • @Hardwareai
      @Hardwareai  6 місяців тому +1

      Okay, then it is likely something specific to the mic setup. I was using reSpeaker 2 mic raspberry pi hat. First thing to try would be to see if the mic works correctly (with arecord) and then if it does, debug the issue with this particular mic and SDL.
      To summarize, the issue is not in the stream.py code or even whisper.cpp, but rather that SDL does not seem to be working with your mic setup...