You asked for it - and I delivered | Live speech transcription with OpenAI Whisper STT

Поділитися
Вставка
  • Опубліковано 13 кві 2024
  • Run live speech transcription on Raspberry Pi 5 with faster-whisper and WhisperLive, see the transcription results as they are processed and send the final output to an LLM or TTS. Less finicky than SDL2, WhisperLive instead uses PyAudio for audio capture. Tested with two microphones: ReSpeaker 2-Mics Pi HAT and ReSpeaker USB Mic Array.
    How to make whisper.cpp transcribe faster? (audio_ctx explanation)
    / how-to-make-cpp-102337630
    Microphones:
    www.seeedstudio.com/ReSpeaker...
    www.seeedstudio.com/ReSpeaker...
    Barebone single-threaded implementation of FasterWhisper transcription from the microphone
    gist.github.com/AIWintermuteA...
    My fork of whisper.cpp Python bindings
    github.com/AIWintermuteAI/whi...
    My fork of Whisper live - git clone this to follow along the video
    github.com/AIWintermuteAI/Whi...
    faster-whisper repository
    github.com/SYSTRAN/faster-whi...
    Piper TTS
    github.com/rhasspy/piper
  • Наука та технологія

КОМЕНТАРІ • 33

  • @dad2979
    @dad2979 Місяць тому +2

    I have been following you for a long time and you have done such a fantastic job of crafting your style and keeping your content relevant. Great job Dmitry!

    • @Hardwareai
      @Hardwareai  Місяць тому

      Thank you for leaving this comment!
      I'm still refining my style to tell the truth. One of the things I was successful recently (I think) is keeping my videos more to the point, with good flow of information. Now it looks to me I was blabbering way too much in my older videos at times. I cut a lot of stuff now on post-processing if I feel the video is overloaded.
      I plan to make some more storytelling-oriented robotics content next half-year, stay tuned and see how it goes.

  • @exploring-electronic
    @exploring-electronic 2 місяці тому +1

    Thank you for making this follow up!

    • @Hardwareai
      @Hardwareai  2 місяці тому

      Appreciate your support

  • @georgeknerr
    @georgeknerr 2 місяці тому

    Excellent work, keep it up!!! Shared on Twitter too.

  • @jackwarner5445
    @jackwarner5445 25 днів тому +1

    I'm trying to make an AI voice assistant and would be completely lost without your videos. Thanks so much!

  • @justquicker5044
    @justquicker5044 2 місяці тому

    Thank you so much! You’ve really helped me speed up my project. I normally don’t like and subscribe but I made an exception 🙃. Keep it up!!

    • @Hardwareai
      @Hardwareai  2 місяці тому

      Thank you for your support!

  • @MrTubertub
    @MrTubertub День тому +1

    Hi there, could you please advise what is the best and easiest way to transcribe mp3 files speech recordings to text with no coding experience at all. Thank you

  • @MarkD-p2h
    @MarkD-p2h 12 годин тому

    Thank you for sharing your knowledge. I'm trying to do "float16" STT transcription with diarization using WhisperX on an 8GB Pi5, but "the ctranslate2 package does not compile with CUDA support." Per the whisperx readme, I tried to install pytorch v11.8 from the PyTorch pip command, and then I tried the current version, before trying to install whisperx with no joy. Apologies if this is a silly question, but is there a CUDA version that works on a Pi5 GPU (Broadcom VideoCore VII), or must I only use CPU CUDA? What do you recommend? Thanks!

  • @ameetkarn
    @ameetkarn 8 днів тому

    This is too good....I think this should fit in directly with one of my project. Do you have any recommendation for real time TTS ?

    • @Hardwareai
      @Hardwareai  6 днів тому

      Hopefully!
      I used espeak before for other projects... it is pretty horrible by modern standards, but does its job.
      For this example I used piper TTS - much better quality, but not as fast as espeak.

  • @inout3394
    @inout3394 2 місяці тому

    Thx

    • @Hardwareai
      @Hardwareai  2 місяці тому

      My pleasure! Thanks for commenting!

  • @ameetkarn
    @ameetkarn День тому

    hi,
    I am getting following error while running the fork..any ideas ?
    A module that was compiled using NumPy 1.x cannot be run in
    NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
    versions of NumPy, modules must be compiled with NumPy 2.0.
    Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
    If you are a user of the module, the easiest solution will be to
    downgrade to 'numpy

  • @domesticatedviking
    @domesticatedviking 2 місяці тому

    Hey, just wanted to say I really appreciated your last two videos. Will you please be my sensei? Thank you!!

    • @Hardwareai
      @Hardwareai  2 місяці тому

      I appreciate your appreciation! xD
      I'd say that I'm already a sensei of sorts... You always can support me on Patreon for some extras, but otherwise simply stay tuned for more videos!

  • @shakhizatnurgaliyev9355
    @shakhizatnurgaliyev9355 2 місяці тому

    Like!!!Dima, awesome content, what do u think about VOSK API and compare it to Whisper? Great example of PiperTTS. Thank you!

    • @Hardwareai
      @Hardwareai  2 місяці тому

      Thanks, appreciate it!
      I'll try it out and compare it - I don't think I'll make a video about it, but maybe a blog article :)

  • @bystander85
    @bystander85 28 днів тому

    I've been trying to find a way to make end of speech flag to be more intelligent than just detecting a pause. I find it common that I may have a mental blank, or misspeak, and the delay in my speech incorrectly flags end of speech. It would be interesting if STT systems can continue listening after a pause if it detects an incomplete sentence. Any ideas?

    • @Hardwareai
      @Hardwareai  22 дні тому +1

      That's a hard one. I don't think this one is solved even in commercial STT engines - e.g. google assistant or siri.
      That would require understanding on sentence context. We might be getting somewhere with multi-modal models, such as GPT4o, but I don't think there is anything available to be run on Raspberry Pi format computer.
      Also, as a shortcut, perhaps it would be possible to either run a classifier or modify whisper model to output probability of sentence being finished... It's just an idea though, finding out how well will it work is another thing entirely.

  • @glikoz
    @glikoz 12 днів тому

    Please advise the hardware setup for offline RAG, TTS, STT

    • @Hardwareai
      @Hardwareai  10 днів тому

      Hard to estimate without knowing the details?

  • @Onlyindianpj
    @Onlyindianpj 9 годин тому

    Real implementation is using websocket
    Idea is
    App is transmitting PCM 16k raw audio
    WS Server will capture those audio packets
    Sent that to whisper ai to get transcription and return to app in json

  • @Andriu66
    @Andriu66 2 місяці тому

    do you use the 8gb raspberry pi?

    • @Hardwareai
      @Hardwareai  2 місяці тому +2

      Yes, Raspberry Pi 5 8 Gb - but RAM is hardly relevant here, for tiny.en model.

    • @isaacfranklin2712
      @isaacfranklin2712 Місяць тому

      @@Hardwareai thinking of getting the Pi 4 with 1GB RAM. shouldn't be an issue to replicate hopefully.

  • @Hazar-bt6nf
    @Hazar-bt6nf 4 дні тому

    Can raspberry pi5 run whisper using Python?

  • @levbereggelezo
    @levbereggelezo 2 місяці тому

    Thx