AI Chatbot on Raspberry Pi | Offline and Real-time, but ...

Поділитися
Вставка
  • Опубліковано 27 чер 2023
  • Speech to Text, Language Model and then onto Text to Speech! Run OpenAI Whisper Speech to Text model on Raspberry Pi 4 for transcribing user input, then feed it into a Language model and get the output back.
    Github:
    Language model - WIP
    Fixed whisper.cpp wrapper - github.com/AIWintermuteAI/whi...
    TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
    arxiv.org/abs/2305.07759
    Future of DeepSpeech / STT after recent changes at Mozilla
    discourse.mozilla.org/t/futur...
    A new frog in (speech tech) town
    discourse.mozilla.org/t/a-new...
    Media Credits:
    Evaluating Language Models
    lena-voita.github.io/nlp_cour...
  • Наука та технологія

КОМЕНТАРІ • 8

  • @alx8439
    @alx8439 Рік тому

    As a llm you can give it a try with orca mini 3b. They say it runs just fine on a weak hw like rpi 4

    • @Hardwareai
      @Hardwareai  Рік тому +1

      I checked orca mini 3b from model card here huggingface.co/TheBloke/orca_mini_3B-GGML
      it would only run on largest RAM version of Pi
      and is still much slower
      www.reddit.com/r/LocalLLaMA/comments/14lo34l/orca_mini_3b_on_a_pi_4_in_real_time/
      it says real time, but from the video it hardly is....

    • @alx8439
      @alx8439 Рік тому

      @@Hardwareai it's up to you how you code the dialog logic. It could be something like:
      - {wakeword} ask AI {question}
      - sure, let me think for a moment
      ... (after some time) ...
      - here is what AI replied: {reply}

  • @dukezacks
    @dukezacks 8 місяців тому

    Hello good job, can i ask which language model are you using?

    • @Hardwareai
      @Hardwareai  8 місяців тому

      Hi! Thanks!
      I was using gpt4all-7B/gpt4all-lora-quantized - but that was a while ago already. It did generate coherent synthetic dataset, but as I recently discovered is utterly unreliable for grading the output of another LLM. I now switched to using text-generation-webui for server/llama.cpp for model loader and llama-2-13b-chat.Q5_K_M.gguf for the model. 7b also worked, but was worse.

  • @alx8439
    @alx8439 Рік тому

    Btw the biggest issue with the open source voice assistants, as I see it, is that ppl don't give a sht about proper software architecture:
    - the core should allow multitasking (the music can be played on a background while it will be answering your question AND still listening for a wake word to stop)
    - using piped processing over batched processing to reduce the delay (this was partially solved in the most recent version of Rhasspy)
    - extendable architecture which allows ppl to develop their own "skills" without scratching left ear with a right leg

    • @Hardwareai
      @Hardwareai  Рік тому

      Good points!
      Although for the first one, I think the main obstacle is that it'd be rather hard to have a good quality speech recognition with music blasting out of the speaker close to the microphone :)

    • @alx8439
      @alx8439 Рік тому

      @@Hardwareai this is why most of the commercial hardware voice assistant speakers turn the music volume down, when they hear the wake word.