AI Chatbot on Raspberry Pi | Offline and Real-time, but ...
Вставка
- Опубліковано 27 чер 2023
- Speech to Text, Language Model and then onto Text to Speech! Run OpenAI Whisper Speech to Text model on Raspberry Pi 4 for transcribing user input, then feed it into a Language model and get the output back.
Github:
Language model - WIP
Fixed whisper.cpp wrapper - github.com/AIWintermuteAI/whi...
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
arxiv.org/abs/2305.07759
Future of DeepSpeech / STT after recent changes at Mozilla
discourse.mozilla.org/t/futur...
A new frog in (speech tech) town
discourse.mozilla.org/t/a-new...
Media Credits:
Evaluating Language Models
lena-voita.github.io/nlp_cour... - Наука та технологія
As a llm you can give it a try with orca mini 3b. They say it runs just fine on a weak hw like rpi 4
I checked orca mini 3b from model card here huggingface.co/TheBloke/orca_mini_3B-GGML
it would only run on largest RAM version of Pi
and is still much slower
www.reddit.com/r/LocalLLaMA/comments/14lo34l/orca_mini_3b_on_a_pi_4_in_real_time/
it says real time, but from the video it hardly is....
@@Hardwareai it's up to you how you code the dialog logic. It could be something like:
- {wakeword} ask AI {question}
- sure, let me think for a moment
... (after some time) ...
- here is what AI replied: {reply}
Hello good job, can i ask which language model are you using?
Hi! Thanks!
I was using gpt4all-7B/gpt4all-lora-quantized - but that was a while ago already. It did generate coherent synthetic dataset, but as I recently discovered is utterly unreliable for grading the output of another LLM. I now switched to using text-generation-webui for server/llama.cpp for model loader and llama-2-13b-chat.Q5_K_M.gguf for the model. 7b also worked, but was worse.
Btw the biggest issue with the open source voice assistants, as I see it, is that ppl don't give a sht about proper software architecture:
- the core should allow multitasking (the music can be played on a background while it will be answering your question AND still listening for a wake word to stop)
- using piped processing over batched processing to reduce the delay (this was partially solved in the most recent version of Rhasspy)
- extendable architecture which allows ppl to develop their own "skills" without scratching left ear with a right leg
Good points!
Although for the first one, I think the main obstacle is that it'd be rather hard to have a good quality speech recognition with music blasting out of the speaker close to the microphone :)
@@Hardwareai this is why most of the commercial hardware voice assistant speakers turn the music volume down, when they hear the wake word.