Is there anything like this that can do image recognition? I do wound care and have always wondered when my computer could help with wound assessment - provide measurements, describe wound bed, maybe even temperature with the right camera. I'm only at 15:30, but i get the impression you are not using TPU? I am just getting into raspberry pi and I'm looking for an excuse to get one. There only seem to be a couple of TPU options and I'm leaning toward the USB 3 version but I have no idea how to judge them.
Thanks, Kevin! I have been running Ollama on a Pi5 for a few weeks now. But I look forward to following your docker example and trying out the GUI. My use case is robotics and still trying to solve how to run a Pi5 at full power on a DC battery. (harder than it seems)
I am waiting for my pi5, and I plan to also use it for robotics and potentially run some local LLM. I would be curious to hear how you solve the power. Not much info online about it. Seems like the pi5 is picky? Not having it in person though, I can't test. I was hoping to just use a buck converter and power the pi5 in some way through that. I haven't seen any definitive videos or articles on this.
True. There is almost nothing out on the Internet right now about using a Pi5 on dc battery source. (as of Feb, 2024). My current solution to be able to run all CPU cores at pedal to the metal, is using a small automotive inverter powered by a 36v Lion battery via a dc-dc BUCK to make 12vdc. Then plugging the OEM Pi wall wart into the inverter. It is the only way I have discovered to get max power. Even $150 200W power banks, claiming to be USB PD will not deliver more than about 10-12W max to the Pi5. Very frustrating as obviously a car inverter on a robot is a highly wonkified, not to mention going around three sides of a square in voltage conversion takes an efficiency toll.@@evanhiatt9755
True. There is almost nothing out on the Internet right now about using a Pi5 on dc battery source. (as of Feb, 2024). My current solution to be able to run all CPU cores at pedal to the metal, is using a small automotive inverter powered by a 36v Lion battery via a dc-dc BUCK to make 12vdc. Then plugging the OEM Pi wall wart into the inverter. It is the only way I have discovered to get max power. Even $150 200W power banks, claiming to be USB PD will not deliver more than about 10-12W max to the Pi5. Very frustrating as obviously a car inverter on a robot is a highly wonkified, not to mention going around three sides of a square in voltage conversion takes an efficiency toll. @@evanhiatt9755
Sure. Happy to share. I am running Ollama on the Pi5 using a local large language model. Running a local LLM really taxes the Pi5 to max CPU while it is processing. So any amount of throttling due to low power, slows down the processing time. And the processing time of a 3B LLM, while acceptable, the outputs are dodgy. So, trying to run 7B models... They outputs are acceptable, but processing time is noticeably slower. Every second counts when trying to have a conversation with a robot and have any hope of it feeling natural. Side note, a Pi4 handles animatronic eyes, speech to text and text to speech. I am working on passing this data via MQTT to the Pi5, whose sole job is to run the LLM. So I am all ready dividing labor best I can. I might have to just give up on the Pi5 for LLM and use a NUC. Still trying to resolve the power issue. @@dariovicenzo8139
do you have a seed on your prompt? because if not you will generate the same prompt each time, with out some for of randomization a transformer model is more or less deterministic with most modern schedulers. If you do have a see you might have it cached.
Thanks for this. I'm having trouble getting started. I have a fresh install of bookworm on a RPi5. I installed Docker and created the compose.yaml file as shown in your blog. I'm getting an error when I try to run "docker-compose up -d". "yaml line 1:did not find expected key". Do you know what I'm missing?
@@kevinmcaleer28 I keep replying with the text from the file but I guess UA-cam is blocking it. I'm using the file you posted on your blog and made sure that the file contains Linux style line feeds instead of Windows.
@@BillYovinohave you joined our discord server (its completely free) - its easier to troubleshoot as you can share files and screenshots etc. Its hard for me to troubleshoot without looking at the file. There are YAML Validator sites you can run the file through, they will also tell you what is wrong
@@kevinmcaleer28 I'm not getting the confirmation email from Discord after signing up from your link. I've checked all of my email boxes including spam and trash.
I've not got a lot of experience of lxc (Linux Containers for anyone else whos reading), is supposed to be faster than docker, but I wouldn't know how to create an lxc container to run Ollama in. Ollama needs quite a few Python libraries so any bare bones linux with Python installed should be ok). Ollama is VERY resource intensive so it may steal all the compute cycles from anything else running on Proxmox. Worth trying though
I've installed it in an LXC container on Proxmox. I'm using Proxmox on a 13th gen I7 machine that has hybrid cores. In Proxmox it's easy to dedicate certain cores to VMs but a bit more faffing to say what E cores or P cores to assign to LXC containers. (I have a Windows VM on Proxmox and pass through the GPU to that to use as my daily-driver so it's not available to me to use with my containers etc. normally and normally I use LM Studio in Windows and it's server, for when I want GPU powered LLM models). All I've done so far is clone a priviledged Debian 12 LXC container (not very secure lol - should there be supply chain attacks etc. in the Python libraries for example), and I did the one-liner Ollama install in that and then tested with Postman. You can limit in the Proxmox GUI how much memory or CPU cores to give to the LXC container. (It only gives the container what it asks for up to the limit, and of course the container doesn't need memory for it's kernel as that's shared with the host). I gave this container a limit of 8 cores (leaving Proxmox to decide) and I've found that the container only maxes out 4 of them when I send Ollama a request from Postman. -- My plan is to set-up an unpriviledged container for Ollama. I still want to install it directly in the LXC rather than in a docker container. I plan to use Podman for WebUi - hosting Podman in the same container and using Portainer with Podman. I have another unpriviledged LXC container using Cockpit stuff to run as a SMB NAS, and using "pct set" in Proxmox I can assign my underlying ZFS dataset from that NAS container to be available in my Ollama container. That's useful to store models in so I only need store them once (on fast nvme), and be available to the various utilities that might consume them. That's the plan anyway. Always limited by time. :) I try to script everything now-a-days because I can't remember a thing lol. I'm hoping LLMs with RAG and memory and agency will help me out there! Videos / channels like these really help with getting going quickly and can save a huge amount of time (thanks Kevin!). :)
nb: I hope I can run Podman in an unprivileged LXC container (with nesting) ... I can't remember now off the top of my head what problems might get in the way that might effect my planned implementation - it was so long ago when I last looked. Just for experiments I'd use a VM for kernel separation and a privileged LXC container in that - efficient and easy to SSH into using Visual Studio Code remote explorer etc. so I can mess about with Ollama and running stuff besides it within the same "scope" etc. I'd rather have some sort of container rather than Python venv or anaconda etc. ... just less to think about in terms of isolation. As a "commodity" sort of thing, just running / listening on my hardware, 24/7, I like the idea of using an LXC container for it rather than in docker just to strip away some of the management complexity and use Proxmox, and while idle I want it to use very little resource. It's for that scenario that reducing the security footprint if possible makes sense.
This looked interesting so I went to the github page and followed the instructions to run the docker version and the first thing I saw was a sign in screen with the need to create an account! Not really sure this counts as private at this point
That account is only stored on your machine (it’s not a cloud account). This is so you can provide this to many users, such as your family, school or business
@@kevinmcaleer28 Thanks, it wasn't clear. Quite a few of the "run you own server" apps seem to need you to log into them with no real necessity so I noped out. But lets try that again 👍
is there a way to update the ollama it builds? i follow the video and is working fine but gemma just come out and it works only with the latest ollama 1.26... and when i followed the video it installed 1.23... what do i have to erase or what do i do?
You don’t need to update it ever, if you want newer and better versions of the model then you can download them whenever you want (when they are released)
@@kevinmcaleer28 The webui doesnt have the option to download the model anymore. It sends an error when you write dolphin-phi, for example. The /ollama/api send status 500 and I was not able to use a local language model... If you can clarify, maybe the new version of the webui doesnt allow to download anymore or the docke-compose file needs to be updated
Been there done that 15-minute response times are just not good. Building myself a small llm box that's going to be going against my creedence low power. High efficiency servers, but it's the only way to acceptably run llms and other inference models.
wtf !? container? docker? what happened to the simple exe file that runs with a mouse click? Most people are using windows, not linux. There is ollama for windows now. Is there any chance to install the WebUI without any additional program on windows? I dont want to have all that docker crap and dependencies on my machine.
Three things - 1) this video is about running Ollama on a raspberry pi, which doesn’t run Windows desktop or server - most people who use a raspberry pi run Linux on it, 2) docker makes it easier to add and remove stuff on your computer without leaving a trace, 3) the webui is just the front end; it needs Ollama to run in the background - I’m sure you’ll find a windows Package 📦 for this. Hope this helps
People serious about self-hosting don't use Windows Server. And docker works on Windows too. Containerized environments are awesome and you obviously don't want to learn. Otherwise, stop complaining and do the research yourself.
Keep in mind, when you install an exe, all it has done is turned all of the same steps that you take going the long route and turn them into a single GUI type install to make you feel more comfortable and make things a little more streamlined.
I get this error step 5/31 : FROM --platform=$BUILDPLATFORM node:21-alpine3.19 as build failed to parse platform : "" is an invalid component of "": platform specifier component must match "^[A-Za-z0-9_-]+$": invalid argument ERROR: Service 'ollama-webiu' failed to build : Build failed What am i doing wrong?
Hi - this is due to a change in the webui code (for some reason they have switched to using arguments in the dockerfile which means it doesn't run automatically). I've updated the script that installs webui (now called open-webui) so that should fix things. let me know how you get on (you'll need to do a git pull in the folder where you run this to get the new changes)
@@kevinmcaleer28 I am sorry I am quiet new to github and I only used the clone function. I tried the pull function but I am not sure I use the right arguments. In documentation I understand you need to write it as "git pull webui webui" as an example but I am not sure.
Great video, Kevin!
Is there a way to give it a voice, and speech recognition?
Is there anything like this that can do image recognition? I do wound care and have always wondered when my computer could help with wound assessment - provide measurements, describe wound bed, maybe even temperature with the right camera.
I'm only at 15:30, but i get the impression you are not using TPU? I am just getting into raspberry pi and I'm looking for an excuse to get one. There only seem to be a couple of TPU options and I'm leaning toward the USB 3 version but I have no idea how to judge them.
pretty good stuff, i'm planning to install ollama on a nas xd, hope your channel grows up
Great show!
THX OLLAMA!
Thanks, Kevin! I have been running Ollama on a Pi5 for a few weeks now. But I look forward to following your docker example and trying out the GUI. My use case is robotics and still trying to solve how to run a Pi5 at full power on a DC battery. (harder than it seems)
I am waiting for my pi5, and I plan to also use it for robotics and potentially run some local LLM. I would be curious to hear how you solve the power. Not much info online about it. Seems like the pi5 is picky? Not having it in person though, I can't test. I was hoping to just use a buck converter and power the pi5 in some way through that. I haven't seen any definitive videos or articles on this.
True. There is almost nothing out on the Internet right now about using a Pi5 on dc battery source. (as of Feb, 2024). My current solution to be able to run all CPU cores at pedal to the metal, is using a small automotive inverter powered by a 36v Lion battery via a dc-dc BUCK to make 12vdc. Then plugging the OEM Pi wall wart into the inverter. It is the only way I have discovered to get max power. Even $150 200W power banks, claiming to be USB PD will not deliver more than about 10-12W max to the Pi5. Very frustrating as obviously a car inverter on a robot is a highly wonkified, not to mention going around three sides of a square in voltage conversion takes an efficiency toll.@@evanhiatt9755
True. There is almost nothing out on the Internet right now about using a Pi5 on dc battery source. (as of Feb, 2024). My current solution to be able to run all CPU cores at pedal to the metal, is using a small automotive inverter powered by a 36v Lion battery via a dc-dc BUCK to make 12vdc. Then plugging the OEM Pi wall wart into the inverter. It is the only way I have discovered to get max power. Even $150 200W power banks, claiming to be USB PD will not deliver more than about 10-12W max to the Pi5. Very frustrating as obviously a car inverter on a robot is a highly wonkified, not to mention going around three sides of a square in voltage conversion takes an efficiency toll. @@evanhiatt9755
Hi, what use cases in robotics if you can share. Thank you.
Sure. Happy to share. I am running Ollama on the Pi5 using a local large language model. Running a local LLM really taxes the Pi5 to max CPU while it is processing. So any amount of throttling due to low power, slows down the processing time. And the processing time of a 3B LLM, while acceptable, the outputs are dodgy. So, trying to run 7B models... They outputs are acceptable, but processing time is noticeably slower. Every second counts when trying to have a conversation with a robot and have any hope of it feeling natural. Side note, a Pi4 handles animatronic eyes, speech to text and text to speech. I am working on passing this data via MQTT to the Pi5, whose sole job is to run the LLM. So I am all ready dividing labor best I can. I might have to just give up on the Pi5 for LLM and use a NUC. Still trying to resolve the power issue. @@dariovicenzo8139
Thanks for the demo and info, have a great day
Is posible remake this video with the new Hat AI Hailo 8L?
nice video !
Nice tutorial , possible deploy ollama as stack in docker swarm as cluster improve performance?
it would be awesome if you could tell it to ssh to other devices via ssh and run commands
do you have a seed on your prompt? because if not you will generate the same prompt each time, with out some for of randomization a transformer model is more or less deterministic with most modern schedulers. If you do have a see you might have it cached.
Fantastic video. Thank you very much. I have a Coral TPU. Would it be possible to use it with Ollama on the Pi 5?
Large language models need a lot of ram to run, so things like the Coral are not really suited to them.
Thanks for this. I'm having trouble getting started. I have a fresh install of bookworm on a RPi5. I installed Docker and created the compose.yaml file as shown in your blog. I'm getting an error when I try to run "docker-compose up -d". "yaml line 1:did not find expected key". Do you know what I'm missing?
Can you share your compose file; its yaml so its very particular about spacing and formatting
@@kevinmcaleer28 I keep replying with the text from the file but I guess UA-cam is blocking it. I'm using the file you posted on your blog and made sure that the file contains Linux style line feeds instead of Windows.
@@BillYovinohave you joined our discord server (its completely free) - its easier to troubleshoot as you can share files and screenshots etc. Its hard for me to troubleshoot without looking at the file. There are YAML Validator sites you can run the file through, they will also tell you what is wrong
@@kevinmcaleer28 I'm not getting the confirmation email from Discord after signing up from your link. I've checked all of my email boxes including spam and trash.
The first time you send a prompt it the WebUI loads the model into memory, so the first response always takes the longest.
If I arleady have a service running on localhost:8080 is there a way to have Ollama run on a different port?
@kevinmcaleer, How would you run this on a proxmox server? would you use a lxc or full vm running ubuntu? Thank you
I've not got a lot of experience of lxc (Linux Containers for anyone else whos reading), is supposed to be faster than docker, but I wouldn't know how to create an lxc container to run Ollama in. Ollama needs quite a few Python libraries so any bare bones linux with Python installed should be ok). Ollama is VERY resource intensive so it may steal all the compute cycles from anything else running on Proxmox. Worth trying though
I was just wondering the same thing since I just installed proxmox on my home server
I've installed it in an LXC container on Proxmox. I'm using Proxmox on a 13th gen I7 machine that has hybrid cores. In Proxmox it's easy to dedicate certain cores to VMs but a bit more faffing to say what E cores or P cores to assign to LXC containers. (I have a Windows VM on Proxmox and pass through the GPU to that to use as my daily-driver so it's not available to me to use with my containers etc. normally and normally I use LM Studio in Windows and it's server, for when I want GPU powered LLM models).
All I've done so far is clone a priviledged Debian 12 LXC container (not very secure lol - should there be supply chain attacks etc. in the Python libraries for example), and I did the one-liner Ollama install in that and then tested with Postman.
You can limit in the Proxmox GUI how much memory or CPU cores to give to the LXC container. (It only gives the container what it asks for up to the limit, and of course the container doesn't need memory for it's kernel as that's shared with the host). I gave this container a limit of 8 cores (leaving Proxmox to decide) and I've found that the container only maxes out 4 of them when I send Ollama a request from Postman.
--
My plan is to set-up an unpriviledged container for Ollama. I still want to install it directly in the LXC rather than in a docker container. I plan to use Podman for WebUi - hosting Podman in the same container and using Portainer with Podman. I have another unpriviledged LXC container using Cockpit stuff to run as a SMB NAS, and using "pct set" in Proxmox I can assign my underlying ZFS dataset from that NAS container to be available in my Ollama container. That's useful to store models in so I only need store them once (on fast nvme), and be available to the various utilities that might consume them.
That's the plan anyway. Always limited by time. :)
I try to script everything now-a-days because I can't remember a thing lol. I'm hoping LLMs with RAG and memory and agency will help me out there!
Videos / channels like these really help with getting going quickly and can save a huge amount of time (thanks Kevin!). :)
nb: I hope I can run Podman in an unprivileged LXC container (with nesting) ... I can't remember now off the top of my head what problems might get in the way that might effect my planned implementation - it was so long ago when I last looked.
Just for experiments I'd use a VM for kernel separation and a privileged LXC container in that - efficient and easy to SSH into using Visual Studio Code remote explorer etc. so I can mess about with Ollama and running stuff besides it within the same "scope" etc. I'd rather have some sort of container rather than Python venv or anaconda etc. ... just less to think about in terms of isolation.
As a "commodity" sort of thing, just running / listening on my hardware, 24/7, I like the idea of using an LXC container for it rather than in docker just to strip away some of the management complexity and use Proxmox, and while idle I want it to use very little resource. It's for that scenario that reducing the security footprint if possible makes sense.
minute 5:03, how can you display linux activity % like that, sir?
Hi, I used the 'htop' command, it will show all the processes, and CPU, Memory and swapfile usage
Is there anyway to reliably increase the performance of a raspberry pi 5 to support the larger models in ollama?
Not that I'm aware of - the smaller models will run a lot faster, the pay off is in the accuracy/depth of knowledge contained in the model
installation starts at 19:20
@Clipzz.z I added chapters!
Thanks for liking my comment❤️❤️❤️
@ that’s very helpful. Thank you for that
Can I run ollama on a powerful system and serve the webui on another system?
Yep - you sure can
@@kevinmcaleer28 can you help me with it.
How about a video about this topic.
This looked interesting so I went to the github page and followed the instructions to run the docker version and the first thing I saw was a sign in screen with the need to create an account! Not really sure this counts as private at this point
That account is only stored on your machine (it’s not a cloud account). This is so you can provide this to many users, such as your family, school or business
@@kevinmcaleer28 Thanks, it wasn't clear. Quite a few of the "run you own server" apps seem to need you to log into them with no real necessity so I noped out. But lets try that again 👍
Ollama in Orange Pi 5 plus in NPU chip, it's possible?
is there a way to update the ollama it builds? i follow the video and is working fine but gemma just come out and it works only with the latest ollama 1.26... and when i followed the video it installed 1.23... what do i have to erase or what do i do?
Simply rebuild it using docker-compose build -no-cache and then docker-compose up -d
Can it run in a cluster on multiple RPi's?
ooh looks like it can on a kubernetes cluster!!!
Q how often will you need updates.
You don’t need to update it ever, if you want newer and better versions of the model then you can download them whenever you want (when they are released)
How are you accessing rasberry pi from another computer
I am able resolve this issue, thanks
Phenomenal
Probably is the version v0.1.119But it is not working anymore. It doesnt have the option to download and has server 500
Are you talking about ollama.com? I've just checked and the website is up and running
@@kevinmcaleer28 The webui doesnt have the option to download the model anymore.
It sends an error when you write dolphin-phi, for example.
The /ollama/api send status 500 and I was not able to use a local language model...
If you can clarify, maybe the new version of the webui doesnt allow to download anymore or the docke-compose file needs to be updated
Got another reason to get a few Pi5's.
Been there done that 15-minute response times are just not good. Building myself a small llm box that's going to be going against my creedence low power. High efficiency servers, but it's the only way to acceptably run llms and other inference models.
👍👍👍👍
Gonna nitpick here, ollama is not an LLM - from Langchain "Ollama allows you to run open-source large language models, such as Llama 2, locally."
Fair enough
Grato,
wheressssstherevat linnnnnnnblog pos lth istrutins plz???
Too slow
Which model are you using? The larger the model, the slower it is.
wtf !? container? docker? what happened to the simple exe file that runs with a mouse click? Most people are using windows, not linux. There is ollama for windows now. Is there any chance to install the WebUI without any additional program on windows? I dont want to have all that docker crap and dependencies on my machine.
Three things - 1) this video is about running Ollama on a raspberry pi, which doesn’t run Windows desktop or server - most people who use a raspberry pi run Linux on it, 2) docker makes it easier to add and remove stuff on your computer without leaving a trace, 3) the webui is just the front end; it needs Ollama to run in the background - I’m sure you’ll find a windows
Package 📦 for this. Hope this helps
People serious about self-hosting don't use Windows Server. And docker works on Windows too. Containerized environments are awesome and you obviously don't want to learn. Otherwise, stop complaining and do the research yourself.
Keep in mind, when you install an exe, all it has done is turned all of the same steps that you take going the long route and turn them into a single GUI type install to make you feel more comfortable and make things a little more streamlined.
I get this error
step 5/31 : FROM --platform=$BUILDPLATFORM node:21-alpine3.19 as build
failed to parse platform : "" is an invalid component of "": platform specifier component must match "^[A-Za-z0-9_-]+$": invalid argument
ERROR: Service 'ollama-webiu' failed to build : Build failed
What am i doing wrong?
you need to download the ollama webui first - step two: www.kevsrobots.com/blog/ollama.html
@@kevinmcaleer28 Yeah i followed the instructions but still get the error
Hi - this is due to a change in the webui code (for some reason they have switched to using arguments in the dockerfile which means it doesn't run automatically). I've updated the script that installs webui (now called open-webui) so that should fix things. let me know how you get on (you'll need to do a git pull in the folder where you run this to get the new changes)
@@kevinmcaleer28 I am sorry I am quiet new to github and I only used the clone function. I tried the pull function but I am not sure I use the right arguments. In documentation I understand you need to write it as "git pull webui webui" as an example but I am not sure.