I tried running the 7B model on my phone the Redmi Note 13 Pro 5G with 12GB of ram and the speed is pretty amazing i am getting around 2 tokens /s for long answers. Nice video as always 👏👏
on gentoo, even on the pi, you can install it via repo if you pull in guru, open-webui is also available via pip, you can pull it into a venv with that instead of using docker.
Any significant performance over docker? Any big simplification over docker. I don’t know anything about gentoo so I’m not trying to knock it. I really don’t know. Is gentoo angling to be very AI integrated?
@@frederickwood9116 to be clear, open-webui is in pip, which is for every distro, not a gentoo thing, it's just python's built in manager. (you use it with a venv to auto fetch all dependencies and build a folder for it). Performance-wise it shouldn't be TOO different, but you're able to easily run updates via your normal system and aren't reliant on someone else building images for a closed ecosystem. As for the gentoo/ollama part, there's generally a decent performance bump by doing a native build, as gentoo is less of a distro, more of a build system, guru repo is for bleeding edge stuff
I’m still loving your organic and upfront style. Thanks for sharing this one too. The AI space is very topical. Here is something I’m looking for. A local AI installed tool with the ability to use on line AI services to solve more complex situations using anonymisation processes to maintain privacy. The focus is for local system help, troubleshooting, potentially fixing etc. So the inevitable tangle of replacing window managers or desktop environment or whatever will no longer be so finger burning. There will need to be log file reading and config file reading. A database for that data. I would love for the IA tool to “learn” something about the system to reduce effort as time goes on. Possibly the install is on a separate host like a raspberry pi or an orange pi 5 with 16 G or more ram. I don’t know if that is feasible. Possibly you or one of the community here can comment ! Thanks again.
I did exactly this last night!! 1 thing i did run into with the raspberry pi was when pulling model files, it would stop and restart the dl a ton of times (i guess it couldn’t write to the SD card quick enough). You just need to hit Ctl-C every so often so it saves the progress or you can build a little shell script to do this. Fun little project. I tried 8b as well just for laughs😂
@nightcrows787 depends on the question had as high as 19 and as low as 4 tokens. Definitely usable but limited with the 1.5b parameters. It seems like a good option though if you need info and are without Internet connection
I think Alpaca is pretty great, much easier to install as a Flatpak, and you can simply point it at your locally-running Ollama if you want to use one of the Ollama CLI's models (instead of the built-in Alpaca-Ollama models).
I run it on i5 6500T :) and asked to show me how to install docker in Ubuntu 24.04. so results are not that much accurate as deep seek orginally. but still it is awesome excercise ! :) thank you!
if i use OpenAI or any other USA technology, i can't help but be reminded of USA's torture camp in Guantanemo and its genocide in the Mid East .. not to mention its intense security state surveillance via NSA/PRISM.
@@jonjohnson2844 not everyone is talentless enough to be scared of ai you know? maybe they just hate the environmental impacts of it. either way being condescending gets you nowhere.
if you haven't used it it's got much cleaner output with it's markdown support, a full plugin engine for custom tools, integrates with things like automatic1111 for image generation, and a pretty nice manager interface for saving off presets with different knowledge sets, system prompts, etc. It has a community with a pretty large number of plugins you can just drop in. It basically does a LOT more than just give you a chat interface. I'd prefer if there was a non-gtk/non-qt native app that did all this but right now there just isn't one that isn't little more than just the chat functionality. Being web does make it nice any easy to host on a separate machine on your network too though, so that's at least one plus.
It's fun to run local llm, but I don't think there is much point in running it without a gpu. I tried r1 1.5b model and it is not very useful in the end. In the end I will still use their web version, since I can't run anything better than 1.5b. I don't get argument about data going to China either, it's not like ChatGPT data isn't logged on a server. Probably I will try to run r1 again, when I can run at least 8b model or bigger.
I like having local llms to support me with coding ideas I have or to remember commands that do specific stuff on Linux machines. What annoys me about Deepseek is that I see all its thinking process 😅Qwent and Llama don't do that which feels more like a conversation and less like a talk to someone with split personality.
it hides it by default on open-webui, though tbh half the fun of this model is seeing it's scatterbrained approach at it, lol. I would never even think to use it for code though, codellama is way, way better for that.
Think of it as a programming language that’s verbose, you don’t care about the verbose part until something went wrong, which you can debug the thought of the LLM so that you can know where went wrong
Why is the whole industry calling this "open source". It's not. You are just able to download the binary brain of the LLM. The code and data that is needed to create the LLM itself are not open at all or am I misataken?
It's truly disheartening to see a tech UA-camr I admire discussing politics without fact-checking. Why, even in 2025, do people in the West remain so arrogant?
I get that deepseek-r1 is the "hot new thing" but if you can't run much more than the 1.5b, you're better off running llama3.2. There is a 1b and a 3b and the 3b is remarkable.
I tried running the 7B model on my phone the Redmi Note 13 Pro 5G with 12GB of ram and the speed is pretty amazing i am getting around 2 tokens /s for long answers.
Nice video as always 👏👏
on gentoo, even on the pi, you can install it via repo if you pull in guru, open-webui is also available via pip, you can pull it into a venv with that instead of using docker.
Any significant performance over docker? Any big simplification over docker. I don’t know anything about gentoo so I’m not trying to knock it. I really don’t know.
Is gentoo angling to be very AI integrated?
@@frederickwood9116 to be clear, open-webui is in pip, which is for every distro, not a gentoo thing, it's just python's built in manager. (you use it with a venv to auto fetch all dependencies and build a folder for it). Performance-wise it shouldn't be TOO different, but you're able to easily run updates via your normal system and aren't reliant on someone else building images for a closed ecosystem.
As for the gentoo/ollama part, there's generally a decent performance bump by doing a native build, as gentoo is less of a distro, more of a build system, guru repo is for bleeding edge stuff
Good speed on the Pi 5
Well, it's dramatically reduced, but yeah, overall impressive performance.
I’m still loving your organic and upfront style.
Thanks for sharing this one too.
The AI space is very topical.
Here is something I’m looking for.
A local AI installed tool with the ability to use on line AI services to solve more complex situations using anonymisation processes to maintain privacy.
The focus is for local system help, troubleshooting, potentially fixing etc. So the inevitable tangle of replacing window managers or desktop environment or whatever will no longer be so finger burning.
There will need to be log file reading and config file reading. A database for that data. I would love for the IA tool to “learn” something about the system to reduce effort as time goes on.
Possibly the install is on a separate host like a raspberry pi or an orange pi 5 with 16 G or more ram.
I don’t know if that is feasible. Possibly you or one of the community here can comment !
Thanks again.
I did exactly this last night!! 1 thing i did run into with the raspberry pi was when pulling model files, it would stop and restart the dl a ton of times (i guess it couldn’t write to the SD card quick enough). You just need to hit Ctl-C every so often so it saves the progress or you can build a little shell script to do this. Fun little project. I tried 8b as well just for laughs😂
Amazing, thank you! I will try it
Did this on a pixel 8 pro through termux 😅
thanks
U guys all are persistent 😂
How was the performance on the cellphone model ?
@nightcrows787 depends on the question had as high as 19 and as low as 4 tokens. Definitely usable but limited with the 1.5b parameters. It seems like a good option though if you need info and are without Internet connection
it would probably be faster
The lightest deepseek model has absolute pudding for brains, but the 22gb version is actually of some use!
Yeah but sadly it's not so easy to run.
I think Alpaca is pretty great, much easier to install as a Flatpak, and you can simply point it at your locally-running Ollama if you want to use one of the Ollama CLI's models (instead of the built-in Alpaca-Ollama models).
I'm surprised to see Ollama already has the model up on their page.
No, I NEED to build a cabin in the woods.
Valid
The government will Ruby Ridge you
Which Pi model were you using ?
I run it on i5 6500T :) and asked to show me how to install docker in Ubuntu 24.04. so results are not that much accurate as deep seek orginally. but still it is awesome excercise ! :)
thank you!
Awesome brother
if i use OpenAI or any other USA technology, i can't help but be reminded of USA's torture camp in Guantanemo and its genocide in the Mid East .. not to mention its intense security state surveillance via NSA/PRISM.
Valid.
and also, america wants corporate monopoly.
will this work on raspberry pi 4 4GB?
Is there a way to use Internet with the local model for web search etc?
Nope. Think!
If you can program it shouldn't be too hard to integrate it with a scraper
I am not a fan of AI, but like the content that you put out. Passing on this one, but watch all of your other stuff. Thanks for sharing what you do.
Sums up how I feel. No AI garbage.
Let me guess, you’re an experienced developer but LLMs are already writing better code than you?
@@jonjohnson2844 not everyone is talentless enough to be scared of ai you know? maybe they just hate the environmental impacts of it. either way being condescending gets you nowhere.
@@TheuppercaseM You can be very talented and still have your job taken by AI. You're behind the times, ma boy.
Cool stuff. What is the advantage of that convoluted webui install over Alpaca, which works great in Gnome?
if you haven't used it it's got much cleaner output with it's markdown support, a full plugin engine for custom tools, integrates with things like automatic1111 for image generation, and a pretty nice manager interface for saving off presets with different knowledge sets, system prompts, etc. It has a community with a pretty large number of plugins you can just drop in. It basically does a LOT more than just give you a chat interface. I'd prefer if there was a non-gtk/non-qt native app that did all this but right now there just isn't one that isn't little more than just the chat functionality. Being web does make it nice any easy to host on a separate machine on your network too though, so that's at least one plus.
@1Raptor85 Ah those are some clear diffs. Thanks for the reply.
It's fun to run local llm, but I don't think there is much point in running it without a gpu. I tried r1 1.5b model and it is not very useful in the end. In the end I will still use their web version, since I can't run anything better than 1.5b. I don't get argument about data going to China either, it's not like ChatGPT data isn't logged on a server. Probably I will try to run r1 again, when I can run at least 8b model or bigger.
Does this make any sense?
I like having local llms to support me with coding ideas I have or to remember commands that do specific stuff on Linux machines.
What annoys me about Deepseek is that I see all its thinking process 😅Qwent and Llama don't do that which feels more like a conversation and less like a talk to someone with split personality.
it hides it by default on open-webui, though tbh half the fun of this model is seeing it's scatterbrained approach at it, lol.
I would never even think to use it for code though, codellama is way, way better for that.
Think of it as a programming language that’s verbose, you don’t care about the verbose part until something went wrong, which you can debug the thought of the LLM so that you can know where went wrong
Sehr gut
Why is the whole industry calling this "open source". It's not. You are just able to download the binary brain of the LLM. The code and data that is needed to create the LLM itself are not open at all or am I misataken?
Code is available in github !!
It's truly disheartening to see a tech UA-camr I admire discussing politics without fact-checking. Why, even in 2025, do people in the West remain so arrogant?
I'd be more worried about all the stuff you type that goes to your own government. You know, the ones who can actually use it against you.
Nobody should run AI from a web user interface you don’t control, otherwise servers in AmeriKKKa or other countries will be logging it.
What if you use depseek over the internet? I don't understand shit about this, sorry
0:20 Chatgpt literally does the same thing 😂
And OpenAI is not open-source.
You need CONTROL over your data. OpenAI is NOT open. Everything you type is recorded on servers in AmeriKKKa.
Love you 0:25
nah 1.5b is dumb as hell it repeats itself too much and only interesting for a few minutes
Why we need to run stupid AI?
👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍
You NEED to run from DeepSeek!
FIRST
dang it
Your mom
I get that deepseek-r1 is the "hot new thing" but if you can't run much more than the 1.5b, you're better off running llama3.2. There is a 1b and a 3b and the 3b is remarkable.
Bro give it a rest everyone hates AI. Nobody likes it. Its made with intent to destroy humanity.
This is probably the first time I have even mentioned AI directly.
Hail AI!!!
Blasphemer! Bow down to your new overlords!!!
When you can "video call" your AI to give you realtime directions on how to cook something ...
So called 'AI' are programmed statistical learning methods. Fancy word salad mixers. Effin hate em meself.