In the beginning you asked "why" to use a local LLM, I think you forgot, "Online connectivity". I do sometime take my laptop to place where I have no WiFi or do not think the WiFi is secure, but I still want to use a LLM to analyze text and scripts.
from home page of your webUI localhost3000 in your browser, click on your account name in the lower left, then click settings, then "models", then you can pull llama3.1 by typing it in the "pull" box and clicking the download button. when it completes, close webUI and reopen it. then i had the option to select 3.1 8B from the models list
Note for 405B: We are releasing multiple versions of the 405B model to accommodate its large size and facilitate multiple deployment options: MP16 (Model Parallel 16) is the full version of BF16 weights. These weights can only be served on multiple nodes using pipelined parallel inference. At minimum it would need 2 nodes of 8 GPUs to serve. MP8 (Model Parallel 8) is also the full version of BF16 weights, but can be served on a single node with 8 GPUs by using dynamic FP8 (Floating Point 8) quantization. We are providing reference code for it. You can download these weights and experiment with different quantization techniques outside of what we are providing. FP8 (Floating Point 8) is a quantized version of the weights. These weights can be served on a single node with 8 GPUs by using the static FP quantization. We have provided reference code for it as well. 405B model requires significant storage and computational resources, occupying approximately 750GB of disk storage space and necessitating two nodes on MP16 for inferencing.
you should integrate a monthly 1000 dollar payment into my bank account.. thats a good idea too. I am affraid LLM's are the way of inputting and outputting, it is other applications, software and hardware that does stuff like that. i.e. a browser to display css, the web ui and LLMs use Markdown not html so cannot do stuff like youtube embeds. Besides, F12 on most browsers will give you that anyway
finally setup open webui thanks to you. i'd approached it, seen "docker" and left it on my todo list for weeks/months. I'm running gemma2 2b on my gtx 1060 6gb vram. any suggestions on good models for my size?
despite my model being listed with ollama list it unfortunately doesnt show up in the webui as an option not sure what to do since i am not skilled in such things
Thanks so so much for this I'd been struggling with it for so long. So I usually have this problem where it's really slow and if I try to reference a document like you did, it just keeps loading and never responds. I did everything you did except that I use phi model instead of llama 3.1. could this be the reason? Thanks in advance😊
hello. After installing OpenWebUI, I am unable to find OLLAM under 'Select a Model'. Is this due to a specific configuration? For information, my system is running Ubuntu 24.04.
hey Dan can you help me out I have an issue i cant figure out, i usedto host ollama webui locally and online on a server, but im not sure why its not working anymore
I installed this under WSL on Windows 11 and it's really slow. is it because it's under WSL and not native on my windows box?! I have a 3080ti GPU and i9 processor and yours is MUCH faster than mine.
Which program stores the local user data? Ollama or Web UI? Data like improvements to the model, chat history. How do multiple users work, which program does that? Can different users access other users' data? Does 1 user "improving" the model affect other users' conversations? How can you completely reset all the environment?
2 videos in 1 day? Woah! Thanks
indeed, as the second will reduce views of the first so it wasnt for self benefit to upload 2
Interesting tutorial with Web UI and Ollama, Thanks!!!
Excelente explicacion !!!!! , simple y directo a la vena como dicen aqui en mi pais
cuidado open web ui. de verdad, es malefico. haya mejores. gpt 4 all, follamac, alpaca
Big thanks from Palestine
❤💚🖤
be safe
Stay strong
Stay safe from evil.zionists
Love from USA free Palestine
Great one Dan! Keep ups updated on the AI stuff!
In the beginning you asked "why" to use a local LLM, I think you forgot, "Online connectivity". I do sometime take my laptop to place where I have no WiFi or do not think the WiFi is secure, but I still want to use a LLM to analyze text and scripts.
How can I connect my local ollama3 with webUi, My webUI couldn't find the locally running ollama3
same problem
from home page of your webUI localhost3000 in your browser, click on your account name in the lower left, then click settings, then "models", then you can pull llama3.1 by typing it in the "pull" box and clicking the download button. when it completes, close webUI and reopen it. then i had the option to select 3.1 8B from the models list
@@MURD3R3D i found that happen due to docker networking.
I faced similar problem. Restarting the system, Starting Ollama, starting the docker desktop and container solved the issue for me.
Hello! Which software is used to make this video?
thanks in advance
Dan, what the specs for your local machine?
Hello, thank you for your video.
Could you please let me know if I can use Llama 3.1 on my laptop, which only has an NVIDIA GeForce MX330?
I am runnung llama3.1 on my Alien R17 without issue.
in Ollama Is there an admin dashboard for tuning the model, sir?
Note for 405B:
We are releasing multiple versions of the 405B model to accommodate its large size and facilitate multiple deployment options:
MP16 (Model Parallel 16) is the full version of BF16 weights. These weights can only be served on multiple nodes using pipelined parallel inference. At minimum it would need 2 nodes of 8 GPUs to serve.
MP8 (Model Parallel 8) is also the full version of BF16 weights, but can be served on a single node with 8 GPUs by using dynamic FP8 (Floating Point 8) quantization. We are providing reference code for it. You can download these weights and experiment with different quantization techniques outside of what we are providing.
FP8 (Floating Point 8) is a quantized version of the weights. These weights can be served on a single node with 8 GPUs by using the static FP quantization. We have provided reference code for it as well.
405B model requires significant storage and computational resources, occupying approximately 750GB of disk storage space and necessitating two nodes on MP16 for inferencing.
and what about 70B? How it could be served? Could some of llama 3.1 be used by simple 16-cores laptop with integrated GPU and 32GB ram?
When you say "we" do you work for meta?
@@isaac10231 im reprinting from release notes.
Understand?
Love your terminal, which tools do you use to customize it?
Loved this 🤩😍
Ollama should integrate a feature like artifact that allow you to test your html css code in a mini webview
you should integrate a monthly 1000 dollar payment into my bank account.. thats a good idea too. I am affraid LLM's are the way of inputting and outputting, it is other applications, software and hardware that does stuff like that. i.e. a browser to display css, the web ui and LLMs use Markdown not html so cannot do stuff like youtube embeds. Besides, F12 on most browsers will give you that anyway
Five Stars ***** - Thanks for sharing.
Does Open WebUI support creating an API endpoint for AI models or is it just a chat UI?
does expose the models as a RESTful API ?
no, but ollama does. See docs.
finally setup open webui thanks to you. i'd approached it, seen "docker" and left it on my todo list for weeks/months. I'm running gemma2 2b on my gtx 1060 6gb vram. any suggestions on good models for my size?
I have spent all morning trying to getup and running. I can get ollama running and also open webui on port 3000. But there are no models on the web ui
If you got ollama installed you need to install a model. What happens if you run ollama list
@@DanVega
deepseek-coder-v2:latest 63fb193b3a9b 8.9 GB 2 hours ago
llama3.1:latest 42182419e950 4.7 GB 6 hours ago
this is my desktop.
@@DanVega
Ollama list shows deepseek-coder-v2 id:63fb
Llama3.1:latest id:4218
Any idea why the port isn't popping up for me in Docker? Tried the generic address numbers in Chrome and couldnt find the webui
despite my model being listed with ollama list it unfortunately doesnt show up in the webui as an option not sure what to do since i am not skilled in such things
Thank you
Is WebUI a replacement for aider?
Do you mind telling us what your Mac book specs are?
How did you connect to the api?!
hello, any idea how to set keep_alive when running the windows exe ?
Hey,
Its nice
Can you list all the specs of your machine, so for running 8b/9b model?
is it normal for docker to take up 15gb of ram on your machine?
Is there an integration for Open WebUI + Spring AI?
Thanks so so much for this
I'd been struggling with it for so long.
So I usually have this problem where it's really slow and if I try to reference a document like you did, it just keeps loading and never responds. I did everything you did except that I use phi model instead of llama 3.1. could this be the reason?
Thanks in advance😊
How we can tune a model with custom data?
hello. After installing OpenWebUI, I am unable to find OLLAM under 'Select a Model'. Is this due to a specific configuration? For information, my system is running Ubuntu
24.04.
hey Dan can you help me out I have an issue i cant figure out, i usedto host ollama webui locally and online on a server, but im not sure why its not working anymore
Thanks DAN, good video!
It runs so smooth. Sorry I am a new subscriber.
I want to know what is your computer hardware for my reference. Many thanks!!
A perfect tutorial.
Very nice
Is it possible to build it out to monetize ?
I installed this under WSL on Windows 11 and it's really slow. is it because it's under WSL and not native on my windows box?! I have a 3080ti GPU and i9 processor and yours is MUCH faster than mine.
This is great. Thank you
Good stuff Big Dawg!
Thank you, I tried it but it is very slow, running it on a laptop with 16GB RAM!
my ollama running same model is deadslow, running in laptop i5 11th gen without GPU 26GB Ram.
Is it because of no dedicated GPU?
No GPU is a deal breaker for perf.
where do I get spring-boot-reference.pdf ?
this Is really helpful
Would make a video on how to integrate llama 3 to wordpress website, making chatbot or co pilot
Hey, could you make a video on how to edit the login page? I want to make the login page to my liking.
ask your LLM to restyle it for you... same as when you want to know the time you dont ask your friend, you look at your phone
Which program stores the local user data? Ollama or Web UI? Data like improvements to the model, chat history. How do multiple users work, which program does that? Can different users access other users' data? Does 1 user "improving" the model affect other users' conversations? How can you completely reset all the environment?
Bro you the G
finnaly my gpu has other task than gaming
You skipped configuration of WeUI. It's unfair. 😢 Excellent video, but without this important thing it will not work.👎
Talk to much
6 months behind everyone else.