Umm ok. There are a lot of folks just eager to do the right thing without getting much out of it. All I get is a cut of the ad revenue for ads UA-cam is going to show you anyway. And if I were in it for the money it would be easier to make more to cook fries at the local McDonald’s.
Great video again, thanks. Can you please make an instruction video about installing routeLLM, using the llama3.1 8b local with ollama and groq (or openai or claude) through internet?
@@technovangelist Yes sure, thanks for asking: basically if you already have a model downloaded, if you are without internet connection, you can install Ollama but you can't really add any models (as far as I can tell?) . Is there a way to "restore" or add models fresh from existing download in this scenario? Or perhaps we need to download the manifest and other files too to facilitate?
@@technovangelist interestingly i couldn't find anything about it - but for example I've had to redownload 40GB models again through Ollama pull/run even though I already have the model. Then dump the origin and make links to use the newly downloaded model in other app.
If you have the model in ollama and don’t remove it you won’t have to dl it again. Sometimes folks run ollama serve at the cli and then you have a second user running ollama and it will dl to a different directory. If you dl a model for a different tool however ollama will never see it until you add it to ollama. But most other tools only distribute the weights while ollama distributes the entire model.
Perhaps this is out of scope for ollama. Can Ollama run the local model on the laptop with both the GPU and CPU? GPU NVidia rtx 4090 16GB VRam. CPU 32 cores 128GB Ram. I do notice the small models auto run on the GPU when they fit the vram. Larger models just run on the CPU/RAM not the GPU. Can the GPU be used to buffer/process larger then VRAM models concurrently with the cpu/ram?
Gpu is Orders of magnitude faster than cpu. You want as much as possible on gpu. If any runs on cpu it’s because you don’t have enough resources on gpu
Thank you Matt. I am using Ollama llama3 8b (4.7GB) for our RAG system in Tesla T4 (1 GPU 16 GB) memory.There is a slowness in it while questioning - not sure how to increase the performance within it. Please check if you can help.
Nice. I find it's a bit inconsistent with the models running offline because sometimes they run and other times they try to download each time. Getting myself 128gb of ram soon so I can test out the 33b model I tried on my 32gb and it ran so slow lol.
If you run it as different users then it will download the same model twice. So run it as the default service and then another time run ollama serve from the command line then you will get two copies.
@@technovangelist Thanks, it popped up with the update prompt in the taskbar and after updating it stopped doing it so maybe it was a known issue or file integrity. I installed 3.1 last night. 405b won't run(understandably lol), 60b is 1 word every 20 seconds slow and 8b runs but often displays an error and breaks "Error: template: :28:7: executing "" at : can't evaluate field ToolCalls in type *api.Message"
Thanks 🌹 I have problem with download model due to low internet speed, I have pc on other locations with high speed internet. Can copy download model with there manifest files from other pc to my pc, IF can please explain it 🙏🙏🌹
Thanks, Matt! Now, I'm one step closer to "All your base are belong to us!"
Respect!
Thank you so much for making this course! This definitely gets anyone interested to get started using ollama
This is pretty close to exactly the level of abstraction I need. Looking forward to more!
Thanks, Matt!
Thanks Matt! Very helpful!
love it. Looking forward to your next videos!
what do you mean, "this video is already long enough"? I could have gone on for another hour. love your work, Matt!
Best introductory video. One bit at a time for dumb people like me.
I know I've been "cautious" of you and some of your videos, but I do appreciate what you are doing, including this course. Thank you.
Cautious? What does that mean in this context.
@@technovangelist Just spidey senses, with nothing to back them up.
Umm ok. There are a lot of folks just eager to do the right thing without getting much out of it. All I get is a cut of the ad revenue for ads UA-cam is going to show you anyway. And if I were in it for the money it would be easier to make more to cook fries at the local McDonald’s.
@@technovangelist see , he was right !!!! you're shilling for McDonald’!.
Thank you, Matt. I was eagerly anticipating the initial video to start the course. I really enjoy the videos you create.😄
waiting for the typescript part, thank you for making this course!
Wonderful, Matt! Thanks a lot for this.
Great! Can‘t wait for more. 🎉
Short, sweet and to the point.
Great first course video! Thank you.
I really enjoy your videos, thank you for this introduction to Ollama.
The discord link you mention in the video is invalid - managed to join via the link in the description
What GUI are you using in general? For a relative novice so I won't be building super sophisticated products yet
i don't. I just use Ollama because the CLI is always faster than anything else.
Great video again, thanks.
Can you please make an instruction video about installing routeLLM, using the llama3.1 8b local with ollama and groq (or openai or claude) through internet?
much appreciated matt, keep it up
this is the missing link.
Would you be able to cover how we can install a model from a local repository?
Can you tell me more? Install what model from what local repo from what source.
@@technovangelist Yes sure, thanks for asking: basically if you already have a model downloaded, if you are without internet connection, you can install Ollama but you can't really add any models (as far as I can tell?) . Is there a way to "restore" or add models fresh from existing download in this scenario? Or perhaps we need to download the manifest and other files too to facilitate?
I feel like someone asked this exact question somewhere else today.
@@technovangelist interestingly i couldn't find anything about it - but for example I've had to redownload 40GB models again through Ollama pull/run even though I already have the model. Then dump the origin and make links to use the newly downloaded model in other app.
If you have the model in ollama and don’t remove it you won’t have to dl it again. Sometimes folks run ollama serve at the cli and then you have a second user running ollama and it will dl to a different directory. If you dl a model for a different tool however ollama will never see it until you add it to ollama. But most other tools only distribute the weights while ollama distributes the entire model.
Perhaps this is out of scope for ollama. Can Ollama run the local model on the laptop with both the GPU and CPU? GPU NVidia rtx 4090 16GB VRam. CPU 32 cores 128GB Ram. I do notice the small models auto run on the GPU when they fit the vram. Larger models just run on the CPU/RAM not the GPU. Can the GPU be used to buffer/process larger then VRAM models concurrently with the cpu/ram?
Gpu is Orders of magnitude faster than cpu. You want as much as possible on gpu. If any runs on cpu it’s because you don’t have enough resources on gpu
Awesome info. I understood what a Model is. But One thing i still didn’t get, what exactly is Ollama it self? An Ai Models interface:manager?
A model by itself doesn’t do anything. It needs software to run it. That’s ollama
@ Ah like the infrastructure to run Models. Like the relationship between OS and Applications.
Maybe. Or like extensions in chrome.
Keep going. 🥳
Thanks Matt!!!!
Thank you Matt. I am using Ollama llama3 8b (4.7GB) for our RAG system in Tesla T4 (1 GPU 16 GB) memory.There is a slowness in it while questioning - not sure how to increase the performance within it. Please check if you can help.
Get a better gpu
@@technovangelist thank you. I will try
Nice. I find it's a bit inconsistent with the models running offline because sometimes they run and other times they try to download each time. Getting myself 128gb of ram soon so I can test out the 33b model I tried on my 32gb and it ran so slow lol.
Once you download a model you will never have to download it again unless you delete them
@@technovangelist Strange then. I have gigabit so it's only mildly inconvenient.
If you run it as different users then it will download the same model twice. So run it as the default service and then another time run ollama serve from the command line then you will get two copies.
@@technovangelist Thanks, it popped up with the update prompt in the taskbar and after updating it stopped doing it so maybe it was a known issue or file integrity. I installed 3.1 last night. 405b won't run(understandably lol), 60b is 1 word every 20 seconds slow and 8b runs but often displays an error and breaks "Error: template: :28:7: executing "" at : can't evaluate field ToolCalls in type *api.Message"
Thanks 🌹
I have problem with download model due to low internet speed,
I have pc on other locations with high speed internet.
Can copy download model with there manifest files from other pc to my pc,
IF can please explain it 🙏🙏🌹
If you have a laptop take a visit to the coffee shop or McDonald to do so. They have not bad speed.
1. I not have laptop 😊
2. No net Coffee or Macdonlas on my regain 😢
if you have a machine on a better connection you can pull the models there and copy them over. make sure all the files in the manifest make it over.
MATT the MAN!
Discord invite looks invalid 🙁
Sorry about that. It’s fixed now. Thanks for pointing it out.
Ah - that discord... says invite expired ! ?
there is a different one in the description
discord.gg/uS4gJMCRH2
Nice!
Why does he talk like that?