I picked up a 12gb 3060 from my local microcenter for a pretty good price, and am now running Ollama with Open WebUI for the frontend, and they have a community repository for prompts and modelfiles. The biggest hurdle was passing the gpu through my proxmox host to my vm.
Another great flag this Sam. I would be interested to see this running so you can make API calls.. currently using Text Gen Web UI as a server and this looks like it would be a good alternative.
Nice, thank you. It runs so much faster than Text Gen Web UI! I wish they'd make it easier for you to add a custom choice of models though (that is a real draw back).
السلام عليكم ورحمة الله وبركاته يسعدني ان اكتب إليكم راجين منكم المساعدة في كيفية العمل ب ollama في العمل بكتب pdf . ويمكنك ان ترشدني إلى مواقع لأجل ذلك وشكرا جزيلا.
Its not just about been techincal. Its also about been productive. Do you want to spend your time building something useful or time trying to figure out how a badly maintained and documented piece of software works.
LLama2 unsencored was quite surprising for me x) By the way. THANK YOU FOR yOUR VIDEO Every time i need to use Ollama I'm using your video to be sure of the command "Ollama run" hahah
Thanks Sam Great video. These are some of the best videos on AI tools. I need to master this for my work and your approach to communication really works for me. Cheers. Keep up all things Langchain, please.
Ollama for windows is out and available for download. I am testing it and it works fabulous but its very slow for me on windows. I don't think its using my nvidia GPU and can't seem to find a way to hook the GPU in under windows. But just got started, I love the fact that it is serving to a local http port as well as command line.
Great video, as usual :) I have been using Ollama on Linux and it has been working great. I know that Ollama can be used via API but I was wondering whether its API is compatible with OpenAI API and can be used as a replacement for OpenAI API inside LangChain. Looking forward to more videos about Ollama. Thank you.
Best video ever. Thank you VERY much for making it, it helped this noob so much! Unfortunately, it says it wont install the uncensored version, blah blah..As a responsible AI language model, I am programmed to follow ethical and
Thank you Sam for posting this video. Very accessible, clearly explained. Question: what I could not see is if ollama enables you to choose the model size. I.e., whether you want llama2 7b, 13b or 70b for example.
Fantastic video Sam, thank you! Ollama looks great but the big, 70B models still remain beyond the reach of typical RAM. Do you know of any way, (be it API or other) to get access to Llama70B and be able to run arbitrary tokens on the model? There are some APIs like TogetherAI but they only let you to run the endpoints like /prediction, not much more.
There are bunch of alike tools (simple to use for non technical ppl). The most prominent is gpt4all. Yeah from the guys who fine tuned first llama back in march/April on their own handcrafted datasets. These guys from ollama were definitely inspired by docker, based on the syntax and architecture :)
Ollama uses llama.cpp under the hood… the fastest LLM Inference Engine. Period. Many other apps also use llama.cpp: Kobold, Oogabooga, etc Many other apps use Python inside … easier to build much much slower performance.
@@IanScrivener They have the llama.cpp running on Metal on macs right. It feels like it is more than just on cpu etc. honestly haven't looked under the hood much it.
LM Studio for Windows and Mac is a great way to achieve the same with a lot less setup! Also has a great internal model browser which suggests what models might run on your machine.
It can also run as an API with a click in the UI. Definitely been the easiest way for me to test out a load of different LLMs locally, nice user interface with history and markdown support.
Lm studio is a proprietary software. God only knows what else it is doing on your PC - gathering and sending out your data while you are sleeping, mining bitcoins, using your PC as exit node for TOR, keylogging everything you type - you can only guess
Agree, LM Studio is great. Can be run in OpenAI API mode.. which replicates OpenAI's API format.. and so can be easily used with langchain, LlamaIndex etc
@@IanScrivener Yeah, I'm hoping to get it setup to use the API via LM Studio with Microsoft AutoGen which provides a multi agent work flow with a code Interpreter.
Hi, i installed llama 3.2 2 days ago and had no problems at all, but yesterday i got the following message when i try to run it: zsh: command not found: ollama I reinstalled, it worked fine, and today, i get the same zsh command not found thing. any ideas?
I'm new to this topic and I just binged your videos. How does this compare to vLLM from your previous video? I get ollama is more user-friendly, but I'm more curious about the performance?
# New Software In the past: It only runs in WIndows, but maybe in a few years it will be available on MacOS and one day but probably never on Linux. Today: At the moment it supports MacOS and Linux, but apparently Windows support is coming soon as well.
Hi I am using ollama for past 2months, yes its giving the good results but what i need is it is possible to set the configuration file for ollama like setting the parameters for ollama to get a most accurate results can you make one video about how to set the custom parameters.
It's taking more time for me to get responses from local model not as fast as yours. Can you please tell me what processor you are using? What are minimum hardware requirements to run LLM models to get faster responses?
If the used model talks to an api, how it this a local usage? Id like to know where the prompt data goes to? Will it go to a database and the Model loads it after? Or is the model hosted seperatly in a monitored env? My basic question is, who gets the data from the input prompt?
the data is only on your machine. It is all running locally. It can run an api on your machine and you can then expose that if you want to use it from somewhere else. If you are just using it on your machine all data stays on your machine.
Thank you so much. I am having problem running models downloaded from hugging face having safetensor file. I have these files in oobabooga/text-generation-webui. I have to use this for ollama. I followed everything, even created modelfile with path to safetensor directory, but it is not running >> ollama create model_name -f modelfile. Please help me.
In fact, it was quite easy to install ollama on Windows 10 using Windows Subsystem for Linux (WSL). In a Windows command prompt: wsl --install -d Ubuntu (this downloads and runs the Ubuntu distribution giving a Linux prompt) ollama pull llama2:13b (this downloads the selected model) ollama run llama2:13b (this runs the selected model) At this point you can white user test that will be sent to the model. This did not work for me, the keyboard input is not correctly directed to the application. This is possibly a compatibility issue with this Linux emulation. But I could fully use the downloaded models from simple Python programs directly or through Langchain.
It's a good question how we should refer to such models. It's not 100% Foss compliant because of the restrictions which come into place if you have like 700 millions of users, if my memory serves me well. But this is more like restriction for a couple of companies like ms, Google, tiktok. Who cares about them? Or am I missing something bigger?
It's not open-source because you can't reproduce it without the source (training data)... Just making the equivalent of binaries available for commercial use doesn't make something "open-source..."
I run only locally and cloud services are anyway blocked to our region (quite many people don't have access to such, more than 2 billions, China+dozen others countries, mostly by political non-scentific logic). And hardware allow such, thankfully to China which recycle servers and bringing on market quite a secret chips from Intel, like Xeon 22 cores, which was never released outside enterprise, it costs only 150 bucks. My motherboard Asrock X99 Extreme4 become defacto standard in China for such socket, also 150 bucks, can be filled with 256gb Ram, i've bought mine in 2020th during Gpt2, which was impossible to run locally in it's max size 1558millions, there wasn't any of current tools, i was able to run by terminal 774millions on Gpu and it's was a mess of text.
I just did the ollama install yesterday, you are awesome for being able to produce these so quickly.
Thanks, Ollama.
I picked up a 12gb 3060 from my local microcenter for a pretty good price, and am now running Ollama with Open WebUI for the frontend, and they have a community repository for prompts and modelfiles. The biggest hurdle was passing the gpu through my proxmox host to my vm.
Definitely has a docker vibe. I like it!
ollama is now available for windows (windows 10 or above)
nice video. I managed to create a customized model by watching this.
Another great flag this Sam. I would be interested to see this running so you can make API calls.. currently using Text Gen Web UI as a server and this looks like it would be a good alternative.
Nice, thank you. It runs so much faster than Text Gen Web UI! I wish they'd make it easier for you to add a custom choice of models though (that is a real draw back).
I have a video coming that shows how to do exactly that. Its actually pretty easy.
For windows users they can use WSL
Win 10 was last for me, I found perfect Linux of PikaOS, an Ubuntu without ubuntu snaps or other crap.
السلام عليكم ورحمة الله وبركاته
يسعدني ان اكتب إليكم راجين منكم المساعدة في كيفية العمل ب ollama في العمل بكتب pdf . ويمكنك ان ترشدني إلى مواقع لأجل ذلك وشكرا جزيلا.
Thanks Sam, very interesting. It's amazing how fast the whole LLM ecosystem is moving.
Its not just about been techincal. Its also about been productive. Do you want to spend your time building something useful or time trying to figure out how a badly maintained and documented piece of software works.
this.
im "technical" and ollama made it easy for me get an llm up and running so i could focus on the actual core of my project.
Great video :) Thanks for sharing. I am experimenting with different models now.
I'd be interested to learn how to build a RAG system or local LLM agent with tools like Ollama, LM Studio, LangChain etc.
Thank you for the introduction!
LLama2 unsencored was quite surprising for me x)
By the way. THANK YOU FOR yOUR VIDEO
Every time i need to use Ollama I'm using your video to be sure of the command "Ollama run" hahah
it looks like the uncensored versions are a lot better. then it is not always giving a paragraph why it can't do what you ask it to do
definitely a great place to start with Ollama "Hello World"
Thanks Sam Great video. These are some of the best videos on AI tools. I need to master this for my work and your approach to communication really works for me. Cheers. Keep up all things Langchain, please.
Nice and to the point video. Appreciate it!
Ollama for windows is out and available for download. I am testing it and it works fabulous but its very slow for me on windows. I don't think its using my nvidia GPU and can't seem to find a way to hook the GPU in under windows. But just got started, I love the fact that it is serving to a local http port as well as command line.
Great video, as usual :) I have been using Ollama on Linux and it has been working great. I know that Ollama can be used via API but I was wondering whether its API is compatible with OpenAI API and can be used as a replacement for OpenAI API inside LangChain. Looking forward to more videos about Ollama. Thank you.
There are dedicated langchain and LlamaIndex connectors for Ollama.
Ollama is different to OpenAI’s.. better IMO
Best video ever. Thank you VERY much for making it, it helped this noob so much! Unfortunately, it says it wont install the uncensored version, blah blah..As a responsible AI language model, I am programmed to follow ethical and
Thank you Sam for posting this video. Very accessible, clearly explained. Question: what I could not see is if ollama enables you to choose the model size. I.e., whether you want llama2 7b, 13b or 70b for example.
3:37
You can, specify with
ollama run llama2-uncensored
Just go to the models page, then click one. It'll tell u the command if you're using cli
Yes you can pick this take a look on the models page.
Fantastic video Sam, thank you! Ollama looks great but the big, 70B models still remain beyond the reach of typical RAM. Do you know of any way, (be it API or other) to get access to Llama70B and be able to run arbitrary tokens on the model? There are some APIs like TogetherAI but they only let you to run the endpoints like /prediction, not much more.
What are the pros and cons of using such "local" Ollama models on Colab Pro with 2 TB of Drive?
Hi. How do I uninstall one of them off my MacBook Pro? I am using it in terminal.
ollama rm llama3
If you just type ollama in the command line you should be able to see all the commands
There are bunch of alike tools (simple to use for non technical ppl). The most prominent is gpt4all. Yeah from the guys who fine tuned first llama back in march/April on their own handcrafted datasets.
These guys from ollama were definitely inspired by docker, based on the syntax and architecture :)
a few of the maintainers were early Docker employees
Excellent intro
when is the windows version coming out? O.o
awesome! I was hoping to use a custom model but didnt fully understand :(
I used the docker container just released and it works in windows.
can you provide the docker container link. from where did you downloaded
Great demo ! Thank you !!
Hi, Sam, good video. my little question is that if you llama2 model ran on your cpu?
Pretty sure it was running on Metal and using the Apple Silicon GPUs. It is certainly a quanitized model though, which helps .
You CAN run any llama.cop tool on CPU… though it is MUCH slower than GPU.
MacOS Metal GPU is surprisingly fast…
Thanks, Sam. Do you know what tricks ollama uses to make it run so smoothly locally?
They are using quantized models and on macOS they are using metal etc.
Ollama uses llama.cpp under the hood… the fastest LLM Inference Engine. Period.
Many other apps also use llama.cpp: Kobold, Oogabooga, etc
Many other apps use Python inside … easier to build much much slower performance.
@@IanScrivener They have the llama.cpp running on Metal on macs right. It feels like it is more than just on cpu etc. honestly haven't looked under the hood much it.
LM Studio for Windows and Mac is a great way to achieve the same with a lot less setup! Also has a great internal model browser which suggests what models might run on your machine.
It can also run as an API with a click in the UI. Definitely been the easiest way for me to test out a load of different LLMs locally, nice user interface with history and markdown support.
Lm studio is a proprietary software. God only knows what else it is doing on your PC - gathering and sending out your data while you are sleeping, mining bitcoins, using your PC as exit node for TOR, keylogging everything you type - you can only guess
Agree, LM Studio is great.
Can be run in OpenAI API mode.. which replicates OpenAI's API format.. and so can be easily used with langchain, LlamaIndex etc
@@IanScrivener Yeah, I'm hoping to get it setup to use the API via LM Studio with Microsoft AutoGen which provides a multi agent work flow with a code Interpreter.
@@AndyAinsworththis is what I want to do also! Have you had any progress and success with this?
Hi, i installed llama 3.2 2 days ago and had no problems at all, but yesterday i got the following message when i try to run it: zsh: command not found: ollama
I reinstalled, it worked fine, and today, i get the same zsh command not found thing. any ideas?
thanks for the video! how can I make ollama run the 13gb tar file i download locally?
Can we give this power to N8n ? Connect our local ollama with our selfhosted N8n ?
Thanks for the video, keep it up
I'm new to this topic and I just binged your videos. How does this compare to vLLM from your previous video? I get ollama is more user-friendly, but I'm more curious about the performance?
vLLM is for more for serving full resolution models in the cloud and Ollama is for doing. vLLM shines when you have some strong GPUs to use etc.
@@samwitteveenaiGot it! Thanks!
# New Software
In the past: It only runs in WIndows, but maybe in a few years it will be available on MacOS and one day but probably never on Linux.
Today: At the moment it supports MacOS and Linux, but apparently Windows support is coming soon as well.
"Its Llama for those who dont have technical skills" .. the PC version is currently only available on Linux... xD
Windows version is now available. ^^
Windows version is out now.
you mention it is local, but where are the logs stored?
I asked it but it says it cannot give me this information.
Hi I am using ollama for past 2months, yes its giving the good results but what i need is it is possible to set the configuration file for ollama like setting the parameters for ollama to get a most accurate results can you make one video about how to set the custom parameters.
If we get these locally in our cloud, is there a best practice to keep them updated?
It's taking more time for me to get responses from local model not as fast as yours. Can you please tell me what processor you are using?
What are minimum hardware requirements to run LLM models to get faster responses?
Can i use this to install ai town..default method was too complex for me
It seems like it's Docker :D Same feeling.. Ollama will captalize one CloudNative Software Engineers
If the used model talks to an api, how it this a local usage?
Id like to know where the prompt data goes to? Will it go to a database and the Model loads it after? Or is the model hosted seperatly in a monitored env?
My basic question is, who gets the data from the input prompt?
the data is only on your machine. It is all running locally. It can run an api on your machine and you can then expose that if you want to use it from somewhere else. If you are just using it on your machine all data stays on your machine.
@@samwitteveenai wow! Thats a complete game changer! Thanks! ill sub, insane content!
super helpful.
Also another question. Do you really run this on a mac mini? If so, how much ram does your machine have?
32gb of RAM
Thank you so much. I am having problem running models downloaded from hugging face having safetensor file. I have these files in oobabooga/text-generation-webui. I have to use this for ollama. I followed everything, even created modelfile with path to safetensor directory, but it is not running >> ollama create model_name -f modelfile. Please help me.
feels like docker
In fact, it was quite easy to install ollama on Windows 10 using Windows Subsystem for Linux (WSL). In a Windows command prompt:
wsl --install -d Ubuntu (this downloads and runs the Ubuntu distribution giving a Linux prompt)
ollama pull llama2:13b (this downloads the selected model)
ollama run llama2:13b (this runs the selected model)
At this point you can white user test that will be sent to the model. This did not work for me, the keyboard input is not correctly directed to the application. This is possibly a compatibility issue with this Linux emulation. But I could fully use the downloaded models from simple Python programs directly or through Langchain.
can u make a video of ollama interaction using voice input. and it replies back like whisper
Interesting Idea!
Any idea how to load a model that is already on my disk?
how to make your own language model? for example I want to take some texts and force AI to use only this text to answer my questions
Let me know when you've got it figured out. I'm curious about this as well.
Does it require VRAM or just regular RAM?
Can you run fine tuned models?
Yes, and your own Loras…😊
what port does the webserver on? can i set that port?
would a i3 11gen with 8 ram and 630uhd graphics card be enough ?
honestly not sure. It will probably run but you may get very slow tokens per second
Please, more. Más por favorrrr
I am sure windows users can probably install it under wsl
I was wondering about this. i asked one of my staff to give it a quick try it but he couldn't get it working.
Would you consider not referring to models like Llama and Mistral as "open-source?" It sets a precedent. "Freeware," maybe?
It's a good question how we should refer to such models. It's not 100% Foss compliant because of the restrictions which come into place if you have like 700 millions of users, if my memory serves me well. But this is more like restriction for a couple of companies like ms, Google, tiktok. Who cares about them? Or am I missing something bigger?
Mistral is completely open
Do not mix Llama and Mistral together. Mistral has a truly open license, Llama is the Facebook/Meta poison.
It's not open-source because you can't reproduce it without the source (training data)... Just making the equivalent of binaries available for commercial use doesn't make something "open-source..."
nice one
thanks
Thanks
My system is very slow when I am running Ollama. My system is mac M2. Is this issue?
depends which model you are trying to run. The video was done on a M2 Mac Mini
@@samwitteveenai ohh thanks for the reply. mine also same Mac air with M2 chip but it was slow. I will check
Interesting!
can i upload and work with documents with ollama?
yes you will need to code it to do a custom RAG
@@samwitteveenai thanks good man but whats custom RAG?
I’m trying to run it in a python environment if possible to build on top of it
I have another vid there on Ollama's Python SDK etc
@@samwitteveenaithanks I’ll check it out
Why this new model
good video :)
windows version is here
I run only locally and cloud services are anyway blocked to our region (quite many people don't have access to such, more than 2 billions, China+dozen others countries, mostly by political non-scentific logic). And hardware allow such, thankfully to China which recycle servers and bringing on market quite a secret chips from Intel, like Xeon 22 cores, which was never released outside enterprise, it costs only 150 bucks. My motherboard Asrock X99 Extreme4 become defacto standard in China for such socket, also 150 bucks, can be filled with 256gb Ram, i've bought mine in 2020th during Gpt2, which was impossible to run locally in it's max size 1558millions, there wasn't any of current tools, i was able to run by terminal 774millions on Gpu and it's was a mess of text.
Chaaarrrliee
oobabooga?
Obama
~ % ollama pull 01-ai/Yi-VL-6B
pulling manifest
Error: pull model manifest: file does not exist
would say oobabooga is still the way to go
Yeah. Or h20 gpt / llm studio
I'm coming from text-generation-webui, how can i use that model folder for ollama?