Ollama - Local Models on your machine

Поділитися
Вставка
  • Опубліковано 14 лис 2024

КОМЕНТАРІ • 126

  • @chautauquatrail
    @chautauquatrail Рік тому +20

    I just did the ollama install yesterday, you are awesome for being able to produce these so quickly.

  • @VincentVonDudler
    @VincentVonDudler Рік тому +45

    Thanks, Ollama.

  • @richardchiodo2200
    @richardchiodo2200 8 місяців тому +2

    I picked up a 12gb 3060 from my local microcenter for a pretty good price, and am now running Ollama with Open WebUI for the frontend, and they have a community repository for prompts and modelfiles. The biggest hurdle was passing the gpu through my proxmox host to my vm.

  • @DanielSpringer
    @DanielSpringer Рік тому +7

    Definitely has a docker vibe. I like it!

  • @AlphaSynapse
    @AlphaSynapse 8 місяців тому +6

    ollama is now available for windows (windows 10 or above)

  • @何骁-q9q
    @何骁-q9q 8 місяців тому +2

    nice video. I managed to create a customized model by watching this.

  • @theh1ve
    @theh1ve Рік тому +8

    Another great flag this Sam. I would be interested to see this running so you can make API calls.. currently using Text Gen Web UI as a server and this looks like it would be a good alternative.

  • @jidun9478
    @jidun9478 Рік тому +4

    Nice, thank you. It runs so much faster than Text Gen Web UI! I wish they'd make it easier for you to add a custom choice of models though (that is a real draw back).

    • @samwitteveenai
      @samwitteveenai  Рік тому

      I have a video coming that shows how to do exactly that. Its actually pretty easy.

  • @MohamedElmardi123
    @MohamedElmardi123 Рік тому +8

    For windows users they can use WSL

    • @fontenbleau
      @fontenbleau Рік тому +3

      Win 10 was last for me, I found perfect Linux of PikaOS, an Ubuntu without ubuntu snaps or other crap.

    • @mohamedbaghdadbrahim1326
      @mohamedbaghdadbrahim1326 2 місяці тому

      السلام عليكم ورحمة الله وبركاته
      يسعدني ان اكتب إليكم راجين منكم المساعدة في كيفية العمل ب ollama في العمل بكتب pdf . ويمكنك ان ترشدني إلى مواقع لأجل ذلك وشكرا جزيلا.

  • @kenchang3456
    @kenchang3456 Рік тому

    Thanks Sam, very interesting. It's amazing how fast the whole LLM ecosystem is moving.

  • @jeffsteyn7174
    @jeffsteyn7174 9 місяців тому +15

    Its not just about been techincal. Its also about been productive. Do you want to spend your time building something useful or time trying to figure out how a badly maintained and documented piece of software works.

    • @kryptodash5480
      @kryptodash5480 2 місяці тому

      this.
      im "technical" and ollama made it easy for me get an llm up and running so i could focus on the actual core of my project.

  • @haxoAI
    @haxoAI Місяць тому

    Great video :) Thanks for sharing. I am experimenting with different models now.

  • @Knowledge_Nuggies
    @Knowledge_Nuggies 8 місяців тому +1

    I'd be interested to learn how to build a RAG system or local LLM agent with tools like Ollama, LM Studio, LangChain etc.

  • @kp_kovilakam
    @kp_kovilakam Рік тому +1

    Thank you for the introduction!

  • @sitrakaforler8696
    @sitrakaforler8696 Рік тому +1

    LLama2 unsencored was quite surprising for me x)
    By the way. THANK YOU FOR yOUR VIDEO
    Every time i need to use Ollama I'm using your video to be sure of the command "Ollama run" hahah

    • @willi1978
      @willi1978 11 місяців тому

      it looks like the uncensored versions are a lot better. then it is not always giving a paragraph why it can't do what you ask it to do

  • @khenghuatlim7264
    @khenghuatlim7264 Місяць тому +1

    definitely a great place to start with Ollama "Hello World"

  • @paulmiller591
    @paulmiller591 10 місяців тому

    Thanks Sam Great video. These are some of the best videos on AI tools. I need to master this for my work and your approach to communication really works for me. Cheers. Keep up all things Langchain, please.

  • @nexuslux
    @nexuslux Рік тому +1

    Nice and to the point video. Appreciate it!

  • @photorealm
    @photorealm 7 місяців тому

    Ollama for windows is out and available for download. I am testing it and it works fabulous but its very slow for me on windows. I don't think its using my nvidia GPU and can't seem to find a way to hook the GPU in under windows. But just got started, I love the fact that it is serving to a local http port as well as command line.

  • @Leonid.Shamis
    @Leonid.Shamis Рік тому +2

    Great video, as usual :) I have been using Ollama on Linux and it has been working great. I know that Ollama can be used via API but I was wondering whether its API is compatible with OpenAI API and can be used as a replacement for OpenAI API inside LangChain. Looking forward to more videos about Ollama. Thank you.

    • @IanScrivener
      @IanScrivener Рік тому +2

      There are dedicated langchain and LlamaIndex connectors for Ollama.
      Ollama is different to OpenAI’s.. better IMO

  • @LisajanaLynn
    @LisajanaLynn 2 місяці тому

    Best video ever. Thank you VERY much for making it, it helped this noob so much! Unfortunately, it says it wont install the uncensored version, blah blah..As a responsible AI language model, I am programmed to follow ethical and

  • @mbottambotta
    @mbottambotta Рік тому +1

    Thank you Sam for posting this video. Very accessible, clearly explained. Question: what I could not see is if ollama enables you to choose the model size. I.e., whether you want llama2 7b, 13b or 70b for example.

    • @brando2818
      @brando2818 Рік тому +2

      3:37
      You can, specify with
      ollama run llama2-uncensored
      Just go to the models page, then click one. It'll tell u the command if you're using cli

    • @samwitteveenai
      @samwitteveenai  Рік тому +2

      Yes you can pick this take a look on the models page.

  • @FreakyStyleytobby
    @FreakyStyleytobby Рік тому +2

    Fantastic video Sam, thank you! Ollama looks great but the big, 70B models still remain beyond the reach of typical RAM. Do you know of any way, (be it API or other) to get access to Llama70B and be able to run arbitrary tokens on the model? There are some APIs like TogetherAI but they only let you to run the endpoints like /prediction, not much more.

  • @attilavass6935
    @attilavass6935 Рік тому +1

    What are the pros and cons of using such "local" Ollama models on Colab Pro with 2 TB of Drive?

  • @BionicAnimations
    @BionicAnimations 6 місяців тому +1

    Hi. How do I uninstall one of them off my MacBook Pro? I am using it in terminal.

    • @samwitteveenai
      @samwitteveenai  6 місяців тому

      ollama rm llama3
      If you just type ollama in the command line you should be able to see all the commands

  • @alx8439
    @alx8439 Рік тому +2

    There are bunch of alike tools (simple to use for non technical ppl). The most prominent is gpt4all. Yeah from the guys who fine tuned first llama back in march/April on their own handcrafted datasets.
    These guys from ollama were definitely inspired by docker, based on the syntax and architecture :)

    • @technovangelist
      @technovangelist Рік тому +2

      a few of the maintainers were early Docker employees

  • @sandrocavali9810
    @sandrocavali9810 8 місяців тому

    Excellent intro

  • @XiOh
    @XiOh Рік тому +2

    when is the windows version coming out? O.o

  • @xdasdaasdasd4787
    @xdasdaasdasd4787 Рік тому

    awesome! I was hoping to use a custom model but didnt fully understand :(

  • @iainattwater1747
    @iainattwater1747 Рік тому

    I used the docker container just released and it works in windows.

    • @VijayChintapandu
      @VijayChintapandu 7 місяців тому

      can you provide the docker container link. from where did you downloaded

  • @sonurocks341
    @sonurocks341 6 місяців тому

    Great demo ! Thank you !!

  • @liji8672
    @liji8672 Рік тому +1

    Hi, Sam, good video. my little question is that if you llama2 model ran on your cpu?

    • @samwitteveenai
      @samwitteveenai  Рік тому +2

      Pretty sure it was running on Metal and using the Apple Silicon GPUs. It is certainly a quanitized model though, which helps .

    • @IanScrivener
      @IanScrivener Рік тому +1

      You CAN run any llama.cop tool on CPU… though it is MUCH slower than GPU.
      MacOS Metal GPU is surprisingly fast…

  • @guanjwcn
    @guanjwcn Рік тому +1

    Thanks, Sam. Do you know what tricks ollama uses to make it run so smoothly locally?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      They are using quantized models and on macOS they are using metal etc.

    • @IanScrivener
      @IanScrivener Рік тому +2

      Ollama uses llama.cpp under the hood… the fastest LLM Inference Engine. Period.
      Many other apps also use llama.cpp: Kobold, Oogabooga, etc
      Many other apps use Python inside … easier to build much much slower performance.

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      @@IanScrivener They have the llama.cpp running on Metal on macs right. It feels like it is more than just on cpu etc. honestly haven't looked under the hood much it.

  • @AndyAinsworth
    @AndyAinsworth Рік тому +3

    LM Studio for Windows and Mac is a great way to achieve the same with a lot less setup! Also has a great internal model browser which suggests what models might run on your machine.

    • @AndyAinsworth
      @AndyAinsworth Рік тому +1

      It can also run as an API with a click in the UI. Definitely been the easiest way for me to test out a load of different LLMs locally, nice user interface with history and markdown support.

    • @alx8439
      @alx8439 Рік тому

      Lm studio is a proprietary software. God only knows what else it is doing on your PC - gathering and sending out your data while you are sleeping, mining bitcoins, using your PC as exit node for TOR, keylogging everything you type - you can only guess

    • @IanScrivener
      @IanScrivener Рік тому +1

      Agree, LM Studio is great.
      Can be run in OpenAI API mode.. which replicates OpenAI's API format.. and so can be easily used with langchain, LlamaIndex etc

    • @AndyAinsworth
      @AndyAinsworth Рік тому +2

      @@IanScrivener Yeah, I'm hoping to get it setup to use the API via LM Studio with Microsoft AutoGen which provides a multi agent work flow with a code Interpreter.

    • @scitechtalktv9742
      @scitechtalktv9742 Рік тому

      @@AndyAinsworththis is what I want to do also! Have you had any progress and success with this?

  • @PedroLourenco-v4x
    @PedroLourenco-v4x 3 дні тому

    Hi, i installed llama 3.2 2 days ago and had no problems at all, but yesterday i got the following message when i try to run it: zsh: command not found: ollama
    I reinstalled, it worked fine, and today, i get the same zsh command not found thing. any ideas?

  • @马爷-m8i
    @马爷-m8i 8 місяців тому

    thanks for the video! how can I make ollama run the 13gb tar file i download locally?

  • @GrecoFPV
    @GrecoFPV 7 місяців тому

    Can we give this power to N8n ? Connect our local ollama with our selfhosted N8n ?

  • @riflebird4842
    @riflebird4842 Рік тому

    Thanks for the video, keep it up

  • @Shawn-lk2ze
    @Shawn-lk2ze 11 місяців тому

    I'm new to this topic and I just binged your videos. How does this compare to vLLM from your previous video? I get ollama is more user-friendly, but I'm more curious about the performance?

    • @samwitteveenai
      @samwitteveenai  11 місяців тому +1

      vLLM is for more for serving full resolution models in the cloud and Ollama is for doing. vLLM shines when you have some strong GPUs to use etc.

    • @Shawn-lk2ze
      @Shawn-lk2ze 11 місяців тому

      @@samwitteveenaiGot it! Thanks!

  • @NoidoDev
    @NoidoDev Рік тому +1

    # New Software
    In the past: It only runs in WIndows, but maybe in a few years it will be available on MacOS and one day but probably never on Linux.
    Today: At the moment it supports MacOS and Linux, but apparently Windows support is coming soon as well.

  • @wendten2
    @wendten2 Рік тому +1

    "Its Llama for those who dont have technical skills" .. the PC version is currently only available on Linux... xD

  • @vrstary
    @vrstary 2 місяці тому

    Windows version is now available. ^^

  • @NabinAndNajira
    @NabinAndNajira 2 місяці тому

    Windows version is out now.

  • @joewooks3935
    @joewooks3935 3 місяці тому

    you mention it is local, but where are the logs stored?
    I asked it but it says it cannot give me this information.

  • @PrasannaVenkatesh-j3j
    @PrasannaVenkatesh-j3j 11 місяців тому

    Hi I am using ollama for past 2months, yes its giving the good results but what i need is it is possible to set the configuration file for ollama like setting the parameters for ollama to get a most accurate results can you make one video about how to set the custom parameters.

  • @morespinach9832
    @morespinach9832 8 місяців тому

    If we get these locally in our cloud, is there a best practice to keep them updated?

  • @nelavallisivasai8740
    @nelavallisivasai8740 8 місяців тому

    It's taking more time for me to get responses from local model not as fast as yours. Can you please tell me what processor you are using?
    What are minimum hardware requirements to run LLM models to get faster responses?

  • @Thelgren00
    @Thelgren00 5 місяців тому

    Can i use this to install ai town..default method was too complex for me

  • @MarcellodeSales
    @MarcellodeSales Рік тому

    It seems like it's Docker :D Same feeling.. Ollama will captalize one CloudNative Software Engineers

  • @samyio4256
    @samyio4256 9 місяців тому

    If the used model talks to an api, how it this a local usage?
    Id like to know where the prompt data goes to? Will it go to a database and the Model loads it after? Or is the model hosted seperatly in a monitored env?
    My basic question is, who gets the data from the input prompt?

    • @samwitteveenai
      @samwitteveenai  9 місяців тому

      the data is only on your machine. It is all running locally. It can run an api on your machine and you can then expose that if you want to use it from somewhere else. If you are just using it on your machine all data stays on your machine.

    • @samyio4256
      @samyio4256 9 місяців тому

      @@samwitteveenai wow! Thats a complete game changer! Thanks! ill sub, insane content!

  • @nembalage
    @nembalage 11 місяців тому

    super helpful.

  • @samyio4256
    @samyio4256 9 місяців тому

    Also another question. Do you really run this on a mac mini? If so, how much ram does your machine have?

  • @parthwagh3607
    @parthwagh3607 4 місяці тому

    Thank you so much. I am having problem running models downloaded from hugging face having safetensor file. I have these files in oobabooga/text-generation-webui. I have to use this for ollama. I followed everything, even created modelfile with path to safetensor directory, but it is not running >> ollama create model_name -f modelfile. Please help me.

  • @abhijitkadalli6435
    @abhijitkadalli6435 Рік тому +2

    feels like docker

  • @ghrasko
    @ghrasko 11 місяців тому +1

    In fact, it was quite easy to install ollama on Windows 10 using Windows Subsystem for Linux (WSL). In a Windows command prompt:
    wsl --install -d Ubuntu (this downloads and runs the Ubuntu distribution giving a Linux prompt)
    ollama pull llama2:13b (this downloads the selected model)
    ollama run llama2:13b (this runs the selected model)
    At this point you can white user test that will be sent to the model. This did not work for me, the keyboard input is not correctly directed to the application. This is possibly a compatibility issue with this Linux emulation. But I could fully use the downloaded models from simple Python programs directly or through Langchain.

  • @foolcj9999
    @foolcj9999 8 місяців тому

    can u make a video of ollama interaction using voice input. and it replies back like whisper

  • @pensiveintrovert4318
    @pensiveintrovert4318 11 місяців тому

    Any idea how to load a model that is already on my disk?

  • @abobunus
    @abobunus Рік тому

    how to make your own language model? for example I want to take some texts and force AI to use only this text to answer my questions

    • @volkanazer9997
      @volkanazer9997 8 місяців тому

      Let me know when you've got it figured out. I'm curious about this as well.

  • @ronelgem23
    @ronelgem23 9 місяців тому

    Does it require VRAM or just regular RAM?

  • @Ryan-yj4sd
    @Ryan-yj4sd Рік тому +1

    Can you run fine tuned models?

  • @Gerald-iz7mv
    @Gerald-iz7mv 7 місяців тому

    what port does the webserver on? can i set that port?

  • @merselfares8965
    @merselfares8965 8 місяців тому

    would a i3 11gen with 8 ram and 630uhd graphics card be enough ?

    • @samwitteveenai
      @samwitteveenai  8 місяців тому

      honestly not sure. It will probably run but you may get very slow tokens per second

  • @franciscojlobaton
    @franciscojlobaton Рік тому

    Please, more. Más por favorrrr

  • @kevinehsani3358
    @kevinehsani3358 Рік тому

    I am sure windows users can probably install it under wsl

    • @samwitteveenai
      @samwitteveenai  Рік тому

      I was wondering about this. i asked one of my staff to give it a quick try it but he couldn't get it working.

  • @DaeOh
    @DaeOh Рік тому +2

    Would you consider not referring to models like Llama and Mistral as "open-source?" It sets a precedent. "Freeware," maybe?

    • @alx8439
      @alx8439 Рік тому

      It's a good question how we should refer to such models. It's not 100% Foss compliant because of the restrictions which come into place if you have like 700 millions of users, if my memory serves me well. But this is more like restriction for a couple of companies like ms, Google, tiktok. Who cares about them? Or am I missing something bigger?

    • @spirobel2.0
      @spirobel2.0 Рік тому +2

      Mistral is completely open

    • @clray123
      @clray123 Рік тому +1

      Do not mix Llama and Mistral together. Mistral has a truly open license, Llama is the Facebook/Meta poison.

    • @DaeOh
      @DaeOh Рік тому

      It's not open-source because you can't reproduce it without the source (training data)... Just making the equivalent of binaries available for commercial use doesn't make something "open-source..."

  • @twobob
    @twobob Рік тому

    nice one
    thanks

  • @VaibhavPatil-rx7pc
    @VaibhavPatil-rx7pc Рік тому

    Thanks

  • @VijayChintapandu
    @VijayChintapandu 7 місяців тому

    My system is very slow when I am running Ollama. My system is mac M2. Is this issue?

    • @samwitteveenai
      @samwitteveenai  7 місяців тому

      depends which model you are trying to run. The video was done on a M2 Mac Mini

    • @SharePointMaster
      @SharePointMaster 7 місяців тому

      @@samwitteveenai ohh thanks for the reply. mine also same Mac air with M2 chip but it was slow. I will check

  • @tusharbokade8378
    @tusharbokade8378 Рік тому

    Interesting!

  • @stanTrX
    @stanTrX 7 місяців тому

    can i upload and work with documents with ollama?

    • @samwitteveenai
      @samwitteveenai  7 місяців тому

      yes you will need to code it to do a custom RAG

    • @stanTrX
      @stanTrX 7 місяців тому

      @@samwitteveenai thanks good man but whats custom RAG?

  • @HitopFaded
    @HitopFaded 8 місяців тому

    I’m trying to run it in a python environment if possible to build on top of it

    • @samwitteveenai
      @samwitteveenai  7 місяців тому +1

      I have another vid there on Ollama's Python SDK etc

    • @HitopFaded
      @HitopFaded 7 місяців тому

      @@samwitteveenaithanks I’ll check it out

  • @kunalr_ai
    @kunalr_ai Рік тому

    Why this new model

  • @zacharymacaroni7649
    @zacharymacaroni7649 6 місяців тому

    good video :)

  • @LITTLEFREDOX2
    @LITTLEFREDOX2 8 місяців тому

    windows version is here

  • @fontenbleau
    @fontenbleau Рік тому +6

    I run only locally and cloud services are anyway blocked to our region (quite many people don't have access to such, more than 2 billions, China+dozen others countries, mostly by political non-scentific logic). And hardware allow such, thankfully to China which recycle servers and bringing on market quite a secret chips from Intel, like Xeon 22 cores, which was never released outside enterprise, it costs only 150 bucks. My motherboard Asrock X99 Extreme4 become defacto standard in China for such socket, also 150 bucks, can be filled with 256gb Ram, i've bought mine in 2020th during Gpt2, which was impossible to run locally in it's max size 1558millions, there wasn't any of current tools, i was able to run by terminal 774millions on Gpu and it's was a mess of text.

  • @GenericWebMarketingChannel
    @GenericWebMarketingChannel 3 місяці тому

    Chaaarrrliee

  • @Timur-u5z
    @Timur-u5z 4 місяці тому

    oobabooga?

  • @anispinner
    @anispinner Рік тому +1

    Obama

  • @antonpictures
    @antonpictures 9 місяців тому

    ~ % ollama pull 01-ai/Yi-VL-6B
    pulling manifest
    Error: pull model manifest: file does not exist

  • @astronosmage3722
    @astronosmage3722 Рік тому +3

    would say oobabooga is still the way to go

    • @alx8439
      @alx8439 Рік тому +1

      Yeah. Or h20 gpt / llm studio

  • @rookandpawn
    @rookandpawn 5 місяців тому

    I'm coming from text-generation-webui, how can i use that model folder for ollama?