Run Llama 3.1 405B with Ollama on RunPod (Local and Open Web UI)

Поділитися
Вставка
  • Опубліковано 15 вер 2024
  • In this tutorial video, I demonstrate how to run the Llama 3.1 405B large language model on RunPod using Ollama. I cover the entire process, from setting up your environment to accessing the model locally in the terminal and using the Ollama Open Web UI on localhost:3000.
    Make sure to like, comment, and subscribe for more tutorials and updates on the latest in GenAI.
    Other Related Videos:
    1. • Deploy LLMs using Serv...
    2. Open WebUI Locally: • Private LLM Inference:...
    Join Discord: / discord
    Join this channel to get access to perks:
    / @aianytime
    To further support the channel, you can contribute via the following methods:
    Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
    UPI: sonu1000raw@ybl
    #meta #ai #llm

КОМЕНТАРІ • 14

  • @RedCloudServices
    @RedCloudServices 2 дні тому

    Runpod have a openwebui and ollama template and it now supports both openai compliant LLM api endpoints and non-compliant LLM api endpoints 😊 as well as the ones loaded on your runpod. I am trying only 1 4090X at .22 per hour hope it works well. Can you link a custom domain with your openwebui?

  • @muhammedajmalg6426
    @muhammedajmalg6426 Місяць тому

    it's a great video, thanks for sharing

  • @unclecode
    @unclecode Місяць тому +3

    Why should you run Docker on your local machine and set Ollama host to your runpod, when you can run the same Open Web UI on the runpod and simply expose its port? The Open Web UI is just a simple app, and I don't see the benefit to have that locally when the main component, the model, is on cloud. And If you prefer not to use runpod as your base API server, you don't even need to expose port 11434. May I know the reason?

    • @Larimuss
      @Larimuss Місяць тому +1

      Data privacy concerns, such as testing apps on business data. It's 100% free, aside from the electricity costs. Especially when you are learning of experimenting with lora that takes hours to run its a very cost effective solution that's offers every single thing runpod does but better except 1. Expensive insane gpu stacks. That's all you are paying for.. if your job can run on a 12gb vram local gpu you should be running it on yojr 12gb vram local gpu.
      These machines costs like $25 a week to run on the cheap end. I guess. $10 if you use it 5 hours a day. That's still a lot to spend to experiment.

    • @unclecode
      @unclecode Місяць тому +2

      @@Larimuss Ok, I see your point. It makes sense when experimenting and playing around, as it definitely saves costs and allows for flexibility with trial and error. Thanks for the explanation.

    • @Larimuss
      @Larimuss Місяць тому +1

      @unclecode yeh, it's pretty useless for production. But a good, I guess, home test lab that you can try things on and much cheaper to learn on for me. I only use runpod for large model tests and some LORA that requires more power. Then azure ans aws would be ideal for production.

    • @unclecode
      @unclecode Місяць тому

      @@Larimuss agree, the thing about AWS is when a company already has its major servers running in aws why run the LLMs backend somewhere else?! Same clusters, same security groups and …

  • @karthikb.s.k.4486
    @karthikb.s.k.4486 Місяць тому +1

    Nice tutorial. May I know what configuration of laptop is recommended for the LLM. May I know your system configuration please to run in local machine.

    • @TJ-hs1qm
      @TJ-hs1qm Місяць тому +2

      a laptop that can host a 405B model doesn't exist 🤨
      The best you can do afaik is a MBP M3 with 64 shared memory to run llama 3.1 70B. Or invest time and money for a desktop 2x 3090 RTX machine.
      A 4090RTX mobile variant comes with only 16 GB VRAM. For 405B you need even 250 GB shared memory.
      I own a 16 GB M1 with llama 3.1 8B with CodeGPT plugin for Intellij/Pycharm. But ChatGPT 4o mini is fast and cheap so I use that mostly.
      That's why we have to pay companies like RunPod for the big models.

  • @user-ry7lz9kf1f
    @user-ry7lz9kf1f Місяць тому

    hello i have run this but have some error, ollama run llama3.1:405b
    Error: timed out waiting for llama runner to start - progress 0.00 -

  • @taras5942
    @taras5942 Місяць тому

    Any information about consumption of GPU VRAM within described method?
    ndivia-smi?

  • @mvdiogo
    @mvdiogo Місяць тому

    how much memory it consume? nvtop?

  • @Raj-df6us
    @Raj-df6us Місяць тому

    just curious-what are the specs of your PC?

  • @smartinsilicon
    @smartinsilicon Місяць тому

    I need an AI to mask out that Manchester U flag