Let's run Llama 3.1 8B Model (Different Ways)

host ALL your AI locally

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

БЕРЕМЕННА В 16 ► Репер АЛЬФОНС и мама АЛКАШКА

Люди в Курській області просять українську армію захистити їх від російської. ЕКСКЛЮЗИВ ТСН.Тижня

БЕЛКА РОЖАЕТ? #cat

Run Llama 3.1 405B with Ollama on RunPod (Local and Open Web UI)

AI Anytime

Переглядів 3 784

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 15 вер 2024
In this tutorial video, I demonstrate how to run the Llama 3.1 405B large language model on RunPod using Ollama. I cover the entire process, from setting up your environment to accessing the model locally in the terminal and using the Ollama Open Web UI on localhost:3000.
Make sure to like, comment, and subscribe for more tutorials and updates on the latest in GenAI.
Other Related Videos:
1. • Deploy LLMs using Serv...
2. Open WebUI Locally: • Private LLM Inference:...
Join Discord: / discord
Join this channel to get access to perks:
/ @aianytime
To further support the channel, you can contribute via the following methods:
Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
UPI: sonu1000raw@ybl
#meta #ai #llm

КОМЕНТАРІ • 14

@RedCloudServices 2 дні тому
Runpod have a openwebui and ollama template and it now supports both openai compliant LLM api endpoints and non-compliant LLM api endpoints 😊 as well as the ones loaded on your runpod. I am trying only 1 4090X at .22 per hour hope it works well. Can you link a custom domain with your openwebui?
@muhammedajmalg6426 Місяць тому
it's a great video, thanks for sharing
@unclecode Місяць тому ⁺³
Why should you run Docker on your local machine and set Ollama host to your runpod, when you can run the same Open Web UI on the runpod and simply expose its port? The Open Web UI is just a simple app, and I don't see the benefit to have that locally when the main component, the model, is on cloud. And If you prefer not to use runpod as your base API server, you don't even need to expose port 11434. May I know the reason?
@Larimuss Місяць тому ⁺¹
Data privacy concerns, such as testing apps on business data. It's 100% free, aside from the electricity costs. Especially when you are learning of experimenting with lora that takes hours to run its a very cost effective solution that's offers every single thing runpod does but better except 1. Expensive insane gpu stacks. That's all you are paying for.. if your job can run on a 12gb vram local gpu you should be running it on yojr 12gb vram local gpu.
These machines costs like $25 a week to run on the cheap end. I guess. $10 if you use it 5 hours a day. That's still a lot to spend to experiment.
@unclecode Місяць тому ⁺²
@@Larimuss Ok, I see your point. It makes sense when experimenting and playing around, as it definitely saves costs and allows for flexibility with trial and error. Thanks for the explanation.
@Larimuss Місяць тому ⁺¹
@unclecode yeh, it's pretty useless for production. But a good, I guess, home test lab that you can try things on and much cheaper to learn on for me. I only use runpod for large model tests and some LORA that requires more power. Then azure ans aws would be ideal for production.
@unclecode Місяць тому
@@Larimuss agree, the thing about AWS is when a company already has its major servers running in aws why run the LLMs backend somewhere else?! Same clusters, same security groups and …
@karthikb.s.k.4486 Місяць тому ⁺¹
Nice tutorial. May I know what configuration of laptop is recommended for the LLM. May I know your system configuration please to run in local machine.
@TJ-hs1qm Місяць тому ⁺²
a laptop that can host a 405B model doesn't exist 🤨
The best you can do afaik is a MBP M3 with 64 shared memory to run llama 3.1 70B. Or invest time and money for a desktop 2x 3090 RTX machine.
A 4090RTX mobile variant comes with only 16 GB VRAM. For 405B you need even 250 GB shared memory.
I own a 16 GB M1 with llama 3.1 8B with CodeGPT plugin for Intellij/Pycharm. But ChatGPT 4o mini is fast and cheap so I use that mostly.
That's why we have to pay companies like RunPod for the big models.
@user-ry7lz9kf1f Місяць тому
hello i have run this but have some error, ollama run llama3.1:405b
Error: timed out waiting for llama runner to start - progress 0.00 -
@taras5942 Місяць тому
Any information about consumption of GPU VRAM within described method?
ndivia-smi?
@mvdiogo Місяць тому
how much memory it consume? nvtop?
@Raj-df6us Місяць тому
just curious-what are the specs of your PC?
@smartinsilicon Місяць тому
I need an AI to mask out that Manchester U flag

Наступне

Автоматичне відтворення

Let's run Llama 3.1 8B Model (Different Ways)

Let's run Llama 3.1 8B Model (Different Ways)

host ALL your AI locally

host ALL your AI locally

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

БЕРЕМЕННА В 16 ► Репер АЛЬФОНС и мама АЛКАШКА

БЕРЕМЕННА В 16 ► Репер АЛЬФОНС и мама АЛКАШКА

Люди в Курській області просять українську армію захистити їх від російської. ЕКСКЛЮЗИВ ТСН.Тижня

Люди в Курській області просять українську армію захистити їх від російської. ЕКСКЛЮЗИВ ТСН.Тижня

БЕЛКА РОЖАЕТ? #cat

БЕЛКА РОЖАЕТ? #cat

😲 Гаишник шокировал водителя Мерседеса такими новостями! | Новостничок

😲 Гаишник шокировал водителя Мерседеса такими новостями! | Новостничок

OpenAI Releases GPT Strawberry 🍓 Intelligence Explosion!

OpenAI Releases GPT Strawberry 🍓 Intelligence Explosion!

Using Llama-3.1-405B as a Coding Assistant with Continue.Dev, Ollama, and NVIDIA GH200 Superchip

Using Llama-3.1-405B as a Coding Assistant with Continue.Dev, Ollama, and NVIDIA GH200 Superchip

Run your own AI (but private)

Run your own AI (but private)

Build Anything with Llama 3.1 Agents, Here’s How

Build Anything with Llama 3.1 Agents, Here’s How

FREE Local LLMs on Apple Silicon | FAST!

FREE Local LLMs on Apple Silicon | FAST!

Ollama UI - Your NEW Go-To Local LLM

Ollama UI - Your NEW Go-To Local LLM

Linus Torvalds: Speaks on Hype and the Future of AI

Linus Torvalds: Speaks on Hype and the Future of AI

Hacking Windows TrustedInstaller (GOD MODE)

Hacking Windows TrustedInstaller (GOD MODE)

Mark Zuckerberg on Llama 3.1, Open Source, AI Agents, Safety, and more

Mark Zuckerberg on Llama 3.1, Open Source, AI Agents, Safety, and more

Самое неинтересное видео

Самое неинтересное видео

Хто зверху? 2024 - Випуск 1 від 05.09.2024 | ПРЕМ'ЄРА

Хто зверху? 2024 – Випуск 1 від 05.09.2024 | ПРЕМ'ЄРА

БЕЛКА РОЖАЕТ? #cat

БЕЛКА РОЖАЕТ? #cat

when you have plan B 😂

when you have plan B 😂

ЖІНОЧИЙ ЛІКАР. НОВЕ ЖИТТЯ. Сезон 2. Серія 14. Драма. Мелодрама. Серіал про Лікарів.

ЖІНОЧИЙ ЛІКАР. НОВЕ ЖИТТЯ. Сезон 2. Серія 14. Драма. Мелодрама. Серіал про Лікарів.

Остановили аттракцион из-за дочки!

Остановили аттракцион из-за дочки!

Прийняла ваду синочка | #ЯСоромлюсьСвогоТіла #медицина #здоровя

Прийняла ваду синочка | #ЯСоромлюсьСвогоТіла #медицина #здоровя

😲 Гаишник шокировал водителя Мерседеса такими новостями! | Новостничок

😲 Гаишник шокировал водителя Мерседеса такими новостями! | Новостничок