Let's build a RAG system - The Ollama Course

Optimize Your AI Models

Linus Torvalds: Speaks on Hype and the Future of AI

ПРИКОЛЫ НАД БРАТОМ #shorts

👆🏻Жми на «МЫ поехали в Питер…» и смотри 1 из 48 видео про мою жизнь

Сестра не поделила надувной матрас с братом..🤦‍♂️🪡⚓️

Can the Ollama API be slower than the CLI

Matt Williams

Переглядів 2 051

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 10 вер 2024

КОМЕНТАРІ • 15

@eldaria 9 днів тому
I came to your video searching for this, and it makes so much sense, thanks.
@albertbozesan 24 дні тому
This is such a specific question, I never thought I'd get such a good video about it! Thanks!
@vexy1987 23 дні тому
I have been having this "issue" with open webui. Thanks for clearing that up, I thought I had something setup incorrectly.
@fabriai 26 днів тому
As usual, wonderful video, Matt. This is exactly the kind of question that I make to myself when learning ollama. And here’s the answer on a silver tray.
@SlykeThePhoxenix 26 днів тому ⁺¹
So I made a Discord bot that streams from the Ollama API to Discord. Discord has an API rate limit of like 3 requests per second, so I had to buffer all the stream payloads in memory and dump them in chunks to Discord. I did this in NodeRed if anyone wants the code. I had to implement the HTTP client again from the TCP layer to support streaming, but have wrapped it up nicely into a single function node. I should also mention that it supports multiple conversations concurrently (without mixing up the streams).
@technovangelist 26 днів тому
If you are getting rate limited at 3/s your code is probably doing something wrong. Discord allows 50 requests/sec. I guess unless you have lots of bots.
@SlykeThePhoxenix 26 днів тому ⁺¹
@@technovangelist This is the only bot I have on my test server, lol. It's either discord or the nodered plugin for Discord. It's definitely around 3/s. It's possible it's because it's an unapproved bot that I just use for testing and the rate limit is lifted when your bot is approved (This could be to help prevent abuse).
@twilkpsu 26 днів тому
Great educational content. Bravo! 🎉🎉🎉
@YeryBytes 25 днів тому
Can you explain why the Windows Executable is significantly slower than when running in WSL? I also found that when running Ollama on Docker with WSL2 backend is faster than just running in WSL. Why!?
@technovangelist 25 днів тому
Running on windows native in most cases is 10-15% faster than using wsl. If it’s not there is something wrong with the install
@UnwalledGarden 26 днів тому
Keep up the great myth busting.
@romulopontual6254 27 днів тому
When accessing Ollama via the API, can we set keep alive to forever? If yes would it prevent the API from later switching models?
@technovangelist 27 днів тому ⁺⁶
You can set it to -1 which will keep it in memory until you run out and change models.
@yuvrajkukreja9727 10 днів тому
any samel code ???
@technovangelist 9 днів тому
Not for this one. Didn't make sense

Наступне

Автоматичне відтворення

Let's build a RAG system - The Ollama Course

Let's build a RAG system - The Ollama Course

Optimize Your AI Models

Optimize Your AI Models

Linus Torvalds: Speaks on Hype and the Future of AI

Linus Torvalds: Speaks on Hype and the Future of AI

ПРИКОЛЫ НАД БРАТОМ #shorts

ПРИКОЛЫ НАД БРАТОМ #shorts

👆🏻Жми на «МЫ поехали в Питер…» и смотри 1 из 48 видео про мою жизнь

👆🏻Жми на «МЫ поехали в Питер…» и смотри 1 из 48 видео про мою жизнь

Сестра не поделила надувной матрас с братом..🤦‍♂️🪡⚓️

Сестра не поделила надувной матрас с братом..🤦‍♂️🪡⚓️

В наш дом проникли неизвестные средь бела дня..😱🥷🏡

В наш дом проникли неизвестные средь бела дня..😱🥷🏡

I forced EVERYONE to use Linux

I forced EVERYONE to use Linux

An Optimization That Is Impossible In Rust

An Optimization That Is Impossible In Rust

Why Are Open Source Alternatives So Bad?

Why Are Open Source Alternatives So Bad?

Comparing Quantizations of the Same Model - Ollama Course

Comparing Quantizations of the Same Model - Ollama Course

40 APIs Every Developer Should Use (in 12 minutes)

40 APIs Every Developer Should Use (in 12 minutes)

Unlocking The Power Of AI: Creating Python Apps With Ollama!

Unlocking The Power Of AI: Creating Python Apps With Ollama!

The Most Legendary Programmers Of All Time

The Most Legendary Programmers Of All Time

Setting up a production ready VPS is a lot easier than I thought.

Setting up a production ready VPS is a lot easier than I thought.

Fine Tune a model with MLX for Ollama

Fine Tune a model with MLX for Ollama

Ого😳 #люксфм #новинишоубізнесу #ністиданісовісті #залужний

Ого😳 #люксфм #новинишоубізнесу #ністиданісовісті #залужний

ДНРівці та ЛНРівці найбільше знущалися над полоненими азовцями

ДНРівці та ЛНРівці найбільше знущалися над полоненими азовцями

ЖІНОЧИЙ ЛІКАР. НОВЕ ЖИТТЯ. Сезон 2. Серія 10. Драма. Мелодрама. Серіал про Лікарів.

ЖІНОЧИЙ ЛІКАР. НОВЕ ЖИТТЯ. Сезон 2. Серія 10. Драма. Мелодрама. Серіал про Лікарів.

Никогда не Спасай АДМИНА на Сервере и Вот Почему... #майнкрафт

Никогда не Спасай АДМИНА на Сервере и Вот Почему... #майнкрафт

БЕЛКА РОЖАЕТ?#cat

БЕЛКА РОЖАЕТ?#cat

⚡️путін у Монголії: що ЗАГРОЖУЄ країні за ігнорування ордера на арешт

⚡️путін у Монголії: що ЗАГРОЖУЄ країні за ігнорування ордера на арешт

Кінець РФ близько ❗️ Власна балістична ракета України

Кінець РФ близько ❗️ Власна балістична ракета України