Try this Before RAG. This New Approach Could Save You Thousands!

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

Transformers (how LLMs work) explained visually | DL5

Сестра обхитрила!

СКОЛЬКО ИХ...?! #Shorts #Глент

«Просив пробачення, що не уберіг Діму» - історія братів Василя Репчука і Дмитра Мурару #shorts

The Hidden Cost of Embeddings in RAG and how to Fix it

Prompt Engineering

Переглядів 9 117

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 13 січ 2025

КОМЕНТАРІ • 22

@greendsnow 4 місяці тому ⁺⁹
Use Embedding-3-small + Qdrant Quantization for saving storage costs.
@BadBite 4 місяці тому ⁺³
Pretty good! very useful as I never thought about the long term wallet bleeding
@jirikosek3714 4 місяці тому ⁺²
Just a question, I see you are mentioning e.g. the AWS X2gd EC2 instance. So if I understand correctly you want to keep all the vectors in memory. Isn't it better to just use a storage solution for this instead if the database is massive? E.g. Amazon OpenSearch Service. Storage should be cheap...
@uwegenosdude 4 місяці тому ⁺¹
Thank you very much ! This is very good to know if our app is becoming bigger.
@engineerprompt 4 місяці тому
Yes, something to keep in mind.
@unclecode 4 місяці тому
Very interesting and important points you raised. I’ve seen startups completely unaware of this and, as a result, they're doomed. Many don’t even use features like OpenAI’s dimension reduction. This binary and quantization has been around since March and is incredibly powerful. Now, with Gemini's support for PDF and long context windows, freeing up to a billion tokens in a day, it raises questions about when to use embedding and RAG, and when not to. When necessary, combining this with a long context window seems like the perfect solution. I suggest you create a video showing how to use this with Gemini to fetch and cache context, which will deliver the best balance of performance and cost.
@engineerprompt 4 місяці тому ⁺¹
I am also noticing the same, there are some great tools which needs to be every production pipeline but folks are not aware. Funny thing, I put together a video on the topic you suggested. Combining Gemini's PDF capabilities with context caching. Will be releasing it tomorrow. This is very powerful and definitely needs to be an options for developers in any retrieval task.
@unclecode 4 місяці тому
@@engineerprompt Looking forward to that, my friend. Your ability to create educational and practical content is your superpower! This embedding video should definitely be added to your course, and you should dive deep into the details. Just this alone is enough to convince a developer to take your course! I had a couple of interviews for an AI engineer position last week, and I asked them all, "Have you seen or followed the engineerprompt channel?" To motivate you, 2 out of 5 said yes, and no surprise, their answers were better than those who hadn't seen it. So as usual, I stay tuned for your next video.
@harshilpatel4989 4 місяці тому ⁺²
Please make a video on hybrid search using the BM25 algorithm.
@xuantungnguyen9719 4 місяці тому
Your channel is so insightful
@aibeginnertutorials 4 місяці тому
Brilliant and extremely useful and relevant information as usual. Thanks!
@engineerprompt 4 місяці тому
thank you!
@messam1981 4 місяці тому
Thank you, waiting for real tutorial for production RAG app
@MeinDeutschkurs 4 місяці тому ⁺¹
That’s great! Yes, please create a video with a useful example. I‘d appreciate it! 🎉🎉
@abdulrehmanbaber2104 4 місяці тому
Very helpful
One question, can you explain the difference between this word quantization used with embedding model (here) and use of quantization when doing inference or fine-tuning!?
@engineerprompt 4 місяці тому
Both are used in the same context. For inference, its used for quantization of the weights (numerical value) of the model (LLM). That will help you reduce the memory (RAM) needed when you load the model for inference. In the case of embeddings, we are talking about the output of the model (again numerical value) which needs to be stored somewhere (usually vectorstore). You want to quantize them to reduce storage cost.
@sashirestela8572 4 місяці тому
Thanks for your very useful information.
@kordou 4 місяці тому
very nice video. Thanks
@hsin-yusu9094 4 місяці тому
Yes, this is exactly what I'm looking for
@mattshelley6541 4 місяці тому
Using QDrant on our servers, RAM will be our largest expense to maintain the database as it grows.
@themax2go 4 місяці тому
what about sci-phi triplex ?
@Suro_One 4 місяці тому
Thanks!

Наступне

Автоматичне відтворення

Try this Before RAG. This New Approach Could Save You Thousands!

Try this Before RAG. This New Approach Could Save You Thousands!

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

Сестра обхитрила!

Сестра обхитрила!

СКОЛЬКО ИХ...?! #Shorts #Глент

СКОЛЬКО ИХ...?! #Shorts #Глент

«Просив пробачення, що не уберіг Діму» - історія братів Василя Репчука і Дмитра Мурару #shorts

«Просив пробачення, що не уберіг Діму» — історія братів Василя Репчука і Дмитра Мурару #shorts

Перший наступ КНДРівців

Перший наступ КНДРівців

OpenAI Embeddings and Vector Databases Crash Course

OpenAI Embeddings and Vector Databases Crash Course

Every Developer Needs a Raspberry Pi

Every Developer Needs a Raspberry Pi

Supercharge Your RAG with Contextualized Late Interactions

Supercharge Your RAG with Contextualized Late Interactions

LightRAG: A More Efficient Solution than GraphRAG for RAG Systems?

LightRAG: A More Efficient Solution than GraphRAG for RAG Systems?

Run your own AI (but private)

Run your own AI (but private)

Agentic RAG: Make Chatting with Docs Smarter

Agentic RAG: Make Chatting with Docs Smarter

Stop Losing Context! How Late Chunking Can Enhance Your Retrieval Systems

Stop Losing Context! How Late Chunking Can Enhance Your Retrieval Systems

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Mind-Blowing Humanoid Robot Walked Outside (The Internet Exploded)

Mind-Blowing Humanoid Robot Walked Outside (The Internet Exploded)

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Lp. Сердце Вселенной #60 РОЖДЕНИЕ ЛОЛОЛОШКИ [Финал] • Майнкрафт

Lp. Сердце Вселенной #60 РОЖДЕНИЕ ЛОЛОЛОШКИ [Финал] • Майнкрафт

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников

УКРАЇНСЬКИЙ ДЕТЕКТИВ | Стоматолог. ТОП СЕРІАЛ. 1,2 серія

УКРАЇНСЬКИЙ ДЕТЕКТИВ | Стоматолог. ТОП СЕРІАЛ. 1,2 серія

Разобрался голыми руками 😎 #start #кино #фильм #сериал #молотведьм #полиция #пацаны

Разобрался голыми руками 😎 #start #кино #фильм #сериал #молотведьм #полиция #пацаны

Тайское мороженое в Калининграде

Тайское мороженое в Калининграде