BM25 : The Most Important Text Metric in Data Science

AI can't cross this line and we don't know why.

Stop Losing Context! How Late Chunking Can Enhance Your Retrieval Systems

How To Choose Mac N Cheese Date Night.. 🧀

The IMPOSSIBLE Puzzle..

ЯК ПОКАРАЛИ КОМБАТА-М’ЯСНИКА/ЗНАЙШОВ ЗРАДНИКА, ЯКИЙ КИНУВ ПОМИРАТИ/ЄВРЕЙ - ЖИТТЯ ПІСЛЯ СМЕРТІ

Why vector search is not enough and we need BM25

Diffbot

Переглядів 18 966

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 лис 2024

КОМЕНТАРІ • 60

@endre777 Місяць тому ⁺¹⁴
Thanks for the explanation, was super clear.
We just planning to move from vector search to hybrid, and your explanation on BM25 helps a lot to understand what edge cases it can solve. Appreciate a lot!
Guess we will see a surge on BM25 due to Anthropic Contextual retrieval paper .
@ChocolateMilkCultLeader Місяць тому ⁺⁶
Great video. This is why when building a search engine- I like to use BM25 for sparse search, and use Vector based search later, once most of the corpus has been filtered out. This allows me to stay precise and efficient.
One additional thing- people often assume that you need a Vector Db for vector search, but you can do completely without. Just store the vectors in a normal DB.
@notsojharedtroll23 Місяць тому
I mean, at the end of the day, the embedsings are data period
@stxnw Місяць тому
It should be the other way around. Most prompts may not have exact matches. Use vector search first, then BM25 and rerank the results.
@shiholololo1053 День тому
Waiting for the next videp. I enjoyed the format.
@oncedidactic Місяць тому ⁺⁴
Nice discussion, thanks! I wish there was more structure to the video so the “why” of the title I served as a main dish, ie let’s define the terms up front, explain how each works, then do why discussion and give a teaser for hybrid approach discussion. Instead there are some gaps and jumps around, which leaves it feeling incomplete or maybe not quite capturing the essence? I have a feeling this is partly a result of editing many clips, so don’t take this feedback too seriously. Cheers
@microburn Місяць тому ⁺⁴
Nice video. I’ve been on the opposite side of the coin, but I like hearing the balanced argument to keep me educated
@matveyshishov Місяць тому
Thanks, guys, YT recommended me this video, a very pleasant snippet of explanation.
Trying to work through your website to understand what the service is.
@jameswigglesworth8132 Місяць тому
Thank you for delving into this important topic!
@NicolasEmbleton Місяць тому
Wonderful explanation. Thank you.
@dougunderwood569 Місяць тому
Great overview, thank you!
@aproperhooligan5950 Місяць тому
Excellent presentation/explanation. Very useful. Thank you!
@andydataguy Місяць тому
Great video! This is one of the most misunderstood concepts. Will def share this next time it comes up!
@marka5215 Місяць тому
Great explanation. Thank you so much!
@weirdsciencetv4999 Місяць тому ⁺¹
Oh man you are amazing!!
Love channel I subscribed. Please do a video on working with such graphs using a vector database
@roopad8742 Місяць тому
This is so easy to understand, thank you!
@badashphilosophy9533 Місяць тому
this is an amazing explanation. im an instant follower
@ashraf_isb Місяць тому
thats insightful, thank you so much boss
@andrewwalker8985 Місяць тому ⁺¹
Why don’t we include semantic dimensions in vectors
@amortalbeing Місяць тому
thanks this was great!
@MLGJuggernautgaming Місяць тому
I believe a vector search is still better for rag applications. Bm25 is better for more literal matches. Also what does this have to do with LLMs doing math?
@MathsSciencePhilosophy Місяць тому
The mathematics behind chatGPT is amazing
@broccoli322 Місяць тому
Thanks for the video.
@theepicosityofpizza Місяць тому ⁺¹³
BM25 doesn't do anything to address any of the issues you bring up at the beginning of the video. TF IDF is dumber than vector search in every aspect. It's just much cheaper to run. Not saying it doesn't have value as part of the toolkit but not sure why you spend the first half setting all thes problems with vector search up as if BM25 addresses any of them.
@stxnw Місяць тому
is English not your first language?
@BleachWizz Місяць тому
Oh no, this is going to make texts like I do!!!
ok, drama aside, I do believe this will improve things a lot.
I still see some caveats that would be left for luck, but huges amount of data might overcome that.
I do believe we already have enough with GPT and a few previous ideas, still improving the language model itself is always a plus.
@Howoulduknow841 Місяць тому
This is something Anthropic has shared with their contextual retrieval.
@Isaacmellojr Місяць тому
Otima exemplificacao de como word2vec não é a solucao definitiva.
@АндрейАндреевич-з7т Місяць тому
BM25. Frequency-weighted by sponsored-definition-tag vector search. Yeah google search do that too, you know. If you ever did seo optimization for your website or some kind of smm you know that it works
@pratikerande4808 Місяць тому
super
@themax2go Місяць тому ⁺¹
ty for the insight to "pair" numerical rep (vector) w/ MB25... can the same be achieved w/ just using a knowledge graph? i'm experimenting w/ sci/phi triplex... what do you think, do you have any preliminary ideas, or have you already tested it and found using "entities_and_triples" not as effective / not effective at all? 6 mo ago you did a vid on knowledge graphs, i haven't watched it yet, i'll check it out...
@shizheliang2679 27 днів тому
wait...I think I am in love...
@tempname-dr2bm 28 днів тому
Poland mentioned
@bmm8213 Місяць тому
Golden nugget
@NLPprompter Місяць тому
i love this bot...
@knucker3 Місяць тому ⁺⁴
TURN YOUR VOLUME UP
@815TypeSirius Місяць тому
But vs is enough to scam dummies and create a market bubble.
@ValidatingUsername Місяць тому
Try tokenizing engendered languages 😂
@rontheoracle Місяць тому ⁺¹²
Excuse me, but your volume is just too low. Just saying.
@martin777xyz Місяць тому ⁺²²
Seems fine to me
@sladeTek Місяць тому ⁺¹²
No it’s not, your device is the issue
@rontheoracle Місяць тому
@@sladeTek It's just this video and a few others that play with very low volume. I try other videos in youtube, in general, they sound acceptably loud. Dunno why.
@rontheoracle Місяць тому ⁺¹
@@sladeTek Try watching the video in youtube with this title:
"The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!"
It is significantly much louder. Just my 2 cents.
@csmac3144a Місяць тому ⁺¹
Her audio is fine. Turn up your volume.
@Ruhgtfo Місяць тому ⁺¹
Contributed 3blue1brown

Наступне

Автоматичне відтворення

BM25 : The Most Important Text Metric in Data Science

BM25 : The Most Important Text Metric in Data Science

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

Stop Losing Context! How Late Chunking Can Enhance Your Retrieval Systems

Stop Losing Context! How Late Chunking Can Enhance Your Retrieval Systems

How To Choose Mac N Cheese Date Night.. 🧀

How To Choose Mac N Cheese Date Night.. 🧀

The IMPOSSIBLE Puzzle..

The IMPOSSIBLE Puzzle..

ЯК ПОКАРАЛИ КОМБАТА-М’ЯСНИКА/ЗНАЙШОВ ЗРАДНИКА, ЯКИЙ КИНУВ ПОМИРАТИ/ЄВРЕЙ - ЖИТТЯ ПІСЛЯ СМЕРТІ

ЯК ПОКАРАЛИ КОМБАТА-М’ЯСНИКА/ЗНАЙШОВ ЗРАДНИКА, ЯКИЙ КИНУВ ПОМИРАТИ/ЄВРЕЙ – ЖИТТЯ ПІСЛЯ СМЕРТІ

Полицейский Гнев Головоломка 2 Ищет Шин Тейпс Крафти Корн

Полицейский Гнев Головоломка 2 Ищет Шин Тейпс Крафти Корн

Embeddings: What they are and why they matter

Embeddings: What they are and why they matter

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

Announcing sPhil

Announcing sPhil

Harvard Presents NEW Knowledge-Graph AGENT (MedAI)

Harvard Presents NEW Knowledge-Graph AGENT (MedAI)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

This puzzle took me three years and required thinking in 3721 dimensions

This puzzle took me three years and required thinking in 3721 dimensions

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Contextual RAG is stupidly brilliant!

Contextual RAG is stupidly brilliant!

I thought Entropy was a measure of Disorder.. boy was I wrong!

I thought Entropy was a measure of Disorder.. boy was I wrong!

Побег из Тюрьмы : Тетрис помог Nuggets Gegagedigedagedago сбежать от Nikocado Avocado !

Побег из Тюрьмы : Тетрис помог Nuggets Gegagedigedagedago сбежать от Nikocado Avocado !

15 Способов Пронести ГАДЖЕТЫ и СЛАДОСТИ в ШКОЛУ !

15 Способов Пронести ГАДЖЕТЫ и СЛАДОСТИ в ШКОЛУ !

Як пацієнти зустріли військового лікаря після повернення з фронту

Як пацієнти зустріли військового лікаря після повернення з фронту

消防避险训练，消防员用“水盾”逼退烈火！这是训练，也是他们可能面对的日常。致敬！#熱門 #中国

消防避险训练，消防员用“水盾”逼退烈火！这是训练，也是他们可能面对的日常。致敬！#熱門 #中国

Hoodie gets wicked makeover! 😲

Hoodie gets wicked makeover! 😲

👀Пропозиція від військового #війна #мобілізація #зсу #тцк #повістки

👀Пропозиція від військового #війна #мобілізація #зсу #тцк #повістки

ОБМЕНЯЛА КВИНКУ НА…😱(смотрите до конца😂)#роблокс #игры #смешное #интересное #квинка

ОБМЕНЯЛА КВИНКУ НА…😱(смотрите до конца😂)#роблокс #игры #смешное #интересное #квинка

27 октября 2024 г.

27 октября 2024 г.