The Biggest Misconception about Embeddings

ritvikmath

1 100

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 10 лют 2025
The biggest misconception I had about embeddings!
My Patreon : www.patreon.co...
Visuals Created Using Exclidraw:
excalidraw.com/
Icon References :
Bird icons created by Mihimihi - Flaticon
www.flaticon.c...
Whale icons created by Freepik - Flaticon
www.flaticon.c...
Carrot icons created by Pixel perfect - Flaticon
www.flaticon.c...
Kale icons created by Freepik - Flaticon
www.flaticon.c...
Book icons created by Good Ware - Flaticon
www.flaticon.c...
Book icons created by Pixel perfect - Flaticon
www.flaticon.c...
Sparkles icons created by Aranagraphics - Flaticon
www.flaticon.c...
Flower icons created by Freepik - Flaticon
www.flaticon.c...
Feather icons created by Freepik - Flaticon
www.flaticon.c...
Communication icons created by Freepik - Flaticon
www.flaticon.c...
Student icons created by Freepik - Flaticon
www.flaticon.c...
Lunch icons created by photo3idea_studio - Flaticon
www.flaticon.c...

КОМЕНТАРІ • 56

@shoaibsh2872 Рік тому ⁺²⁸
It feels like the shorter your video is the more informative it is 😅, you don't only explain what's embedding is but also explain how it can differ based on problem statement in less than 5 minutes
@ritvikmath Рік тому ⁺⁸
Thanks! I’m trying to make shorter videos and learning that it can actually be more challenging than making a longer one
@johannestafelmaier0 Рік тому
I'd say Quality > Quantity.
Time is valuable and that is probably one reason why shorter from videos are becoming so successful.
Also, I'd also say making shorter educational videos forces you to cut away everything that is not important, which should leave you with a clearer picture of what the essence of that concept is.
@xspydazx Рік тому ⁺¹
reality , make a base model .. highly tuned and use this as your starting point for new models ... preserve you base at all costs... often online versions are poluted ...
@SierraSombrero Рік тому ⁺⁶
I've never commented on any of your videos before but thought it was time to do so after this one.
Thank you so much for all the great work!
For me you're the best explaining data science and ML concepts on youtube.
I also love how broad your range of topics is. I feel like I used your content to understand concepts in NLP and general Data Science but also RL or Bayesian Approaches to Deep Learning.
Your real life and intuition explanations are really strong. Keep it up!
@ritvikmath Рік тому ⁺²
Hey I really really appreciate the kind words and would absolutely love more comments and feedback in the future
@jfndfiunskj5299 Рік тому ⁺⁷
Dude, your videos are so damn mind-opening.
@ritvikmath Рік тому
Thanks!
@Pure_Science_and_Technology Рік тому ⁺³
In a RAG-based Q&A system, the efficiency of query processing and the quality of the results are paramount. One key challenge is the system’s ability to handle vague or context-lacking user queries, which often leads to inaccurate results. To address this, we’ve implemented a fine-tuned LLM to reformat and enrich user queries with contextual information, ensuring more relevant results from the vector database. However, this adds complexity, latency, and cost, especially in systems without high-end GPUs.
Improving algorithmic efficiency is crucial. Integrating techniques like LORA into the LLM can streamline the process, allowing it to handle both context-aware query reformulation and vector searches. This could significantly reduce the need for separate embedding models, enhancing system responsiveness and user experience.
Also, incorporating a feedback mechanism for continuous learning is vital. This would enable the system to adapt and improve over time based on user interactions, leading to progressively more accurate and reliable results. Such a system not only becomes more efficient but also more attuned to the evolving needs and patterns of its users.
@gordongoodwin6279 Рік тому
this is a fantastic video. I found myself confused as to why NNs needed an embedding layer each time and why we didn't just import some universal embedding dictionary. This made that super simple! Parrots and carrots and kales and whales and cocks and rocks!
@zeroheisenburg3480 Рік тому ⁺⁴
One thing I don't understand is that why are these embeddings learned through deep learning with non-linearity in-between could be compared with linear metrics such as the most commonly used cosine similarity. I can't find a good discussion anywhere.
@SierraSombrero Рік тому ⁺³
The deep learning models are trained using non-linearities to capture non-linear relationships in the data. Hence, the function (=model architecture) you use to learn the embeddings has non-linearities.
When we train a deep learning model to obtain an embedding, we most of the time have an embedding layer as the first layer in our model. We then train the model using a specific objective (goal), which is suitable to obtain word embeddings. After having trained the model, we just take the embedding layer out of the full model and discard the rest. You can imagine the embedding layer as a matrix of size (vocab_size x embedding_dimension). That means each word/token in our vocabulary is represented by a vector with as many numbers as the embedding dimension. The matrix (embedding layer) itself has no non-linearities, it's just a matrix. Therefore, the vectors that represent the tokens can be compared with each other using linear metrics as you said above.
Hope it helps :)
@zeroheisenburg3480 Рік тому
@@SierraSombrero Appreciate the response. But I think there's some critical issue lingering.
1. The input is a matrix. It goes through linear -> non-linear -> linear transformations. The back-propagation has to go through the same steps when updating the embedding layer's weight. So it's carrying non-linear information over to the embedding layer, thus breaking the linear properties, right?
2. By "The matrix (embedding layer) itself has no non-linearities", does that mean I can extract any weights before the activation unit in a neuron and use it as embedding?
@SierraSombrero Рік тому ⁺¹
@@zeroheisenburg3480
I'll try to answer as best as I can. I'm not sure I'll be able to answer question 1 satisfactorily, though :)
I guess I'll start with question because I can explain it better.
2. An embedding layer is not the same as a linear layer. It does not represent neurons and does not output activations (but rather representations).
In a linear layer you have an input x that you multiply with the weight w and then you add a bias b. (I don't know of any case where weights have been used as embeddings.)
An embedding layer can usually only be the first layer in a network. You don't multiply an input x with a weight w here.
Instead you have a number of input classes in the form of integers (that represent e.g. words) that you can feed your model (the number of integers is your vocab size). Each of these input integers is mapped to one row in your embedding layer (vocab_size x embed_dim). You can imagine it like table where you look up which embedding belongs to which word.
Once you have looked up the embedding for your current word, you use it as input to the next layer in your model.
Now, before having trained your model the embedding is random and the embedding layer is updated during training using backprob just like every other layer (though differently because it is a different mathematical operation than a linear layer).
After training the model, the embedding layer has been changed so that every of your inputs words now has a meaningful representation in the embedding space (if your training was successful).
Now you can take the lookup table (embedding layer) out of your model, feed it a word and it will give you the meaningful embedding belonging to your word.
I suggest you to check out the difference between Linear and Embedding layer in pytorch :)
Make sure to understand what kinds of inputs you feed it and what you get as outputs.
pytorch.org/docs/stable/generated/torch.nn.Linear.html
pytorch.org/docs/stable/generated/torch.nn.Embedding.html
Maybe also try to find a good explanation of how the first static embeddings were trained (CBOW, Skipgram).
I think this should give you the intuition.
1. It's true that during training backpropagation also non-linear operations take place.
However, since you're discarding all non-linear parts of the model and only keep the embedding layer, it is definitely possible in practice to apply linear operations on them.
If there are theoretical mathematical issues lingering in the background then I'm certainly the wrong person to answer your question.
But since it works so well in practice I would personally not worry too much about it :)
@shirleyhu5446 6 місяців тому
This is enlightening. It conveys how embedding works in an intuitive way.
@nüchtern_betrachtet Рік тому ⁺⁴
I would like to point out an important distinction: The *concepts* described by the symbols in context of other symbols can have vastly different embeddings. The *symbols* themselves however need absolute/fixed embeddings. If you use multiple symbols in a sequence, like words in a sentence, you can use all the other symbols in order to give each other context.
So the raw input embeddings are always the same. In that case, I would argue that the initial "common misconception" is actually accurate.
Using a model like a transformer allows you to input a sequence of (fixed) symbol-embeddings and end up with contextualized embeddings in place of those symbols. The transformer then iteratively applies *transformations* on those embedding vectors depending on the *context* .
The symbol "parrot" always starts as the same fixed embedding vector, no matter in which context it appears. But depending on the context, the repeated transformations done by the transformer eventually *map* that vector to another vector close to "parrot" if the context is a poem, or yet another vector close to "kale" if the context is a cooking recipe.
This is why word2vec back then just was not enough. It only computed something similar to those input embeddings and then stopped there without doing those transformations.
@lechx32 Рік тому ⁺²
Would love to see more about embeddings
@ritvikmath Рік тому ⁺¹
Noted! Thanks for the feedback
@Kivoswag Місяць тому
So now I know how to make a hip hop AI agent that can rap. What out Kendrick!
@adaoraenemuo4228 Рік тому
Love love your videos! Very clear with meaningful examples!
@polikalepotuaileva6006 6 місяців тому
Excellent video. Thanks for taking the time to share.
@baharrezaei5637 Рік тому
best explanation I have seen of embeddings by far, Thanks 🌻
@randoff7916 Рік тому ⁺³
When the sample size is large, does the embedding for individual words start to converge?
@andreamorim6635 7 місяців тому
Thanks for the explanation! Really easy to understand after watching this video!! keep up the good work
@ritvikmath 7 місяців тому
Glad it helped!
@Rafael-xu7jh 6 місяців тому
What's the difference between embeddings and correspondence analysis for predictions, considering that both provide coordinates in an n-dimensional space based on the characteristics of categorical variables?
If I can use embeddings for categorical variables in predictive models, why can't I use correspondence analysis?
@BlayneOliver 10 місяців тому
How do you introduce categorical embeddings into a seq2seq model which works on sequence input_layers?
@JoseWaihiga 5 місяців тому
Thank you for a great explanation!
@pratik.patil87 5 місяців тому
This is amazing. Thank you for doing this video. This drives home a very imp. point. Can we fine tune the already present state of art embeddings to my specific world of context? Also, I would really like to know at least conceptually how some of these popular embeddings are created, like SBERT, RoBerTa etc.
@TheTechnokobi Місяць тому
could you suggest some embeddings for videos for video similarity analysis ?
@blairt8101 3 місяці тому
I got a good grade for my ml exam because of you!
@Tonkuz 9 місяців тому
What will happens to the embedding created for one LLM if I change the LLM
@mojekonto9287 8 місяців тому
Nothing. At least in the context of a RAG system where you use the embeddings to search through their vector database to retrieve context for the LLM.
@turkial-harbi2919 Рік тому
Simply beautiful 🙏
@ritvikmath Рік тому
Thanks!
@OliverKulinski 26 днів тому
This is a video about context-based embeddings, but can you go further to explain sense-based embeddings and the functional differences between the two?
@garyboy7135 Рік тому
Maybe some topics around word2vec and popular embedding method. And how embedding can expand beyond texts.
@jordiaguilar3640 Рік тому
great teaching.
@MindLaboratory Рік тому
I'm working on embeddings for a very particular application inside a game. Lots of natural language, but also lots of game-specific language. I started by downloading GloVe, find each word that appears in my vocabulary and in GloVe and copying that vector in my model for the matching word, and using a random vector for words that do not appear in GloVe. Then running an update function using a random sample of sentences each loop. Does this sound viable?
@zilaleizaldin1834 2 місяці тому
you are amazing!
@Septumsempra8818 Рік тому
Time series embedding? And encoders?
@ritvikmath Рік тому
We’re getting there 😂
@cirostrizzi3760 Рік тому
Greati video, very informative and clear. Can someone tell me some names of modern embeddings (e.g. OpenAI) and maybe give me some sources to search and understand more about them?
@xspydazx Рік тому
really if you train your embedding model with entity lists and topic themed sentences , ie highly classified and enity rich data paired with its assocciated topic or entity. then you will build the right model. this modfel should form your base model and when performing tasks then you "fine tune the model" for the customized corpus so that it will no also update the vocabulary from your new corpus ... Re assigning the terms closer. to optimize then it would be neccasary to retrain for a set of epochs (without over fitting the new data ) as the pretrained model contains the data (YOU WANT UNDERNEATH) the new model is poluted to the new data corpus... hence keeping a base model unchanged to give your projects a jumpstart... tuning these models with new entity lists and topic lists etc .. updating the new knowledge in the model. even cleaning and pruning the vocabulary of stop words which are unwanted from the model , even offensive words and missassigned words... So a base model is your starting point . if you train a fresh model on the corpus then it will produce the results that you show.. it will essentailly not be fuit for purpose except the purpose it was trained for,,, ,
@user-wr4yl7tx3w Рік тому
are there other options besides embeddings?
@EeshanMishra Рік тому
Google sends me to you when I am working on Llama embeddings :)
@anishbhanushali Рік тому
In short Context matters
@micahdelaurentis6551 Рік тому
that was a fantastic example
@ritvikmath Рік тому
Thanks!
@tomoki-v6o Рік тому
embeddings are good tools for statistics discovery
they hold the spirit of statistics of information organisation
@ritvikmath Рік тому
Agreed!
@tahabinzafar 6 місяців тому
Thank you.

Наступне

Автоматичне відтворення

The Unreasonable Effectiveness of Bayesian Prediction