thanks, yeah dependent on your python environment, that may be needed. I've just added your improvement on the repo. Feel free to create issues there in the repo if you spot further bugs.
Howdy! I was too categoric on that in the video. The embeddings can be from different layers. The first layers are closer semantically to the input, then they are transformed to being more abstract in the middle, and then they come back towards word-meanings in the last layers as the output prediction is approached. A language model is trained to predict the next token - does that align well with forming an abstract representation for comparison with other text? Yes, empirically it seems so, but I don't have great intuition about why a first or a middle layer would be better. Obviously you need at least one layer because you want to get into vector space. Regarding dimensions, the dimensions of all layers (32 in Llama) are all the same, so it wouldn't be less dimensional just by counting the matrix sizes. Possibly the inner layers *are* lower rank, or could be represented by lower rank matrices (in fact, probably, because Low Rank Adapters, LoRAs, work). But maybe the first layer also could be very well approximated by a lower rank matrix. So overall, probably you could do either of those things, but I need to learn more and see more examples to say something deeper.
Great clarification! Well done! 1- I would love to see how your codes work with M1/M2 macbook. 2- Have you tried COLBERT? Would you be able to production-alize this notebook? 3-Would this embeddings and sentence transformers work for 1- hundreds of pages pdf or 2-thousands of pdfs with various pages? 4- For scenario in #3, should someone use i) embeddings only, ii) peft or fine tune or train LLM with specific documents, iii) both ,iv). 5- When are you publishing the Part2? Thanks again!
1. With embeddings let me dig in and revert. For fine-tuning, that's trickier, I'll think about it. 2. I haven't tried ColBERT. The thing here is that marco dot-product is specifically trained for dot-product search, so you'd be trying marco dot-product fine-tuned versus some ColBERT approach. I'll add it to my list of things to do, although making fine tuning is higher priority. 3. Yes, embeddings will work for hundreds or thousands of pages. One issue is that there may be so many highly similar snippets returned that you can't fit them in the prompt to go to the LLM. This makes it dataset and question dependent. 4. In my experience, fine-tuning is really hard. I've only gotten it to work for structured responses (like function calling) or maybe a bit for tone. Encoding information is really difficult, so my suggestion would be to try and stick to prompting and embeddings. 5. Hopefully soon - I want to publish something that is useful, and in a lot of the fine-tunings, the results are just really bad.
Thanks for the video! You helped me a lot. But no matter what you'll have a large prompt in the end right? I want my application to run on CPU and Llamma.cpp even tho it's way faster on CPU than other stuff specialized on GPU if you give it a long prompt it automatically go very slow. I made test for 1 phrase I get a response in less than a second. For a 1 page prompt it takes 4 minutes! So I cannot use embedding for my usecase (I have 40 000 pages of technical documentation and want answer on them). I have to use finetuning I guess. But everyone on the internet told me that finetuning is not for that. How would you proceed?
Hi, great channel! Thanks for sharing. I have a questions. Do you have any recommendations regarding embeddings and models to use when dealing with spanish for context and questions? Thanks in advance.
Great video! Could you explain why you chose MARCO instead of taking an arbitrary embedding model from the MTEB leaderboard (e.g. BGE)? What's your opinion on fine-tuning an embedding model on the domain in your case on the touch rugby dataset?
I just wanted to pick something more standard. Leaderboards can be misleading and there's a risk of grabbing something that later turns out not to be robust. That said, I think it's a good idea to try leaderboard embeddings and probably you can get better performance. I've never thought of fine-tuning an embedding model, but I like it! Might be a way to do better RAG!
How did you manage to match the dimension of the question embedding tensor with the text tensors? I would assume that you would have to pad the dimension of the question tensor to match the dimension of the text tensors don't you?
Do you mean my test question set? If so, that isn't a tensor but rather a list - it's not being sent through as a batch (although that would be a more efficient approach).
@@TrelisResearch I am refering to your evaluation of the embedding model on your train set. The dimension of the question embedding must match the dimension of the stacked corpus tensors to do the dot product. How did you accomplish that? I am missing something like: padding = target_len - question_emb.shape[0] question_emb = F.pad(question_emb (0, padding)) Where target_len is the length of the longest sentence in the corpus. To make the dimension of question and corpus match
@@adriangabriel3219 no matter the length of the sentence you put into the embedding model, it will return a 1D vectors whose length is the embedding dimension (not the sentence length). Does that help?
**Running on Mac M1 or M2** !!! Requires at least Mac M1 or M2 with 16 GB+ !!! I'm making available for purchase a version of the Embedding.ipynb script for Mac M1 or M2. Video demo here: www.loom.com/share/eb45fad389364c229655567dcc3aaf0d?sid=86a9fc70-37e0-4808-b016-1707f9a34c9f
This is great. Very much looking forward to Part 2
Very good explanation of how embeddings work. After this video I got a much better understanding. Thank you very much for your video and the examples!
This is really good. 🌟⭐⭐⭐⭐
This is an amazing tutorial. Thank you!
I had to add encoding parameter in the pdf_to_txt-py file here to prevent an error:
with open(txt_path, 'w', encoding='utf-8') as f:
f.write(text)
thanks, yeah dependent on your python environment, that may be needed. I've just added your improvement on the repo. Feel free to create issues there in the repo if you spot further bugs.
any current thoughts on ColBERTv2 vs regular bert embeddings? Seems intriguing--not too hard to set up if your're not on WSL 2
I haven't dug in but since the v2 is optimised for retrieval, it does sound intriguing!
Why are embeddings only from the first layer? Could you not pass it through N layers and attain lower dimensional embeddings that way?
Howdy! I was too categoric on that in the video. The embeddings can be from different layers. The first layers are closer semantically to the input, then they are transformed to being more abstract in the middle, and then they come back towards word-meanings in the last layers as the output prediction is approached. A language model is trained to predict the next token - does that align well with forming an abstract representation for comparison with other text? Yes, empirically it seems so, but I don't have great intuition about why a first or a middle layer would be better. Obviously you need at least one layer because you want to get into vector space.
Regarding dimensions, the dimensions of all layers (32 in Llama) are all the same, so it wouldn't be less dimensional just by counting the matrix sizes. Possibly the inner layers *are* lower rank, or could be represented by lower rank matrices (in fact, probably, because Low Rank Adapters, LoRAs, work). But maybe the first layer also could be very well approximated by a lower rank matrix.
So overall, probably you could do either of those things, but I need to learn more and see more examples to say something deeper.
Great clarification! Well done!
1- I would love to see how your codes work with M1/M2 macbook.
2- Have you tried COLBERT? Would you be able to production-alize this notebook?
3-Would this embeddings and sentence transformers work for 1- hundreds of pages pdf or 2-thousands of pdfs with various pages?
4- For scenario in #3, should someone use i) embeddings only, ii) peft or fine tune or train LLM with specific documents, iii) both ,iv).
5- When are you publishing the Part2?
Thanks again!
1. With embeddings let me dig in and revert. For fine-tuning, that's trickier, I'll think about it.
2. I haven't tried ColBERT. The thing here is that marco dot-product is specifically trained for dot-product search, so you'd be trying marco dot-product fine-tuned versus some ColBERT approach. I'll add it to my list of things to do, although making fine tuning is higher priority.
3. Yes, embeddings will work for hundreds or thousands of pages. One issue is that there may be so many highly similar snippets returned that you can't fit them in the prompt to go to the LLM. This makes it dataset and question dependent.
4. In my experience, fine-tuning is really hard. I've only gotten it to work for structured responses (like function calling) or maybe a bit for tone. Encoding information is really difficult, so my suggestion would be to try and stick to prompting and embeddings.
5. Hopefully soon - I want to publish something that is useful, and in a lot of the fine-tunings, the results are just really bad.
Thanks for the video! You helped me a lot.
But no matter what you'll have a large prompt in the end right? I want my application to run on CPU and Llamma.cpp even tho it's way faster on CPU than other stuff specialized on GPU if you give it a long prompt it automatically go very slow.
I made test for 1 phrase I get a response in less than a second. For a 1 page prompt it takes 4 minutes! So I cannot use embedding for my usecase (I have 40 000 pages of technical documentation and want answer on them). I have to use finetuning I guess. But everyone on the internet told me that finetuning is not for that. How would you proceed?
Exactly! Embeddings means long prompts. Fine tuning is your option here, but it’s hard
Hi, great channel! Thanks for sharing. I have a questions. Do you have any recommendations regarding embeddings and models to use when dealing with spanish for context and questions? Thanks in advance.
Probably you could look at BETO (github.com/dccuchile/beto) and dig around from there
Great video! Could you explain why you chose MARCO instead of taking an arbitrary embedding model from the MTEB leaderboard (e.g. BGE)? What's your opinion on fine-tuning an embedding model on the domain in your case on the touch rugby dataset?
I just wanted to pick something more standard. Leaderboards can be misleading and there's a risk of grabbing something that later turns out not to be robust.
That said, I think it's a good idea to try leaderboard embeddings and probably you can get better performance.
I've never thought of fine-tuning an embedding model, but I like it! Might be a way to do better RAG!
How did you manage to match the dimension of the question embedding tensor with the text tensors? I would assume that you would have to pad the dimension of the question tensor to match the dimension of the text tensors don't you?
Do you mean my test question set?
If so, that isn't a tensor but rather a list - it's not being sent through as a batch (although that would be a more efficient approach).
@@TrelisResearch I am refering to your evaluation of the embedding model on your train set. The dimension of the question embedding must match the dimension of the stacked corpus tensors to do the dot product. How did you accomplish that? I am missing something like: padding = target_len - question_emb.shape[0]
question_emb = F.pad(question_emb (0, padding))
Where target_len is the length of the longest sentence in the corpus. To make the dimension of question and corpus match
@@adriangabriel3219 no matter the length of the sentence you put into the embedding model, it will return a 1D vectors whose length is the embedding dimension (not the sentence length). Does that help?
great video!!!!
**Running on Mac M1 or M2**
!!! Requires at least Mac M1 or M2 with 16 GB+ !!!
I'm making available for purchase a version of the Embedding.ipynb script for Mac M1 or M2. Video demo here: www.loom.com/share/eb45fad389364c229655567dcc3aaf0d?sid=86a9fc70-37e0-4808-b016-1707f9a34c9f
Great