This seems like a great technique to also help with entity confusion during retrieval. Sometimes I've noticed that embeddings models don't really capture nuanced but important differences between chunks that talk about once company vs another and that ends up confusing the LLM as well.
Intersting! But I still think Naive RAG is a bit underrated. To properly build up contextual retrieval or any RAG system for that matter a Naive approach lays as a foundation. It is also cheap and fast and if done correctly works very well. Only thing is Naive RAG for tables works quite bad, but for text it can work very well.
Most documents for rag would be more than 50 pages so I don’t think there’s any embedding model with that huge context. Pls correct me if I’m wrong. I don’t see this approach being effective for rag systems
I think these needs some clarifications. the 8k max token doesn't mean that you can only embed a document when its less than that many tokens. If you have a document which is longer than say 8k tokens, you can divide into batches and process it the way you would do it for chunking. Now there might be some discontinuity but overlap is again your friend here. Hope this clarifies how you would use it.
If we chunk paragraphs, and yeah we keep overlap, still the main property of Late chunking is to hold semantic meaning of whole context, which is not useful. It's like an intermediate solution, because embedding limit is a challenge.
Ah, chunking. I love the late chunking idea, but personally have found optimizing my document formatting for a specific RAG to be to best approach. Making sure it gets chunked sensibly. Pain in the ass, frankly, and can be largely avoided with fractal structuring. But you can't usually do that. Sigh.
I totally agree with this approach and have been advocating for it for a while now with clients I work with. Non of this is magic. You have to spend time with your data to understand it and then build on top of it. The unfortunate part is people don't want to do the dirty work for the most part.
Your videos are always clear and super explicatives. Thanks. Keep on going like this!😇
Great video. I am a college student, and your videos are helping me do my projects. Thank you for such content.
Thank you and glad you are finding it useful.
Awesome, I’m subscribed
This seems like a great technique to also help with entity confusion during retrieval. Sometimes I've noticed that embeddings models don't really capture nuanced but important differences between chunks that talk about once company vs another and that ends up confusing the LLM as well.
Late chunking seems very cost-efficient compared to others approach, ty for the share !
Super interesting once again 👍
Intersting! But I still think Naive RAG is a bit underrated. To properly build up contextual retrieval or any RAG system for that matter a Naive approach lays as a foundation. It is also cheap and fast and if done correctly works very well. Only thing is Naive RAG for tables works quite bad, but for text it can work very well.
use a specific Agent for SQL and use a Router :)
Most documents for rag would be more than 50 pages so I don’t think there’s any embedding model with that huge context. Pls correct me if I’m wrong. I don’t see this approach being effective for rag systems
I think these needs some clarifications. the 8k max token doesn't mean that you can only embed a document when its less than that many tokens. If you have a document which is longer than say 8k tokens, you can divide into batches and process it the way you would do it for chunking. Now there might be some discontinuity but overlap is again your friend here. Hope this clarifies how you would use it.
If we chunk paragraphs, and yeah we keep overlap, still the main property of Late chunking is to hold semantic meaning of whole context, which is not useful. It's like an intermediate solution, because embedding limit is a challenge.
@@engineerpromptthat defeats the very purpose of late chunking doesn’t it
Does the embedding dimension refer to the output length of the response?
y, i have the same question
Its the length of the vector. So a dimension = 3 would be [2, 3, 5]
So how to combine that with VRAG and Context extension in localVisionGPT?
Could we use Hybrid search with Late chunking ? or late Chunking is enough ?
@Prompt Engineering
Hybrid will always help. It's hard to beat BM25 :) that is usually really helpful when you have a lot of keywords in your dataset
Is there an application here where this can enhance knowledge graph generatin?
Thanks!
Thanks
Ah, chunking. I love the late chunking idea, but personally have found optimizing my document formatting for a specific RAG to be to best approach. Making sure it gets chunked sensibly. Pain in the ass, frankly, and can be largely avoided with fractal structuring. But you can't usually do that. Sigh.
I totally agree with this approach and have been advocating for it for a while now with clients I work with. Non of this is magic. You have to spend time with your data to understand it and then build on top of it. The unfortunate part is people don't want to do the dirty work for the most part.