Building long context RAG with RAPTOR from scratch

LangChain

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 26 лис 2024

КОМЕНТАРІ • 45

@maxi-g 8 місяців тому ⁺⁸
Lance is killing it with these videos. Keep it up!
@cnmoro55 8 місяців тому ⁺²⁰
I think this approach is very interesting, and was very well presented, thank you for the video.
One thing is, this works when working with a "closed" context, so we know we will query ONLY these 31 pages, let's say.
If we are in an environment where this is dynamic, the clustering approach might not work so well.
When we add more documents, we would have to run the clustering again, not simply load the model and predict the cluster, because new documents might get added that have completely new information. This becomes a problem when scaling this up, basically - both in terms of time spent, as well as cost for running the summarization again.
@seanpitcher1102 8 місяців тому ⁺⁶
That was my first thought. This approach seems to work well with static content but what happens if I want to add new documents? It seems like you would need to rerun the entire process which will get incrementally expensive over time.
@alchemication 7 місяців тому ⁺³
Agreed, we need another paper on scalable RAPTOR ;)
@quoctuongangvuong8031 10 днів тому
I think that after having a summary tree, adding new documents or modifying old documents. Can be done binary by comparing information with existing clusters from root to leaf, and deciding whether that leaf is created new or updated information. Splitting the original document will be the thing for me to find the difference when expanding or duplicating the document.
@danielschoenbohm 8 місяців тому ⁺⁸
That was so useful. Thanks! I'd love to see more advanced technics like that.
@johnnydubrovnic 8 місяців тому ⁺⁸
Excellent approach and very well explained.
One challenge that comes to mind with this summarisation hierarchy is maintaining it, as the source content changes or is revised. I am thinking in scenarios where there are hundreds of millions of documents to index.
@Hollowed2wiz 3 місяці тому
You can use an llm to make the summary of the documents and clusters each time an update is necessary.
@Novacasa88 8 місяців тому ⁺¹
Hilarious I just came up with this idea a few months ago for a project really makes me think I should just get into doing the research in this field that seems my ideas end up becoming common concepts over and over again over the last few years. 😊 Such a cool field
@jaysonp9426 8 місяців тому ⁺¹
This is great, long context is a tool for a specific use case. Until costs and latency with long context are the same as RAG, RAG will be what most apps use.
@MrPlatinum148 7 місяців тому
Fantastic video. Thanks heaps for the content. It really feels like you could present a series of these talks. I want to learn more about implementation of some of these ideas.
@isa-bv481 8 місяців тому
First, I want to mention I like your explanations/videos. Thanks for your great work.
In this occasion I was blocked (but I will solve that) because of the following:
1. Claude is not available in some regions (like mine, being Belgium) - I'm on the waiting list.
2. I tried with GPT4 as an alternative, but I forgot that you must put money on the account (I still have most of the $5 free test account, but that's limited to GPT 3.5).
@JonWillis9 8 місяців тому ⁺⁷
F yes, it's lance from langchain again, it is going to be a good day.
@nishhaaann 8 місяців тому ⁺¹
Indeed
@SimonMariusGalyan 6 місяців тому
Thank you for your awesome presentation :)
@bingyaoli8373 5 місяців тому
thanks for sharing, love your videos
@HealthLucid 7 місяців тому ⁺¹
Some of the readers have commented that we need run the entire clustering algorithm again if we get a new set of documents, or need it to be dynamic.
I DO NOT think we need to do this. Here is why
Lance (the speaker) shows how the documents are clustered recursively till it reaches n or a single cluster.
So let us say there are 10,000 clusters and the new documents impact only 4 clusters [see at 06:33], where he talks about Gaussian Mixture model (AFAIK, this means a point can belong to multiple clusters), then we have two cases here
1. No new clusters are created: So only "those 4 clusters" have to be rebuilt and its changes need to be propagated up through the chain to the root node right? We continue to have 10,000 clusters
2. Let us say it ends up in expanding the # of clusters from 4 to 6 say, then only the impacted clusters will have to be rebuilt from that point to the root cluster. We will now have 10,0002 clusters
If this is true, we do not need to rebuild everything but only that clusters that get impacted. Its like rebalancing the tree
@awakenwithoutcoffee 4 місяці тому
good point! I think it would be useful to attach meta-data to the summarized clusters so when new documents are upload only those relevant clusters are retrieved and re-indexed.
Having said that I do have my doubts about the efficiency of this system. It only works if the data going in is already pre-processed correctly although one could say this applies to all RAG solutions.
@StephenRayner 5 місяців тому ⁺¹
An enhancement here would be to have it expand to the summarised nodes into the original nodes.
@paraconscious790 8 місяців тому ⁺¹
This approach and implementation is amazing to alleviate the 3 issues you mentioned, thanks! One query though: have you checked the accuracy of the output as against the entire content into single prompt in the large context LLMs?
@perrygoldman612 8 місяців тому ⁺²
One key question of your approach is how to define the summary so that it offers adequate information to be used in RAG. If the summary does not include some minor information points, it would be impossible for RAG to identify the document as relevant solely based on the summary. And moreover, what if the document itself contains too many scatter info, and is hard to summarize, the approach would have many issues...... I do believe using this approach for many docs, but this approach does have some pre-requisites...
@easvidi6325 8 місяців тому ⁺²
I think we should shift from summaries to abstract summaries. Making them more conceptual on higher level. Then, before sending a request for search, LLM should (re)formulate question in the way that it will be compatible with abstract summaries, then search, then find real texts based on found abstract summaries
@YueNan79 7 місяців тому ⁺¹
Hey I got a issue, What if sum of cluster documents exceed maximum token of summary chain ?
@byrondelgado 8 місяців тому ⁺¹
This is a more comprehensive RAG scalable approach
@henkhbit5748 8 місяців тому ⁺¹
Indeed an interesting approach that is not limited by the context length of the LLM. I have some remarks: a) Is the threshold also not the same as choosing the K-parameter of KNN (can Kohonen map not be used, its also unsupervised clustering...?) b) you don't have a performance impact retrieving from a long embedded text and also from the summarization clusters? c) already ponted out in some of the comments: how 2 update when adding new docs efficiently? (of course u can do, for example, using a copy vectorstore and do the update and switch over when done). d) have u tested the results using the "standard" method without summarization and this "Raptor" method and timing the inference time of both methods?
btw: using long context is NOT very cost effective if u are using the commercial big AI companies.
@8eck 7 місяців тому
Anyway, thank you for the high level explanation.
@ShoiebAhmedChowdhury 8 днів тому
If higher level summary of is being used as the context during generation, how would one go about providing reference? Specially, in use cases where answers have to be 100% factual and reference is necessary to have transparency. Thanks!
@insitegd7483 8 місяців тому
I think that the solution in the last part to not exceed the limit tokens could be this:
If we know that the first document is very large we could only embed this whole document and add an ID in metadata, then do similarity search in other vector database retrieving the documents by the ID.
I am not sure, but I think that could solve the problem.
@anhvunguyen7935 8 місяців тому ⁺¹
Will you make videos about RAG with PDF (contains not only text but also tables and images). That would be a very helpful video for me. Thank you for the great work!
@jeffsteyn7174 8 місяців тому ⁺¹
So in the example you adding a batch of 30 pages and they clustered and summarized. What happens when you add another batch or even just one extra doc. Is it added to an existing cluster and summary or does this become a new cluster summary
@bertobertoberto3 7 місяців тому
Interesting idea, however, if you retrieve from an intermediate summary, would it still be possible to do citations on the original documents? Citations are key for most production level deployments
@dejoma. 8 місяців тому ⁺¹
How is running all your context through an LLM in “chunks” cheaper than throwing it all in 1 chunk… I think this approach is not viable for most people since it requires passing ALL context through an LLM. Either adding as context, or by being passed through the summary prompt.. Opinions?
@f2f4ff6f8f0 8 місяців тому
Great stuff
@mr_adisa 8 місяців тому
Awesome walkthrough, going to give it a try.
One thing this approach seems to lack is the ability to include metadata (e.g. source) on the summarizations. Has anyone found a solution to this?
@杨冲冲-b8t 26 днів тому
The paper doesn't share the experimental script, I don't know if there is a big guy who can share what he wrote, and I have never been able to reproduce the results in the paper
@gowtham-user2834 8 місяців тому
you are great one champ
@HashimWarren 7 місяців тому
7:56 What does it mean to "embed" the document?
@MattJonesYT 8 місяців тому ⁺³
The content is great but the audio has a lot of echos. If you use a headset with the mic positioned so that it's below the chin to avoid plosive pop sounds it will greatly improve the audio quality.
@8eck 7 місяців тому
Still, "k" problem haven't gone anywhere. 😅
@AtomicPixels 8 місяців тому ⁺¹
Y’know what works better than all of this? Something we’ve done for centuries. Versioning the model itself in a server cache as an instance the model can prompt - using the exact same method for every instance until it finds the model that holds the summary .
@easvidi6325 8 місяців тому ⁺¹
Please elaborate
@peterwlodarczyk3987 8 місяців тому
⁠⁠I believe he’s trying to make a joke along the lines of “just fine tune the model bro lol”. Which is, of course, useless advice. Impossible for most valid use cases (using e.g. GPT4 / Claude 3). Impractical for the less popular ones (prohibitively expensive for anything above a 14B). His writing style is pretty schizo though so I’m giving him the benefit of the doubt by assuming he was actually trying to provide some kind of constructive feedback or suggestion rather than going on a free association word rant. He’s not describing fine-tuning but is vaguely in the neighborhood with that nonsense.
@nogool111 8 місяців тому ⁺¹
Can this approach solve a multi-hop question? I should try it myself. Thank you for a great video.
@dhrumil5977 8 місяців тому ⁺¹
I. Was wondering same

Наступне

Автоматичне відтворення