Would have loved to see this done with Antropic, mostly due to if you wanted to do this on larger documents, context caching from anthropic would be ideal.
Thank you for creating this content. It's very useful for completing the bachelor's thesis I'm currently working on. I'd like to ask a question: When the chat history reaches thousands of entries, and this chatbot is potentially used for a mobile app, is a vector database needed (for storing the data)? If so, should each data query (session ID, query, and answers) be stored? Or is there something else to consider? In this case, I want the vector database to address the limitation of LLMs, which is the context window constraint.
Thx for explaining how to generalize the anthropic trick! It’s very germane. And thanks for all the different RAG approaches to consider. Could u do a video on how to evaluate these different methods with metric driven analysis. Now, I just eyeball the results and besides being time consuming;I’m not that good at distinguishing small improvements between RAG models.
How costly it will be to send the same big document to some paid API many-many times asking it to locate the next small snippet and add some context? It will be rediculously expensive
Yeah, sounds very inefficient with quadratic complexity as function of document size. That gave me an idea, though: What if we first generate PDF summary and then use it as a context, instead of full document?
the whole points of rag, atleast for me, was not having to feed the LLM the whole document. Now this needs to be done for every chunk? Doesn't seem very efficient to me
Prompt caching helps in this case, but I am also not a great fan of putting whole documents into the LLM. Especially since they may still blow up the context size and processing may take a long time.
@@moin_uddin You can fit one document or some part of the entire documents, but RAG was made for scanning through thousands of documents right? I do agree that sending the extra text to add context is a tough problem. I am wondering if the contexting wouldn't allready work by adding two chuncks before and two chuncks after or 1 chunck before and one chunck after?
Should you not join a summary/summaries to the prompt, instead of the whole document/section? Would that not be even better in providing the essence of the context?
Place in to instructions and thank me later --- You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside `` tags, and then provide your final response inside `` tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside `` tags. Self reflection is mandatory in every reply unless specifically stated by the user.
Would have loved to see this done with Antropic, mostly due to if you wanted to do this on larger documents, context caching from anthropic would be ideal.
i love this type of explain the code from the llm provider 🥰thank you so much.
Thanks! Can you please create a video on hybrid rag - vector + graph based retrieval
14:44 - would be nice to have some practical video about Late chunking
Thank you for creating this content. It's very useful for completing the bachelor's thesis I'm currently working on. I'd like to ask a question: When the chat history reaches thousands of entries, and this chatbot is potentially used for a mobile app, is a vector database needed (for storing the data)? If so, should each data query (session ID, query, and answers) be stored? Or is there something else to consider? In this case, I want the vector database to address the limitation of LLMs, which is the context window constraint.
Thx for explaining how to generalize the anthropic trick! It’s very germane. And thanks for all the different RAG approaches to consider. Could u do a video on how to evaluate these different methods with metric driven analysis. Now, I just eyeball the results and besides being time consuming;I’m not that good at distinguishing small improvements between RAG models.
Thank You❤
Thank You.
How costly it will be to send the same big document to some paid API many-many times asking it to locate the next small snippet and add some context? It will be rediculously expensive
Yeah, sounds very inefficient with quadratic complexity as function of document size.
That gave me an idea, though: What if we first generate PDF summary and then use it as a context, instead of full document?
Anthropic, Gemini, and OpenAI all support prompt/context caching. This can substantially reduce the cost by caching your document
Will this work if I have JSON data instead of text documents? How to work out contextual embedding for JSON chunks?
What is in the json? Can you create flat descriptions from your json and add a reference to the actual json in the metadata?
the whole points of rag, atleast for me, was not having to feed the LLM the whole document. Now this needs to be done for every chunk? Doesn't seem very efficient to me
Prompt caching helps in this case, but I am also not a great fan of putting whole documents into the LLM. Especially since they may still blow up the context size and processing may take a long time.
I'm facing a similar situation, if I could enter the whole document why would I need RAG.
Not mentioning the additional time it needs for indexing, that is the dealbreaker for me
@@moin_uddin You can fit one document or some part of the entire documents, but RAG was made for scanning through thousands of documents right? I do agree that sending the extra text to add context is a tough problem. I am wondering if the contexting wouldn't allready work by adding two chuncks before and two chuncks after or 1 chunck before and one chunck after?
@@rikhendrix261 like we can just add the first few pages of a document even that can help in adding context.
Should you not join a summary/summaries to the prompt, instead of the whole document/section? Would that not be even better in providing the essence of the context?
thanks:)
av you got github link to this code used?
Place in to instructions and thank me later
---
You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside `` tags, and then provide your final response inside `` tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside `` tags. Self reflection is mandatory in every reply unless specifically stated by the user.