The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

Поділитися
Вставка
  • Опубліковано 26 січ 2025

КОМЕНТАРІ •

  • @BinWang-b7f
    @BinWang-b7f 4 місяці тому +182

    Sending my best to the little one in the background!

  • @tvwithtiffani
    @tvwithtiffani 4 місяці тому +150

    For anyone wondering, I did try these methods (contextual retrieval + reranking) with a local model on my laptop. It does work great the rag part but it takes a while to import new documents due to chunking, generating summaries and generating embeddings. Re-ranking on a local model is surprisingly fast and really good with the right model. If you're building an application using rag, I'd suggest you make adding docs the very first step in the on-boarding to your application because you can then do all of the chunking etc in the background. The user might be expecting real-time drag->drop->ask question workflow but it wont work like that unless you're using models in the cloud. Also, remember to chunk, summarize and gen embeddings simultaneously, not one chunk after another as of course that'll take longer for your end-user.

    • @kenchang3456
      @kenchang3456 4 місяці тому +2

      Thanks for the follow-up.

    • @TheShreyas10
      @TheShreyas10 4 місяці тому +7

      Can you share the code if possible

    • @ashwinkumar5223
      @ashwinkumar5223 4 місяці тому +2

      Nice

    • @ashwinkumar5223
      @ashwinkumar5223 4 місяці тому +1

      Will you guide to do the same

    • @tvwithtiffani
      @tvwithtiffani 4 місяці тому

      @@ashwinkumar5223 Unfortunately I cannot share code but I can advise. Just remember that everything runs locally. The language model, the embeddings model (very small compared to llm), the vector db (grows in GBs as you add more docs. This is where the generated embeddings are labeled & stored). A regular db for regular db crud stuff & keeping track of the status of document processing jobs. I went with mongodb because its a simple nosql data store that has libraries and docs for many programming languages. These dbs and models are ideally held in memory, but for resource constrained systems, you may want to orchestrate the loading and unloading of models as needed during your workflow. How would depend on the target platform you're developing for, desktop vs native mobile, vs web. I say all of this to say make sure you have a lot of system RAM and hard drive space. Mongo recently added some support for vectors given the noise around llms lately so there may be a bit of overlap here. But I haven't checked it out. Might not need a vectordb AND mongodb...

  • @zeta_meow_meow
    @zeta_meow_meow 5 днів тому

    LOVE THIS VIDEO !!!!!
    REALLY REALLY NICE.. PLEASE INCLUDE MORE OF SUCH TYPE OF EXPLANATION VIDEOs

  • @tomwawer5714
    @tomwawer5714 4 місяці тому +1

    Thanks very interesting. Many ideas came to my head for improving RAG with enhancing chunk

  • @seanwood
    @seanwood 4 місяці тому +2

    Working with this now and didn’t use the new caching method 😫. Nice to have someone else run through this 🎉😆

  • @megamehdi89
    @megamehdi89 4 місяці тому +45

    Best wishes for the kid in the background

  • @vikramn2190
    @vikramn2190 4 місяці тому +2

    Thanks for the easy to understand explanation

  • @theguy5423
    @theguy5423 18 днів тому

    That's one side of contextualization. The other part is where the user query is related to the previous one. Think "Show me a blue jeans" followed by new query "Show me more"

  • @MatichekYoutube
    @MatichekYoutube 4 місяці тому +9

    do you maybe know what is going on in GPT Assistants - cause they rag is really efficiant - accurate - they have default 800 token chunks and 400 overlap. And it seems to work really well.Perhaps they use somekind of re-ranker also? Maybe you know ..

  • @IAMCFarm
    @IAMCFarm 4 місяці тому +4

    Applying this to local models for large document repos seems like a good combo to increase RAG performance. I wonder how you would optimize for the local environment.

  • @stonedizzleful
    @stonedizzleful 4 місяці тому

    Great video man. Thank you!

  • @SunilM-x9o
    @SunilM-x9o 2 місяці тому +2

    what if the document is so big, that it couldn't fit in the llm context window how do we get the contextual based chunks then.
    if we consider break the document into small segments/documents to implement this approach, won't it lose some context with it

  • @yt-sh
    @yt-sh 3 місяці тому

    really useful article and video!

  • @alexisdamnit9012
    @alexisdamnit9012 3 місяці тому

    Great explanation 🎉

  • @aaronjsolomon
    @aaronjsolomon Місяць тому

    I was looking all over in the house and outside for the source of that sound. I thought the neighborhood cats were having a conference on my porch!

  • @i2c_jason
    @i2c_jason 4 місяці тому +9

    Hasn't structured graphRAG already solved this? Find the structured data using a graph, then navigate it to pull the exact example?

    • @remusomega
      @remusomega 4 місяці тому

      How do you think the Graph gets structured in the first place

    • @faiqkhan7545
      @faiqkhan7545 4 місяці тому +2

      @@remusomega Any Links to read ?

    • @MyBinaryLife
      @MyBinaryLife 3 місяці тому

      @@faiqkhan7545 checkout the microsoft graphrag repo, pretty useful

    • @MyBinaryLife
      @MyBinaryLife 3 місяці тому

      @@faiqkhan7545 check out the microsoft graphrag repository

  • @anubisai
    @anubisai 26 днів тому

    Thought it was my baby, but it was yours in the background 😂😂

  • @loudcloud1499
    @loudcloud1499 4 місяці тому

    very informational visualizations!

  • @AlfredNutile
    @AlfredNutile 4 місяці тому

    Great work!

  • @PeterJung-cx1ib
    @PeterJung-cx1ib 4 місяці тому

    How is the diagram generated/built at 0:48 for RAG embeddings?

  • @limjuroy7078
    @limjuroy7078 4 місяці тому +2

    What happened if the document contains a lot of images like tables, charts, and so on? Can we still chunk the document in a normal way like setting a chunk size?

    • @kai_s1985
      @kai_s1985 4 місяці тому

      You can use vision based rag, he described in his previous video.

    • @limjuroy7078
      @limjuroy7078 4 місяці тому

      @@kai_s1985, so we don't need to chunk our documents if we use vision based RAG? My problem is how are we going to chunk our documents even though the LLM has vision capabilities

    • @kairatsabyrov2031
      @kairatsabyrov2031 4 місяці тому

      @@limjuroy7078 it is very different from the text based rag. But, I think you need to embed images page by page. Look at his video or read the ColPali paper.

    • @awakenwithoutcoffee
      @awakenwithoutcoffee 3 місяці тому

      @@limjuroy7078 no, you would still need to chunk/parse your PDF's into text/tables/extracted images -> store those in 2 separate databases (s3/blob storage for images ) -> embed the images and the text separately -> on Query retrieve the closest images/text from these 2 stores in parallel -> feed to the OCR Model which will analyze the context including texts & image(s).
      There are more ways to use Vision models though: ColPali is one of them that is discussed by the OP in a different video. The approach here is to directly embed each page of a PDF/source as a picture and embed them directly. It's an interesting approach but with several drawbacks that source content isn't extracted/stored/accessible directly for queries/analysis but only at run-time. To get insight in your data you would need an OCR model to process the pages directly.

  • @PedroNihwl
    @PedroNihwl 27 днів тому

    I don't know if I'm implementing 'prompt caching' incorrectly, but in my case, each chunk processing is taking too long (about 20s) with a 30-page file. Due to the processing time, this approach becomes unfeasible.

  • @jackbauer322
    @jackbauer322 4 місяці тому +18

    I think the baby in the background disagrees :p

  • @souvickdas5564
    @souvickdas5564 4 місяці тому +1

    How to generate those context for chunks without having the sufficient information to the LLM regarding the chunk? How they are getting the information about the revenue numbers in that example? If it is extracted from the whole document then it will be painful for llm cost.

    • @zachmccormick5116
      @zachmccormick5116 4 місяці тому +1

      They put the entire document in the prompt for every single chunk. It’s very inefficient indeed.

    • @karthage3637
      @karthage3637 3 місяці тому

      @@zachmccormick5116well it’s not inefficient if you can cache the prompt
      They find a way to push this feature

  • @samuelimanuel7643
    @samuelimanuel7643 3 місяці тому

    I'm still new learning about RAG, but want to ask how would this differ or fit it with graphRAG? I heard GraphRAG are really well connected?

  • @DRMEDAHMED
    @DRMEDAHMED 4 місяці тому +1

    I want to add this as a the default way the rag is handled in open webUI but its conflicting with other stuff, I tried to make a custom pipeline for it but i'm struggling to make it work is it out of the scope of open web UI or am I just not understanding the documentation properly

  • @HarmanSingh-wp8in
    @HarmanSingh-wp8in Місяць тому

    Can someone suggest something for larger documents (above 500 pages)? Normal RAG is not so accurate and cannot use contextual Rag as we need to pass the whole document inside the prompt which exceeds the token limit.

  • @konstantinlozev2272
    @konstantinlozev2272 4 місяці тому

    Losing the context in RAG is a real issue that can destyall usefulness.
    I have read that a combination of the chunks and Graphs is a way to overcome that.
    But have not tested with a use case yet myself.

    • @NLPprompter
      @NLPprompter 4 місяці тому

      I'm interested why graph can be useful for LLM to able retrieve better

    • @konstantinlozev2272
      @konstantinlozev2272 4 місяці тому +1

      @@NLPprompter My understanding is that graphs condense and formalise the context of a piece of text.
      My use case is a database of case law.
      There are some obvious use cases for that when a paragraph cites another paragraph from another case.
      But beyond that I think there is a lot of opportunity is just representing each judgement in a standardised hierarchical format.
      But I am not 100% sure how to put all together from a software engineering perspective.
      And maybe one could use relational database instead of graphs too.🤔

    • @NLPprompter
      @NLPprompter 4 місяці тому +1

      @@konstantinlozev2272 graph indeed is fascinating, maybe I'm not really know what and how it's able related to LLMs, what's makes it interesting is when Grokking state happen and model reach to be able generalize it's training, they tend to create a pattern with their given data, and those pattern are mostly geometric patterns, really fascinating although i tried to understand that paper which i can't comprehend with my little brain.... so i do believe graph rag somehow also have meaning/useful for llm.

    • @konstantinlozev2272
      @konstantinlozev2272 4 місяці тому +1

      @@NLPprompter I guess it will have to be the LLM working with the API of the knowledge graph which function calling

  • @martinsherry
    @martinsherry 4 місяці тому

    V helpful explanation.

  • @janalgos
    @janalgos 4 місяці тому +2

    how does it compare to hybridRAG?

  • @RedCloudServices
    @RedCloudServices 4 місяці тому

    Do you think Visual LLMs like ColPali provide accurate context and results than traditional RAG using text-based LLMs?

  • @LatifAmars
    @LatifAmars 4 місяці тому

    What tool did you use to record the video?

  • @wwkk4964
    @wwkk4964 4 місяці тому +4

    🎉baby voices were cute!

  • @steve-g3j6b
    @steve-g3j6b 4 місяці тому

    @Prompt Engineering didnt find a clear answer for my question, so I ask you. as a screenplay writer what do you think is the best model for me? gpt has very short memory. (not enough token memory)

    • @kees6
      @kees6 4 місяці тому

      Gemini?

    • @steve-g3j6b
      @steve-g3j6b 4 місяці тому

      @@kees6 why?

    • @lollots82
      @lollots82 4 місяці тому

      ​@@steve-g3j6bhas had 1M token window for a while

  • @ashutoshdongare5370
    @ashutoshdongare5370 3 місяці тому

    How this compare with Graph Hybrid RAG ?

  • @CryptoMaN_Rahul
    @CryptoMaN_Rahul 4 місяці тому

    I'm working on AI POWERED PREVIOUS YEAR QUESTIONS ANALYSIS SYSTEM WHICH WILL ANALYZE TRENDS AND SUMMARY OF PREVIOUS 5-10 YEARS PAPER AND WILL GIVE A DETAILED REPORT OF IMPORTANT TOPICS ETC.. can you tell what should be the approach to implement this ?

  • @VerdonTrigance
    @VerdonTrigance 4 місяці тому

    How did they put a whole doc into prompt?

    • @vaioslaschos
      @vaioslaschos 4 місяці тому

      most commercial LLm have a window of 120k or more. But even if this not the case, you can just take much bigger chunks as context.

  • @VuNguyen-n2n1o
    @VuNguyen-n2n1o 2 місяці тому

    hello, how this compared to Microsoft GraphRAG?

  • @udaym4204
    @udaym4204 4 місяці тому

    does Multi-Vector Retriever Worth It?

  • @SonGoku-pc7jl
    @SonGoku-pc7jl 4 місяці тому +1

    thanks!

  • @nealdalton4696
    @nealdalton4696 4 місяці тому

    Are you adding this to localGPT?

  • @MrGnolem
    @MrGnolem 4 місяці тому

    Isn't this what llama index has been doing for over a year now?

  • @andrew-does-marketing
    @andrew-does-marketing 4 місяці тому +1

    Do you do contract work? I’m looking to get something like this created.

    • @engineerprompt
      @engineerprompt  4 місяці тому +1

      Yes, you can contact me. Email is in the video description.

  • @robrita
    @robrita 4 місяці тому +15

    can hear baby in the background 👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶

  • @HawkFranklinResearch
    @HawkFranklinResearch 4 місяці тому

    Contextual retrieval just seems equivalent to GraphRag (by Microsoft) that indexes knowlegde context wise

  • @DayLearningIT-hz5kj
    @DayLearningIT-hz5kj 4 місяці тому

    Love the Baby ❤️ good father !

  • @ibrahimaba8966
    @ibrahimaba8966 4 місяці тому

    This is the best way to sell their features: prompt caching 😁.

  • @loicbaconnier9150
    @loicbaconnier9150 4 місяці тому +4

    I you want to make, the embedding, bm25 and reranker , just use Colbert it's more efficient...

    • @the42nd
      @the42nd 4 місяці тому +1

      True, but he does mention colbert at 09:45

    • @engineerprompt
      @engineerprompt  4 місяці тому +2

      ColBERT is great but there are two major issues with it currently, which hopefully will be addressed soon by the community.
      1. Most of the current vectorstores lack support for it. I think qdrant has added the support. Vespa is another one but the mostly used ones still need to add that support.
      2. The size and storage needs is another big issue with colbert. Quantization can help but I haven't seen much work on it yet.

    • @loicbaconnier9150
      @loicbaconnier9150 4 місяці тому +1

      It’s very quick indexing documents, i use it as another retreiver in llamaindex. I create several index with it to improve or check retrieved chunks
      But you are right, best option to keep index is Qdrant.

    • @PriyanshuSingh-sd2dc
      @PriyanshuSingh-sd2dc 11 днів тому

      But in my structured document colpali some times missed the whole relevance pices but normal rag very rarely fails and keep in mind i am comparing vision and non vision based rag

  • @crashandersen602
    @crashandersen602 4 місяці тому

    So easy a baby could do it. Don't believe us? We have one following along in this lesson!

  • @MrAhsan99
    @MrAhsan99 4 місяці тому

    You can name the little one "Ahsan" just in case, if you are looking for the names.

  • @NLPprompter
    @NLPprompter 4 місяці тому

    so we are going to have chunking model, embedding model, graph model, and conversation model... and they can work within program called by lines of codes, or... they can work freely fuzzyly in agentic way...
    i imagine a UI of game dev, drag and drop pdf to them, they will busy working on that file running around like cute little employee, and when done user can click a pc item then it will.... ah nevermind that would be waste of VRAM

  • @micbab-vg2mu
    @micbab-vg2mu 4 місяці тому

    interesting :)

  • @isaacking4555
    @isaacking4555 4 місяці тому

    The baby in the background 🤣

  • @marc-io
    @marc-io 4 місяці тому +2

    so nothing new really

  • @jensg8547
    @jensg8547 3 місяці тому +1

    Vector embedding solutions for retrieval are doomed as soon as SLMs get cheap and fast enough. Why relying on cosine similarity when you can instead query a llm over all search data at inference time?!

  • @finalfan321
    @finalfan321 4 місяці тому +1

    you sound tired but i thin i know why ;)

  • @LukePuplett
    @LukePuplett 4 місяці тому

    I was so astonished by how obviously terrible the original "dumb chunking" approach is that I couldn't watch the video.

  • @karansingh-fk4gh
    @karansingh-fk4gh 3 місяці тому

    Your voice is very low. So difficult to understand entire things

  • @cherepanovilya
    @cherepanovilya 4 місяці тому

    old news))

  • @yurijmikhassiak7342
    @yurijmikhassiak7342 4 місяці тому +1

    WHY NOT TO DO SMARK CHANKING ON CONTENT. LIKE WHEN NEW TOPIC STARTS? NEW SENTENCE, ETC? YOU WILL USE FAST LLM TO GENERATE CHANKS. THERE WILL BE LESS NEED FOR OVERLAP.

    • @Autovetus
      @Autovetus 4 місяці тому

      Chill , dude... Sheesh 🙄

  • @hayho4614
    @hayho4614 4 місяці тому

    maybe speaking with a bit more energy would keep me more engaged

  • @snapman218
    @snapman218 4 місяці тому +1

    Good information, but having a child crying in the background is unprofessional. Of course now everyone will say I hate children, but I don’t care. I’m sick of unprofessional behavior.

    • @kerbberbs
      @kerbberbs 4 місяці тому +2

      Its youtube dawg, nobody cares. Just watch the overlengthed vid and move on. Most people here only came for 2 mins of what's actually important

    • @ogoldberg
      @ogoldberg 4 місяці тому +3

      Rude thing to say, and ridiculous. You are the one who is unprofessional.

    • @tombelfort1618
      @tombelfort1618 4 місяці тому +2

      Entitled much? How much did you pay him for his time again?

  • @ZukunftTrieben
    @ZukunftTrieben 4 місяці тому +2

    00:30