2 Methods For Improving Retrieval in RAG

Поділитися
Вставка
  • Опубліковано 7 лют 2025
  • Want to learn more about automating your business with AI?
    cal.com/johann...
    Connect with me on LinkedIn:
    / johannesjolkkonen

КОМЕНТАРІ • 30

  • @CortezLabs
    @CortezLabs 24 дні тому

    Thank you

  • @steveknows6126
    @steveknows6126 Місяць тому +2

    Nice solution. Thanks for sharing.

  • @limjuroy7078
    @limjuroy7078 Місяць тому +2

    A tutorial for this real-world use case is absolutely necessary. It’s highly relevant and applicable to many real-world problems.

  • @sapdalf
    @sapdalf Місяць тому +1

    A very good video showing that following the main trends isn't always profitable. Thanks.

  • @reserseAI
    @reserseAI Місяць тому +1

    I accidentally asked Claude AI through the MCP function to whip up a script for data chunking. Told it to extract specific data and format it into a certain output. Next thing I know, Claude goes ahead and writes a script that pulls metadata first and sends it off to a vector database. A few hours later, your video pops up on my homepage

  • @PierreRibardière-u7x
    @PierreRibardière-u7x Місяць тому +1

    Awesome video! Thank you for sharing your techniques. Extracting use case-specific info and store them in metadata before indexing is a very interesting approach. This might actually be better than regular contextualization of chunks, where you add the info in the content of your chunk instead of the metadata. Will definitely try that out. Thanks!
    Would love to see you talk about agents frameworks in the future! Especially how you could try to make something as good as the Composer Agent from Cursor.

    • @johannesjolkkonen
      @johannesjolkkonen  Місяць тому

      Thanks! Yeah, this kind of metadata-enriching with LLMs can definitely be applied in all kinds of ways. Very versatile.
      Might make a video about LangGraph at some point, it's honestly the only agent framework I've found that could be useful - I tend to look for deterministic workflows as much as possible. Don't know about making something that could touch Composer though, the engineering team at Cursor is pretty nuts😄

  • @AA-rd6nm
    @AA-rd6nm Місяць тому +1

    Excellent findings. Keep continue good work!

  •  Місяць тому +1

    Interesting topic of cause. I would start using LLM for the UX part, NLP etc and then generate SQL content against a database. The content is structured anyways. The result could then polished with a fine tuned model. Complete or partial results could be cached too since we are inside of a specific domain. Outliars could be caught and managed. This would be the benchmark to beat in that case. Running cost included...

  • @mdarafatiqbal
    @mdarafatiqbal Місяць тому

    I have found NotebookLM’s retrieval to be pretty accurate. How would you benchmark your method against it? And how is NotebookLM’s method different?

    • @johannesjolkkonen
      @johannesjolkkonen  Місяць тому

      As far as I know, google hasn't talked about their exact retrieval methods for NoteboookLM anywhere, so it's hard to say. I mean, that's usually the case for any off-the-shelf RAG-apps, they want to keep their secret sauce.
      NbLM does cite the retrieved sources though, so in theory you could manually run a test set of questions, and see how well it fetches the correct documents for each.
      Of course quite tedious to do in practice, as you can't automate any of it, and need to first manually upload all the documents (if I wanted to benchmark for this scenario for example, that would mean over 15 000 documents) and then run the tests.

  • @ZabavTubus
    @ZabavTubus Місяць тому +1

    That's really interesting. So, the LLM got rid of the conjugation problem by mapping all the forms to specific services? Did you also run any tests to find out how large the LLM needs to be for that functionality?

    • @johannesjolkkonen
      @johannesjolkkonen  Місяць тому

      Yep, that's correct. Both in extracting the services, and in structuring the filters, the LLM is instructed to return the un-conjugated / nominative form of the services and cities. So we got 2 birds with 1 stone, both getting the filtering as well as eliminating the conjugation-issues (:
      We tested a couple of OpenAI models of varying sizes. They all did pretty good, but the smaller ones occasionally missed some services in the extraction. So we ended up going with a larger model (4o), which performed very well.
      But for the query rewriting/structuring, I'm pretty sure we concluded that 4o-mini was good enough for that. So as far as conjugation goes, I think smaller models should be able to do it. It was more the service extraction where we saw issues. This was of course specific to Finnish, so your mileage may vary (:

  • @micbab-vg2mu
    @micbab-vg2mu Місяць тому +1

    thanks:)

  • @GayathriG-h5h
    @GayathriG-h5h Місяць тому

    Can you share any notebook?

  • @KS-kf1me
    @KS-kf1me Місяць тому

    A early adopter i was expecting to just drop the documents and get the things sorted by LLM.
    Now after 2 weeks of reasearching populating 3 testing accounts of pinecone with test data, that rather looks more like a filter search in airtable.
    That's just the complete oposite of the "promisse", and an absolute disapointment.

  • @theseedship6147
    @theseedship6147 Місяць тому +1

    I second your approach, a bit strong on the agentics orchestration, that might not fit your use case, but still have plenty other good happy ending ;)

    • @johannesjolkkonen
      @johannesjolkkonen  Місяць тому

      Yeah, definitely. There's just quite a lot of over-enthusiasm about taking an agentic approach wherever possible, so I want to push back on that (:

  • @timfitzgerald8283
    @timfitzgerald8283 Місяць тому +1

    After a day, why only 22 likes???

  • @mtprovasti
    @mtprovasti Місяць тому

    Is the LLM bert?

    • @johannesjolkkonen
      @johannesjolkkonen  Місяць тому

      Nah, GPT4o and -mini

    • @mtprovasti
      @mtprovasti Місяць тому

      @johannesjolkkonen trying to figure out now that modern bert is out at what stage of rag is it applied.

    • @johannesjolkkonen
      @johannesjolkkonen  Місяць тому

      @@mtprovasti Ah, right. BERT is an encoder, meaning it would be used in creating the embeddings for vector search. Here we used OpenAI's ada-02 -encoder for the same purpose. Until we gave up on vector search, that is.
      BERT is a popular choice when you want a fine-tuned embedding-model though, to better capture the semantic similarities/dissimilarities in your specific content (and thus get better retrieval results)

  • @themax2go
    @themax2go Місяць тому

    why not just kg w triples?

    • @johannesjolkkonen
      @johannesjolkkonen  Місяць тому

      Not sure what the benefit would be. What do you think?

    • @themax2go
      @themax2go Місяць тому

      @@johannesjolkkonenfor versatility, specifically being able to get a response on global context. if you don't plan to get sophisticated responses then it'd be wasteful due to being more comp exp. so really depends on use case.

  • @KevinKreger
    @KevinKreger Місяць тому

    Disagree about agentic RAG. It's becoming a common feature. It's not just some grad paper. I don't know why you would say this after presenting a use case.

    • @johannesjolkkonen
      @johannesjolkkonen  Місяць тому +3

      Sure, not saying it doesn't have its place. Just that in my experience, people are too quick to jump on flashy solutions instead of simpler ones that get the job done fine, and in a more robust way.
      Could you achieve similar results with an agentic approach? Maybe, but they typically come with serious trade-offs in latency, cost and unpredictability.
      Appreciate the comment though. One reason why I often speak against agents is also just how poorly the term is defined and over-used. Fine for marketing, but imo not useful when it comes to actually understanding how all this stuff works.