WHY Retrieval Augmented Generation (RAG) is OVERRATED!

Поділитися
Вставка
  • Опубліковано 25 лип 2024
  • Retrieval augmented generation (RAG) is over-hyped. I'll explain why this is the case, having worked on products with RAG as their core functionality.
    Read the blog post that complements this UA-cam content: Comin soon!
    Need to develop some AI? Let's chat: www.brainqub3.com/book-online
    Register your interest in the AI Engineering Take-off course: • Building Chatbots with...
    Hands-on project (build a basic RAG app): www.educative.io/projects/bui...
    Stay updated on AI, Data Science, and Large Language Models by following me on Medium: / johnadeojo
    My article on RAG: / dont-build-llm-apps-be...
    Chapters
    Intro: 00:00
    Hallucinations: 02:59
    Retrieval Complexity: 07:55
    Cost of RAG: 12:27
    Making RAG practical: 20:00
  • Наука та технологія

КОМЕНТАРІ • 45

  • @bilalzahoor5608
    @bilalzahoor5608 3 місяці тому +6

    Very nice video. My experience resonates with what is talked in the video. The weakest link is the chunking which from my point of view is a joke the way it's done now and probably the opportunity is to develop a good AI model that can help solve the chunking issue. thanks

  • @nyx211
    @nyx211 2 місяці тому +4

    Thankfully, I realized the costs for RAG were going to be too expensive during testing. I spent $2.40 after only 5 minutes of testing. I then tried to do it with local open-source models, but since I didn't have a GPU the entire system was horrifically slow. It took about 5 minutes just to generate a single function call (although it did generate perfect JSON every time). It's very frustrating because it seems like it's almost usable, but it's not there yet.

  • @GeobotPY
    @GeobotPY 3 місяці тому +7

    Good vid! Altough I think the title is a bit misleading.Considering the issues you point out are costs and the ability to retrieve different document types I would say there are many solutions. For instance GPT-3.5 is extremely inexpensive and costs for models will just go down, therefore I think this concern is perhaps valid for the best models now, but will be of less concern in the future. In terms of document types and chunking there are several techniques that can be used to improve this, for instance: RAPTOR, semantic and agentic chunking, multihop-rag among other things. For any businesses working with internal data or need a strict context base RAG is needed.
    The one thing that I would say could render RAG less effective is the fact that context lengths are increasing making it increasingly possible to simply input all data into the model all the time. Here Claude has performed well on needle-in-the-haystack experiments, perhaps making RAG less applicable in a lot of contexts.
    Nice to see a different perspective on this though, but in my opinion RAG is definitely not overhyped. I can follow what you are saying about the traditional MVP/POC with basic chunking and cosine sim. search is, but definitely not more advanced techniques.

    • @VulNicuil
      @VulNicuil 3 місяці тому +2

      All great points! Internal document referencing/retrieval does not seem addressed in the video. Combining techniques these techniques + a strict citation pipeline can be very valuable for companies with massive, private corpuses of data.
      Huge context lengths are not as available for companies that want internal RAG models, so I think they will stay... until GPU prices are WAY cheaper, who knows!

    • @Data-Centric
      @Data-Centric  3 місяці тому

      Thanks, you raise some good points. I disagree that GPT-3.5 is extremely inexpensive. In my experience, it's been too expensive to implement in production. However, I will concede that it depends on your use case and budget. I suspect that if you're building internal tools for a large corporate, the budgets are higher vs. building products for startups (lot's of other things to worry about too like pricing etc.). I haven't yet had a chance to look into some of those methods, do you know what the cost impact is?

    • @Data-Centric
      @Data-Centric  3 місяці тому

      I suspect the difference in opinion here is largely driven by use cases. I'm talking more in the context of building products (for startups) rather than internal tools for businesses. There's a lot to consider by the way of pricing, margins etc. when managing costs. There's often a lot more leeway with budgets if you're developing internal tools for a large corp for example.

    • @misterwafflezzz
      @misterwafflezzz 3 місяці тому +1

      GPT-3.5 is honestly horrible in a multilingual setting. It doesn't answer in the correct language and it can't find information in non-English context. It's also very hard to embed non-English text in a meaningful way, although a hybrid search through the knowledge base does help in this regard. GPT-4 is really expensive. Putting the entire knowledge base in the token context of an LLM is also prohibitively expensive.

    • @GeobotPY
      @GeobotPY 3 місяці тому

      @@Data-Centric Had a lot of promising results with agent based chunking. Will try to create some evals for RAPTOR for big context lengths soon - seems promising.

  • @nyx211
    @nyx211 Місяць тому +2

    Quite often, there's a tendency to reinvent the wheel by trying to make LLMs and RAG systems recreate existing solutions. People forget that older algorithms and database systems can still work quite well for a comparatively negligible cost.

  • @gileneusz
    @gileneusz 3 місяці тому +3

    I like your videos, you present more critical and realistic, non overhyped point of view. Would be great if you could make some kind of overview of current AI revolution video.

    • @Data-Centric
      @Data-Centric  3 місяці тому

      Thank you! I wish that I could deliver more success stories. However, I'm optimistic that the technology will improve.

  • @trsd8640
    @trsd8640 3 місяці тому +1

    Very good video! Thank you!

  • @thunken
    @thunken Місяць тому +1

    I thought the purpose of RAG was to create more "signal" versus "noise" with the respect to preparation of context for then inference operations by an LLM
    If so, even despite the need for improvements and reductions in costs, fundamentally, its a good idea?
    And then, what alternatives do we have?
    To give an LLM the best chance at creating a good answer, we should always expect to do some real work and prepare data for it?
    And then even have a verification chain to double check.

  • @jonathanholmes9219
    @jonathanholmes9219 2 місяці тому

    Chunking and overlap is the main issue. I found that creating structured data from unstructured data using the LLM was the answer. SQL queries through LLM works very well. Thanks for the video and sharing your experiences.

  • @googleyoutubechannel8554
    @googleyoutubechannel8554 2 місяці тому

    Brilliant hot take, best on YT, thanks for sharing your experience.

  • @khalifarmili1256
    @khalifarmili1256 3 місяці тому

    thx for the detailed insights

  • @jirikadlec7796
    @jirikadlec7796 3 місяці тому +6

    Just based on first minute a title, it is missleading. It is a very useful and needed technology. Coming from someone working with it in production in big tech. There is no alternative to it. Regarding hallucinations, you can control the data flow to limit any embedded information and prefer the context from RAG. It has flaws, but its not overhyped!

    • @underdogAI
      @underdogAI 3 місяці тому

      prove it

    • @jirikadlec7796
      @jirikadlec7796 3 місяці тому +1

      @@underdogAI Prove what and to who? 🤣

    • @Data-Centric
      @Data-Centric  3 місяці тому +1

      Thanks for the feedback. I think Big Tech really is the differentiator; you have huge budgets. When developing products outside of big tech, budgets are much tighter! I agree that there aren't any viable alternatives right now. Effectively the range of RAG centered products that can be built and run on a small budget is quite small for now.
      The method you mention regarding hallucinations, I don't think I've come across this yet.

    • @andydataguy
      @andydataguy 3 місяці тому

      This guy gets it 🙌🏾

  • @malikrumi1206
    @malikrumi1206 3 місяці тому +1

    With respect to hallucinations, are you saying that over time they become a larger and larger share of all responses, or just that they become more noticeable over time?

    • @Data-Centric
      @Data-Centric  3 місяці тому +2

      I'm saying the incidences increase with scale. If you're working with a prototype, you probably won't encounter them but they will inevitably be there once you scale.

  • @user-du6zo7zp2k
    @user-du6zo7zp2k Місяць тому

    Seems that some of the data might be better off in a different storage accessed by a tool (not RAG vector) . The menu for example, might be better off in a traditional database that the agent can use a tool to query?

  • @vaioslaschos
    @vaioslaschos 2 місяці тому +1

    Watching your videos acts as a reality check for me. I have the same view for most of the issues. Do you have a discord server for your audience?

    • @Data-Centric
      @Data-Centric  Місяць тому

      Thanks for the feedback. I don't yet, might consider it down the line.

  • @filosophik
    @filosophik 3 місяці тому +2

    Wonderfully informative and honest. Thank you for the clarity that your perspective commands.

  • @mrd6869
    @mrd6869 3 місяці тому +2

    Too bad there isnt an automatic self check layer within RAG,where it could score itself or keep itself within a certain accuracy threshold.Like have it catch the inaccuracies before we catch it

    • @GeobotPY
      @GeobotPY 3 місяці тому

      Just adjust the k top chunks based on cosine sim. score?

  • @FloodGold
    @FloodGold 3 місяці тому +3

    I agree with you mostly, BUT I find it interesting that you're using hype to criticize hype.

  • @neopabo
    @neopabo 3 місяці тому +3

    RAG is very useful, you can use it to parse documentation for companies. Then you can pull the information out and put it into jsons or whatever you want to use, to minimize context length

    • @neopabo
      @neopabo 3 місяці тому +1

      Then you can use that for training. I've been working on a prototype for fine-tuning using json outputs. Does a very good job, will be perfect for reinforcement learning

  • @markdkberry
    @markdkberry 3 місяці тому +2

    I thought it was me. my testswith it have been awful. I cant trust its responses at all.

    • @Data-Centric
      @Data-Centric  3 місяці тому

      Not just you. Many, many others too!

  • @andydataguy
    @andydataguy 3 місяці тому +3

    RAG is expensive AF, for now

    • @Data-Centric
      @Data-Centric  3 місяці тому +1

      Yep, it is! But people will find out the hard way.

    • @iukeay
      @iukeay 2 місяці тому +1

      I have a RAG pipeline that is around 680gb of data in the vector store.
      I run everything locally multiple LLMs, and embedding .
      (It is the only way I could afford to do my project)
      When I would say is the learning curve to learn proper and embedding, ranking, and hybrid search is painful . Especially if your data can be tagged with metadata tags.
      Another hard part is learning how to maintain an update and embeddings when data changes.
      Total tokens in the rag is over 250 billion .
      Also I HATE PDFs. (Had to be said)
      Also had to learn to set up VLLm Just due to its memory efficiency over Ollama.
      If you need a cheap way to run inference locally up to 120B models . I highly recommend looking at Nvidia Jetson Omni boards.
      64GB of GPU ram for 2k ( It also only uses 70 watts)

    • @andydataguy
      @andydataguy 2 місяці тому

      @iukeay thanks for sharing man! 🙌🏾
      I'd love your input, have you tested (or can you calculate) the magnitude of difference between h100's/latest for production systems? How much of an improvement can one reasonably expect?
      Wondering if the current enterprise gpu buying spree is substantiated, or just perhaps datacenter dick measuring

  • @FunwithBlender
    @FunwithBlender 3 місяці тому +1

    Feels like Clickbait, or you not making bespoke RAG solutions, RAG is cheap also you can get mso of the benefits from just RA

    • @neopabo
      @neopabo 3 місяці тому +1

      I think he's using Cunningham's Law. (The best way to get the right answer is not to ask a question, its to post the wrong answer)

  • @ljanu
    @ljanu 3 місяці тому

    🎯 Key Takeaways for quick navigation:
    00:00 *🤖 Hledání a generování pomocí retrieval augmented generation (RAG) je momentálně přehypované.*
    - RAG není efektivní pro většinu produkčních použití.
    - Vývoj prototypů pomocí nástrojů jako Lang chain může vytvořit klamný dojem o úspěšnosti RAG.
    03:12 *🧠 Problematika halucinací s RAG.*
    - RAG byl navržen k řešení problému halucinací s velkými jazykovými modely.
    - Někdy se při použití RAG může stále objevovat halucinace kvůli rozporu mezi váhami modelu a získaným kontextem.
    08:02 *💰 Náklady na provoz RAG v produkci.*
    - Implementace RAG může být finančně náročná, zejména při vysokém objemu dotazů.
    - Náklady na hardware a výpočetní výkon mohou být překvapivě vysoké.
    - Neuvážené plánování cen může způsobit finanční problémy pro podniky využívající RAG.
    20:08 *🛠️ Budoucí vylepšení RAG pro praktické použití.*
    - Snížení nákladů na hardware by umožnilo rozšíření využití RAG v produkci.
    - Vylepšení schopností jazykových modelů samotných by mohlo vést ke zlepšení výkonu RAG.
    - Možnost trénovat modely přímo na úkoly RAG by mohla zvýšit jejich efektivitu a odstranit některé problémy, jako jsou halucinace.
    Made with HARPA AI

  • @tonywhite4476
    @tonywhite4476 Місяць тому

    Overrated, overdone, over-priced, and I'm just over it. Now, I just block all RAG tutorials.