How RAG Turns AI Chatbots Into Something Practical

Поділитися
Вставка
  • Опубліковано 12 вер 2024
  • Check out ThinkBuddy using the code "BYCLOUD" in the link: thinkbuddy.ai/ltd to get your discount!
    my newsletter: mail.bycloud.ai/
    Retrieval augmented generation, a current popular method to utilize LLMs to retrieve from a database instead of putting everything in a context window. But how does it work? Today I will walk through the most basic idea of RAG and the current meta of how RAG is used, and what it is composed of.
    some papers
    [Web + RAG] arxiv.org/abs/...
    [Vector + KG RAG] arxiv.org/abs/...
    [RAG Survey] arxiv.org/abs/...
    [Knowledge Graph for RAG] docs.llamainde...
    [LlamaIndex] www.llamaindex...
    [LlamaParse] docs.llamainde...
    [HuggingFace] huggingface.co...
    [Cohere Command R+] docs.cohere.co...
    [Cohere Rerank] docs.cohere.co...
    [Cohere Embedding Models] cohere.com/blo...
    [GraphRAG] github.com/mic...
    [RAGAS] github.com/exp...
    This video is supported by the kind Patrons & UA-cam Members:
    🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Owen Ingraham, Tanaro, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Penumbraa, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth, Thipok Tham, Clayton Ford
    [Discord] / discord
    [Twitter] / bycloudai
    [Patreon] / bycloud
    [Music] massobeats - glisten
    [Profile & Banner Art] / pygm7
    [Video Editor] ‪@Askejm‬

КОМЕНТАРІ • 95

  • @bycloudAI
    @bycloudAI  19 днів тому +47

    Check out ThinkBuddy using the code "BYCLOUD" in the link: thinkbuddy.ai/ltd to get your discount!

  • @HanzDavid96
    @HanzDavid96 19 днів тому +51

    I dont think LLMs are going to replace RAG. RAGs are going to be a long term solution for context retrieval. Even if the context window is large enough for processing thousends of books, it would still be expensive and the LLM looses precission with growing input. The LLM should be able to focus on its specific task and not be overlorded with lots of expensive context. I wouldn't also call it a "hacky" way, its just another type of database.

    • @LeonidasRaghav
      @LeonidasRaghav 19 днів тому +1

      maybe the context window is like the cache and RAG stuff is the DB on top just like how so many other systems in CS work

    • @Askejm
      @Askejm 16 днів тому

      Well the end goal would be to get rid of RAG and do it natively. That's the long term solution. All the things you mentioned are presumptions based on the limitations of current transformer architectures. Lost in the middle has already been pretty much solved, and if we find an architecture that scales linearly or at least subquadratically, then big context wouldn't be so expensive either. This is what all the mamba hype was about

    • @HanzDavid96
      @HanzDavid96 16 днів тому

      @@Askejm Your brain does not do it. I need to think pretty long sometimes to remember stuff. That means there is a multi agentic framework running in my brain searching for information. I dont have more than 500 tokens in my short-term-context when I am talking, so current llms are already surpassing that part. Maybe IF it scales well there might be a chance to use one LLM for it. But for solving tasks you still want to use smaller context windows cause the context would be full of unnecessary stuff otherwise. I think there is a high chance that would reduce the llms task solving ability.

    • @Askejm
      @Askejm 16 днів тому

      @@HanzDavid96 Your brain is not comparable to a computer, though. Your brain runs at like, 100-200 Hz, and comparing that to a H100 is just not possible. That is fundamentally different. Also, with a good architecture the AI wouldn't care if its context is cluttered. That's a human presumption. It would just be able to use the information it needs, which they already do in some capacity. I definitely think the next paradigm shift in LLMs will make RAG largely obsolete, but there's no telling when it will happen.

    • @HanzDavid96
      @HanzDavid96 16 днів тому

      @@Askejm We will see, you could be right but I dont know where the practical possible limits are. :)

  • @nicejungle
    @nicejungle 19 днів тому +48

    Couldn't agree more on usefulness of chatbots.
    RAG is awesome features.
    But with the growing size of context window for recent LLM (Mistral-NeMo has a window of 128k tokens for example), RAG isn't that useful now.
    It greatly depends of the size of your knowledge database

    • @bycloudAI
      @bycloudAI  19 днів тому +15

      there's actually a new paper about using both RAG and context window, by routing queries to RAG or long context depending on self-reflection, p cool u can check it out
      arxiv.org/abs/2407.16833

    • @macchiato_1881
      @macchiato_1881 19 днів тому +19

      No. RAG is still very needed for LLMs to produce practical output for some practical use cases. 128k context window is much SMALLER than you think. Sure, it can fit around a 50-150 page document depending on the content, but some use cases easily surpass that 128k context window.
      And I wouldn't just jam every context information I have inside the context window, even though for some cases it can easily fit all of them. Pre-filtering context information to pass unto the LLM is a great practice in general so that the model can have very relatively sharp and concise results.
      You have to remember that every token you pass in the context window actually has an affect on the output's probability distribution. Having a pretty well defined and cookie-cutter path for the desired text output's probability distribution is very, very important.

    • @claybford
      @claybford 17 днів тому

      @@macchiato_1881 100%. Quality goes down in new tokens the further down the context window you are. Also LLMs have the similar memory issues as humans, where they remember the beginning and end the best, with gaps in the middle. Maybe the big context window will still work for needle-in-a-haystack, but for quality new tokens e.g. code? Not so much.

    • @Yobs2K
      @Yobs2K 15 днів тому

      I think RAG could be used to get rid of hallucinations. Like having all of the Wikipedia (just for example, probably not the best source of information) pages inside of a knowledge base and giving factual answers only after getting documents from the base. There is no way you could fit the whole wikipedia into any model's context.

  • @Steamrick
    @Steamrick 19 днів тому +108

    Lifetime for $130? Your sponsor is banking on LLMs getting cheaper real hard.

    • @Askejm
      @Askejm 19 днів тому +8

      loss leader, probably

    • @jaydeep-p
      @jaydeep-p 19 днів тому +18

      or they are just lying

    • @Niiwastaken
      @Niiwastaken 19 днів тому +12

      If you have a basic understanding then you would know thats not a foolish thing to bank on

    • @pigeon_official
      @pigeon_official 19 днів тому +12

      LLMs are already so cheap its basically free like a few pennies per million tokens for most open source models and even the best of the best proprietary models cost like 3$ per million tokens

    • @mmmm768
      @mmmm768 19 днів тому

      google Groq

  • @Laszer271
    @Laszer271 18 днів тому +3

    LLMs are like cars, if it stands in the middle of the deep forest we can point at it and laugh at how it's stupid and how it's better to just walk through the forest.
    RAG and tools (as in tool-calling for llms) are the infrastructure comparable to roads. Many people don't realize that once the "car" gets on the proper "road", it is all of sudden very efficient at what it does.
    We don't faster cars (e.g. GPT-5), infrastructure is all we need right now.

  • @kylebroflovski6382
    @kylebroflovski6382 19 днів тому +29

    Long time no see bycloud

  • @SperkSan
    @SperkSan 19 днів тому +10

    Your thumbnail reminds me of The Code Report

    • @mine.moment
      @mine.moment 18 днів тому +8

      Because he is literally copying Fireship ? Like, that's his whole gimmick.

  • @Neomadra
    @Neomadra 19 днів тому +10

    Ngl, this lifetime access deal is sus af

    • @mine.moment
      @mine.moment 18 днів тому

      It's obviously a lie. They can shut down at any moment, take your money and run away and there's nothing you can do about it.

  • @juriaan5786
    @juriaan5786 19 днів тому +50

    at this point we are creating diffrent parts of a brain. this is littrally how our brain works. amazing! keep up the content you now alot about the topic and i can really find out what is the latest hot news

    • @sgttomas
      @sgttomas 19 днів тому

      tree structure of dendrites?

    • @xomiachuna
      @xomiachuna 19 днів тому +2

      But is it though? The conceptual model might be similar, but the model is not the brain. I'm pretty sure that actual brain memory works in a complicated way (i.e. short term and long term, forgetting, transfer between the memory types, the actual retrieval is unlikely to be just vector lookup and so on).

    • @w花b
      @w花b 19 днів тому

      Wrong

    • @vidal9747
      @vidal9747 19 днів тому +3

      ​@@xomiachunaThe actual brain is much weirder. Every nerve in the path from a limb to the brain does some data processing. Every transfer of information from one part to another interferes into everything in the middle.

    • @TheUniverseWatchesYou
      @TheUniverseWatchesYou 18 днів тому +1

      It's not how our brain works.
      For "sampling" large scale information, the brain has all kinds of complicated electromagnetic synchronisation patterns (alpha/beta/gamma/... waves, which may also be influenced by the form of the skull!), which are much different from how a semiconductor works.

  • @vladimirtchuiev2218
    @vladimirtchuiev2218 16 днів тому

    The "what is this pokemon" of a transformer is brilliant, saved for future references.

  • @marshallodom1388
    @marshallodom1388 18 днів тому +1

    I wish I could use AI to do something productive with out having to learn rocket surgery. This sounds interesting but way beyond a layman's understanding of chat AIs.

  • @kocokan
    @kocokan 19 днів тому +1

    Describing all these AI news and papers for casual mortals takes significant efforts

  • @l.halawani
    @l.halawani 18 днів тому

    You turn AI science into proper entertainment.
    Couldn't be more digestible! Thanks for that!

  • @RyanClark-gr9yb
    @RyanClark-gr9yb 17 днів тому

    Esstally RAG helps LLMs have data access to whatever contextualized information you have at hand, and helps it bring more meaningful data out of it cheaper and faster.

  • @archamondearchenwold8084
    @archamondearchenwold8084 19 днів тому +4

    what about llamacpp for RAG?

  • @st.dantry5051
    @st.dantry5051 19 днів тому +2

    This channel is a circejerk for people who already know enough about the topic to not need these videos. I'm a reasonably smart layman with an interest in AI and could learn nothing from this video. Too much jargon. It's the reason why this channel has so few subscribers. It should have millions, but the information is not packaged into easily understandable bits.

  • @user-hk2gk7pj8h
    @user-hk2gk7pj8h 10 днів тому

    7:47 "what sets STINK BUDDY apart..."

  • @ghaith2580
    @ghaith2580 19 днів тому

    Thank you for the high quality information , you have no idea how much headache and time you saved me

  • @KoZM7
    @KoZM7 18 днів тому

    Can you share the link for the Knowledge-Nexus RAG example you used for GraphRAG?

  • @aykutakguen3498
    @aykutakguen3498 19 днів тому

    I think the future of llms will be a system where an llm primarily traverses a rag like systems and uses thag to build rules, it keeps building rules until all requirements are fullfilled and then sends back the result. Models like these would be poor ceratives but insanly factual and logistically strong

  • @StephenRayner
    @StephenRayner 19 днів тому +1

    You didn’t talk about the best chuck size / overlap

    • @bycloudAI
      @bycloudAI  19 днів тому

      I coulddd and there are so much more I wanted to talk about too, but I think it'll get too long for a "conceptual" video about RAG...
      thanks for letting me know tho

  • @jimwagner7817
    @jimwagner7817 18 днів тому

    love your content, but the current meta is deeper, we're all beyond llama index

  • @andydataguy
    @andydataguy 19 днів тому

    Its so simple! Thanks for breaking it down like this 💜

  • @rail_hail6625
    @rail_hail6625 19 днів тому

    Thanks I needed this video.

  • @EntropyOnTheCrosshair
    @EntropyOnTheCrosshair 17 днів тому

    Is this realistic fireship?

  • @WolfeByteLabs
    @WolfeByteLabs 18 днів тому

    Paying Lifetime for An AI service in 2024 based on gpt-4 is wild.

  • @ujjwalmishramusic
    @ujjwalmishramusic 18 днів тому +1

    could you please share the simulator used at 2:30?

    • @Parallaxxx28
      @Parallaxxx28 День тому

      Yes it looks like a very good data visualisation tool to understand tensors

  • @clapclapapp
    @clapclapapp 18 днів тому

    So, indexing is what search engines do, right?

  • @dan2800
    @dan2800 17 днів тому

    Tldr: How to fix LLMs
    Add another 5 application specific specially tuned models

  • @pauljones9150
    @pauljones9150 19 днів тому

    Love you buddy

  • @msnp1
    @msnp1 19 днів тому

    Great content 💯

  • @fluffsquirrel
    @fluffsquirrel 19 днів тому

    What do you think about AnythingLLM?

  • @benshulz4179
    @benshulz4179 19 днів тому +27

    so LLMs will rewrite complex practical questions into simplified, general form... Losing the entire complex question in the process...
    Seriously, what's the practical use case of this? AI already just spouts nonsense about generalist topics, this just removes the ability to ask specialized questions.

    • @liuandy_
      @liuandy_ 19 днів тому +12

      Seems like you missed a ton of the video? The rewriting stage is just potential adjustment that can be added in one stage (the indexing stage) that would only be introduced if the person making the model wanted it. As one example, maybe you know your application is served to users that don’t phrase questions well.
      The overall point of RAG is to help the LLM to be LESS of a generalist by introducing contextual knowledge to draw on. Optimizing the indexing stage is just one potential way to improve this specialization

    • @Yobs2K
      @Yobs2K 15 днів тому

      As I understand, simplified form of a question would be used just in a document searching stage. In the end, LLM will have both retrieved (by using simplified query) context and the question in an original form. But that simplified question may just work better with vector search in knowledge base.

  • @TegraZero
    @TegraZero 18 днів тому

    Video went from RAGs to Riches

  • @StephenRayner
    @StephenRayner 19 днів тому

    Excellent 🎉

  • @julsezer651
    @julsezer651 19 днів тому +2

    Hugee video

  • @nomadshiba
    @nomadshiba 19 днів тому +1

    so this is what ruined the AI and made it act like a bot rather than AI

  • @blocko9701
    @blocko9701 19 днів тому +3

    gyatt

  • @Ruhgtfo
    @Ruhgtfo 17 днів тому

    What are Rag?

  • @murromanden
    @murromanden 19 днів тому

    noticed some danish text in the video, are you danish?

    • @Askejm
      @Askejm 19 днів тому

      editor is

  • @KeinNiemand
    @KeinNiemand 15 днів тому

    RAG is useless without some kind of data, which I don't have.

  • @winterknight1159
    @winterknight1159 19 днів тому

    It’s funny, I just got fed up with coding on my research which is in optimizing RAG lol. I must say hugginface’s retriever which is BERT based is great, but for the LLM I am using Mistral-7B. Combining these components together and doing end-to-end fine tuning is a challenge! I wish there proper containers which would just fit in. Although in my case I am creating k prompts to pass through mistral and then marginalizing the output over the documents. This sort of way to backdrop is somewhat contrived but it seems to work haha. Let’s see where my shit goes. Kinda lost right now

  • @WaffleSSSSSPLUS
    @WaffleSSSSSPLUS 19 днів тому

    too bad AI still cant work properly with microsoft project

  • @unused-account-a
    @unused-account-a 19 днів тому

    rah