How to Make RAG Chatbots FAST

Поділитися
Вставка
  • Опубліковано 11 вер 2024

КОМЕНТАРІ • 72

  • @WinsonDabbles
    @WinsonDabbles Рік тому +33

    I really appreciate that all if not most of your collabs dont use langchain at all. Really like to see what goes on under the hood to learn from a first principals perspective.

  • @hughesadam87
    @hughesadam87 10 місяців тому +1

    These videos are such a high quality collection of content for app developers in the AI space who are building apps and not AI experts (nor really care about the AI itself, just wanting to use it)

  • @xflory26x
    @xflory26x Рік тому +2

    Been anticipating this video since seeing the notebook on your github! Thank you so much for your detailed explanations! Would be keen to see your implementations of NeMo Guardrail's moderation pipelines :)

  • @chrismcdannel3908
    @chrismcdannel3908 11 місяців тому +1

    Great dissection on the "wrapper" visualization to simplify the relationship between the Agent and the Model. I'm going to borrow it, with backlinks of course.
    Oh, and thanks for the composure in your thumbnails my guy. It's nice to see some professionalism getting the merit it deserves instead of some assclown with his jaw on the floor and pulling his hair up like a chimp on drugs. Classy AF bro. Keep up the good work.

  • @realCleanK
    @realCleanK 6 місяців тому

    Really appreciate you puttin this together 🙏

  • @AlgoTradingX
    @AlgoTradingX Рік тому +1

    You glowed up like crazy in your video content ! It so cool !!!!

    • @jamesbriggs
      @jamesbriggs  Рік тому

      thanks Sajid - means a lot coming from you :)

  • @cmars7845
    @cmars7845 Рік тому

    Thanks for the intro to NeMo Guardrails! I kept expecting you to say tools like ... 'google' ... but you seemed to pause and then not say it 😂

  • @drwho8576
    @drwho8576 Рік тому +2

    Excellent video as always. Thanks for sharing. Is there a way to setup Colang for an "anything but" scenario? So far, I only seem to be able to program what to detect for a workflow. But can I setup a 'default deny' type thing? Anything different than the topic my bot is designed to handle returns an "I'm sorry, Dave. I'm afraid I can't do that"...

  • @joshualee6559
    @joshualee6559 11 місяців тому +1

    I want to build this for my research lab so that we can query information about our protocols, standards, etc. This seems really useful.
    I presume it wouldn't be that hard to then embed it into a slack chatbot?

  • @yarikbratashchuk3386
    @yarikbratashchuk3386 11 місяців тому +1

    Would this approach work if vecorized data was shop inventory? And the question was like how many items fo you have? Or about specifics about a group of items?

  • @uctran1169
    @uctran1169 Рік тому +1

    Can you make a video tutorial on creating data from wikipedia?

  • @ylazerson
    @ylazerson Рік тому +1

    Great video as always!

  • @pavellegkodymov4295
    @pavellegkodymov4295 Рік тому

    Thanks, James, very very useful. Will try to include guardrails into our corporate RAG chat bot.

  • @shortthrow434
    @shortthrow434 Рік тому +1

    Excellent thank you James.

  • @shaheerzaman620
    @shaheerzaman620 Рік тому

    awesome video James!

  • @sandorkonya
    @sandorkonya Рік тому +1

    Thank you for the super video. I wonder how can we do chain-of-thoughts (COT) or tree-of-thoughts with Guardrails without langchain?

  • @RichardBurgmann
    @RichardBurgmann 11 місяців тому +1

    Hi James, Enjoying your series greatly. Question or suggestion for a future video, I've been seeing a lot of articles on the use of graph data structures to build knowledge graphs to address issues such as hallucinations and weaknesses in logical reasoning in LLMs'. I've only found one person who has actually done this and they had mixed results as far as addressing these issues. Wondering what your experience has been in this area? Do you have an opinion? From what I can see there is not much evidence (yet) that it is a better result than well crafted semantic search.

    • @jamesbriggs
      @jamesbriggs  11 місяців тому +2

      I never tried myself, but everyone I know who tried said it was hard to do and the results were either the same as or worse than using vector search - so haven't had much reason to look into it
      Maybe at some point if I see it useful for a particular use-case, and it makes sense to use it given trade-offs, I'll try it out

  • @mobime6682
    @mobime6682 7 місяців тому

    Great show thank you. Question -t seems awfully similar to your (more recent?) videos about semantic router , or have I got the wrong end of the stick. I know I should do a similarity search on the text for each I guess 😉! Thanks again.

  • @unperfectbryce
    @unperfectbryce 10 місяців тому +1

    can't you just do knn with your embeddings to make sure query isn't out of distribution, isn't this a pretty quick euclidean distance operation? why bother with guardrails? Thanks for the great video! keep it up.

    • @VikasChaudhary-x1y
      @VikasChaudhary-x1y 2 місяці тому

      1.Not all queries are straightforward. Complex queries might need more nuanced understanding and contextual analysis which KNN might not handle well
      2.Guardrails can adapt to new rules and policies quickly, while KNN models might need retraining with new data
      3.Guardrails can provide more interpretable reasons for why a query is out-of-distribution or not appropriate, aiding in better understanding and transparency
      however using both of these together might be more robust.

  • @aravindudupa957
    @aravindudupa957 Рік тому +1

    What is the difference in accuracy between reasoning (whether to retrieve) using embeddeding similarity vs giving it to an llm?

  • @RichardHamnett
    @RichardHamnett Рік тому

    Brilliant mate, also don't forget this could be a massive cost optimizer along with speed :)

  • @kaustubhnegi1838
    @kaustubhnegi1838 3 місяці тому

    🎯 Key points for quick navigation:
    00:00 *🔍 Introduction to retrieval augmented generation with guardrails for building chatbots.*
    00:27 *📂 Utilizing vector database (Pinecone), embedding model (RoBERTa), and documents for retrieval.*
    00:54 *🕸️ Two traditional approaches to RAG: naive approach and agent approach.*
    02:25 *⌛ Agent approach is slower but potentially more powerful with multiple thoughts and external tools.*
    05:23 *🛡️ Guardrails approach: Directly embedding query, checking similarity with defined guardrails, and triggering retrieval tool if needed.*
    07:42 *🧩 Guardrails approach combines query and retrieved context, then passes to language model for answer generation.*
    08:23 *⚡ Guardrails approach is significantly faster than agent approach while still allowing tool usage.*
    09:03 *📋 Step-by-step implementation details, including data indexing, embedding, and Pinecone setup.*
    13:12 *🔄 Defining retrieve and RAG functions as guard actions for guardrails.*
    14:46 *🚫 Guardrails config to avoid talking about politics.*
    15:15 *🤖 Defining guardrail for user asking about LLMs to trigger RAG pipeline.*
    17:10 *🔥 Demonstrating RAG pipeline via guardrails, showing its effectiveness in answering LLM-related queries.*
    18:04 *🆚 Comparing guardrails without RAG, which lacks information for LLM-related queries.*
    19:55 *💡 Guardrails approach allows agent-like tool usage without slow initial LM call, making it faster for triggered tools.*

  • @user-jj2mo5sl7p
    @user-jj2mo5sl7p Рік тому

    useful work!

  • @fabianaltendorfer11
    @fabianaltendorfer11 11 місяців тому

    great video! any Idea how to deal with screenshots in the documents?

  • @elrecreoadan878
    @elrecreoadan878 11 місяців тому

    When should one opt to RAG, fine tune or just a Botpress knowledge base linked to chatgpt? Thank you !

  • @OlivierEble
    @OlivierEble Рік тому +1

    I want to start using RAG but I want something fully local. What could be an alternative to pinecone?

    • @rabomeister
      @rabomeister Рік тому

      Except pinecone, almost all of the vectorstores are opensource. Also, I dont know about pinecone since its not free, but others are mostly similar. I use chromadb for my personal projects since I started working on LLMs recently and it is very user friendly. You will handle it, the problematic part is data.

    • @satyamwarghat1305
      @satyamwarghat1305 Рік тому +1

      Use Deeplake I have been using it for my projects it is pretty good

    • @jamesbriggs
      @jamesbriggs  Рік тому +2

      Yeah if you want fully local there are open source alternatives like qdrant or weaviate - for the comment above, Pinecone is free, they have the free/standard tier :)

    • @drwho8576
      @drwho8576 Рік тому

      Using pgvector here, directly on top of good-ol Postgres. Works like a charm.

  • @ThangTran-rj8gt
    @ThangTran-rj8gt Рік тому

    Hey! I am researching the topic of answering questions from an open-domain, so how can I get data from that domain? Thank you

  • @andriusem
    @andriusem Рік тому +1

    This what I was searching for! Thanks James your videos are very informative and easy to follow with google colab! My question would be: Can we use extracted information from vector db to analyse by LLM and provide insights or compare different documents using guardraills or Agent? Thanks, keep a great work!

    • @jamesbriggs
      @jamesbriggs  Рік тому +1

      It depends on what you’re comparing, but I see no reason as to why it couldn’t work! You can select an existing doc at random, perform a semantic search for similar docs and feed them into your LLM with instructions on what you’re comparing - there may be other ways of doing it too - I hope that helps!

  • @guanjwcn
    @guanjwcn Рік тому +1

    Thanks very much for the sharing, James. May I seek your advice on how I can estimate infrastructure requirements eg number of GPUs, assuming I need to host an open source model with size of 70B on premise and the number of concurrent users being 1000 at most? Thank you very much.

    • @jamesbriggs
      @jamesbriggs  Рік тому +2

      You can calculate number of parameters * bytes required for data type of each parameter - people do keep asking about this so I think I can go into more detail in a future video

    • @reknine
      @reknine 11 місяців тому

      Would really appreciate that!@@jamesbriggs

  • @rabomeister
    @rabomeister Рік тому

    What do you think about the accuracy and other related metrics while using guardrails? It really sounds nice, but if you use LLMs on fields with high risks (finance), does it promise accuracy also, at least similar to standard approaches? Great videos by the way, I guess i implemented almost all of them. And always nice to learn from a professional.

    • @rabomeister
      @rabomeister Рік тому +1

      Also, (if you are ok with that since you also work for a company), if you can make a video about the hardware side of the LLMs and DBs, that would be great. Because at some point, there are enough information about coding and software (of course, not enough yet, but one can implement something somehow), but hardware side really requires theoretical knowledge. I dont want to just check the tables and go buy some NVIDIA GPU, I want to know why. Thanks in advance.

    • @jamesbriggs
      @jamesbriggs  Рік тому +2

      It’s hard to guarantee accuracy, LLM and the broader field of NLP is generally non-deterministic so there’s always that level of randomness, I’m still figuring out the best way of dealing with it myself - we try to add metrics, or extra LLM analysis steps (like asking “is this answer using information from these sources…”) - but it’s a difficult problem
      I like the GPU hardware idea, would love to jump into it

    • @aravindudupa957
      @aravindudupa957 Рік тому

      @James Are there any good "deterministic" ways to check the accuracy of information (by going through the reply and checking for eg) in the reply to that in the context? I've heard of Self-check GPT which takes multiple iterations but it's not 'deterministic'. It would be great to have such a technique!

    • @chrismcdannel3908
      @chrismcdannel3908 11 місяців тому

      @@rabomeister outside of highly specialized & sensitive use cases requiring procurement of a commercial grade GPU or TPU, and the talent & skill to use it effectively in a business process, there is no real advantage in spending $15-$20K or more on the HW unless you just have the insatiable desire & urge to do it for the hell of it and because you want to have your own and that's ok too my friend. Unfortunately the cloud giants, have structured the market in a way that makes getting compute from them is still more economically prudent than buying even 1 of the ASICs they have hundreds of thousands or millions of.

  • @georgekokkinakis7288
    @georgekokkinakis7288 Рік тому +1

    Very informative video. Thanks. Is there any chance that you know any open sourced LLM that supports the Greek language for retrieval augmented generation?

    • @jamesbriggs
      @jamesbriggs  Рік тому +1

      cohere have a multilingual embedding model - it probably covers Greek, there will also be multilingual sentence transformers you can use too :)

    • @georgekokkinakis7288
      @georgekokkinakis7288 Рік тому

      Thanks for your response @jamesbriggs . For the embedding part I have found a multilingual model which does an excellent job in retrieving the document which is more relevant to the placed question. What I can not find is an open sourced LLM for the generation part which will generate the answer to the user's query based on the retrieved document (I am talking for the Greek language). OpenAI tokenizer is very expensive since, from what I have noticed, it tokenizes the greek words to character level. So using their model does not fit to my task at hand. Any ways , if you ever notice any generative model which supports Greek please mention it to your upcoming videos, which by the way I have to say that they have helped me a lot.

    • @ashraymallesh2856
      @ashraymallesh2856 Рік тому +1

      @@georgekokkinakis7288 what about doing the RAG pipeline in english and then translating to greek for your users? :P

    • @georgekokkinakis7288
      @georgekokkinakis7288 Рік тому

      @@ashraymallesh2856 If I am not mistaken, please correct me if I am wrong, applying the RAG pipeline in English would require first to translate the documents from Greek to English. As I mentioned in a previous post the documents contain mathematical definitions and terminology. Using a translation model or google translate api wouldn't work because, for example, google translate translates the words παραπληρωματικές , συμπληρωματικές, both as supplementary which is not correct. On the other hand translate all the documents by a human would be a tedious task. That's why I am looking for an open sourced LLM which supports the Greek language. Any ideas are welcome 😁.

  •  Рік тому

    Have you tried to setup this with gpt-4? I'm getting some errors switching from davinci to gpt-4

    • @jamesbriggs
      @jamesbriggs  Рік тому

      Hey Andre! I usually avoid generating output with the built-in LLM function, I usually just use guardrails as a mid decision layer and then use actions to call LLMs like GPT4

  • @humayounkhan7946
    @humayounkhan7946 Рік тому

    This is awesome, thanks James, out of curiosity, do you know if this can be integrated with langchain?

    • @jamesbriggs
      @jamesbriggs  Рік тому +2

      absolutely, Langchain is code, and we can execute code via actions like we did with our RAG pipeline here

  • @eightrice
    @eightrice Рік тому

    does this have message history? Does the context carry over from one input to the next?

    • @jamesbriggs
      @jamesbriggs  Рік тому

      In this example no, but you can bring in a few previous interactions for embedding

    • @eightrice
      @eightrice Рік тому

      @@jamesbriggs why would you use embeddings on the previous interactions? Can you just use the ChatCompletion endpoint and pass the array of previous messages as `chat_history` ?

    • @jamesbriggs
      @jamesbriggs  Рік тому

      @@eightrice ​ ChatCompletion endpoint is more effective, and is what you do for the "agent approach to RAG" - it's just slower.
      In real-world use-cases I have always used the pure agent approach, but I recently begun experimenting with a mix of both, so I try to capture obvious queries ("user asks about LLMs") with guardrails and send the single query direct to the RAG pipeline, but for more general purpose queries I direct them to the typical agent endpoint (and include conversation history)
      I'm still experimenting with the best approach, but so far this system seems to be working well for speeding up a reasonable portion of queries

    • @eightrice
      @eightrice Рік тому

      @@jamesbriggs yup, that hybrid architecture seems optimal if you need both normal chatbot functionality and subject matter knowledge with low latency. Thank you so much for this, I feel like I should be paying a lot for your code and tutorials :)

    • @jamesbriggs
      @jamesbriggs  Рік тому

      @@eightrice yeah so far I've liked this approach - haha no worries, I'm happy it's useful :)

  • @user-ib1st1tm9w
    @user-ib1st1tm9w Рік тому

    How to use guardrail and RAG with other LLM? Like falcon or Llama?

    • @jamesbriggs
      @jamesbriggs  Рік тому

      You can modify the model provider and name in the config.yaml file - they have docs on it in the guardrails GitHub repo :)

  • @deter3
    @deter3 Рік тому +1

    This method is very simple to talk using toy example , but you need lots of hard work in the real business enviorment to build and test whether it's really working or not . Using simple sentences + embedding distance for decision making is not really a reliable solution .

    • @jamesbriggs
      @jamesbriggs  Рік тому +5

      I use it in production, it can be more reliable at times than LLMs if you define the semantic vector space that should trigger an action well - typically I view prompt engineering as the broad stroke, and guardrails as the fine-tuning of your chatbot behavior, so when you specific RAG workflows like "refer to HR docs", "refer to eng docs", "refer to company Y DB", guardrails can be very helpful
      But you're very right, it needs a lot of work, testing, and iterating over the guardrails to get something reliable

  • @AbhayKumar-yh9zs
    @AbhayKumar-yh9zs 9 місяців тому

    For implementing lagchain agents with NemoGuardrails do we need to do below?
    in the colang file first define the action which is calling the function which has the agent execution like this
    $answer = execute custom_function(query=$last_user_message)
    and then we register the tails like ?
    rag_rails.register_action(action=custom_function, name="custom_function")
    Am I on right track?

  • @EarningsNest
    @EarningsNest Рік тому +2

    Did u smoke something before recording this ?

  • @prashanthsai3441
    @prashanthsai3441 10 місяців тому +1

    Why should I use guardrails? @jamesbriggs
    I have dialogflow which has all the intents and flows (like in colang file) - I will check the intent confidence and if it is high then I will trigger the corresponding intent flow and if it is low then I will retrieve the data from the source using naive retrieval method?s