Chatbot Memory for Chat-GPT, Davinci + other LLMs - LangChain #4

Поділитися
Вставка
  • Опубліковано 4 гру 2024

КОМЕНТАРІ • 97

  • @decodingdatascience
    @decodingdatascience Рік тому +4

    Thanks Kames to elaborate about Langchain Memory , For the viewers here are some 🎯 Key Takeaways for quick navigation:
    00:00 🧠 Conversational memory is essential for chatbots and AI agents to respond coherently to queries in a conversation.
    01:23 📚 Different memory types, like conversational buffer memory and conversational summary memory, help manage and recall previous interactions in chatbots.
    05:42 🔄 Conversational buffer memory stores all past interactions in a chat, while conversational summary memory summarizes these interactions, reducing token usage.
    14:13 🪟 Conversational buffer window memory limits the number of recent interactions saved, offering a balance between token usage and remembering recent interactions.
    23:05 📊 Conversational summary buffer memory combines summarization and saving recent interactions, providing flexibility in managing conversation history.
    We are also doing lots of workshops in this pace , looking forward to talk more

  • @MrFiveDirections
    @MrFiveDirections Рік тому +3

    Super! For me, it is one of the best tutorials on this subject. Much appreciated, James.

    • @jamesbriggs
      @jamesbriggs  Рік тому

      thanks, credit to Francisco too for the great notebook

  • @GergelyGyurics
    @GergelyGyurics Рік тому

    Thank you. I was way behind langchain and had no time to read documentations. This video saved me a lot of time. Subscribed.

  • @kevon217
    @kevon217 Рік тому +1

    Another masterpiece of a tutorial. You’re an absolute gem James!

  • @daharius2
    @daharius2 Рік тому +11

    Things really seem to get interesting with the knowledge graph! Saving things that really matter like relation context, along with a combination of the other methods, starts to sound very powerful. Add in some embedding/vectorDB and wow. The other commenters idea about a system for bots evolving sentiment, or even personality, over time is worth thinking about as well.

    • @jamesbriggs
      @jamesbriggs  Рік тому +3

      yeah this is fascinating to me, looking forward to working on these

    • @Jordy-t8y
      @Jordy-t8y Рік тому

      Very powerful!
      Any idea or resources on how to add a embedding/vectorDB to this?
      I would like this memory chatbot to be able to reference my own data stored in the vectorDB but I can't seem to make it work together.
      Either the chatbot has memory OR it references the embedding bot I can't seem to combine it..

    • @vintagegenious
      @vintagegenious Рік тому

      @@Jordy-t8y It's done in video #9

  • @davidmoran4623
    @davidmoran4623 Рік тому

    Great explaining to the memory in langchain, when you show the chart is more clearly for my

  • @cloudshoring
    @cloudshoring Рік тому

    Cool! This video addressed the question that I had posed in your earlier (1st) video about the Token size limitations due to adding conversational history. The charts provide a good intuition of the workings of the memory types. Two takeaways. 1.When to use which mem. type 2. How to do performance tuning for a Chatbot app. due to the overheads posed by token tracking, memory appending so on..

  • @adumont
    @adumont Рік тому +2

    If I understand correctly the graphs, what is represented is the token used per interaction, in the case of the Buffer Memory (the quasi linear one), the 25th interact is about 4k tokens. But the price (in tokens) of the whole conversation up to the 25th interaction is the sum of the price of all the interactions up to the 25th. So basically the price of the conversations, in each case, is the area under the curves you showed, not the highest point it reached. The Summarized conversations, with the flat tendency towards the end, it means the price just keep adding almost the same tokens per each new interaction, not that the price of the conversation has reached a top.

    • @fire17102
      @fire17102 Рік тому +1

      If my math isnt off that should be 25/2 * 4k = 12.5 * 4k = 50k tokens after 25 interactions at $0.002 per 1k tokens (on turbo) that is $0.1 dollars or 1 dime for that whole conversation

    • @jamesbriggs
      @jamesbriggs  Рік тому +2

      yeah you're logic is correct, the graphs ended up like this as I wanted to show the limit of buffer memory (ie hitting the token limit) - we had intended to include cumulative total graphs but I didn't get time, planning on putting together a little notebook to show this in the coming days
      token math checks out for me - it adds up quickly

  • @DavidGarcia-gd2vq
    @DavidGarcia-gd2vq Рік тому

    Thanks for your content! looking forward to watching the knowledge graph video :)

  • @PrinceCyborg
    @PrinceCyborg Рік тому +2

    Oh wow you just destroyed my project lol I gave chat GPT long term memory, autonomous memory store and recall,speech recognition, audio out put, self reflect. Thought I was the only working on stuff like this. Well I’m basically trying to build a sentient, I need vision tho. Hopefully GPT 4 is multimodal because I’m struggling to give me project vision recognition.

    • @jamesbriggs
      @jamesbriggs  Рік тому

      yeah I think you might be in luck for multimodal GPT-4 :) - that's awesome though, I haven't done all of that yet, very cool!

    • @ericgeorge7667
      @ericgeorge7667 Рік тому

      Great work bro! Keep it up! 👍

  • @matheusrdgsf
    @matheusrdgsf Рік тому

    Thanks for this content James, awesome!

  • @goelnikhils
    @goelnikhils Рік тому

    Amazing Content

  • @gutgutia
    @gutgutia Рік тому

    James - are you still planning to work on the KG video? Seems like a powerful method that solves for scale and token limits.

  • @THCV4
    @THCV4 Рік тому +1

    Check out David Shapiro’s latest approach with salient summarization when you get a chance. Essentially: The summarizer can more efficiently pick and choose which context to preserve if it is properly primed with specific objectives/goals for the information.

    • @jamesbriggs
      @jamesbriggs  Рік тому +1

      fascinating, love Dave's videos they're great!

  • @jason_v12345
    @jason_v12345 Рік тому

    Skimming through the docs, LangChain seems like a complicated abstraction around what's essentially auto copy and paste.

    • @jamesbriggs
      @jamesbriggs  Рік тому

      the simpler stuff yes, but they have some other things like knowledge graph memory + agents that I think are valuable

  • @TomanswerAi
    @TomanswerAi Рік тому

    Great demo James

  • @sysadmin9396
    @sysadmin9396 9 місяців тому

    Hi Sam, how do we keep the Conversation context of multiple users on different devices separate ?

  • @FCrobot
    @FCrobot Рік тому

    In the scenario of conversational robots, how to limit the token consumption of the entire conversation?
    For example, once the consumption reaches 1,000, it will prompt that the tokens for this conversation have been used up.

  • @Davipar
    @Davipar Рік тому

    Thank you! Awesome work!! Appreaciate it!

  • @Sciencehub-oq5go
    @Sciencehub-oq5go Рік тому

    James, thanks so much!

  • @jianleichen7750
    @jianleichen7750 Рік тому

    Just curious, what's the openAI cost to complete this course if you choose the pay as you go plan?

  • @kevinkate4500
    @kevinkate4500 Рік тому

    @jamesbriggs why transformers are stateless

  • @m.branson4785
    @m.branson4785 Рік тому

    Great video! I love the graphs for token usage. I kept meaning to graph the trends myself, but I was too lazy! I was talking to Harrison Chase as he was implementing the latest changes to memory, and it's had me thinking about other unique ways to approach it. I've been using different customized summarizers, and I can bring up any subset of the message history as I like, but I'm thinking also to include some way to flag messages as important or unimportant, dynamically feeding the history. I also haven't really explored my options in terms of local storage and retrieval of old chat history. One note that I might make for the video too... I noticed you're using LangChain's usual OpenAI class and just adjusting your model to 3.5-turbo. My understanding is that we have been advised to use the new ChatOpenAI class for now when interacting with 3.5-turbo, since that's where they'll be focusing development and they can address changes there without breaking other stuff, necessary since the new model endpoint differs in how it takes a message list as parameter instead of a simple string.

    • @jamesbriggs
      @jamesbriggs  Рік тому +1

      dynamically feeding the memory sounds cool, would you do this explicitly or implicitly?
      langchain moves super fast, I haven't seen the new ChatOpenAI class, thanks for pointing this out!

    • @omnipedia-tech
      @omnipedia-tech Рік тому

      @@jamesbriggs My notions are to create a chat client where the bot is controlling the conversation, instead of the user, for the purpose of guided educational experiences - like a math lesson performed with the Socratic method, where you want to elicit the solution from the user rather than just provide it to them. I'm imagining I'll need an internal model of the user's cognition and an outline of the lesson, then implicitly determining the importance of any interaction or lesson detail by how logically connected it is to both, feeding only the immediately relevant context to the external facing LLM. I'm really still brainstorming, and I just started a month-long vacation to play with the idea.

  • @SaifBattah
    @SaifBattah 11 місяців тому

    what if i want to use it for my own fine-tuned gpt3.5 model?

  • @binstitus3909
    @binstitus3909 10 місяців тому

    How can I keep the conversation context of multiple users separately?

  • @adityaroy4261
    @adityaroy4261 Рік тому

    Can you please please please make a video on how to connect mongoDB with langchain?

  • @vinaynaman5697
    @vinaynaman5697 Рік тому

    How to use this conversational memory for custom chatbot along with lagnchain?

  • @sanakmukherjee3929
    @sanakmukherjee3929 Рік тому

    do u have a substitute of langchain

  • @ylazerson
    @ylazerson Рік тому

    you are awesome - thanks again!

  • @bwilliams060
    @bwilliams060 Рік тому

    Hi James, great video. This is probably a stupid comment but here goes.…Could you not just ask the LLM to capture some key variables that summarise the completion for the prompt and then feed that (rather than the full conversation) as ‘memory’ for subsequent prompts? I’m imagining a ‘ghost’ question being added to each prompt like ‘Also capture key variables to summarise the response for future recall’ and then this being used as the assistant message (per GTPTurbo 3.5) rather than all of the previous conversation?

  • @Sciencehub-oq5go
    @Sciencehub-oq5go Рік тому

    How is the model able to judge whether it needs to come to the conclusion: "I don't know."

  • @adamsardo
    @adamsardo Рік тому

    Love the video! Question about wanting to put this behind a UI, how hard would that process be?

  • @souvickdas5564
    @souvickdas5564 Рік тому

    How do I use memory with ChatVectorDBChain where we can specify vector stores. Could you please give code snippet for this. Thanks

  • @isaacyimgaingkuissu3720
    @isaacyimgaingkuissu3720 Рік тому

    Great content. thanks for that.
    I'm working on a summary tweets use case, but I don't want to break the overall corpus into pieces, build summary to each one, and combine those summaries into a larger one. I want something more clever.
    Suppose I have 10 tweets. 6 are related (same topics) and the last 4 are different from each other. I think I can build a better summary from "lang chain summary" by only summarizing the 6 related tweets and adding the 4 raw tweets. This can help not to lose the context for the future.

    • @jamesbriggs
      @jamesbriggs  Рік тому +1

      I'm not sure how exactly to implement this, but possibly:
      1. embed the tweets
      2. when looking to summarize, embed the current query and perform semantic search to identify tweets over a particular similarity threshold to return
      3. summarize those retrieved tweets

  • @satvikparamkusham7454
    @satvikparamkusham7454 Рік тому

    These lectures are really helpful, thanks a lot!
    Is there a way to use Conversational Memory along with VectorDBQA (generative question answering on a database)?

  • @billykotsos4642
    @billykotsos4642 Рік тому

    I swear you have the coolest shirts!
    Make a drip video too! would watch !

  • @jashwanthl9618
    @jashwanthl9618 Рік тому

    How would I be able to use this with a pinecone vector DB for context ?

  • @ObservingBeauty
    @ObservingBeauty Рік тому

    Helpful! Thanks

  • @max4king
    @max4king Рік тому

    Does anyone know the difference between the run vs predict method? Cause they seem the same to me.
    If there is a difference, which one is better?

  • @isaiahsgametube2321
    @isaiahsgametube2321 Рік тому

    thank you great topic

  • @agritech802
    @agritech802 Рік тому

    Can someone let me know where i can get an off the shelf LLM with long term memory? I need it to be able to remember things i tell it, remember where i put stuff etc, I don't mind paying for it.

  • @bagamanocnon
    @bagamanocnon Рік тому

    Hey James, can you share the Collab notebook for this?

    • @jamesbriggs
      @jamesbriggs  Рік тому +1

      Yes it’s the chat notebook here github.com/pinecone-io/examples/tree/master/generation/langchain/handbook

  • @huppahd5101
    @huppahd5101 Рік тому +1

    Hi great content but the gpt-3.5 model already has its conversation memory so instead of davinci you can use that. It is also 10 times cheaper😊

    • @jamesbriggs
      @jamesbriggs  Рік тому +3

      thanks for sharing, gpt-3.5-turbo is great! We do demo it in this video during the first example even :)
      - the reason I share this tutorial anyway is because gpt-3.5-turbo is (using the direct openai api) restricted to the equivalent of `ConversationBufferMemory`, it doesn't do the summary, window, or summary + window memory types
      We didn't really cover it here but there's also the knowledge graph memory, we'll cover that in the future

    • @heymichaeldaigler
      @heymichaeldaigler Рік тому +1

      @@jamesbriggs I see, so even if we want to use the turbo model because it is cheaper than davinci, we would still want to explore one of these Langchain memory types?

    • @fire17102
      @fire17102 Рік тому

      @@jamesbriggs graph memory looks really interesting, would love to see it utilized with turbo or chatgpi api, also wondering if/when openai will start cacheing tokens for users on their end meaning you would only pay for new data added to the conversation.

  • @thedailybias5408
    @thedailybias5408 Рік тому

    Hello James, this method would not work for chat models anymore, right? The code would have to be adjusted to work for the new chat models from langchain. Could you make a new video to cover that?

    • @jamesbriggs
      @jamesbriggs  Рік тому +2

      it works for normal LLMs, not for chatbot-only models - but yes I'll be doing another video on this

    • @thedailybias5408
      @thedailybias5408 Рік тому

      @@jamesbriggs awesome! Thank you so much for all the work you put in. You got me back to coding :)

  • @AlbusDumbledore-fr3qg
    @AlbusDumbledore-fr3qg Рік тому

    make a video on using this kind of long term memory based chat for sementic search on local files like txt pls

  • @antoniosalzano6235
    @antoniosalzano6235 Рік тому

    I know that OpenAI’s text embeddings measure the relatedness of text.
    I am new to this field, so probably for some of you this question would be trivial. Anyway, I was wondering if is it possible to use this technique with source code.
    I was trying to figure out a way to analyse a source code, but due to token limitation, one way to save prior knowledge could have been this.
    For example if I have a list of source codes, I can search similarities within the list.
    Any advice? Is it possible or I am just blathering on?

    • @jamesbriggs
      @jamesbriggs  Рік тому +1

      interesting question, I'm not sure as I haven't seen this done before but generally speaking, these language models are just as good (if not better) at generating good code to good natural language, so I'd imagine generating embeddings for code *might* work
      For dealing with token limits, you can try comparing chunks of code, rather than the full code - if your use-case allows for that

  • @did.dynamics8504
    @did.dynamics8504 Рік тому

    no exemple???

  • @superchiku
    @superchiku Рік тому +5

    Make James Famous ....

  • @younginnovatorscenterofint8986

    Hello, this was interesting. I am currently developing a chatbot with llama index model_name="text-ada-001 or davinci-003. So, based on thousands of documents (external data), the user will ask questions, and the chatbot must respond. When I tried it with just one document, the model performed well, but when I added another, the performance dropped. Could you please advise on a possible solution to this? thank you in advance

  • @eduardomoscatelli
    @eduardomoscatelli Рік тому

    The big problem is that so far I haven't found a solution that doesn't need to insert the entire schema in the prompt itself so that chatgpt understands how to organize and structure the data.
    Explaining my need better, I extracted information from sales pages via webscrapping and I would like Chatgpt to organize the data collected based on my SCHEMA structure so that I can save them in the database with the fields I created.
    I wouldn't want to add instructions on how to sort the data in the ChatGPT prompt every time.
    DOUBT:
    Question of 1 million dollars 😊: How to "teach" the schema to chatgpt only 1 time and be able to validate infinite texts without having to spend a token inserting the schema in the prompt and without having to train the model via fine-tune?

    • @RatafakRatafak
      @RatafakRatafak 11 місяців тому

      For this kind of question you should try more advanced LLM channels

  • @dallasurban9676
    @dallasurban9676 Рік тому

    So large language is simply a specialized transformer models. For words.
    Stable diffusion, and all the others are a specialized transformer model for images.
    Etc. Right now companies are developing out their own specialized transformer models.

    • @jamesbriggs
      @jamesbriggs  Рік тому +1

      for large language models yes, they're essentially specialized and very large transformer models
      Stable diffusion does contain a transformer or two in the pipeline, but the core "component" of it is the diffusion model, which is different. But the input to this diffusion model includes embeddings which are generated by something like CLIP (which contains a text transformer and vision transformer, ViT)
      Generally yes, transformers are everywhere, with a couple of other models (like diffusers) scattered around the landscape

    • @TLabsLLC-AI-Development
      @TLabsLLC-AI-Development 9 місяців тому

      Yeah. I count the transformer and diffusion layers to be separate aspects of it but I see what you mean. It's getting so crazy.

  • @did.dynamics8504
    @did.dynamics8504 Рік тому

    IT's not DIALOGUE its a SERIE of Questions .... the AI must dialogue like you make with friend ,

  • @RyushoYosei
    @RyushoYosei Рік тому

    And yet ChatGPT needs some of this badly as I have seen it massively forget things that it said literally just one or two comments previously.

  • @uncletan888
    @uncletan888 Рік тому

    ChatGPT 4 charge high fees and people should not support it.

  • @VoyceAtlas
    @VoyceAtlas Рік тому

    we should have a dedicated ai that sumarizes from old chats based on what you are talking about now and then give back less recent convos. a bit of both

    • @jamesbriggs
      @jamesbriggs  Рік тому

      I think this is similar to the summary + buffer window memory?