Chatbot Memory for Chat-GPT, Davinci + other LLMs - LangChain #4

James Briggs

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 10 гру 2024

КОМЕНТАРІ • 97

@decodingdatascience Рік тому ⁺⁴
Thanks Kames to elaborate about Langchain Memory , For the viewers here are some 🎯 Key Takeaways for quick navigation:
00:00 🧠 Conversational memory is essential for chatbots and AI agents to respond coherently to queries in a conversation.
01:23 📚 Different memory types, like conversational buffer memory and conversational summary memory, help manage and recall previous interactions in chatbots.
05:42 🔄 Conversational buffer memory stores all past interactions in a chat, while conversational summary memory summarizes these interactions, reducing token usage.
14:13 🪟 Conversational buffer window memory limits the number of recent interactions saved, offering a balance between token usage and remembering recent interactions.
23:05 📊 Conversational summary buffer memory combines summarization and saving recent interactions, providing flexibility in managing conversation history.
We are also doing lots of workshops in this pace , looking forward to talk more
@MrFiveDirections Рік тому ⁺³
Super! For me, it is one of the best tutorials on this subject. Much appreciated, James.
@jamesbriggs Рік тому
thanks, credit to Francisco too for the great notebook
@GergelyGyurics Рік тому
Thank you. I was way behind langchain and had no time to read documentations. This video saved me a lot of time. Subscribed.
@daharius2 Рік тому ⁺¹¹
Things really seem to get interesting with the knowledge graph! Saving things that really matter like relation context, along with a combination of the other methods, starts to sound very powerful. Add in some embedding/vectorDB and wow. The other commenters idea about a system for bots evolving sentiment, or even personality, over time is worth thinking about as well.
@jamesbriggs Рік тому ⁺³
yeah this is fascinating to me, looking forward to working on these
@Jordy-t8y Рік тому
Very powerful!
Any idea or resources on how to add a embedding/vectorDB to this?
I would like this memory chatbot to be able to reference my own data stored in the vectorDB but I can't seem to make it work together.
Either the chatbot has memory OR it references the embedding bot I can't seem to combine it..
@vintagegenious Рік тому
@@Jordy-t8y It's done in video #9
@kevon217 Рік тому ⁺¹
Another masterpiece of a tutorial. You’re an absolute gem James!
@adumont Рік тому ⁺²
If I understand correctly the graphs, what is represented is the token used per interaction, in the case of the Buffer Memory (the quasi linear one), the 25th interact is about 4k tokens. But the price (in tokens) of the whole conversation up to the 25th interaction is the sum of the price of all the interactions up to the 25th. So basically the price of the conversations, in each case, is the area under the curves you showed, not the highest point it reached. The Summarized conversations, with the flat tendency towards the end, it means the price just keep adding almost the same tokens per each new interaction, not that the price of the conversation has reached a top.
@fire17102 Рік тому ⁺¹
If my math isnt off that should be 25/2 * 4k = 12.5 * 4k = 50k tokens after 25 interactions at $0.002 per 1k tokens (on turbo) that is $0.1 dollars or 1 dime for that whole conversation
@jamesbriggs Рік тому ⁺²
yeah you're logic is correct, the graphs ended up like this as I wanted to show the limit of buffer memory (ie hitting the token limit) - we had intended to include cumulative total graphs but I didn't get time, planning on putting together a little notebook to show this in the coming days
token math checks out for me - it adds up quickly
@davidmoran4623 Рік тому
Great explaining to the memory in langchain, when you show the chart is more clearly for my
@cloudshoring Рік тому
Cool! This video addressed the question that I had posed in your earlier (1st) video about the Token size limitations due to adding conversational history. The charts provide a good intuition of the workings of the memory types. Two takeaways. 1.When to use which mem. type 2. How to do performance tuning for a Chatbot app. due to the overheads posed by token tracking, memory appending so on..
@PrinceCyborg Рік тому ⁺²
Oh wow you just destroyed my project lol I gave chat GPT long term memory, autonomous memory store and recall,speech recognition, audio out put, self reflect. Thought I was the only working on stuff like this. Well I’m basically trying to build a sentient, I need vision tho. Hopefully GPT 4 is multimodal because I’m struggling to give me project vision recognition.
@jamesbriggs Рік тому
yeah I think you might be in luck for multimodal GPT-4 :) - that's awesome though, I haven't done all of that yet, very cool!
@ericgeorge7667 Рік тому
Great work bro! Keep it up! 👍
@gutgutia Рік тому
James - are you still planning to work on the KG video? Seems like a powerful method that solves for scale and token limits.
@THCV4 Рік тому ⁺¹
Check out David Shapiro’s latest approach with salient summarization when you get a chance. Essentially: The summarizer can more efficiently pick and choose which context to preserve if it is properly primed with specific objectives/goals for the information.
@jamesbriggs Рік тому ⁺¹
fascinating, love Dave's videos they're great!
@DavidGarcia-gd2vq Рік тому
Thanks for your content! looking forward to watching the knowledge graph video :)
@TomanswerAi Рік тому
Great demo James
@jamesbriggs Рік тому
thanks Tommy I appreciate it!
@jason_v12345 Рік тому
Skimming through the docs, LangChain seems like a complicated abstraction around what's essentially auto copy and paste.
@jamesbriggs Рік тому
the simpler stuff yes, but they have some other things like knowledge graph memory + agents that I think are valuable
@SaifBattah 11 місяців тому
what if i want to use it for my own fine-tuned gpt3.5 model?
@m.branson4785 Рік тому
Great video! I love the graphs for token usage. I kept meaning to graph the trends myself, but I was too lazy! I was talking to Harrison Chase as he was implementing the latest changes to memory, and it's had me thinking about other unique ways to approach it. I've been using different customized summarizers, and I can bring up any subset of the message history as I like, but I'm thinking also to include some way to flag messages as important or unimportant, dynamically feeding the history. I also haven't really explored my options in terms of local storage and retrieval of old chat history. One note that I might make for the video too... I noticed you're using LangChain's usual OpenAI class and just adjusting your model to 3.5-turbo. My understanding is that we have been advised to use the new ChatOpenAI class for now when interacting with 3.5-turbo, since that's where they'll be focusing development and they can address changes there without breaking other stuff, necessary since the new model endpoint differs in how it takes a message list as parameter instead of a simple string.
@jamesbriggs Рік тому ⁺¹
dynamically feeding the memory sounds cool, would you do this explicitly or implicitly?
langchain moves super fast, I haven't seen the new ChatOpenAI class, thanks for pointing this out!
@omnipedia-tech Рік тому
@@jamesbriggs My notions are to create a chat client where the bot is controlling the conversation, instead of the user, for the purpose of guided educational experiences - like a math lesson performed with the Socratic method, where you want to elicit the solution from the user rather than just provide it to them. I'm imagining I'll need an internal model of the user's cognition and an outline of the lesson, then implicitly determining the importance of any interaction or lesson detail by how logically connected it is to both, feeding only the immediately relevant context to the external facing LLM. I'm really still brainstorming, and I just started a month-long vacation to play with the idea.
@sysadmin9396 9 місяців тому
Hi Sam, how do we keep the Conversation context of multiple users on different devices separate ?
@goelnikhils Рік тому
Amazing Content
@matheusrdgsf Рік тому
Thanks for this content James, awesome!
@jamesbriggs Рік тому
you're welcome
@FCrobot Рік тому
In the scenario of conversational robots, how to limit the token consumption of the entire conversation?
For example, once the consumption reaches 1,000, it will prompt that the tokens for this conversation have been used up.
@vinaynaman5697 Рік тому
How to use this conversational memory for custom chatbot along with lagnchain?
@Sciencehub-oq5go Рік тому
James, thanks so much!
@sanakmukherjee3929 Рік тому
do u have a substitute of langchain
@kevinkate4500 Рік тому
@jamesbriggs why transformers are stateless
@jianleichen7750 Рік тому
Just curious, what's the openAI cost to complete this course if you choose the pay as you go plan?
@binstitus3909 11 місяців тому
How can I keep the conversation context of multiple users separately?
@isaacyimgaingkuissu3720 Рік тому
Great content. thanks for that.
I'm working on a summary tweets use case, but I don't want to break the overall corpus into pieces, build summary to each one, and combine those summaries into a larger one. I want something more clever.
Suppose I have 10 tweets. 6 are related (same topics) and the last 4 are different from each other. I think I can build a better summary from "lang chain summary" by only summarizing the 6 related tweets and adding the 4 raw tweets. This can help not to lose the context for the future.
@jamesbriggs Рік тому ⁺¹
I'm not sure how exactly to implement this, but possibly:
1. embed the tweets
2. when looking to summarize, embed the current query and perform semantic search to identify tweets over a particular similarity threshold to return
3. summarize those retrieved tweets
@Sciencehub-oq5go Рік тому
How is the model able to judge whether it needs to come to the conclusion: "I don't know."
@souvickdas5564 Рік тому
How do I use memory with ChatVectorDBChain where we can specify vector stores. Could you please give code snippet for this. Thanks
@adityaroy4261 Рік тому
Can you please please please make a video on how to connect mongoDB with langchain?
@Davipar Рік тому
Thank you! Awesome work!! Appreaciate it!
@jamesbriggs Рік тому
thanks!
@ylazerson Рік тому
you are awesome - thanks again!
@max4king Рік тому
Does anyone know the difference between the run vs predict method? Cause they seem the same to me.
If there is a difference, which one is better?
@jashwanthl9618 Рік тому
How would I be able to use this with a pinecone vector DB for context ?
@did.dynamics8504 Рік тому
no exemple???
@satvikparamkusham7454 Рік тому
These lectures are really helpful, thanks a lot!
Is there a way to use Conversational Memory along with VectorDBQA (generative question answering on a database)?
@huppahd5101 Рік тому ⁺¹
Hi great content but the gpt-3.5 model already has its conversation memory so instead of davinci you can use that. It is also 10 times cheaper😊
@jamesbriggs Рік тому ⁺³
thanks for sharing, gpt-3.5-turbo is great! We do demo it in this video during the first example even :)
- the reason I share this tutorial anyway is because gpt-3.5-turbo is (using the direct openai api) restricted to the equivalent of `ConversationBufferMemory`, it doesn't do the summary, window, or summary + window memory types
We didn't really cover it here but there's also the knowledge graph memory, we'll cover that in the future
@heymichaeldaigler Рік тому ⁺¹
@@jamesbriggs I see, so even if we want to use the turbo model because it is cheaper than davinci, we would still want to explore one of these Langchain memory types?
@fire17102 Рік тому
@@jamesbriggs graph memory looks really interesting, would love to see it utilized with turbo or chatgpi api, also wondering if/when openai will start cacheing tokens for users on their end meaning you would only pay for new data added to the conversation.
@ObservingBeauty Рік тому
Helpful! Thanks
@adamsardo Рік тому
Love the video! Question about wanting to put this behind a UI, how hard would that process be?
@agritech802 Рік тому
Can someone let me know where i can get an off the shelf LLM with long term memory? I need it to be able to remember things i tell it, remember where i put stuff etc, I don't mind paying for it.
@bwilliams060 Рік тому
Hi James, great video. This is probably a stupid comment but here goes.…Could you not just ask the LLM to capture some key variables that summarise the completion for the prompt and then feed that (rather than the full conversation) as ‘memory’ for subsequent prompts? I’m imagining a ‘ghost’ question being added to each prompt like ‘Also capture key variables to summarise the response for future recall’ and then this being used as the assistant message (per GTPTurbo 3.5) rather than all of the previous conversation?
@thedailybias5408 Рік тому
Hello James, this method would not work for chat models anymore, right? The code would have to be adjusted to work for the new chat models from langchain. Could you make a new video to cover that?
@jamesbriggs Рік тому ⁺²
it works for normal LLMs, not for chatbot-only models - but yes I'll be doing another video on this
@thedailybias5408 Рік тому
@@jamesbriggs awesome! Thank you so much for all the work you put in. You got me back to coding :)
@bagamanocnon Рік тому
Hey James, can you share the Collab notebook for this?
@jamesbriggs Рік тому ⁺¹
Yes it’s the chat notebook here github.com/pinecone-io/examples/tree/master/generation/langchain/handbook
@billykotsos4642 Рік тому
I swear you have the coolest shirts!
Make a drip video too! would watch !
@jamesbriggs Рік тому
Thanks Billy! A drip video??
@antoniosalzano6235 Рік тому
I know that OpenAI’s text embeddings measure the relatedness of text.
I am new to this field, so probably for some of you this question would be trivial. Anyway, I was wondering if is it possible to use this technique with source code.
I was trying to figure out a way to analyse a source code, but due to token limitation, one way to save prior knowledge could have been this.
For example if I have a list of source codes, I can search similarities within the list.
Any advice? Is it possible or I am just blathering on?
@jamesbriggs Рік тому ⁺¹
interesting question, I'm not sure as I haven't seen this done before but generally speaking, these language models are just as good (if not better) at generating good code to good natural language, so I'd imagine generating embeddings for code *might* work
For dealing with token limits, you can try comparing chunks of code, rather than the full code - if your use-case allows for that
@AlbusDumbledore-fr3qg Рік тому
make a video on using this kind of long term memory based chat for sementic search on local files like txt pls
@jamesbriggs Рік тому
planning to do it soon!
@isaiahsgametube2321 Рік тому
thank you great topic
@jamesbriggs Рік тому
glad you liked it!
@eduardomoscatelli Рік тому
The big problem is that so far I haven't found a solution that doesn't need to insert the entire schema in the prompt itself so that chatgpt understands how to organize and structure the data.
Explaining my need better, I extracted information from sales pages via webscrapping and I would like Chatgpt to organize the data collected based on my SCHEMA structure so that I can save them in the database with the fields I created.
I wouldn't want to add instructions on how to sort the data in the ChatGPT prompt every time.
DOUBT:
Question of 1 million dollars 😊: How to "teach" the schema to chatgpt only 1 time and be able to validate infinite texts without having to spend a token inserting the schema in the prompt and without having to train the model via fine-tune?
@RatafakRatafak 11 місяців тому
For this kind of question you should try more advanced LLM channels
@superchiku Рік тому ⁺⁵
Make James Famous ....
@promptjungle Рік тому ⁺¹
He already is
@younginnovatorscenterofint8986 Рік тому
Hello, this was interesting. I am currently developing a chatbot with llama index model_name="text-ada-001 or davinci-003. So, based on thousands of documents (external data), the user will ask questions, and the chatbot must respond. When I tried it with just one document, the model performed well, but when I added another, the performance dropped. Could you please advise on a possible solution to this? thank you in advance
@younginnovatorscenterofint8986 Рік тому
my documents are in a form of pdf
@dallasurban9676 Рік тому
So large language is simply a specialized transformer models. For words.
Stable diffusion, and all the others are a specialized transformer model for images.
Etc. Right now companies are developing out their own specialized transformer models.
@jamesbriggs Рік тому ⁺¹
for large language models yes, they're essentially specialized and very large transformer models
Stable diffusion does contain a transformer or two in the pipeline, but the core "component" of it is the diffusion model, which is different. But the input to this diffusion model includes embeddings which are generated by something like CLIP (which contains a text transformer and vision transformer, ViT)
Generally yes, transformers are everywhere, with a couple of other models (like diffusers) scattered around the landscape
@TLabsLLC-AI-Development 9 місяців тому
Yeah. I count the transformer and diffusion layers to be separate aspects of it but I see what you mean. It's getting so crazy.
@RyushoYosei Рік тому
And yet ChatGPT needs some of this badly as I have seen it massively forget things that it said literally just one or two comments previously.
@did.dynamics8504 Рік тому
IT's not DIALOGUE its a SERIE of Questions .... the AI must dialogue like you make with friend ,
@uncletan888 Рік тому
ChatGPT 4 charge high fees and people should not support it.
@VoyceAtlas Рік тому
we should have a dedicated ai that sumarizes from old chats based on what you are talking about now and then give back less recent convos. a bit of both
@jamesbriggs Рік тому
I think this is similar to the summary + buffer window memory?

Наступне

Автоматичне відтворення