RAG in Production - LangChain & FastAPI

Coding Crash Courses

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 29 січ 2025

КОМЕНТАРІ • 75

@Challseus 11 місяців тому ⁺⁴
Thankful for channels like this that go above and beyond the standard tutorials 💪🏾
@codingcrashcourses8533 11 місяців тому
Thanks for your motivating comment :)
@saurabhjain507 11 місяців тому ⁺⁶
Another helpful video! please create more videos of langchain in production
@codingcrashcourses8533 11 місяців тому ⁺¹
Next monday i will release one about Monitoring with langfuse
@delgrave4786 3 місяці тому
so i have some doubt regarding the digest generation. the code generates digest for all the document uploaded to it right? and the advantage is that same document will generate same digest which can be checked with existing digest and be excluded, I guess a better way would be to generate digest for each pages of a pdf in case a new pdf is uploaded with only single page difference?
Currently the code does not handle any cases like this right? it only generates digest as part of the metadata and stores it without checking anything if i haven't missed anything?
@codingcrashcourses8533 3 місяці тому
I generally agrew. That is a different approach since you need to use sind kind of observer pattern, which is not so easy, since you have to rely on your data Provider to offer that.
@nicolascr181 2 місяці тому
Hello, I don't understand how to upload the document since the endpoint receives only a json format. thanks for your content
@codingcrashcourses8533 2 місяці тому
Document is just a class.
It looks like this:
Document(content="xxx", metadata={})
You can serialize this to:
{
"content": "xxx",
"metadata": {}
}
then you have the JSON format.
Code looks like this (dummy implementation):
class Document:
def __init__(self, content, metadata):
self.content = content
self.metadata = metadata
def to_dict(self):
return {
"content": self.content,
"metadata": self.metadata
}
Does this help? :)
@omaralhory8065 11 місяців тому ⁺¹
Hi,
I am following your codebase, and I really like it.
I am still unsure why do we need to update the data via an API, if we can have an ETEL (Extract, Transform, Embed, Load) Data Pipeline that runs on a schedule if new data comes on.
Why do we give such access to the client, + why is it an API that gives access to deleting records.
What would you do differently here? Would you develop a CMS in order to maintain the relationship between the client and the db?
@codingcrashcourses8533 11 місяців тому ⁺¹
You could also do it that way, but in this repo I dont have a pipeline or anything. There is more than one way to do it :-). I currently have no good solution for updating data without the additional API Layer.
@omaralhory8065 11 місяців тому
@@codingcrashcourses8533 Thank you for being responsive! your channel is a gem btw.
Usually RAGs data sources aren't predictable, maybe a data lake (delta lake by Databricks) can be quite beneficial here, you can utilize pyspark to do the pipeline and it will be great when it is connected to Airflow for example for scheduling.
@awakenwithoutcoffee 6 місяців тому
did you find a solution ?
@mcdaddy42069 10 місяців тому ⁺¹
why do you have to put your vectorstore in a docker contrainer?
@codingcrashcourses8533 10 місяців тому ⁺¹
Containers are just the way to go. You dont have to, but it makes everything so much easier.
@zendr0 11 місяців тому ⁺¹
have you thought about caching implementation in RAG based systems? Curious.
@codingcrashcourses8533 11 місяців тому
Yes, but currently it is one of the less prioritized topics, since when you use the whole conversation history, the amount of Times we you can use the Cache is not too often. Have you worked with Cache before?
@zendr0 11 місяців тому
I have used in memory cache. Can we do something like... we use cache to store the embeddings and then do cosine similarity on the new input query embeddings and the ones in the cache. If the score is more than a threshold then it is somewhat obvious that the query has been asked previously, so we just use the cache to answer that. What do you think?
@@codingcrashcourses8533
@DePhpBug 8 місяців тому
Still new with all the concept here , saw the video about having API on top of the model's API is this correct? For having an abstraction layer on top of model.
Am i correct to say , my model need to sit in let;s say server A , then i need to create the API in server B to connect to the A ?
@codingcrashcourses8533 8 місяців тому ⁺¹
Exactly. Adding one Layer is crutial nornally, Adding more can but must not make sense for your usecase
@DePhpBug 8 місяців тому
@@codingcrashcourses8533 thanks
@navaneeth44 3 місяці тому
Great content! But I have a question here, does the post method accepts large files as input?
@codingcrashcourses8533 3 місяці тому
@@navaneeth44 no, i used application/json. But fastapi also has classes which allow to accept files. What do you mean with large? The default is not large, a few mb
@daniel_avila 10 місяців тому
Hi thanks for this! I have a question about digest specifically.
I understand that would be a great way to compare page_content for changes, but I'm not sure where to do this programmatically, or where to inspect where this is happening already. As far as I know, this is not happening already and maybe more on this would be helpful to someone new to pgvector.
Following how documents are added, it seems embeddings are created regardless.
@codingcrashcourses8533 10 місяців тому
There is the indexing api to do this. Or do you mean visually like a git diff?
@daniel_avila 10 місяців тому
@@codingcrashcourses8533 I was unaware this would involve indexing API but makes sense, however there's no official async pgvector implementation for the indexing manager: langchain-ai/langchain, issue #14836
@alchemication 11 місяців тому
Very nice. Did you consider langchain serve before trying an inhouse solution? Just curious..
@codingcrashcourses8533 11 місяців тому
Langserve is more about prototyping in my opinion:)
@alchemication 11 місяців тому ⁺¹
Interesting take on it, I think they promote it as a prod env API, but as usual, without actually trying for real - you won’t know 😅 best!
@codingcrashcourses8533 11 місяців тому ⁺¹
@@alchemication Well, I am quite good with FastAPI and use it since a very long time, so I would in general prefer not to add an abstraction layer on top of it. My first glance on it was like "ok, that´s quick, but robust code is something different"
@Pure_Science_and_Technology 11 місяців тому
When processing a file for RAG, I save its name, metadata, and a unique ID in a structured database. This unique ID is also assigned to each chunk in the vector database. If a file needs updating or deleting, the unique ID in the database is used to modify or remove the corresponding entries in the vector database.
@codingcrashcourses8533 11 місяців тому
Yes, very robust solution :)
@RobertoFabrizi 11 місяців тому
Just to see if I understood you right, let's assume that you have a file (product catalog, functional software specification, your pick) that is a doc with 100 pages. You use a document loader to load it, then split it with a recursive character text splitter with a chunk size of 1000 and overlap of 100, then embed those chunks and store them in a vector db, saving thousands of them all created from the file to ingest. Then a single line around the start of that file changes, but that has repercussions to all later chunks that even though are technically the same data, they are partitioned differently from before (assuming that the change before them caused the chunking process to creare different chunks, maybe the modificed row is longer than before). How do you efficiently update your vector db in this scenario? Thank you!
@codingcrashcourses8533 11 місяців тому ⁺²
@@RobertoFabrizi You wont just read a whole catalog in memory at once. You should have each page seperate as raw data. Then you split each page into smaller chunks. I would even argue against a fixed chunk size, but this is something I will cover in my next (small) video.
@pvrajanrk 10 місяців тому
Great video.
Can you add your thoughts on including state management for maintaining the chat window for the different chat sessions? This is another area I see as a gap in Langchain Production.
@codingcrashcourses8533 10 місяців тому
I did another video about this and cover that in my Udemy course. The answer for me is Redis, where you set key value pairs. Key is the conversation id and value is the stringified conversation
@yazanrisheh5127 11 місяців тому
Can you show us how to implement memory with LCEL and if possible, caching responses? Thanks
@codingcrashcourses8533 11 місяців тому
The Memory classes from Langchain are not a good way to work in production, they are just for prototyping. In real world apps you probably want to handle all of that in Redis
@say.xy_ 11 місяців тому ⁺¹
Best best best!!!
@codingcrashcourses8533 11 місяців тому ⁺¹
Thank you :)
@Pure_Science_and_Technology 11 місяців тому
Will Gemini 1.5 and beyond kill RAG?
@codingcrashcourses8533 11 місяців тому
Highly doubt that with Gemini 1.5, but beyond hopefully. Currently Answers still are bad then your context is larger than 20 Documents or so
@xiscosmite6844 11 місяців тому
@@codingcrashcourses8533Curious why you think answers are bad after that size and how Gemini could solve that in the future. Thanks for the great video!
@codingcrashcourses8533 11 місяців тому
@@xiscosmite6844 I dont trust gemini after i tried it on my own :)
@swiftmindai 11 місяців тому
As always excellent content. I have learned from your previous content about use of langchain index api (SqlRecordManager). Now, I've learned about using of hashing function (generate_digest). I believe both are for same purpose. I'm wondering which one would be better coz I don't see the way to measure performance for both methodology. Appreciate your suggestion.
@codingcrashcourses8533 11 місяців тому ⁺¹
Thank you! I think its just important to understand the concepts of WHY Langchain introduces something like that and learn about the limitations. I found it hard to use the indexing API when there are a large amount of documents.
@swiftmindai 11 місяців тому ⁺¹
It took me literally few days to understand and implement the indexing API concept. I even had to switch to PGvector from other vector store provider which i was using earlier since indexing api was only applicable to sql based vector store. But now, I love PGVector more than any other. I thank you alot for your production implementation video as I literally use this as the basis of my latest project.
@entrepreneurialyt 11 місяців тому
Thank you for videos! Can you please make a video about tools that can be used for both performance measurement and accuracy tracking? Basically how to build test environment for bot before realising to production
@codingcrashcourses8533 11 місяців тому ⁺¹
RAG Performance? Performance of service with a Load Test? What would interest you?
@entrepreneurialyt 11 місяців тому
@@codingcrashcourses8533 Performance of service with a Load Test will be super cool!
@entrepreneurialyt 11 місяців тому
@@codingcrashcourses8533 RAG Performance will be cool!
@entrepreneurialyt 11 місяців тому
@@codingcrashcourses8533 Rag perfomance will be great!
@entrepreneurialyt 11 місяців тому
@@codingcrashcourses8533 attempting to respond to your question 101 time: please make video about r-a- g performance (please UA-cam don't ban my reply). I am developing a bot that is able to answer question based on the transript of the video lecture and other course materials to speed up learning proces. if I am not misteken first one will be more relevant?
How can I deduce that my bot is ready for production? Thank you :)
@dswithanand 9 місяців тому
How to integrare langchain chat memory history with fastapi
@codingcrashcourses8533 9 місяців тому
You don´t. You want your API to be stateless normally.
@dswithanand 9 місяців тому
@@codingcrashcourses8533 I understand that.. I am working on a sqlbot and using fastapi along with it. But this bot is not able to retrieve the context memory. Can you help with that. Langchain has ChatMemoryHistory library which can be used for this.
@sskohli79 11 місяців тому ⁺¹
great video thanks! can you please also add requirements.txt to your repo
@codingcrashcourses8533 11 місяців тому ⁺¹
can add that today, yes :)
@sskohli79 11 місяців тому
Thanks!
@sskohli79 11 місяців тому
@@codingcrashcourses8533 not there 😞
@omaralhory8065 11 місяців тому
Can yo add it please? I checked and its not there
@codingcrashcourses8533 11 місяців тому ⁺¹
@@omaralhory8065 really sorry, forgot about it
@picklenickil 9 місяців тому
Just came to comment that, maintaining a backend for this will be hard!
@codingcrashcourses8533 9 місяців тому ⁺²
What do you exactly mean?
@YerkoMuñoz-q7u Місяць тому
Isn't quite the opposite? having a backend like this allows you to have a maintainable infrastructure
@no-code-po-polsku 11 місяців тому
Blur your OpenAI API key form .env
@codingcrashcourses8533 11 місяців тому ⁺⁴
I always just delete them ;)

Наступне

Автоматичне відтворення

LangChain vs. LlamaIndex - What Framework to use for RAG?