i always see these comments under like every tutorial video even the ones that give no really useful info, so any praise under any educational video kinda devalued to me which i suppose might be the case for the creator themselves too. ive never written such comment partly for that reason and MAN i just want you to know that this video is the most appreciated of the ones about langchain to me, i literally cant begin to describe how useful for me it is, been searching for it for a really long time. you deserve the happiest life a human can imagine and i just hope some day i'll run into you to buy you a beer or two. thank you.
This makes sense for very large projects, a monolith for most use cases is 100 percent fine. Microservices is for an app that is used constantly by many users and the app makes a lot of money. Also why would a restaurant pay monthly for so many containers? If it is a chain it is fine, for small business this is very impractical. Following best practices for the sake of it when it does not provide a benefit, and from the business perspective does not make sense is not a good idea. BTW the design is good, and the tutorial was presented very well, just don't tell people to use microservices for a restaurant app.
Excellent work, it's great to see another approach more oriented to a real application, I wonder if you could make a video of the deployment in web services like AWS and GCP, there are many tools and I don't know which one to choose in these cases, it would help a lot to see how you execute it. Greetings from Colombia.
there are multiple dockers, which docker do you install this package? I run the python file and get an error about write into the database@@codingcrashcourses8533
how does your pgvector and redis scale? you can't have multiple docker containers of these services. a scalable cloud service for redis and pgvector might be more desirable which can handle huge load
you can at least for redis. Redis has got Redis Sentinel which is a cluster solution which master and slave agents. For PGVector I am not sure how much load it can handle, but PostGres is a well established DB which normally can handle a lot of traffic with a single instance.
Good question! Well it depends what you want to do. In the company I work for we store the conversations from multiple channels, Audio and text. So we use this architecture to store all conversations in a single place
Thank you, man, SO useful! regarding scalability. how scalable is this? should it be able to support an arbitrary number of users? I am trying to build something very similar but I am concerned on how the requests to the chatbot will be handled when multiple users make the request. because the generation of the llm response takes some time to generate so how is it that one user does not have to wait for the other users response?
Well, you normally scale horizontally and spin up multiple containers if the load increases. But to be honest: Our Bot in production works with a single pod/container. How many users do you assume with use your bot concurrently?
@@codingcrashcourses8533 I was thinking say for 100-1000 users, for an organization's internal use. how many users does your bot support in production?. My concern is in the concurrent requests to the llm. I don't see how if multiple requests are done at the same time the llm will be able to respond independently to each and not serially causing the last request to take very long to be answered.
@@codingcrashcourses8533 hmm, true... Couple hundreds I would imagine at peak time. One last question. I was thinking on doing this but using a function calling agent from llamaIndex instead of just a bare llm. I know you recommend not to do this at the beginning of the video but is it because it wont work or is it because it makes the process not optimal. Thank you for your replies mate the have been very helpful
@@felipenavarro3004 We could handle 400 requests concurrently easily with a single pod. Make sure you loadtest your app before bringing it into production. Just make sure you make your code non-blocking. Regarding the framework: Do whatever you prefer and which makes it easy to build such an application.
My team is recommending that we deploy our embeddings along with our code inside the azure Function App. I understand using a vectoreDB would be a more better approach. Can you tell me the limitations of keeping the embeddings inside functionsapp instead of a separate vectors store. So that I can make them understand?
@@codingcrashcourses8533 Yes, that would be the perfect example. I actually run an e-commerce store, and I am looking to implement a chatbot to assist users with their requests more effectively. Regular customers often inquire about the estimated delivery time for their orders. In order to assist them effectively, our chatbot will require their order ID to call internal functions to get the delivery time.
Have you worked with LangChain's agents at all in production? I've been trying to spin up a ReAct agent with access to tools for querying a Postgres DB and Pinecone vector store, but the implementation has been sort of lackluster. I'm using their ConversationalBuffer memory populated from a DynamoDB message history, but the agent really struggles with follow-up questions and using context from the conversation to form queries. I'm wondering if its better to just unroll everything down to a simpler LLM chain and forgo the Memory class like you've done here.
Follow up questions are worth a video by themself. Got no solution I am totally happy with. In my company we tried going back in the history n-1 and adding that again, since similarity search won´t give you back data. In general I am not a big fan of agents - they just are unreliable. I would probably use OpenAI function calling for querying a SQL Database. My Agents were too dumb to write good SQL. But try it yourself, I am also just a random dude trying out stuff
Amazing! For some reason though, Im having trouble getting this to work with the LangChain(python) Functions Agent. Does it need to be done differently? By chance do you have any examples?
@@codingcrashcourses8533 hmmm then i think my front end setup is now broken. Does the above work for Streaming? My frontend is SvelteKit which is where my error might be..
nice video. I can echo the sentiment of others. Most videos miss out on important things or basics for a non experienced beginner. Do you think you could add a video in actual production (live production). Second thing just a feedback , the data flow and handshake of services if it is shown in numbers and different colors ( the flow basically ) would be great.
No, these are vector databases, which would substitute pgvector. For storing conversations you could use any other document db like MongoDB, but redis is great since its so fast and easy to use.
Yeah, but does it really matter if you app in 1ms faster or slower? I prefer to be able to use other tables for different stuff, use SQLAlchemy to build an API Layer around my Database. FAISS for me is some kind of big blackbox
I get an error message "exec /wait-for-postgres.sh: no such file or directory" when I build using docker-compose up. Do you have any idea how I can fix this?
I guess that you use a Windows Format for your file. Change it to unix. You can do that in most editors or with dos2unix (cmd). Just make sure it is formatted as LF, not CLRF
How do I replace old embeddings stored in pgvector to new ones. I have new embedding stored in pgvector, but the chatbot still queries from the old embeddings..
Outstanding work! This is one of the best tutorial I've found on how to build real-world LLM apps.I think it's really cool how you use Redis for preserving conversation history. However, I'm wondering if there might be issues exceeding the OpenAI token limit? Do you have any ideas to overcome this issue? Keep up the good work!
VERY good point. You could do that by cutting off the conversation if it grows longer than 10 interactions (or any other useful number). Would be worth looking at how ChatGPT handles that
Thanks for your response! I completely agree, keeping the recent X number of interactions might be sufficient for smaller projects. However, for more complex, production-ready apps, I believe using something like ConversationSummaryBufferMemory from Langchain might be a more robust solution. It would be really fascinating to see how this functionality could be incorporated into ConversationService. Looking forward to future updates with this in mind.
Once again, excellent work with this super-detailed tutorial. If you were to create a similar system for several restaurants, what would the architecture be like? Or would you simply create an instance for each restaurant?
Thank you. I would probably use a single vectorstore and save the Restaurant name or id as Metadata. And use that Info to Filter before the vectorsearch. Each Website would then send that Info in a cookie or the request header
@@codingcrashcourses8533 Thanks for the quick reply, excellent suggestion. Do you have any means of communication where we can exchange ideas other than youtube?
@@codingcrashcourses8533 Well, it's not exactly 'that german accent', not exclusively. It was also the top quality, very succinct, informative content. Your channels are easily mistakable in that regard as well. Truly great.
i always see these comments under like every tutorial video even the ones that give no really useful info, so any praise under any educational video kinda devalued to me which i suppose might be the case for the creator themselves too.
ive never written such comment partly for that reason and MAN i just want you to know that this video is the most appreciated of the ones about langchain to me, i literally cant begin to describe how useful for me it is, been searching for it for a really long time. you deserve the happiest life a human can imagine and i just hope some day i'll run into you to buy you a beer or two. thank you.
Thank you so much for that comment. Reall makes m day to Read something like this :)
You did a containerized, modular deployment of a LangChain implementation. Very nice! Man here made me smash that 'like'. He made me.
excellent ! production readiness score 8/10
This is absolute gold! You are helping me make my dream project right now, thanks so much!
Thank you for that nice comment :)
You are the best my man! Such a great video
A very well-done.. I'm glad to found this channel atleast now.
This is very cool! Would be keep to see more of this!
Great Work! Finally a Real World App on youtube
thank you very much :-)
Nice work!!! there are a lot of notebook but no so much information for real world. its really valuable!!!
Thank you. Will soon release a video on deployment on azure
This makes sense for very large projects, a monolith for most use cases is 100 percent fine. Microservices is for an app that is used constantly by many users and the app makes a lot of money. Also why would a restaurant pay monthly for so many containers? If it is a chain it is fine, for small business this is very impractical. Following best practices for the sake of it when it does not provide a benefit, and from the business perspective does not make sense is not a good idea.
BTW the design is good, and the tutorial was presented very well, just don't tell people to use microservices for a restaurant app.
Yes, this Implementation is for a Restaurant chain of course. Thats why i made that Statement about hybrid search.
Excellent work, it's great to see another approach more oriented to a real application, I wonder if you could make a video of the deployment in web services like AWS and GCP, there are many tools and I don't know which one to choose in these cases, it would help a lot to see how you execute it. Greetings from Colombia.
I will release a project on azure Sohn:)
did u release it already?@@codingcrashcourses8533
@@codingcrashcourses8533Did you also create a video about this? :)
@@joepsterk5176 i did a full udemy course and released a soll video
how did you connected build and connected postgres db ?
Great job!! Thank you very much!
Thanks a lot! Quick question - where did you install pydantic, langchain, and the other libraries that are not in the Dockerfile?
I installed it directory with pip install inside Docker. Most dependencies have subdependencies which get installed on the fly then
there are multiple dockers, which docker do you install this package? I run the python file and get an error about write into the database@@codingcrashcourses8533
Subscribed you channel, waiting for more full project based tutorial on AI, Langchain, Python, Data science. Keep up the good work ❤👍❤
Could you make an updated version with LCEL especially how to implement memory
how does your pgvector and redis scale? you can't have multiple docker containers of these services.
a scalable cloud service for redis and pgvector might be more desirable which can handle huge load
you can at least for redis. Redis has got Redis Sentinel which is a cluster solution which master and slave agents. For PGVector I am not sure how much load it can handle, but PostGres is a well established DB which normally can handle a lot of traffic with a single instance.
Is there any risk of PII leakage in this implementation if I am using other hugging face llm instead of OpenAI ?
Any comments on why you need to separate the conversation ID management into a separate service from the service making the call to open AI?
Good question! Well it depends what you want to do. In the company I work for we store the conversations from multiple channels, Audio and text. So we use this architecture to store all conversations in a single place
Thank you@@codingcrashcourses8533
Thank you, man, SO useful! regarding scalability. how scalable is this? should it be able to support an arbitrary number of users? I am trying to build something very similar but I am concerned on how the requests to the chatbot will be handled when multiple users make the request. because the generation of the llm response takes some time to generate so how is it that one user does not have to wait for the other users response?
Well, you normally scale horizontally and spin up multiple containers if the load increases. But to be honest: Our Bot in production works with a single pod/container. How many users do you assume with use your bot concurrently?
@@codingcrashcourses8533 I was thinking say for 100-1000 users, for an organization's internal use. how many users does your bot support in production?. My concern is in the concurrent requests to the llm. I don't see how if multiple requests are done at the same time the llm will be able to respond independently to each and not serially causing the last request to take very long to be answered.
@@felipenavarro3004 How many requests do you assume will happen concurrently? That the only critical variable in this equation ;-)
@@codingcrashcourses8533 hmm, true... Couple hundreds I would imagine at peak time. One last question. I was thinking on doing this but using a function calling agent from llamaIndex instead of just a bare llm. I know you recommend not to do this at the beginning of the video but is it because it wont work or is it because it makes the process not optimal. Thank you for your replies mate the have been very helpful
@@felipenavarro3004 We could handle 400 requests concurrently easily with a single pod. Make sure you loadtest your app before bringing it into production. Just make sure you make your code non-blocking. Regarding the framework: Do whatever you prefer and which makes it easy to build such an application.
what is I want to use a model like Llama2, how can I change OpenAI API for Llama2?
My team is recommending that we deploy our embeddings along with our code inside the azure Function App. I understand using a vectoreDB would be a more better approach. Can you tell me the limitations of keeping the embeddings inside functionsapp instead of a separate vectors store. So that I can make them understand?
How do you retrieve the embeddings and related documents without a vectorstore? You can do it on your own, but why?
Thanks you sir! Possibly you can do a similar another with current 2024 context ?
This video aged well in my opinion, the architecture can be used 1:1 still today :)
How can you make the ai call external functions and get input for those functions from the user. Such as the as askes for an order id.
I would try to archieve that with prompting and the usage of openai function calling. What might you want to know? Delivery status of a package?
@@codingcrashcourses8533 Yes, that would be the perfect example. I actually run an e-commerce store, and I am looking to implement a chatbot to assist users with their requests more effectively. Regular customers often inquire about the estimated delivery time for their orders. In order to assist them effectively, our chatbot will require their order ID to call internal functions to get the delivery time.
Have you worked with LangChain's agents at all in production? I've been trying to spin up a ReAct agent with access to tools for querying a Postgres DB and Pinecone vector store, but the implementation has been sort of lackluster. I'm using their ConversationalBuffer memory populated from a DynamoDB message history, but the agent really struggles with follow-up questions and using context from the conversation to form queries.
I'm wondering if its better to just unroll everything down to a simpler LLM chain and forgo the Memory class like you've done here.
Follow up questions are worth a video by themself. Got no solution I am totally happy with. In my company we tried going back in the history n-1 and adding that again, since similarity search won´t give you back data.
In general I am not a big fan of agents - they just are unreliable. I would probably use OpenAI function calling for querying a SQL Database. My Agents were too dumb to write good SQL. But try it yourself, I am also just a random dude trying out stuff
Amazing 👍👍👍
Nice tutorials.may i know the VS code theme used.
Material theme
Amazing! For some reason though, Im having trouble getting this to work with the LangChain(python) Functions Agent. Does it need to be done differently? By chance do you have any examples?
What did you Change? If you replace the llmchain with your agent it should work as well
@@codingcrashcourses8533 hmmm then i think my front end setup is now broken. Does the above work for Streaming? My frontend is SvelteKit which is where my error might be..
Well done!
nice video. I can echo the sentiment of others. Most videos miss out on important things or basics for a non experienced beginner. Do you think you could add a video in actual production (live production). Second thing just a feedback , the data flow and handshake of services if it is shown in numbers and different colors ( the flow basically ) would be great.
I have a full udemy course ok that :)
@@codingcrashcourses8533 i have to look for that. Have you also experimented with claude (from anthropic)
Can I substitute Redis for Chroma or faiss?
No, these are vector databases, which would substitute pgvector. For storing conversations you could use any other document db like MongoDB, but redis is great since its so fast and easy to use.
@@codingcrashcourses8533 Ah okay thanks!
Is there any particular reason why you use pgvector over FAISS? I saw benchmarks where people proved that FAISS is much faster
Yeah, but does it really matter if you app in 1ms faster or slower? I prefer to be able to use other tables for different stuff, use SQLAlchemy to build an API Layer around my Database. FAISS for me is some kind of big blackbox
I get an error message "exec /wait-for-postgres.sh: no such file or directory" when I build using docker-compose up. Do you have any idea how I can fix this?
I guess that you use a Windows Format for your file. Change it to unix. You can do that in most editors or with dos2unix (cmd). Just make sure it is formatted as LF, not CLRF
@@codingcrashcourses8533 Thanks. It resolved the issue.
How do I replace old embeddings stored in pgvector to new ones. I have new embedding stored in pgvector, but the chatbot still queries from the old embeddings..
Outstanding work! This is one of the best tutorial I've found on how to build real-world LLM apps.I think it's really cool how you use Redis for preserving conversation history. However, I'm wondering if there might be issues exceeding the OpenAI token limit? Do you have any ideas to overcome this issue? Keep up the good work!
VERY good point. You could do that by cutting off the conversation if it grows longer than 10 interactions (or any other useful number). Would be worth looking at how ChatGPT handles that
Thanks for your response! I completely agree, keeping the recent X number of interactions might be sufficient for smaller projects. However, for more complex, production-ready apps, I believe using something like ConversationSummaryBufferMemory from Langchain might be a more robust solution. It would be really fascinating to see how this functionality could be incorporated into ConversationService. Looking forward to future updates with this in mind.
Once again, excellent work with this super-detailed tutorial. If you were to create a similar system for several restaurants, what would the architecture be like? Or would you simply create an instance for each restaurant?
Thank you. I would probably use a single vectorstore and save the Restaurant name or id as Metadata. And use that Info to Filter before the vectorsearch. Each Website would then send that Info in a cookie or the request header
@@codingcrashcourses8533 Thanks for the quick reply, excellent suggestion. Do you have any means of communication where we can exchange ideas other than youtube?
@@Almopt currently i dont have any other channels :)
Good video.
I'm already going down this path.
Except I'm forced to use local model.
rip
Great video, but it woud be nice if you replace openai to an open as source llm.
Just deploy the Model and change the class you use. Thats what langchain is for
super helpfull
Hey, is this you? Algovibes?
Haha I though the same thing
No that is just that typical german accent :p
@@codingcrashcourses8533 oh damn! you really had me fooled. You guys should collaborate on something as a publicity stunt.
@@jayglookr haha i never heard of him before to be honest. But i know the german accent when i hear it ;)
@@codingcrashcourses8533 Well, it's not exactly 'that german accent', not exclusively. It was also the top quality, very succinct, informative content. Your channels are easily mistakable in that regard as well. Truly great.