Hi there. Quick question. I know the documentation states to open and modify the "cfg.py" but noticed it was replaced with app_config.yml and the other configuration are located in "load_config.py". I am receiving an error with the values I supplied so I was wondering what is the format and values the following configurations are looking for value-wise? I am using OpenAI for instance and not Azure. Thanks! openai.api_type = os.getenv("OPENAI_API_TYPE") openai.api_base = os.getenv("OPENAI_API_BASE") openai.api_version = os.getenv("OPENAI_API_VERSION")
Hi Mike. You are right, the correct configuration file is app_config.yml. Whenever you make changes to this YAML file, the updates should seamlessly propagate throughout the project through load_config.py script that handles the distribution of configuration values. To add a new configuration parameter, you'll need to follow these steps: 1. Introduce the new parameter in the app_config.yml file. 2. Update load_config.py to ensure that the new parameter is loaded correctly. 3. Access the new parameter in your project's modules by creating an instance of the configuration loader, like so: ``` APPCFG = LoadConfig() your_new_parameter = APPCFG.new_argument ``` Regarding the OpenAI credentials, it's crucial to handle them securely. I use an environment variable to store these credentials, which is not included in the GitHub repository to maintain security. You should create a .env file within your project directory and populate it with your OpenAI credentials. Here's an example of what that might look like: OPENAI_API_KEY= Since you're not utilizing Azure, you may not require all four of the credential arguments I use. Just include the necessary ones provided by OpenAI. I suggest checking these links and make a simple API call to OpenAI to ensure you understand the process. * www.datacamp.com/tutorial/using-gpt-models-via-the-openai-api-in-python * platform.openai.com/docs/api-reference/streaming?lang=python Alternatively, if you prefer not to use a .env file, you can directly insert your OpenAI credentials into the load_config.py module, although this is less secure and not recommended for sensitive information. I hope this clarifies the configuration process for you.
@airoundtable Thank you for your reply. I ended up creating a .env but still getting the following error. I even left the the API endpoint out as well since you mentioned this was only needed for Azure. APIConnectionError: Error communicating with OpenAI: Invalid URL 'None/engines/gpt-35-turbo-16k/chat/completions': No scheme supplied. Perhaps you meant None/engines/gpt-35-turbo-16k/chat/completions?
@@mikew2883 No problem! Well, that means that the API call is now ok but the code to generate the response from the GPT mode is not compatible with OpenAI. The issue is because I am using something like: ``` openai.ChatCompletion.create( engine=gpt_model, messages=[ {"role": "system", "content": llm_system_role}, {"role": "user", "content": prompt} ], temperature=temperature, ) ``` This is the code that works with Azure OpenAi. While for those who use OepnAI directly, the code is something like this: ``` client.chat.completions.create( model="gpt-4", messages= messages=[ {"role": "system", "content": llm_system_role}, {"role": "user", "content": prompt} ], ) ``` So, in order to fix the problem: 1. Check the project schema and find where GPT models are generating a response 2. Find the code in the project and change them with the OpenAI format. I suggest you first generate a response from a GPT model using your API key and make sure that the code that you are using works as expected. Check this link for more info on how to generate the response with OpenAI: platform.openai.com/docs/api-reference/streaming?lang=python ,
This is one video if you want to learn and make an advanced RAG Project. Other videos are also equally great and I love your approach to how organized you are in your videos; your code quality is just WOW.
This is an incredibly informative and well-structured video! The detailed breakdown of the RAG-GPT chatbot, along with the time-stamped sections, makes it easy to navigate and understand. The inclusion of real-time document uploads and summarization requests showcases the versatility of this chatbot. The GitHub links and references to the main libraries used are very helpful for those who want to delve deeper. Keep up the great work! Looking forward to more content like this. 👏👏👏
@@usamaahmed8075 @usamaahmed8075 Hi. what do you mean by the uploaded db? if it said vectordb does not exist. it is because you have to run this module first: upload_data_manually.py I explained it completely in the video. If you are planning to upload a document and then start chatting with it, you need to first upload it using the upload button and make sure that you see the confirmation message on the screen.
A very well done @Farzad. Great explanation. This is exactly the concept I was looking to understand and implement. You are simply 100x amazing. I am highly excited to listen to your other videos as well. Thanks for keeping this channel so informative. One suggestion from my side: next time, please use local LLMs like Ollama Llama 3.1 so those who cannot afford it will benefit.
Thanks! I appreciate the kind words and I am glad that the content was helpful. Thanks for the suggestion. I have almost the identical project using open source LLMs. Please check out: ua-cam.com/video/6dyz2M_UWLw/v-deo.htmlsi=u-QWc-Mz5oOA17LS
What are your suggestions on cleaning the company docs before chunking? Some of the challenges faced are how to handle the index pages in multiple pdfs also the headers and footers. You should definitely make some video related to cleaning a pdf before chunking much needed.
Well, handling company documents for integration into a RAG system is indeed a complex task. It's often so detailed and requires such a hands-on approach that I would strongly suggest treating the document preparation as a separate project from the RAG chatbot development. Even that project by itself can be divided into two main flows: 1. Cleaning and preparing existing documents 2. Establishing a standard format for all the new documents for easier future integration Since the RAG system is going to perform a vector search across the entire document set, I suggest removing the unnecessary or duplicate content (for instance I cannot think of any possible way that a separate index would add value to the conventional RAG strategies and vector search techniques, unless you design a complex RAG system that incorporates hierarchical graph methodologies). Finally, if your documents contain domain-specific abbreviations that general language models may not recognize, you can think of implementing an advanced RAG system with a fine-tuned LLM on your specific domain data (There is another video in the channel that explains how to fine-tune an LLM on company documents which might give you some good ideas). And thanks for the suggestion! I'll consider creating a tutorial video and address this issue.
I just discovered your video today in my feed. This is an excellent project with great attention to detail. Very well done. I cloned it and saw a project in your bullet list called "Open Source LLMs" along with the note that it is coming soon. Do you have any idea when that might be? This is important for those of us wanting to run LLMs with RAG locally on our machines. Very much looking forward to seeing this. Thanks for your work,
Thank you very much for the positive feedback @doctorbill37. I am glad to hear you liked the video. For the open-source-RAG project, I have a good news. I have already started to take the video. It will be uploaded in the next couple of days
You did a great job, but the videos are so small, I have to constantly expand them to read it. It would be nice if you could read the text without going full screen all the time.
Thanks @RetiredVet! You are right. I have to find a way to increase the size if the contents for an easier read. That is actually why in langchain vs llama-index video I omitted the powerpoint and showed everything on screen and also showed each command that I was executing on the screen as well. However, I am constantly looking for ways to improve the quality as I just recently started to upload videos on UA-cam.
@@airoundtableI enjoyed your video and think your code is great. The code and explanations are the important part. You can learn the video stuff much easier. I've looked at a lot of langchain videos and your explanations are very clear. Unfortunately, I am an intermediate python programmer and I had no idea that requirements' files were so different between windows and Linux. I cannot use your requirement files and when I try installing langchain with pip these days, it never works. If the UA-cam video is a week old, the requirements have changed. I try to downgrade, to use the recommended versions, but then langchain installs packages that don't work. I am learning a lot more about package management than I ever wanted to. Langchain is a very interesting project, but it is moving so fast, it is difficult for me to keep up. Keep up the good work.
Thanks for the great tutorial! Just out of interest, would it also be possible to use streamlit as a user interface or are there any technical issues? Thanks again.
Thanks for the feedback! I am glad that you liked the video. Sure, you can use streamlit as well. In my opinion using streamlit is a bit easier than Gradio. That was also one of the goals of this series to show how to use streamlit, gradio, and chainlit. I used esch one of them in a separate video. If you check the channel you will see a chatbot that I designed with streamlit. "Connect a GPT agent to duckduckgo search engine". Feel free to reach out if you have any other questions.
@@airoundtableThanks! You mentioned also the issue of data flow management. Let’s assume that I upload ten documents in advance to the database then I have another one that I upload while using the chatbot. Will the chatbot use all eleven documents to answer my question? Thanks again for your help!
That is a good question@@tobiasbuchmann6972. No this chatbot treats the documents that were prepared in advance differently from the ones that you upload while using the chatbot. So, to your example, it creates an index for those 10 documents that you preprocessed earlier, and it creates a separate index for that one single document that you passed to it while using it. And also let's say that during using the UI you upload documents in multiple steps. Everytime that you upload a new set of documents, it creates a new index for them and points the chatbot to the most recent index. Finally, whenever you run the UI, it make sure that all the indexes that were created for the uploaded documents during the previous user's session are removed and clean up the disk and it gives you a fresh start. But this is just one way of doing it. At the end of the day, all these functionalities can be adjusted based on your needs.
Great video. I'm testing out the project but it seems that the chatbot also takes information from the web, as it accesses websites when there is no uploaded docs. I would like to have it only interact with the PDFs/uploaded docs... Any fixes?
Thanks Thomas! The chabot does not have access to the internet but it has access to the pre-trained knowledge of the GPT model. Overall, it works in 3 different ways: 1. If you have already preprcessed some documents and start using the chatbot, it will give you answer based on those documents (This is the chat with pre-processed doc feature). 2. If you select Chat with upload docs feature and upload documents, it will start giving you answer based on the uploaded documents (until you change the setting to with pre-processed doc again) 3. In case the user's question is not related to any documents the chatbot will use its own knowledge but in a limited way to just act as a friendly chatbot. If you would like to restrict it even more, you can change the LLM_system_role in the config folder (configs/app_config.yml: llm_system_role argument). It is where I am instructing it and explain to it how it should behave. I explain it in the video at: 00:40:45 LLM system role I hope this helps you solve the problem.
Great video on RAG. One quick question. Can we add documents from UI for preprocessing and chat with that rather than adding documents to data folder from backend? I mean..add a functionality in UI that will allow me to add documents in data folder and preprocess it so that i can chat? Thank you
Thanks! yes the chatbot has that capability. Follow these steps: 1. In the "Rag with" drop down choose chat with Upload docs 2. Use the "Upload doc" button and select your documents. 3. Stay a few seconds until the chatbot tells you that the documents are procssed and you can start asking questions.
If you mean whether you can add/remove/modify the files of the vectorstore that you created, the answer is yes you can. you can easily find the info on it. I just did a quick search and saw this tutorial: www.datacamp.com/tutorial/chromadb-tutorial-step-by-step-guide But I'd search the ChromaDB documentation to find all the details.
Hi, yes. You have to pay for it to be able to make the API calls. If you go to openAI website you will see how to get the API key from there. Just also keep it in mind that the project is currently using Azure OpenAI. In case you want to use OpenAI directly, a couple of modifications are required. I have pinned a comment here (you can see it on top), where I explained all the steps in detail.
Hello there, I am running into the following error: The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again.
Hello. The project is using Azure OpenAI. Therefore, it needs access the API key, endpoint, and deployment name of the GPT model and embedding model in azure. And for that you need to deploy them on Azure OpenAI studio first. This error rises in two scenarios: 1. The model has not been deployed yet 2. The model is deployed but the deployment name that was passed to the chatbot is not the same as the deployment name that was used in the OpenAI studio. And in case you want to use OpenAI directly, instead of Azure OpenAI, please read the pinned comment for a step by step description of the changes that are required for that.
@@tonysingh9426 create a file in the project directory and name it: .env in there create these arguments: OPENAI_API_TYPE=azure OPENAI_API_VERSION= OPENAI_API_KEY= OPENAI_API_BASE= gpt_deployment_name= embed_deployment_name= Also in the config folder, in llm_config there is the name of the gpt in engine argument, change that as well. Then run the chatbot. The project automatically loads the .env file and extract all these information from it
Hi, i am using Openai entirely and not azure, i change the chat completion function as per your solution in the comments, but i am getting the error : TypeError: 'ChatCompletion' object is not subscriptable in response["choices"][0]["message"]["content"]. Please suggest some resolution for the same. Also, the application stops fetching answers after the first question. Please help. @AI RoundTable
Hi @taylorfans1000, this problem is not hard to debug. 1. Based on OpenAI's website, even by using their model directly you should be able to extract the response from response["choices"][0]["message"]["content"]. Here is the reference: platform.openai.com/docs/guides/text-generation/chat-completions-api 1. But to make sure that you GPT model is working as expected, test it separately on a notebook. Use your API key and the Client.chat.completion class from openai and make a successful API call with the GPT model that your are using. 2. Don't use response["choices"][0]["message"]["content"] and directly print(response) itself and make sure that you are getting the whole json response from OpenAI. 3. Once you've got the API call working, then you can try to get the specific message content by using response["choices"][0]["message"]["content"] and make sure that you can extract the response content from OpenAI's json response. 4. After you went through these steps successfully, apply it to the following files and lines in your project: src/util/chatbot.py in lines 68 to 78 src\utils/summarizer.py in lines 110 to 118 My guess is the fetching problem will be solved as well after you fix this. I hope this helps, feel free let me know in caseyou have more questions.
That is the million dollar question.There have been a lot of effort around evaluating RAG systems but the challenging part is that there is no single metric that can tell you how accurate your system's response is. Instead, to evaluate RAG systems, we usually use the help of LLMs themselves. But before that we need to understand what the challenges in a RAG system are. here is a brief summary of some of the key steps: 1. In data preparation pipeline: data quality + chunking strategy + embedding quality 2. In retrieval side: user's query quality + search quality + relevance of the contexts of the retrieved documents to the query 3. In sysnthesis side: context overflow + LLM hallucination + answer relevance These are the components that need to be adjusted and evaluated in a RAG system. For the evaluation pipeline itself, you can either use the frameworks that are being developed for this purpose: e.g: TruLens, Langsmith, Galileo (My recommendation: Langsmith) Or you can design a custom pipeline depending on your goal and usecase. I have a video on the channel called, "Langchain vs Llama-index" where I design an end to end pipeline and evaluate the performance of 5 different RAG techniques. There I go into much more detail about this topic. Overall, this task requires a good amount of testing and iteration specially if the requirements are too specific and complex.
Thanks. Yes, You need to convert the openai call sections and use another framework like ollama for example. I have received multiple messages from the community that they successfully used OLLAMA with the projects in the channel
I ran this project on Windows but it does not matter. The project is designed in a way that it can be executed on any OS without needing any modification. If you are getting it while installing the requirements.txt just remove the library that is causing this issue and try again
This project is only for .txt and PDF files and easily extendable for docx files (you just need to add it to the list of acceptable files in the code). If your documents contain images and tables, that would not hurt the performance on text. But for implementing RAG with images and tables this project needs to be upgraded. RAG on images requires image embedding and vector search on those embeddings. Still there is no solid approach that can handle various type of images (e.g: technical drawings). But there are a few priliminary solutions out there that work on generic images. So, the industrial application is still very limited. Tables on the other hand requires specific approaches that are able to extract the contents of a table properly. "Unstructured" library has been working on this aspect and Langchain has adapted it within its framework. So, since I used Langchain in this project, you can easily modify it and add that approach to it. But the problem with that is handling tables with that approach takes a very long time (in my experience) which makes it impractical for industrial purposes. That leaves the door open to custom solutions that can suit a specific business need which can vary widely.
@@shahnaz9026 technically you can. You need to make alot of refactoring to the code but it is doable if you want to run it on jupyter. Keep it in mind that this code is currently using Azure OpenAI, in case you want to use OpenAI directly, the chat completion functions need to be modified as well.
Sure, you can use other LLMs but you have to modify the code for that. The code is now using OpenAI GPT 3.5 (with API calls) model for inferencing. If you want to change the LLM and run the code: 1. Use a powerful LLM for a good performance (consider the context length, chunk sizes, and the instructions that you want to give to the model along with the available computational power that you have at hand) 2. You need to change the code wherever it is getting the response from the GPT model, which happens in two location: a. src.utils.chatbot.py - response function - line 68 to 72 b. src.utils.summarizer - get_llm_response function 3. Depending on the model that you are using, you may need to process the response in a differen way as well. (e.g llama2's output will contain both the query and the response along with some special characters that need to be processed for a neat user experience.)
Very detailed explanation and thank you for making it open source. Is there any plan to advancement to this application? like 1 advance rag pipeline which can extract text and table data or image based on user question 2 create vector db based on text image and table data? 3 providing login and admin panel to track the information like no of token used by different users etc 4 using react node for better app experience? 5 Complete deployment process ?
Thanks! I am glad that you liked the video. These are all great points. For some of them yes, there will be a video soon and for some I still have no plan. I am looking into solutions for taking into account tables and unstructured documents along with images. There are already some solutions out there (unstructured library and image vector databases) but non of them are still practical in my opinion. For instance the available approach that "langchain" and "unstructured" proposed for processing the tables in documents is super slow and technically non practical. So, I will make a video as soon as I see an approach that can be applied in real-world scenarios. The next two videos would be interesting for you I guess. The next one is a multimodal chatbot that uses 5 different models in the background and is able to answer questions asked from the context of an image as well. And the one after is an advanced RAG chatbot that uses knowledge graph and take into account more detailed relationships between the content of a document and related chunks. 3,4, and 5 have crossed my mind but I still have not planned a video for them. I will keep it in mind and think about it after the next two videos. Thanks for the suggestions @kunalsatpute8379!
@@airoundtable Thank you for replying, and excited for your videos. One question Will these videos be extension or enhancing this application ? or would be be entire separate video?
@@kunalsatpute8379 That would be expansion. Beside LLM applications, one of the main ideas behind this series was to walk through all the necessary steps required for an advanced multimodal chatbot. I started by explaining function calling and vector search and using them I designed multiple projects. The next video would be: 1. A combination of all the chatbots that I have designed and uploaded on the channel so far (RAG-GPT + connecting the GPT to the search engine + chatting and summarizing websites) 2. We will use the concept that I showed on open-source RAG for creating a web server for serving models 3. We will add more abilities: The user can interact with the chatbot by sending voice, text, and image and the chatbot will respond in voice, text + we can ask questions about a specific image that we uploaded and the chatbot would be able to answer the questions about the image context + we can ask the chatbot to generate image for us as well. So it would be an any-to-any chatbot input: voice, text, image output: voice, text, image functionalities: Normal AI chatbot + RAG with documents + RAG with websites + search the web using a search engine + summarize documents + summarize websites + understand image both for answering questions and for generating them So, RAG-GPT project would be one arm of that chatbot and I am thinking to give the user the ability to work with around 9 or 10 different Gen AI models (all open-source except GPT). So, in that video I will just briefly touch RAG-GPT and the other parts that I have already covered in the previous videos and the focus would be to explain the multimodal side of it and how the whole chatbot was designed. That is a huge project
Hi, I am really not sure how the models would perform on Arabic. You can give it a try or search in arab forums and see what models do they suggest for arabic. Which key are you refering to?
It is hard to tell without seeing the code and it depends on how you are calling the model and generating the response. In the code that I put in my github repository, the model does not stream the response. But in case you are using a different code with langchain, check these links: - python.langchain.com/docs/modules/model_io/chat/streaming - python.langchain.com/docs/modules/model_io/llms/streaming_llm
This is a great video that clearly explains the whole RAG development process. One quick question, I created the .env file to store the OPENAI_API_KEY. But it still could not find it. where should I put the OPENAI_API_KEY?
Thanks, I am glad that you liked the video! To make that work: create a raw file and name it ".env" and put it in the parent folder of the project (RAG-GPT) folder and add your arguments like this: OPENAI_API_TYPE=azure OPENAI_API_VERSION= OPENAI_API_KEY= OPENAI_API_BASE= Then to test if it is working properly, open a notebook or a raw .py module and run this command: import os from dotenv import load_dotenv, find_dotenv # This line automatically finds the .env file in your environment _ = load_dotenv(find_dotenv()) openai_api_type = os.getenv("OPENAI_API_TYPE") openai_api_base = os.getenv("OPENAI_API_BASE") openai_api_version = os.getenv("OPENAI_API_VERSION") openai_api_key = os.getenv("OPENAI_API_KEY") Then you can print and make sure that it got it right: print(openai_api_type ) If you see the values by printing them, then you are good to go.
Hi, I got an error like there is no files inside the data/docs FileNotFoundError: [WinError 3] The system cannot find the path specified: 'data/docs' but I didnt change anything of your repo, I just clone it and run. Can you give some guidance for this issue
Hi, It shouldn't be the case. this directory is part of the repository and I am using 'pyprojroot' library to manage the directoris in the project automatically. Without the full traceback of the error, I cannot understand why it is happening. In case you still get the error, feel free to share the full traceback and I will let you know the source of the problem.
Not to this version. For performing Q&A and RAG with CSV files please have a look at my LLM agent videos. For images you would need more complex approaches and the one presented by Langchain and Unstructured are not ready for production and they took a very long time to process the images. Although in my next video, I am aiming to show how to find-tune multimodal LLMs on custom image datasets and those models can be used to perform RAG on images. Here is the link to my video describing how to chat with SQL and tabular databases: ua-cam.com/video/ZtltjSjFPDg/v-deo.htmlsi=bh9xdkJqufFBMrBI
Hi, I love the whole project, but I would be happy if you go more in-depth on the following statement in the repo: "It is strongly recommended to design a more robust and secure document handling process for any production deployment." Do you mean like improvement on security of the documents and restricted access for the app and implement such steps, or?
Hi, thanks. No, I would suggest the accessing level on the chatbot side. In general, for RAG rjects, I suggest to separate the doument processing pipeline from the chatbot itself. And for the data pipeline there are so many factors that need to be taken into account. If I assume the company is mid size or bigger: 1. Document cleaning, and transformation (for instance, if you are dealing with, .txt, .docx, and .pdf, after preprocessing, you can convert all to PDF documents for a managable workflow). 2. Content validity check. This step can be managed on a division level or on multiple levels depending on the size of the company and the complexity of the documents. (The verification teams should verify the contents and be responsible for what is in there.). 3. Avoid duplicates. (Sometimes differenti dvisions have very similar documents but for different usecases and purposes. This can cause confusion in a RAG project.) 4. Address specific cases (e.g: if a PDF file was created using scanner, you would not be able to perform RAG on it.) 5. Manage security (who can add/remove a document from the pipeline itself and also the access level) 6. Implement a CI/CD pipeline for automating the workflow and managing scale (e.g: in case a document is added to a database, your pipeline should be able to only add that document to the vectorDB rather than recreating the vectordb from scratch) 7. Secure the pipeline 8. Test and improve These are some of the key aspects of the data pipeline for RAG projects.
dude!!!! INSANE!! such a good tutorial. you rock. my one question is: credits. Will the vector function save you credits? e..g i want to build a legal document reader & Q&A. Some docs are 100 pages long. wont each doc cost hundreds in API credits? OR is that what vectorisation & DBs are for?
Thanks! I am glad you liked the video. Storing the vectorized documents in a vectorDB definitely can save costs and it won't be efficient to build such a system without the use of vector databases (especially since you can use many of them for free). About the pricing itself, vectorization is not a very costly task unless you are dealing with thousands of documents. According to OpenAI documentation, text-embedding-3-large will be priced at $0.00013 / 1k tokens. Keep it in mind text-embedding-3-large is their new model and most expensive one (also more expensive than the one that I used in the video). Source: openai.com/blog/new-embedding-models-and-api-updates So, imagine for a document with 100 pages, in case you turn it into let's say 500 chunks, you will make 500 API calls to this model and it will roughly cost you something around: 0.0325 US$ (approximate). I considered each chunk to be around 1500 characters which makes it around 400 tokens. Also, keep it in mind that you can manage the vectordbs by editing them (add/remove documents) rather than re-creating them on every small change.
Great Project, but eventually we are using OpenAI API's which does the "real" AI magic here. question is : if we already plan to build a customize bot and train it on our data, wouldn't we also want to train our own open source LLM model such as LLma-3 and remove the dependencies with API online calls ( which makes the project strongly bound to public access of our private documents, right ?)
It very much depends on the requirements of your project. If using closed-source models lie GPT is not an option, then using a powerful open source model like Llama is the way to go. You can definetly do that. I have another video that I have implemented this same project but using open-source models. You can check it out here: ua-cam.com/video/6dyz2M_UWLw/v-deo.htmlsi=Lfn9h2Y9pl5zRUd9
Hi, I am using windows machine and getting error while running upload_data_manually.py file its giving me runtimeerror: your system has an unsupported version of sqlite3. Chroma requires sqlite>=3.35.0 then i checked sqlite version using sqlite3 -version its showing me 3.41.2 which is greater than 3.35 🤔
Hi, please try to replicate the project using the exact library versions that I included in requirements.txt and let me know if the problem appears again.
OpenAI GPT and embedding models need an API key. If I got your point correctly, you want to use open source LLMs. You absolutely can. Please check the Open Source RAG video in my channel. That is alsomt the identical project with open source models. ua-cam.com/video/6dyz2M_UWLw/v-deo.htmlsi=W9dW4JNbC2KH_tHs
I have been using python 3.11 and never faced any issue with RAG projects. I also included a requirements.txt file in the project root directory that you can check the versions of all the libraries that I used for this chatbot.
Absolutely, please check out my latest video called: Open Source RAG Chatbot with Gemma and Langchain | (Deploy LLM on-prem) i took RAG-GPT and replaced GPT models with Google Gemini7B. I also replaced OpenAI's embedding model with an open source model. I would never suggest using a 7B LLM for deployment but my main goal was to show how you can have the same pipeline (RAG-GPT) but with an open source model on prem.
My url is generated for UI , but nothing is getting displayed. I checked vector db is also created under chroma folder for the documents already stored under docs folder, I am using azure open ai credentials , what could be the reason?
It is hard to tell without seing the Traceback of the problem. Whatever is happening, you can see it from the terminal. In case the problem is not solved yet, feel free top open up an issue on the github repository and post the traceback there.
To add .epub files, modify "prepare_vectordb.py" module and add the condition for adding .epub and then use the following links to prepare the langchain loader and pass the loaded files for chunking. js.langchain.com/docs/integrations/document_loaders/file_loaders/epub python.langchain.com/docs/integrations/document_loaders/epub api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.epub.UnstructuredEPubLoader.html
This is the most comprehensive RAG tutorial video I have seen on UA-cam. What a great effort and command over the subject sir! I am from a low-code business analyst background so I heavily depend upon co-pilot to guide me on python script functionality Still I was able to set up the system as explained by you on my local PC, however I am getting the error on executing python src aggpt_app.py " import pwd ModuleNotFoundError: No module named 'pwd' " Can you guide me on what I am missing Many thanks
Thanks! I am happy to hear that you liked the video! It is a bit difficult for me to debug that code without the whole traceback. What operating system are you using? and can you send me the full error?
@@airoundtable I have same error, I use Windows 10 ... File "D:\Install\PyThon\lib\site-packages\langchain\document_loaders\__init__.py", line 18, in from langchain_community.document_loaders.acreom import AcreomLoader File "D:\Install\PyThon\lib\site-packages\langchain_community\document_loaders\__init__.py", line 163, in from langchain_community.document_loaders.pebblo import PebbloSafeLoader File "D:\Install\PyThon\lib\site-packages\langchain_community\document_loaders\pebblo.py", line 5, in import pwd ModuleNotFoundError: No module named 'pwd'
Thanks for your kind reply. I am using windows 10. On VS Code I have created the virtual env, installed all libraries and followed all instructions as given in readme (for RAG-GPT application) When i initialize python serve.py its ok PS F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT> cd src PS F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT\src> python serve.py Serving at port 8000 But when I initialize python raggpt_app.py ( To launch Gradio ). I get the following error ( truncated ) PS F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT> cd src PS F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT\src> python raggpt_app.py Traceback (most recent call last): File "F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT\src aggpt_app.py", line 25, in from utils.upload_file import UploadFile File "F:\Users\XYZ\miniconda3\lib\site-packages\langchain_community\document_loaders\pebblo.py", line 5, in import pwd ModuleNotFoundError: No module named 'pwd' PS F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT\src>
Hi, Thanks! Sure, I will update the repository in about an hour and push that file in it. I saw you opened an issue on Github as well, so I will inform you there when the file is added.
This is a fantastic tutorial. Nice Work! **Question, sporadically the chat history and retrieved content are appended to the chatbot along with the question/response. Not sure if it's related but, it seems to only happen when I adjust(increase text details) the Prompt instructions. Any idea why? Thanks.
Thanks! I am glad that you liked the video. I am not sure if I understood your point correctly. The chat history, is appended to the model's input in every query. (Along with the retrieved content, and the user's query itself). Therefore, the GPT model should always have access to the chat history. But in case the performance is not consistant, the source can be 1. Due to the context length that the GPT model recived. GPT 3.5 has 4096 token limit and based on my experience if it recives somethong over 3000, there is a great chance that you see degradation in performance. In case you are interested, have a look at minute 8 of Open Source RAG Chatbot video for a more technical explanation of this aspect. 2. Format of the input: In case the chunk sizes are very small in a sense that the input does not create an understandable context, GPT model can get confused as well. 3. Model system role: model system role should clearly guide the GPT model to understand each part of the input context. A vague system role can make the GPT model confused and therefore it can affect the usage of the chat history. Please let me know if this could answer your question.
@@airoundtable Sorry for the confusion. After more testing, I don't think adjusting the prompt is the issue. Sporadically, the chat history, retrieved content(similar to references sidebar), source text and the input query are displayed on the output of the Gradio chatbot. Is this normal functionality? It doesn't happen on all questions though, some just return the response. Thanks again for the interaction.
@@Preds23 No worries. Now I think I got your point which is a good point indeed. In some occasions the GPT model also provides the source with the response and in some cases you don't see it unless you click on references. Here is why: That behavior is due to the context length that I mentioned in the previous message. If the context that the GPT model receives is not too long (in a way that it overwhelms the model) the model is able to pick up that piece of instruction and show the source in the output. But if the context length that the model receives is long (chat history + user prompt + retrieved content) GPT 3.5 is not powerful enough to be able to follow all the details. So, it will only focus on the main body of the instruction and misses that part where it should also provide the source. However, as you noticed you can always see the reference and the chunks in the reference bar. But in case you want a more consistent behavior from GPT on that matter: 1. Use GPT 4 instead of GPT 3.5. with the current config, GPT 4 would be almost always return that source as the end. or 2. Reduce the size of chunks and the number of chat history to be injected to the model. To do that you have to make sure that this change is not so drastic to downgrade the model's behavior but to some extent, you can make the model's input a bit shorter so the model can pick up all the instruction's details. To put it in a more simple explanation think of it this way: GPT 3.5 has 4096 token limit. If you pass an input to it with 3500 tokens, the model will focus more on the beginning and end of the input and starts to forget (or ignore) what was said in the middle section of the input. And in case you pass an input with 2000 tokens, the model can understand and follow all the instructions nicely without any issue. This is an intrinsic characteristic of all LLMs. Hope this help you understand the problem.
Very Informative video, understood all the components involved, could you please let me know where should I define "OPEN_API_KEY", is it in moderation.py, or this along with other parameters should be defined in environment variables, could you please help me with this.
Thanks! To keep important information like passwords or API keys safe, we put them in a special file called .env in our project. This is a common way to handle private settings. Here's a really easy guide to help you set it up: Make a new file in the main folder of your project and call it .env. 1. Inside this .env file, you can write your private information like this: ``` OPEN_AI_API_KEY=yourapikeyhere ANOTHER_SECRET_KEY=yoursecretkeyhere ``` 2. To use these settings in your code, you'll need to add a couple of lines to tell your program to read the .env file. Here's how you do it: First, you need to add these lines at the beginning of your Python script: ``` import os from dotenv import load_dotenv load_dotenv() # This tells Python to read your .env file ``` 3. Then, whenever you need to use the information from the .env file, you can get it like this: ``` openai_api_key = os.getenv("OPEN_AI_API_KEY") another_secret_key = os.getenv("ANOTHER_SECRET_KEY") ``` Replace OPEN_AI_API_KEYand ANOTHER_SECRET_KEYwith the actual names you used in your .env file. Now your code can use your private settings safely!
how to register Azure Deployment? we already have some params in load_openai_cfg(self), like API_KEY, API_BASE, API_VERSION, API_TYPE. I tried to add parameters API_DEPLOYMENT, but it still error: No deployment found
I hope you have already found the answer to your question. But in case you haven't, first check and see if you can call your models from a separate notbook from the project. After the successful call, then try to add the models to the project. This error was raised because the project could not properly communicate with the model. That is either due to the model name or the credentials that you took from Azure OpenAI
@@airoundtable Yes already solved. But I ended up using OpenAI instead of Azure. Hi, I want to ask again: How can we make the chatbot understand the format of documents? Example, I have a document with format: title, dates, content. Then I upload new document and check if my new document has the same format with my preprocessed document. And I altered system prompt in about_config.yml, to add capability of the chatbot to detect typos, but it doesn't work. How can I edit the project? Thanks before
@@coffeepod1 Glad to hear it. Regarding your new question, I've never done it but if I want to solve it, I would use a hybrid approach by combining python libraries that are able to extract document info based on the document structure (headings, subheadings, etc.) and from there I could either hard code it to make sure the structure is correct or I would use an LLM to make the judgment for me. But just sharing a document to an LLM would not be an effective way for achieving your goal. About the typos, since LLMs work with tokens and not the words, it is sometimes hard for them to detect typos (especially when the context length is long). The benefit is when we interact with them they don't care if we have typos in our queries. But on the downside, when it comes to fixing these typos, there is a chance that they miss the errors. They can perform better with smaller context lengths which is not usually the case for RAG systems.
@@airoundtable I see, we are not OpenAI (or big AI companies), limited on resource indeed. But, what if "prompt engineering" with few shots about the doc structure? or how about NER approach since it is similar with information extraction. and about typos, I don't really think about the obstacles like you explained man, I just think the GPT model (I use GPT-4o) has capabilities to detect typos. Poor me, working on hard project
@@coffeepod1 I am not sure about the suggestions. I haven't work with them. But at this point, I would say the best strategy would be to test different approaches quickly and see which one is more effective than the other ones. Then spend time and improve that approach
Hi, Could you give me your .env setting you use, openai_api_type, openai_api_base, and openai_api_version? I am struggling to make it works.. thank.. great video btw..
Thank you for your positive feedback @agep13. Regarding the .env settings, it's important to note that these often contain sensitive information, such as API keys, which should be kept private and not shared openly to ensure security. However, I can certainly guide you on setting up your own .env file. For the openai_api_type, openai_api_base, and openai_api_version, you'll need to consult the official documentation provided by OpenAI or Azure to determine the correct values. These details are typically available in the API or developer section of your account dashboard. Here's a basic template for what your .env file might include: OPENAI_API_KEY=your_unique_api_key OPENAI_API_TYPE=the_type_of_api_you_are_using OPENAI_API_BASE=azure_open_ai_endpoint_url OPENAI_API_VERSION=version_number Please ensure you replace the placeholders with your actual API key and the appropriate values for your use case. The OPENAI_API_TYPE will depend on the API service you've subscribed to (e.g., GPT in the video.), while the OPENAI_API_BASE and OPENAI_API_VERSION are generally standard URLs used for accessing OpenAI's API. Remember to keep your .env file secure and avoid uploading it to public repositories to prevent any unauthorized use of your API keys.
@dtable hello great video, i know the OPENAI_API_key is in your open ai account and openai_api_type would be like gpt3.5 or 4 but for the rest im still confused on where to get the base and version how do i check which urls to use
Hi@@ShadowScales, Thanks! Your confusion is on point because to use GPT models from OpenAI directly, you won't need to insert the endpoint and the api_version. You probably missed that I am using GPT model from Microsoft Azure. That is why I have 4 credentials for it. In order to understand how to adjust the code to OpenAi API directly, please read the comments under @mikew2883 down below (it is the comment with the 7 replies). There we had a full discussion on how to properly modify the project. I hope this helps. In case you have any question along the way, please let me know.
Hello thanks for the video but when i try to run the app i get this error : " openai.error.InvalidRequestError: Invalid URL (POST /v1/engines/gpt-35-turbo/chat/completions)" because i dont have access to azure open ai api yet im using openai api, Would you be able to help me with this ? Thank you
Hi Arian. Thanks. I just pinned a message on top in the comment section where I discussed this issue with @mikew2883. Please read that discussion. I provided all the necessary guidance for changing the openAI API call from Azure to OpenAI itself. let me know if you have any other question
@@airoundtable Thank you for your reply. I made a few additional adjustments, and now it works, really appreciate your awesome work and the effort you put in. Damet Garm 👊
Hi there. Quick question. I know the documentation states to open and modify the "cfg.py" but noticed it was replaced with app_config.yml and the other configuration are located in "load_config.py". I am receiving an error with the values I supplied so I was wondering what is the format and values the following configurations are looking for value-wise? I am using OpenAI for instance and not Azure. Thanks!
openai.api_type = os.getenv("OPENAI_API_TYPE")
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = os.getenv("OPENAI_API_VERSION")
Hi Mike. You are right, the correct configuration file is app_config.yml. Whenever you make changes to this YAML file, the updates should seamlessly propagate throughout the project through load_config.py script that handles the distribution of configuration values.
To add a new configuration parameter, you'll need to follow these steps:
1. Introduce the new parameter in the app_config.yml file.
2. Update load_config.py to ensure that the new parameter is loaded correctly.
3. Access the new parameter in your project's modules by creating an instance of the configuration loader, like so:
```
APPCFG = LoadConfig()
your_new_parameter = APPCFG.new_argument
```
Regarding the OpenAI credentials, it's crucial to handle them securely. I use an environment variable to store these credentials, which is not included in the GitHub repository to maintain security. You should create a .env file within your project directory and populate it with your OpenAI credentials. Here's an example of what that might look like:
OPENAI_API_KEY=
Since you're not utilizing Azure, you may not require all four of the credential arguments I use. Just include the necessary ones provided by OpenAI. I suggest checking these links and make a simple API call to OpenAI to ensure you understand the process.
* www.datacamp.com/tutorial/using-gpt-models-via-the-openai-api-in-python
* platform.openai.com/docs/api-reference/streaming?lang=python
Alternatively, if you prefer not to use a .env file, you can directly insert your OpenAI credentials into the load_config.py module, although this is less secure and not recommended for sensitive information.
I hope this clarifies the configuration process for you.
@airoundtable Thank you for your reply. I ended up creating a .env but still getting the following error. I even left the the API endpoint out as well since you mentioned this was only needed for Azure.
APIConnectionError: Error communicating with OpenAI: Invalid URL 'None/engines/gpt-35-turbo-16k/chat/completions': No scheme supplied. Perhaps you meant None/engines/gpt-35-turbo-16k/chat/completions?
@@mikew2883 No problem! Well, that means that the API call is now ok but the code to generate the response from the GPT mode is not compatible with OpenAI. The issue is because I am using something like:
```
openai.ChatCompletion.create(
engine=gpt_model,
messages=[
{"role": "system", "content": llm_system_role},
{"role": "user", "content": prompt}
],
temperature=temperature,
)
```
This is the code that works with Azure OpenAi. While for those who use OepnAI directly, the code is something like this:
```
client.chat.completions.create(
model="gpt-4",
messages= messages=[
{"role": "system", "content": llm_system_role},
{"role": "user", "content": prompt}
],
)
```
So, in order to fix the problem:
1. Check the project schema and find where GPT models are generating a response
2. Find the code in the project and change them with the OpenAI format.
I suggest you first generate a response from a GPT model using your API key and make sure that the code that you are using works as expected.
Check this link for more info on how to generate the response with OpenAI:
platform.openai.com/docs/api-reference/streaming?lang=python
,
Will do. Thanks for the reply! @@airoundtable
You're welcome!@@mikew2883
This is one video if you want to learn and make an advanced RAG Project. Other videos are also equally great and I love your approach to how organized you are in your videos; your code quality is just WOW.
This is an incredibly informative and well-structured video! The detailed breakdown of the RAG-GPT chatbot, along with the time-stamped sections, makes it easy to navigate and understand. The inclusion of real-time document uploads and summarization requests showcases the versatility of this chatbot. The GitHub links and references to the main libraries used are very helpful for those who want to delve deeper. Keep up the great work! Looking forward to more content like this. 👏👏👏
Thanks ! Glad you liked the video!
@@airoundtable can you help me am facing an issue with db when i run server it said that uploaded db doesnot exist
can you help me am facing an issue with db when i run server it said that uploaded db doesnot exist
@@usamaahmed8075 @usamaahmed8075 Hi. what do you mean by the uploaded db? if it said vectordb does not exist. it is because you have to run this module first: upload_data_manually.py
I explained it completely in the video. If you are planning to upload a document and then start chatting with it, you need to first upload it using the upload button and make sure that you see the confirmation message on the screen.
Really good and perfectly articulated presentation on RAG. Thank you!
Great to see that you liked the content!
A very well done @Farzad. Great explanation. This is exactly the concept I was looking to understand and implement. You are simply 100x amazing. I am highly excited to listen to your other videos as well. Thanks for keeping this channel so informative. One suggestion from my side: next time, please use local LLMs like Ollama Llama 3.1 so those who cannot afford it will benefit.
Thanks! I appreciate the kind words and I am glad that the content was helpful. Thanks for the suggestion. I have almost the identical project using open source LLMs. Please check out:
ua-cam.com/video/6dyz2M_UWLw/v-deo.htmlsi=u-QWc-Mz5oOA17LS
@@airoundtable Thanks for sharing, Watching right away....
Thanks for this project! Very useful! Will watch every1 of ur videos from now on! ☺
Thanks @341yes! I am glad to see that you liked the video!
What are your suggestions on cleaning the company docs before chunking? Some of the challenges faced are how to handle the index pages in multiple pdfs also the headers and footers. You should definitely make some video related to cleaning a pdf before chunking much needed.
Well, handling company documents for integration into a RAG system is indeed a complex task. It's often so detailed and requires such a hands-on approach that I would strongly suggest treating the document preparation as a separate project from the RAG chatbot development. Even that project by itself can be divided into two main flows:
1. Cleaning and preparing existing documents
2. Establishing a standard format for all the new documents for easier future integration
Since the RAG system is going to perform a vector search across the entire document set, I suggest removing the unnecessary or duplicate content (for instance I cannot think of any possible way that a separate index would add value to the conventional RAG strategies and vector search techniques, unless you design a complex RAG system that incorporates hierarchical graph methodologies).
Finally, if your documents contain domain-specific abbreviations that general language models may not recognize, you can think of implementing an advanced RAG system with a fine-tuned LLM on your specific domain data (There is another video in the channel that explains how to fine-tune an LLM on company documents which might give you some good ideas).
And thanks for the suggestion! I'll consider creating a tutorial video and address this issue.
Thank you so much for this! Excellent instructions, excellent documentation.
You're very welcome! I am glad it was helpful
Nice project, thank you very much for the great content sir👏👏👏
Glad to see that you liked the project 😉
Great Project .. Good Work !
Glad to see you liked it! More projects are on the way!
you did a great job! Thank you!
Glad to hear you liked the video! Thanks for the feedback!
Very good content, thanks for the video!!
Thanks!
Thank you so much for the informative video
Thanks! I am glad it was helpful
Thank you! great video and explanations!
Thanks for the feedback! I am glad to hear that you enjoyed the video!
I just discovered your video today in my feed. This is an excellent project with great attention to detail. Very well done.
I cloned it and saw a project in your bullet list called "Open Source LLMs" along with the note that it is coming soon. Do you have any idea when that might be? This is important for those of us wanting to run LLMs with RAG locally on our machines. Very much looking forward to seeing this. Thanks for your work,
Thank you very much for the positive feedback @doctorbill37. I am glad to hear you liked the video. For the open-source-RAG project, I have a good news. I have already started to take the video. It will be uploaded in the next couple of days
@@airoundtable Wonderful -I am subcribed!
Great work! Would love to see this with LiteLLM as an option and some sort of basic user login system…along the lines of open webui
Thanks! That is indeed a great combo. I haven't looked at it closely yet but I will definetly check it down the road
You did a great job, but the videos are so small, I have to constantly expand them to read it. It would be nice if you could read the text without going full screen all the time.
Thanks @RetiredVet! You are right. I have to find a way to increase the size if the contents for an easier read. That is actually why in langchain vs llama-index video I omitted the powerpoint and showed everything on screen and also showed each command that I was executing on the screen as well. However, I am constantly looking for ways to improve the quality as I just recently started to upload videos on UA-cam.
@@airoundtableI enjoyed your video and think your code is great. The code and explanations are the important part. You can learn the video stuff much easier. I've looked at a lot of langchain videos and your explanations are very clear.
Unfortunately, I am an intermediate python programmer and I had no idea that requirements' files were so different between windows and Linux. I cannot use your requirement files and when I try installing langchain with pip these days, it never works. If the UA-cam video is a week old, the requirements have changed. I try to downgrade, to use the recommended versions, but then langchain installs packages that don't work. I am learning a lot more about package management than I ever wanted to.
Langchain is a very interesting project, but it is moving so fast, it is difficult for me to keep up.
Keep up the good work.
Thanks for the great tutorial! Just out of interest, would it also be possible to use streamlit as a user interface or are there any technical issues? Thanks again.
Thanks for the feedback! I am glad that you liked the video. Sure, you can use streamlit as well. In my opinion using streamlit is a bit easier than Gradio. That was also one of the goals of this series to show how to use streamlit, gradio, and chainlit. I used esch one of them in a separate video. If you check the channel you will see a chatbot that I designed with streamlit.
"Connect a GPT agent to duckduckgo search engine".
Feel free to reach out if you have any other questions.
@@airoundtableThanks! You mentioned also the issue of data flow management. Let’s assume that I upload ten documents in advance to the database then I have another one that I upload while using the chatbot. Will the chatbot use all eleven documents to answer my question? Thanks again for your help!
That is a good question@@tobiasbuchmann6972. No this chatbot treats the documents that were prepared in advance differently from the ones that you upload while using the chatbot. So, to your example, it creates an index for those 10 documents that you preprocessed earlier, and it creates a separate index for that one single document that you passed to it while using it. And also let's say that during using the UI you upload documents in multiple steps. Everytime that you upload a new set of documents, it creates a new index for them and points the chatbot to the most recent index. Finally, whenever you run the UI, it make sure that all the indexes that were created for the uploaded documents during the previous user's session are removed and clean up the disk and it gives you a fresh start.
But this is just one way of doing it. At the end of the day, all these functionalities can be adjusted based on your needs.
Great video. I'm testing out the project but it seems that the chatbot also takes information from the web, as it accesses websites when there is no uploaded docs. I would like to have it only interact with the PDFs/uploaded docs... Any fixes?
Thanks Thomas! The chabot does not have access to the internet but it has access to the pre-trained knowledge of the GPT model. Overall, it works in 3 different ways:
1. If you have already preprcessed some documents and start using the chatbot, it will give you answer based on those documents (This is the chat with pre-processed doc feature).
2. If you select Chat with upload docs feature and upload documents, it will start giving you answer based on the uploaded documents (until you change the setting to with pre-processed doc again)
3. In case the user's question is not related to any documents the chatbot will use its own knowledge but in a limited way to just act as a friendly chatbot. If you would like to restrict it even more, you can change the LLM_system_role in the config folder (configs/app_config.yml: llm_system_role argument). It is where I am instructing it and explain to it how it should behave. I explain it in the video at:
00:40:45 LLM system role
I hope this helps you solve the problem.
Great video on RAG. One quick question. Can we add documents from UI for preprocessing and chat with that rather than adding documents to data folder from backend? I mean..add a functionality in UI that will allow me to add documents in data folder and preprocess it so that i can chat? Thank you
Thanks! yes the chatbot has that capability. Follow these steps:
1. In the "Rag with" drop down choose chat with Upload docs
2. Use the "Upload doc" button and select your documents.
3. Stay a few seconds until the chatbot tells you that the documents are procssed and you can start asking questions.
is there a similar ready-made solution on the site "poe"? I am a beginner and want such a model, but not to make it, but to work with it
Check out my video below:
ua-cam.com/video/8iMIGVWMPPQ/v-deo.htmlsi=ryvfD6m65Jyro205
This is a free RAG app that you can use
Can we have the option to upload files to the vectore store to update the assistant? like upload files to the vector store of openai?
If you mean whether you can add/remove/modify the files of the vectorstore that you created, the answer is yes you can. you can easily find the info on it. I just did a quick search and saw this tutorial:
www.datacamp.com/tutorial/chromadb-tutorial-step-by-step-guide
But I'd search the ChromaDB documentation to find all the details.
hello sir is the open ai api key paid ? do we have to pay for it in order to access it and use it ?
Hi, yes. You have to pay for it to be able to make the API calls. If you go to openAI website you will see how to get the API key from there. Just also keep it in mind that the project is currently using Azure OpenAI. In case you want to use OpenAI directly, a couple of modifications are required. I have pinned a comment here (you can see it on top), where I explained all the steps in detail.
Hello there, I am running into the following error: The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again.
Hello. The project is using Azure OpenAI. Therefore, it needs access the API key, endpoint, and deployment name of the GPT model and embedding model in azure. And for that you need to deploy them on Azure OpenAI studio first. This error rises in two scenarios:
1. The model has not been deployed yet
2. The model is deployed but the deployment name that was passed to the chatbot is not the same as the deployment name that was used in the OpenAI studio.
And in case you want to use OpenAI directly, instead of Azure OpenAI, please read the pinned comment for a step by step description of the changes that are required for that.
@@airoundtable thank you, how do i pass the deployment name to the chat bot?
@@tonysingh9426 create a file in the project directory and name it: .env
in there create these arguments:
OPENAI_API_TYPE=azure
OPENAI_API_VERSION=
OPENAI_API_KEY=
OPENAI_API_BASE=
gpt_deployment_name=
embed_deployment_name=
Also in the config folder, in llm_config there is the name of the gpt in engine argument, change that as well.
Then run the chatbot. The project automatically loads the .env file and extract all these information from it
Thank you, sir
@@airoundtable do i need a second deployment for the embedding model? and if so should it be the same engine as the gpt_deployment?
Hi, i am using Openai entirely and not azure, i change the chat completion function as per your solution in the comments, but i am getting the error : TypeError: 'ChatCompletion' object is not subscriptable in response["choices"][0]["message"]["content"]. Please suggest some resolution for the same. Also, the application stops fetching answers after the first question. Please help. @AI RoundTable
Hi @taylorfans1000, this problem is not hard to debug.
1. Based on OpenAI's website, even by using their model directly you should be able to extract the response from response["choices"][0]["message"]["content"]. Here is the reference: platform.openai.com/docs/guides/text-generation/chat-completions-api
1. But to make sure that you GPT model is working as expected, test it separately on a notebook. Use your API key and the Client.chat.completion class from openai and make a successful API call with the GPT model that your are using.
2. Don't use response["choices"][0]["message"]["content"] and directly print(response) itself and make sure that you are getting the whole json response from OpenAI.
3. Once you've got the API call working, then you can try to get the specific message content by using response["choices"][0]["message"]["content"] and make sure that you can extract the response content from OpenAI's json response.
4. After you went through these steps successfully, apply it to the following files and lines in your project:
src/util/chatbot.py in lines 68 to 78
src\utils/summarizer.py in lines 110 to 118
My guess is the fetching problem will be solved as well after you fix this. I hope this helps, feel free let me know in caseyou have more questions.
Hey! Were you able to debug this issue? ....i'm facing the same! Thanks in advance!
How can we evaluate the responses generated by the RAG system?
That is the million dollar question.There have been a lot of effort around evaluating RAG systems but the challenging part is that there is no single metric that can tell you how accurate your system's response is. Instead, to evaluate RAG systems, we usually use the help of LLMs themselves. But before that we need to understand what the challenges in a RAG system are. here is a brief summary of some of the key steps:
1. In data preparation pipeline: data quality + chunking strategy + embedding quality
2. In retrieval side: user's query quality + search quality + relevance of the contexts of the retrieved documents to the query
3. In sysnthesis side: context overflow + LLM hallucination + answer relevance
These are the components that need to be adjusted and evaluated in a RAG system. For the evaluation pipeline itself, you can either use the frameworks that are being developed for this purpose: e.g: TruLens, Langsmith, Galileo (My recommendation: Langsmith)
Or you can design a custom pipeline depending on your goal and usecase. I have a video on the channel called, "Langchain vs Llama-index" where I design an end to end pipeline and evaluate the performance of 5 different RAG techniques. There I go into much more detail about this topic.
Overall, this task requires a good amount of testing and iteration specially if the requirements are too specific and complex.
Great tutorial. I have a question: Can I learn and run all your projects without openai api key?
Thanks. Yes, You need to convert the openai call sections and use another framework like ollama for example. I have received multiple messages from the community that they successfully used OLLAMA with the projects in the channel
module error pwd. Are you running this on a unix-like system? What modifications must I make for Windows pls?
I ran this project on Windows but it does not matter. The project is designed in a way that it can be executed on any OS without needing any modification. If you are getting it while installing the requirements.txt just remove the library that is causing this issue and try again
I have a doubt... This project is only for text pdf? Or it can be used for the pdf which contains images taable as well?
This project is only for .txt and PDF files and easily extendable for docx files (you just need to add it to the list of acceptable files in the code). If your documents contain images and tables, that would not hurt the performance on text. But for implementing RAG with images and tables this project needs to be upgraded.
RAG on images requires image embedding and vector search on those embeddings. Still there is no solid approach that can handle various type of images (e.g: technical drawings). But there are a few priliminary solutions out there that work on generic images. So, the industrial application is still very limited.
Tables on the other hand requires specific approaches that are able to extract the contents of a table properly. "Unstructured" library has been working on this aspect and Langchain has adapted it within its framework. So, since I used Langchain in this project, you can easily modify it and add that approach to it. But the problem with that is handling tables with that approach takes a very long time (in my experience) which makes it impractical for industrial purposes. That leaves the door open to custom solutions that can suit a specific business need which can vary widely.
@@airoundtable thank you so much for replying
One more doubt can i write all those code in Jupytter notebooks and run?
@@shahnaz9026 technically you can. You need to make alot of refactoring to the code but it is doable if you want to run it on jupyter. Keep it in mind that this code is currently using Azure OpenAI, in case you want to use OpenAI directly, the chat completion functions need to be modified as well.
Can I use any other model or point me to section where I can use gguf file
Sure, you can use other LLMs but you have to modify the code for that. The code is now using OpenAI GPT 3.5 (with API calls) model for inferencing. If you want to change the LLM and run the code:
1. Use a powerful LLM for a good performance (consider the context length, chunk sizes, and the instructions that you want to give to the model along with the available computational power that you have at hand)
2. You need to change the code wherever it is getting the response from the GPT model, which happens in two location:
a. src.utils.chatbot.py - response function - line 68 to 72
b. src.utils.summarizer - get_llm_response function
3. Depending on the model that you are using, you may need to process the response in a differen way as well. (e.g llama2's output will contain both the query and the response along with some special characters that need to be processed for a neat user experience.)
@@airoundtable excellent, will give it a try. Thank you 💯💯💯
Very detailed explanation and thank you for making it open source.
Is there any plan to advancement to this application? like
1 advance rag pipeline which can extract text and table data or image based on user question
2 create vector db based on text image and table data?
3 providing login and admin panel to track the information like no of token used by different users etc
4 using react node for better app experience?
5 Complete deployment process ?
Thanks! I am glad that you liked the video.
These are all great points. For some of them yes, there will be a video soon and for some I still have no plan. I am looking into solutions for taking into account tables and unstructured documents along with images. There are already some solutions out there (unstructured library and image vector databases) but non of them are still practical in my opinion. For instance the available approach that "langchain" and "unstructured" proposed for processing the tables in documents is super slow and technically non practical. So, I will make a video as soon as I see an approach that can be applied in real-world scenarios.
The next two videos would be interesting for you I guess. The next one is a multimodal chatbot that uses 5 different models in the background and is able to answer questions asked from the context of an image as well. And the one after is an advanced RAG chatbot that uses knowledge graph and take into account more detailed relationships between the content of a document and related chunks.
3,4, and 5 have crossed my mind but I still have not planned a video for them. I will keep it in mind and think about it after the next two videos.
Thanks for the suggestions @kunalsatpute8379!
@@airoundtable Thank you for replying, and excited for your videos. One question Will these videos be extension or enhancing this application ? or would be be entire separate video?
@@kunalsatpute8379 That would be expansion. Beside LLM applications, one of the main ideas behind this series was to walk through all the necessary steps required for an advanced multimodal chatbot. I started by explaining function calling and vector search and using them I designed multiple projects. The next video would be:
1. A combination of all the chatbots that I have designed and uploaded on the channel so far (RAG-GPT + connecting the GPT to the search engine + chatting and summarizing websites)
2. We will use the concept that I showed on open-source RAG for creating a web server for serving models
3. We will add more abilities: The user can interact with the chatbot by sending voice, text, and image and the chatbot will respond in voice, text + we can ask questions about a specific image that we uploaded and the chatbot would be able to answer the questions about the image context + we can ask the chatbot to generate image for us as well.
So it would be an any-to-any chatbot
input: voice, text, image
output: voice, text, image
functionalities: Normal AI chatbot + RAG with documents + RAG with websites + search the web using a search engine + summarize documents + summarize websites + understand image both for answering questions and for generating them
So, RAG-GPT project would be one arm of that chatbot and I am thinking to give the user the ability to work with around 9 or 10 different Gen AI models (all open-source except GPT). So, in that video I will just briefly touch RAG-GPT and the other parts that I have already covered in the previous videos and the focus would be to explain the multimodal side of it and how the whole chatbot was designed. That is a huge project
facing an issue when i run ma chatbot i doesnot give answer its show error when i processed the file also show error
As I mentioned, please check the repository issues and open a new one if needed
Hi, I have to ask beginner questations
is that support arabic document as well ?
and is that key for free to use ?
Hi,
I am really not sure how the models would perform on Arabic. You can give it a try or search in arab forums and see what models do they suggest for arabic.
Which key are you refering to?
How to Stop the stream response in llm Langchain we are using X axel buffering
It is hard to tell without seeing the code and it depends on how you are calling the model and generating the response. In the code that I put in my github repository, the model does not stream the response. But in case you are using a different code with langchain, check these links:
- python.langchain.com/docs/modules/model_io/chat/streaming
- python.langchain.com/docs/modules/model_io/llms/streaming_llm
This is a great video that clearly explains the whole RAG development process.
One quick question, I created the .env file to store the OPENAI_API_KEY. But it still could not find it. where should I put the OPENAI_API_KEY?
Thanks, I am glad that you liked the video!
To make that work:
create a raw file and name it ".env" and put it in the parent folder of the project (RAG-GPT) folder and add your arguments like this:
OPENAI_API_TYPE=azure
OPENAI_API_VERSION=
OPENAI_API_KEY=
OPENAI_API_BASE=
Then to test if it is working properly, open a notebook or a raw .py module and run this command:
import os
from dotenv import load_dotenv, find_dotenv
# This line automatically finds the .env file in your environment
_ = load_dotenv(find_dotenv())
openai_api_type = os.getenv("OPENAI_API_TYPE")
openai_api_base = os.getenv("OPENAI_API_BASE")
openai_api_version = os.getenv("OPENAI_API_VERSION")
openai_api_key = os.getenv("OPENAI_API_KEY")
Then you can print and make sure that it got it right:
print(openai_api_type )
If you see the values by printing them, then you are good to go.
Hi,
I got an error like there is no files inside the data/docs
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'data/docs'
but I didnt change anything of your repo, I just clone it and run. Can you give some guidance for this issue
Hi,
It shouldn't be the case. this directory is part of the repository and I am using 'pyprojroot' library to manage the directoris in the project automatically. Without the full traceback of the error, I cannot understand why it is happening. In case you still get the error, feel free to share the full traceback and I will let you know the source of the problem.
Would .csv files or images also be able to add?
Not to this version. For performing Q&A and RAG with CSV files please have a look at my LLM agent videos. For images you would need more complex approaches and the one presented by Langchain and Unstructured are not ready for production and they took a very long time to process the images. Although in my next video, I am aiming to show how to find-tune multimodal LLMs on custom image datasets and those models can be used to perform RAG on images. Here is the link to my video describing how to chat with SQL and tabular databases:
ua-cam.com/video/ZtltjSjFPDg/v-deo.htmlsi=bh9xdkJqufFBMrBI
Hi, I love the whole project, but I would be happy if you go more in-depth on the following statement in the repo: "It is strongly recommended to design a more robust and secure document handling process for any production deployment."
Do you mean like improvement on security of the documents and restricted access for the app and implement such steps, or?
Hi, thanks.
No, I would suggest the accessing level on the chatbot side. In general, for RAG rjects, I suggest to separate the doument processing pipeline from the chatbot itself. And for the data pipeline there are so many factors that need to be taken into account. If I assume the company is mid size or bigger:
1. Document cleaning, and transformation (for instance, if you are dealing with, .txt, .docx, and .pdf, after preprocessing, you can convert all to PDF documents for a managable workflow).
2. Content validity check. This step can be managed on a division level or on multiple levels depending on the size of the company and the complexity of the documents. (The verification teams should verify the contents and be responsible for what is in there.).
3. Avoid duplicates. (Sometimes differenti dvisions have very similar documents but for different usecases and purposes. This can cause confusion in a RAG project.)
4. Address specific cases (e.g: if a PDF file was created using scanner, you would not be able to perform RAG on it.)
5. Manage security (who can add/remove a document from the pipeline itself and also the access level)
6. Implement a CI/CD pipeline for automating the workflow and managing scale (e.g: in case a document is added to a database, your pipeline should be able to only add that document to the vectorDB rather than recreating the vectordb from scratch)
7. Secure the pipeline
8. Test and improve
These are some of the key aspects of the data pipeline for RAG projects.
dude!!!! INSANE!! such a good tutorial. you rock.
my one question is: credits. Will the vector function save you credits? e..g i want to build a legal document reader & Q&A. Some docs are 100 pages long. wont each doc cost hundreds in API credits? OR is that what vectorisation & DBs are for?
Thanks! I am glad you liked the video. Storing the vectorized documents in a vectorDB definitely can save costs and it won't be efficient to build such a system without the use of vector databases (especially since you can use many of them for free). About the pricing itself, vectorization is not a very costly task unless you are dealing with thousands of documents. According to OpenAI documentation, text-embedding-3-large will be priced at $0.00013 / 1k tokens. Keep it in mind text-embedding-3-large is their new model and most expensive one (also more expensive than the one that I used in the video).
Source: openai.com/blog/new-embedding-models-and-api-updates
So, imagine for a document with 100 pages, in case you turn it into let's say 500 chunks, you will make 500 API calls to this model and it will roughly cost you something around: 0.0325 US$ (approximate).
I considered each chunk to be around 1500 characters which makes it around 400 tokens.
Also, keep it in mind that you can manage the vectordbs by editing them (add/remove documents) rather than re-creating them on every small change.
Great Project, but eventually we are using OpenAI API's which does the "real" AI magic here. question is : if we already plan to build a customize bot and train it on our data, wouldn't we also want to train our own open source LLM model such as LLma-3 and remove the dependencies with API online calls ( which makes the project strongly bound to public access of our private documents, right ?)
It very much depends on the requirements of your project. If using closed-source models lie GPT is not an option, then using a powerful open source model like Llama is the way to go. You can definetly do that. I have another video that I have implemented this same project but using open-source models. You can check it out here:
ua-cam.com/video/6dyz2M_UWLw/v-deo.htmlsi=Lfn9h2Y9pl5zRUd9
Hi,
I am using windows machine and getting error while running upload_data_manually.py file
its giving me runtimeerror: your system has an unsupported version of sqlite3. Chroma requires sqlite>=3.35.0
then i checked sqlite version using
sqlite3 -version its showing me 3.41.2 which is greater than 3.35 🤔
Hi, please try to replicate the project using the exact library versions that I included in requirements.txt and let me know if the problem appears again.
@@airoundtable I downloaded sqlite3 DLL files and copied those files to python installation directory. this resolve this issue
is commercial or not openai need api key? there is possible to use llm otherwise of IA of third parts?
OpenAI GPT and embedding models need an API key. If I got your point correctly, you want to use open source LLMs. You absolutely can. Please check the Open Source RAG video in my channel. That is alsomt the identical project with open source models.
ua-cam.com/video/6dyz2M_UWLw/v-deo.htmlsi=W9dW4JNbC2KH_tHs
QQ What version of Python is best here as there's a lot of packages and hence implicit dependancies?
I have been using python 3.11 and never faced any issue with RAG projects. I also included a requirements.txt file in the project root directory that you can check the versions of all the libraries that I used for this chatbot.
Thanks.Apologies if you have already clarified that. I will give it a go with 3.11.@@airoundtable
Thank you very much for this. Very concise in your explanations. Any chance this can be done without OpenAI, but instead use a local LLM like ollama?
Absolutely, please check out my latest video called:
Open Source RAG Chatbot with Gemma and Langchain | (Deploy LLM on-prem)
i took RAG-GPT and replaced GPT models with Google Gemini7B. I also replaced OpenAI's embedding model with an open source model. I would never suggest using a 7B LLM for deployment but my main goal was to show how you can have the same pipeline (RAG-GPT) but with an open source model on prem.
My url is generated for UI , but nothing is getting displayed. I checked vector db is also created under chroma folder for the documents already stored under docs folder, I am using azure open ai credentials , what could be the reason?
It is hard to tell without seing the Traceback of the problem. Whatever is happening, you can see it from the terminal. In case the problem is not solved yet, feel free top open up an issue on the github repository and post the traceback there.
How can this be modified to include .epub files?
To add .epub files, modify "prepare_vectordb.py" module and add the condition for adding .epub and then use the following links to prepare the langchain loader and pass the loaded files for chunking.
js.langchain.com/docs/integrations/document_loaders/file_loaders/epub
python.langchain.com/docs/integrations/document_loaders/epub
api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.epub.UnstructuredEPubLoader.html
Thank u Sir🙏
This is the most comprehensive RAG tutorial video I have seen on UA-cam. What a great effort and command over the subject sir!
I am from a low-code business analyst background so I heavily depend upon co-pilot to guide me on python script functionality
Still I was able to set up the system as explained by you on my local PC, however I am getting the error on executing python src
aggpt_app.py
"
import pwd
ModuleNotFoundError: No module named 'pwd'
"
Can you guide me on what I am missing
Many thanks
Thanks! I am happy to hear that you liked the video! It is a bit difficult for me to debug that code without the whole traceback. What operating system are you using? and can you send me the full error?
@@airoundtable
I have same error, I use Windows 10
...
File "D:\Install\PyThon\lib\site-packages\langchain\document_loaders\__init__.py", line 18, in
from langchain_community.document_loaders.acreom import AcreomLoader
File "D:\Install\PyThon\lib\site-packages\langchain_community\document_loaders\__init__.py", line 163, in
from langchain_community.document_loaders.pebblo import PebbloSafeLoader
File "D:\Install\PyThon\lib\site-packages\langchain_community\document_loaders\pebblo.py", line 5, in
import pwd
ModuleNotFoundError: No module named 'pwd'
Thanks for your kind reply. I am using windows 10.
On VS Code I have created the virtual env, installed all libraries and followed all instructions as given in readme (for RAG-GPT application)
When i initialize python serve.py its ok
PS F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT> cd src
PS F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT\src> python serve.py
Serving at port 8000
But when I initialize python raggpt_app.py ( To launch Gradio ). I get the following error ( truncated )
PS F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT> cd src
PS F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT\src> python raggpt_app.py
Traceback (most recent call last):
File "F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT\src
aggpt_app.py", line 25, in
from utils.upload_file import UploadFile
File "F:\Users\XYZ\miniconda3\lib\site-packages\langchain_community\document_loaders\pebblo.py", line 5, in
import pwd
ModuleNotFoundError: No module named 'pwd'
PS F:\Users\XYZ\Desktop\projects\LLM-Zero-to-Hundred\RAG-GPT\src>
Excellent Video. Thanks a lot. Could you pls help me to get the terminal_q_and_a.py file
Hi, Thanks! Sure, I will update the repository in about an hour and push that file in it. I saw you opened an issue on Github as well, so I will inform you there when the file is added.
@@airoundtable I got the file .Thanks a lot for your quick response and excellent video
@@TechPuzzle_Haven You're welcome! Thanks!
This is a fantastic tutorial. Nice Work!
**Question, sporadically the chat history and retrieved content are appended to the chatbot along with the question/response. Not sure if it's related but, it seems to only happen when I adjust(increase text details) the Prompt instructions. Any idea why? Thanks.
Thanks! I am glad that you liked the video.
I am not sure if I understood your point correctly. The chat history, is appended to the model's input in every query. (Along with the retrieved content, and the user's query itself). Therefore, the GPT model should always have access to the chat history. But in case the performance is not consistant, the source can be
1. Due to the context length that the GPT model recived. GPT 3.5 has 4096 token limit and based on my experience if it recives somethong over 3000, there is a great chance that you see degradation in performance. In case you are interested, have a look at minute 8 of Open Source RAG Chatbot video for a more technical explanation of this aspect.
2. Format of the input: In case the chunk sizes are very small in a sense that the input does not create an understandable context, GPT model can get confused as well.
3. Model system role: model system role should clearly guide the GPT model to understand each part of the input context. A vague system role can make the GPT model confused and therefore it can affect the usage of the chat history.
Please let me know if this could answer your question.
@@airoundtable Sorry for the confusion. After more testing, I don't think adjusting the prompt is the issue. Sporadically, the chat history, retrieved content(similar to references sidebar), source text and the input query are displayed on the output of the Gradio chatbot. Is this normal functionality? It doesn't happen on all questions though, some just return the response. Thanks again for the interaction.
@@Preds23 No worries. Now I think I got your point which is a good point indeed. In some occasions the GPT model also provides the source with the response and in some cases you don't see it unless you click on references. Here is why:
That behavior is due to the context length that I mentioned in the previous message. If the context that the GPT model receives is not too long (in a way that it overwhelms the model) the model is able to pick up that piece of instruction and show the source in the output. But if the context length that the model receives is long (chat history + user prompt + retrieved content) GPT 3.5 is not powerful enough to be able to follow all the details. So, it will only focus on the main body of the instruction and misses that part where it should also provide the source. However, as you noticed you can always see the reference and the chunks in the reference bar. But in case you want a more consistent behavior from GPT on that matter:
1. Use GPT 4 instead of GPT 3.5. with the current config, GPT 4 would be almost always return that source as the end.
or
2. Reduce the size of chunks and the number of chat history to be injected to the model. To do that you have to make sure that this change is not so drastic to downgrade the model's behavior but to some extent, you can make the model's input a bit shorter so the model can pick up all the instruction's details.
To put it in a more simple explanation think of it this way: GPT 3.5 has 4096 token limit. If you pass an input to it with 3500 tokens, the model will focus more on the beginning and end of the input and starts to forget (or ignore) what was said in the middle section of the input. And in case you pass an input with 2000 tokens, the model can understand and follow all the instructions nicely without any issue. This is an intrinsic characteristic of all LLMs.
Hope this help you understand the problem.
Could you please open up the test file, please? So that I can follow the video tutorial step by step to know the use of each method,thx!
Which test file?
@@airoundtable your note files,like others project has explore file,I need explore file
Very Informative video, understood all the components involved, could you please let me know where should I define "OPEN_API_KEY", is it in moderation.py, or this along with other parameters should be defined in environment variables, could you please help me with this.
Thanks! To keep important information like passwords or API keys safe, we put them in a special file called .env in our project. This is a common way to handle private settings. Here's a really easy guide to help you set it up:
Make a new file in the main folder of your project and call it .env.
1. Inside this .env file, you can write your private information like this:
```
OPEN_AI_API_KEY=yourapikeyhere
ANOTHER_SECRET_KEY=yoursecretkeyhere
```
2. To use these settings in your code, you'll need to add a couple of lines to tell your program to read the .env file. Here's how you do it:
First, you need to add these lines at the beginning of your Python script:
```
import os
from dotenv import load_dotenv
load_dotenv() # This tells Python to read your .env file
```
3. Then, whenever you need to use the information from the .env file, you can get it like this:
```
openai_api_key = os.getenv("OPEN_AI_API_KEY")
another_secret_key = os.getenv("ANOTHER_SECRET_KEY")
```
Replace OPEN_AI_API_KEYand ANOTHER_SECRET_KEYwith the actual names you used in your .env file. Now your code can use your private settings safely!
Thanks a lot, I will try the above method.
how to register Azure Deployment?
we already have some params in load_openai_cfg(self), like API_KEY, API_BASE, API_VERSION, API_TYPE. I tried to add parameters API_DEPLOYMENT, but it still error: No deployment found
I hope you have already found the answer to your question. But in case you haven't, first check and see if you can call your models from a separate notbook from the project. After the successful call, then try to add the models to the project. This error was raised because the project could not properly communicate with the model. That is either due to the model name or the credentials that you took from Azure OpenAI
@@airoundtable Yes already solved. But I ended up using OpenAI instead of Azure. Hi, I want to ask again:
How can we make the chatbot understand the format of documents? Example, I have a document with format: title, dates, content. Then I upload new document and check if my new document has the same format with my preprocessed document. And I altered system prompt in about_config.yml, to add capability of the chatbot to detect typos, but it doesn't work. How can I edit the project? Thanks before
@@coffeepod1 Glad to hear it.
Regarding your new question, I've never done it but if I want to solve it, I would use a hybrid approach by combining python libraries that are able to extract document info based on the document structure (headings, subheadings, etc.) and from there I could either hard code it to make sure the structure is correct or I would use an LLM to make the judgment for me. But just sharing a document to an LLM would not be an effective way for achieving your goal.
About the typos, since LLMs work with tokens and not the words, it is sometimes hard for them to detect typos (especially when the context length is long). The benefit is when we interact with them they don't care if we have typos in our queries. But on the downside, when it comes to fixing these typos, there is a chance that they miss the errors. They can perform better with smaller context lengths which is not usually the case for RAG systems.
@@airoundtable I see, we are not OpenAI (or big AI companies), limited on resource indeed. But, what if "prompt engineering" with few shots about the doc structure? or how about NER approach since it is similar with information extraction. and about typos, I don't really think about the obstacles like you explained man, I just think the GPT model (I use GPT-4o) has capabilities to detect typos. Poor me, working on hard project
@@coffeepod1 I am not sure about the suggestions. I haven't work with them. But at this point, I would say the best strategy would be to test different approaches quickly and see which one is more effective than the other ones. Then spend time and improve that approach
bro it was giving error for modifying pip
Open a ticket on the repo if you still couldn't solve the problem. I need to see the error
Hi, Could you give me your .env setting you use, openai_api_type, openai_api_base, and openai_api_version? I am struggling to make it works.. thank.. great video btw..
Thank you for your positive feedback @agep13.
Regarding the .env settings, it's important to note that these often contain sensitive information, such as API keys, which should be kept private and not shared openly to ensure security. However, I can certainly guide you on setting up your own .env file.
For the openai_api_type, openai_api_base, and openai_api_version, you'll need to consult the official documentation provided by OpenAI or Azure to determine the correct values. These details are typically available in the API or developer section of your account dashboard.
Here's a basic template for what your .env file might include:
OPENAI_API_KEY=your_unique_api_key
OPENAI_API_TYPE=the_type_of_api_you_are_using
OPENAI_API_BASE=azure_open_ai_endpoint_url
OPENAI_API_VERSION=version_number
Please ensure you replace the placeholders with your actual API key and the appropriate values for your use case. The OPENAI_API_TYPE will depend on the API service you've subscribed to (e.g., GPT in the video.), while the OPENAI_API_BASE and OPENAI_API_VERSION are generally standard URLs used for accessing OpenAI's API.
Remember to keep your .env file secure and avoid uploading it to public repositories to prevent any unauthorized use of your API keys.
@dtable hello great video, i know the OPENAI_API_key is in your open ai account and openai_api_type would be like gpt3.5 or 4 but for the rest im still confused on where to get the base and version how do i check which urls to use
Hi@@ShadowScales, Thanks! Your confusion is on point because to use GPT models from OpenAI directly, you won't need to insert the endpoint and the api_version. You probably missed that I am using GPT model from Microsoft Azure. That is why I have 4 credentials for it. In order to understand how to adjust the code to OpenAi API directly, please read the comments under @mikew2883 down below (it is the comment with the 7 replies). There we had a full discussion on how to properly modify the project. I hope this helps. In case you have any question along the way, please let me know.
Is the code no longer available?
It is available:
github.com/Farzad-R/LLM-Zero-to-Hundred/tree/master/RAG-GPT
🎉❤
Hello thanks for the video but when i try to run the app i get this error : "
openai.error.InvalidRequestError: Invalid URL (POST /v1/engines/gpt-35-turbo/chat/completions)" because i dont have access to azure open ai api yet im using openai api, Would you be able to help me with this ? Thank you
Hi Arian. Thanks. I just pinned a message on top in the comment section where I discussed this issue with @mikew2883. Please read that discussion. I provided all the necessary guidance for changing the openAI API call from Azure to OpenAI itself. let me know if you have any other question
@@airoundtable Thank you for your reply. I made a few additional adjustments, and now it works, really appreciate your awesome work and the effort you put in. Damet Garm 👊
@@arian2168 happy to hear it. Thanks Arian!