@@alejandro_aoI want to create a multi llm chatbot for telecommunication, is it a way to connect with you apart from UA-cam so that I can share the problem statement with you ?
Thank you for building this tutorial. What about in the RAG preprocessing stage to clean data, redact PII information before sending to LLM for chunking and chunk enrichment (keywords, metadata etc)
What do you recommend or how do you suggest that the conversion of a PDF of images (text images) to text can be automated? The problem is that traditional OCR does not always do the job well, but ChatGPT can handle difficult images.
Thank you for the video. Just curious, how to go about persisting the multivector database? What data sources are available that cater to such requirements? Also, how do we go about getting an image as an input from the user, so the language model can relate to it within the documents and predict an answer!
The functionality you showed to visualize the PDF Page and the retrieved chunk in the end seems super usefull for Citation display. Since I couldnt find the implementation in your colab is there a way to find this implementation somewhere else? Amazing content!
Great tutorial, very detailed. Just one question, any options to link the text chunk that describes the image as the context of the image to create more accurate summary of the image?
beautiful question. totally. as you can see, the image is actually one of the `orig_elements` inside a `CompositeElement`. and the `CompositeElement` object has a property called `text`, which contains the raw text of the entire chunk. this means that instead of just extracting the image alone like i did here, you can extract the image alongside the text in its parent `CompositeElement` and send that along with the image when generating the summary. great idea 💪
Thanks for this beautiful and comprehensive presentation. I do have one question about its security. I would like to use Mistral instead of opening and also using ollama run the code locally, so I would like to know your opinion about the data security. Will it rest secure? Because the pdf files I'm going to use as input are confidential. Thanks for your response in advance
Well, if you're running the model (LLaMA, Mistral, or Qwen) locally, you don't need to worry. It's safe unless someone hacks your PC and steals all your data. 😛
hey there. sure thing. if you are running these models locally, then there is absolutely no data leaving your machine, so no need to worry about data leakage. that being said, consider that there are two moments where your data from your files could leave your computer: - when you call your LLM. so if you are using a local LLM with OLlama, no need to worry. Just make sure to use a multimodal model, such as Llava to make sure to get the images. or stick to a text to text model if your pdf does not require multimodality. - when you are parsing your document. in this example, i used Unstructured's open source library locally. if you do the same, your data never leaves your computer. but you can also use their serverless api. if you do that, then your data would be transiting through their servers. that being said, neither unstructured nor OpenAI/Anthropic use the data you send them to train their models. but I understand if you still don't want your confidential data transiting through foreign services
Hi Bro, Thanks for the video. I have implemented the same algorithm using local LLM Ollama & I have used LLava model as multimodel . But Whenever I am asking some questions the algorithm is not able to retrieve images. I am stuck at this point.
Hi I saw your comment and I am trying to do the same thing, and I am stocked with ChromaDB, my kernel keeps dying, then I found ChromaDB works well with openAI models, do you use another Vector Database? If yes which one? Thank you in advance.
Yes, absolutely. just use the langchain ollama integration and change the line of code where i use ChatOpenAI or ChatGroq. Be sure to select multimodal models when dealing with images though
several improvement ideas. here are 3: - you can retrieve the image on the fly only if the chunk retrieved contains images instead of indexing them separately. - you can keep the images indexed separately, but instead of sending the single image to the LLM, you can retrieve their parent chunk and send the entire chunk along with the image for better context. - you can add a persistent document store so you don't have to re-index the whole thing every time
just because they have a very generous free tier and pretty good models. that turns out useful for people watching who don't want to enter their credit card details to use these LLMs
sure thing. that would be more like structured retrieval rather than unstructured. you can check out this video: ua-cam.com/video/9ccl1_Wu24Q/v-deo.htmlsi=kSu-QkMInzq98u-u
in this example, i embedded them with the rest of the text. if you want to process them separately, you can always extract them from the `CompositeElement` like i did here with the images. then you can maybe have a LLM explain the equation and vectorize that explanation (like we did with the description of the images). in my case, i just put them with the rest of the text, i feel like that gives the LLM enough context about it.
Idk, i just finally found most understandable AI Explanation Content. Thank you Alejandro
glad to hear this :)
@@alejandro_aoI want to create a multi llm chatbot for telecommunication, is it a way to connect with you apart from UA-cam so that I can share the problem statement with you ?
i was about to learn from the previous video. But you brother. just bring more gold.
you’re the best
Bro I literally came to back to get your old video on PDFs and you already have an update. Thank You!
the best touch is when you add front-end
good job
hey! i'll add a ui for this in a coming tutorial 🤓
Best Explanation ever.
Thank you for building this tutorial. What about in the RAG preprocessing stage to clean data, redact PII information before sending to LLM for chunking and chunk enrichment (keywords, metadata etc)
What do you recommend or how do you suggest that the conversion of a PDF of images (text images) to text can be automated? The problem is that traditional OCR does not always do the job well, but ChatGPT can handle difficult images.
Excellent!!!! Thank you Alejandro
Amazing Toturial
thanks for great content, how can we modify this to user local LLM , Ollama3.2 and Ollama-vision
Good job!
when will the frontend part come? Super excited for this part
Hey dude what are you using to screen record? Mouse sizing and movement looks super smooth id like to create a similar style when giving tutorials
hey there, that's this screen studio app for mac developed by the awesome Adam Pietrasiak @pie6k, check it out :)
Thank you for the video. Just curious, how to go about persisting the multivector database? What data sources are available that cater to such requirements? Also, how do we go about getting an image as an input from the user, so the language model can relate to it within the documents and predict an answer!
The functionality you showed to visualize the PDF Page and the retrieved chunk in the end seems super usefull for Citation display. Since I couldnt find the implementation in your colab is there a way to find this implementation somewhere else?
Amazing content!
Great tutorial, very detailed. Just one question, any options to link the text chunk that describes the image as the context of the image to create more accurate summary of the image?
beautiful question. totally. as you can see, the image is actually one of the `orig_elements` inside a `CompositeElement`. and the `CompositeElement` object has a property called `text`, which contains the raw text of the entire chunk. this means that instead of just extracting the image alone like i did here, you can extract the image alongside the text in its parent `CompositeElement` and send that along with the image when generating the summary. great idea 💪
Thanks for this beautiful and comprehensive presentation. I do have one question about its security. I would like to use Mistral instead of opening and also using ollama run the code locally, so I would like to know your opinion about the data security. Will it rest secure? Because the pdf files I'm going to use as input are confidential. Thanks for your response in advance
Well, if you're running the model (LLaMA, Mistral, or Qwen) locally, you don't need to worry. It's safe unless someone hacks your PC and steals all your data. 😛
hey there. sure thing. if you are running these models locally, then there is absolutely no data leaving your machine, so no need to worry about data leakage. that being said, consider that there are two moments where your data from your files could leave your computer:
- when you call your LLM. so if you are using a local LLM with OLlama, no need to worry. Just make sure to use a multimodal model, such as Llava to make sure to get the images. or stick to a text to text model if your pdf does not require multimodality.
- when you are parsing your document. in this example, i used Unstructured's open source library locally. if you do the same, your data never leaves your computer. but you can also use their serverless api. if you do that, then your data would be transiting through their servers.
that being said, neither unstructured nor OpenAI/Anthropic use the data you send them to train their models. but I understand if you still don't want your confidential data transiting through foreign services
Hi Bro, Thanks for the video. I have implemented the same algorithm using local LLM Ollama & I have used LLava model as multimodel . But Whenever I am asking some questions the algorithm is not able to retrieve images. I am stuck at this point.
Hi I saw your comment and I am trying to do the same thing, and I am stocked with ChromaDB, my kernel keeps dying, then I found ChromaDB works well with openAI models, do you use another Vector Database? If yes which one? Thank you in advance.
bro can you please share ur code
Very nice, is it possible to be done with local LLM like Ollama model?
Yes, absolutely. just use the langchain ollama integration and change the line of code where i use ChatOpenAI or ChatGroq. Be sure to select multimodal models when dealing with images though
Good one!!, did you see any opensource alternatives like Markers?
Could you create a repo for running this on Windows? Great video btw!
when is the frontend part , being really fun following allow so far
that would be a great addition. i'm working on that one!
# til
tks for you video.
is possible using crewAI in the same example ?
any idea to install poppler tesseract libmagic in windows maqhine?
Nice
Hi, Its a great video. can you help me with how to install these decencies in windows machine
how to increase the accuracy of this RAG system?
several improvement ideas. here are 3:
- you can retrieve the image on the fly only if the chunk retrieved contains images instead of indexing them separately.
- you can keep the images indexed separately, but instead of sending the single image to the LLM, you can retrieve their parent chunk and send the entire chunk along with the image for better context.
- you can add a persistent document store so you don't have to re-index the whole thing every time
Why did you decide to use Groq?
just because they have a very generous free tier and pretty good models. that turns out useful for people watching who don't want to enter their credit card details to use these LLMs
Can I connect to SQL database source ?
sure thing. that would be more like structured retrieval rather than unstructured. you can check out this video: ua-cam.com/video/9ccl1_Wu24Q/v-deo.htmlsi=kSu-QkMInzq98u-u
Hi Bro, Can you create a video for Multimodal RAG: Chat with video visuals and dialogues.
this sounds cool! i’ll make a video about it!
Thanks @@alejandro_ao
what about mathematical equatons?
in this example, i embedded them with the rest of the text. if you want to process them separately, you can always extract them from the `CompositeElement` like i did here with the images. then you can maybe have a LLM explain the equation and vectorize that explanation (like we did with the description of the images). in my case, i just put them with the rest of the text, i feel like that gives the LLM enough context about it.
@@alejandro_ao thanks for the context i was stuck at this for a week now