Multimodal RAG with GPT-4-Vision and LangChain | Retrieval with Images, Tables and Text
Вставка
- Опубліковано 9 чер 2024
- In this video I will show you how to do retrieval with a multivector retriever. First we will disucss how we can do retrieval with images, tables and text by creating summaries for each element.
Then we will go through the jupyter notebook.
Code: github.com/Coding-Crashkurse/...
Timestamps:
0:00 Theory
1:27 Code walkthrough
Nice video. Would be nice to go into a few more examples/use cases to more strongly illustrate why multimodal RAG is useful
thank you
How can I use pinecone instead of chroma here?
Nicely explained ,Subscribed🎉
thanks❤
Will this part of partitioning work on Azure? How do you read pdf from a storage container?
I have not tried this yet. I would use the Azure SDK, but not sure if that works the same as reading the file from the local filesystem
@@codingcrashcourses8533 PDFs stored as blobs on Azure are different than reading locally. I have tried using langchain but was not able to read it. I then used pypdf to read the pdf as a streaming object.
nicely explained and nice informations as always but i have a question my files are stored in azure blob storage i am getting tghem throw blob loader does implementing the multimodal works with them?
I don´t know to be honest, but I think it should be possible. If not maybe try to get the files directly with the Azure SDK
@@codingcrashcourses8533 as always thanks for replying to my comments my mentor
Can we show the images as response along with relevant text as response based on the prompt passed.
Yes, but I would probably do that different. Maybe with a different embedding model. But to be honest, I can not good idea out of the box.
cant we get image also if we use vision model in the chain?
Currently not. At least last time i saw it
why did you use chiain.invoke and not .run or apply or batch? Sometimes in your videos you use run and sometimes invoke. How od you know when to use when and whats the difference?
I thought about using batch and think its probably better, but I tried to keep it simple and just use a loop for every call.
The difference between run and invoke is the chain. I try to use Language expression language only in my newer videos and invoke is the implementation of the runnable interface, while run is the implementation of the (deprecated) chain interface
will this output images with text as well?
No. Gpt4 only outputs text. You can pass the output to dall-e-3
how to store vectore created to local? so ican used again later
Faiss and chroma offer methods to do that. You will find that in the langchain docs
Can you pls share the notebook
Code is in the description
Hi Markus ,I am having problem with downloading tesseract ,the download is really slow , do you have any link to tesseract
digi.bib.uni-mannheim.de/tesseract/ Hello Zaid, this is another Link I used before. Hope that helps! Best regards
@@codingcrashcourses8533 Thanks Markus!!
😊