This looks like it could be huge for enterprise applications where there is a large corpus of unstructured internal information that the model needs to be able to work with. My current method is alright, but multimodal embedding spaces just are not there yet for hundreds of similar looking graphs. Personally I would be extremely interested in a video going through the implementation of this!
Thanks for the video! So the demo retrieved "pages", if we want the actual paragraph or sentence-level sources we have to do an additional retrieval on the retrieved pages, right? I saw your Gemini PDF video and was wondering how ColPali performs compared to that.
Great explanation. Love that you tackle the details and bring what i believe is a little simpler clarity to the picture versus another favorite channel i rely on with a more theoretical bend (Code your own AI). Please do the follow up video as this sounds like a promising standard as compute grows. Is there any mention of better retrieved feeding into the LLM, as i wonder if feeding dense pdf pages of tables etc into the interpretation LLM distracts from the original similarity patches cited.
This was a great video to help understand the complex architecture of the colpali in a simple way. Thanks. In the end of the video I had a query is there a way we can see which parts of the retrieved page colpali is focussing on. Like they showed in the paper. If there is then a video on that would be very helpful as a next part.
Hi bhaiya!! I'm working on my final year project, basically it has 2 ideas . 1) AI POWERED previous year paper analysis system and sample paper generation from the current trends 2) AI powered notes generation from the textbook content. There are 7 engineering departments in my college I'm little bit confused what to use where , agentic RAG , fine tuning or any other things ?? Please help me to clear my confusion Thanks!!
Colpali is better with fewer documents. I tried it on 500 documents. Results are not good. May be Instead of their custom evaluator I am using vectordb. Can you suggest any vectordb that supports late interaction or multi-vector support.
@@engineerprompt Thanks. Can you do the video of implementation using local dataset? It would be more helpful. And its taking too much time(nearly 4 minutes) for processing the pdf file having 10 pages as device i am using is torch.device("cpu"). Is there a way to make it fast for local purpose?
Can i do video with you? i made it work, and can show you how i followed your instructions and it helped out with my legal corpus which is 120GB large.
Yes...an example on this would be helpful
Great video, a full implementation example would be awesome!
This looks like it could be huge for enterprise applications where there is a large corpus of unstructured internal information that the model needs to be able to work with. My current method is alright, but multimodal embedding spaces just are not there yet for hundreds of similar looking graphs. Personally I would be extremely interested in a video going through the implementation of this!
Great! RAG with new methods and multi-agents with better reasoning, multi-models are very useful for academia
Totally agree!
Thanks for the video! So the demo retrieved "pages", if we want the actual paragraph or sentence-level sources we have to do an additional retrieval on the retrieved pages, right? I saw your Gemini PDF video and was wondering how ColPali performs compared to that.
Thank you for the video. It was very interesting.
I would really appreciate a video on implementing this locally, please🙏
Seems a very good approach. Yes, it would be nice to do and end to end test with local install and local documents. Thanks for the update👍
Great explanation. Love that you tackle the details and bring what i believe is a little simpler clarity to the picture versus another favorite channel i rely on with a more theoretical bend (Code your own AI). Please do the follow up video as this sounds like a promising standard as compute grows. Is there any mention of better retrieved feeding into the LLM, as i wonder if feeding dense pdf pages of tables etc into the interpretation LLM distracts from the original similarity patches cited.
This was a great video to help understand the complex architecture of the colpali in a simple way. Thanks. In the end of the video I had a query is there a way we can see which parts of the retrieved page colpali is focussing on. Like they showed in the paper. If there is then a video on that would be very helpful as a next part.
that's a good point. I haven't looked into it but I think there will be an implementation somewhere. Will explore it
@@engineerprompt thanks so much.
Example usage on own data will be heavily appreciated.
Thank you so much for sharing! Would love to see an example!
looking forward to more on this
Would love to see more about this
Yes please, more examples👍
Thanks for your videos! Would love to see a guide to run this locally.
+1 implemention will be helpful
This work is impressive! Thanks for sharing.
I hope you can dive deeper into this
Very interested!
Hi bhaiya!!
I'm working on my final year project, basically it has 2 ideas .
1) AI POWERED previous year paper analysis system and sample paper generation from the current trends
2) AI powered notes generation from the textbook content.
There are 7 engineering departments in my college
I'm little bit confused what to use where , agentic RAG , fine tuning or any other things ??
Please help me to clear my confusion
Thanks!!
The speed of indexing depends on the GPU? Is there any way to speed up the process of indexing by parallelizing?
Colpali is better with fewer documents. I tried it on 500 documents. Results are not good. May be Instead of their custom evaluator I am using vectordb. Can you suggest any vectordb that supports late interaction or multi-vector support.
Please make it. We are interested into it
llamaindex did a session on this and have a notebook you can improve on if you do make a followup !!
Nice, please share that. Would love to make a follow up
❤
interesting:)
In the demo, i could see that according to the query, it fetched images. But if we want to get the actual response using gen ai, How can we do that?
You can feed the images into a multimodal model like gpt4o or gemini to generate the final response
@@engineerprompt Thanks. Can you do the video of implementation using local dataset? It would be more helpful.
And its taking too much time(nearly 4 minutes) for processing the pdf file having 10 pages as device i am using is torch.device("cpu"). Is there a way to make it fast for local purpose?
Can i do video with you? i made it work, and can show you how i followed your instructions and it helped out with my legal corpus which is 120GB large.
would love to see that. Please email me
Which Vector db you have used ?
I would be interested in a demo of your implementation
@@jeelanshahtlyr6076 ok. I’ve done it. And todays gpt release is insane
Please implement??