Nice work first of all !! RAG (Retrieval-Augmented Generation) systems work well for general applications where summarizing content is more important than getting highly accurate results. However, when you need precise or deterministic answers, RAG systems might not be reliable, especially when dealing with large amounts of data. They can struggle with accuracy and scale, making them less suitable for scenarios where correctness is crucial.
It would be really cool to have the whole rag as an api server where i can upload documents and query them. This way we would be able to upload as many documents as we want into the document store and then query them as we want. This then could be used in all kinds of applications. Btw, i wonder how to make sure that not a fix amount of documents is retrieved, but instead only as many as needed 🤔
API server is a neat idea and I will have a look into it. At the moment, I am not sure if the model can decide dynamically how many documents/pages to return but you can alway return more and then filter them out as secondary step.
Hi! how does Colpali handle cases where the information starts on one page and continues on the next? I ran a test with a query where the answer is found in 7 items across two consecutive pages, but Colpali only recognizes that the answer is on the first page where the context of the question is provided.
I don't think colpali will be able to handle this specific situation but I think there is a way around it. The idea would be that when I page is returned, we can take the page before and after it and send that as context as well. This will cover the case if the answer spans more than one page.
LocalGPT Vision depends on vLLM for LLM inference. vLLM does support open source free LLM to be used in LocalGPT Vision Qwen2-VL-7B-Instruct (it is on HuggingFace) colqwen2-v0.1 (it is on HuggingFace) Ollama depends on llama.cpp which does not support Qwen 2's vision arch YET and I do not think that Ollama support colqwen2-v0.1 so you can use LocalGPT Vision for free but without Ollama .. it will download the needed open source free LLMs 😁
Great effort and a novel new approach to RAG. Theoretically looks nice, ! BUT in real world production environment the resources and performance for such a system will NOT be as cost effective as text based RAG. Vision models in general will always be more expensive to run and scale than text models in the range of double to triple per token. For the implementation demonstrated there is a room for improvements: vLLM is the one of the worst preforming LLM management tools out there so maybe that could be replaced . Also Conda is high on resources compared to other environments. We will see how the project evolves, and hopefully it will gain attraction and improve overtime.
@@time8553 In many benchmarks for performance and workloads types for many LLM interfaces, llama.cpp seems to have best overall scoring for most use cases.
Hrm for some reason colqwen2 is using CPU to index the images... strange. Anyone else having this problem? Looks like all the requirements installed... cuda is true... is there some setting I'm missing?
@@engineerprompt I have byaldi working in another conda env, so that's not the problem I don't think.. will dig into it. While using my GPU it colqwen2 takes less than a second to index a page, and qwen2-VL-7B then takes about 3-4 seconds to answer a question given a particular page. Maybe scaling is different or something... or it's not called with "cuda" but "cpu"? But I can tell it's not using the GPU in localGPT-vision, VRAM isn't taken up and utilization doesn't go anywhere, just the CPU is doing work.
It takes 5-10 seconds if you are using one of the APIs. For Qwen2, I am getting about 15 seconds on my M2 Pro. This will be significantly lower if you are using NVIDIA GPUs and if you can get to run the quantized models.
LocalGPT Vision depends on vLLM for LLM inference. vLLM does support open source free LLM to be used in LocalGPT Vision Qwen2-VL-7B-Instruct (it is on HuggingFace) colqwen2-v0.1 (it is on HuggingFace) Ollama depends on llama.cpp which does not support Qwen 2's vision arch YET and I do not think that Ollama support colqwen2-v0.1 so you can use LocalGPT Vision for free but without Ollama .. it will download the needed open source free LLMs 😁
This would need a very very long video or a series of videos .. to be honest it will not be practical and it will need a huge computing resources + long time for training.. but if you would like to know how this can be done (for learning) then do some search, and you will find some tuts and some GitHub repos dealing with your request 🙂
But bro you kept skipping 'waiting' part!!!! How long does it take to retrieve responses for these simple short answers questions you are submitting?!? Time is number 1 factor that determines how good a RAG is!!!! In this video it sounds like it takes forever and that's why you were skipping it!?!
Why do you just complain here? Someone build something and is sharing it with the world for free. Be grateful and if you have something to contribute then improve the speed if you can.
Indexing is taking forever even for 1 small size image. Edit: I do acknowledge the potential of this technology and hope that soon we'll see a solution which can do all these in a matter of seconds/ms. The title was a bit misleading nevertheless it's an informative video and I certainly had fun testing it! Looking forward for more such videos!
@@prasannakarthik7721 I know! It takes long for regular text file, so I would imagine it takes more for an image! I mean this is assuming it would capture all info in the image accurately/correctly!
Doesnt anyone install any of this crap, or is it total 'looks cool' while not installing any of it? I get errors over and over on win 11 . Pretty common with his 'local GPT' stuff that he posts on here and NEVER runs on Windows but says ' run the install on the requrements.txt' and waste yer time downloading garbage packages that don't always install right. It's become a waste of time following channel on windows (basically anything he does code wise) whlle he NEVER says 'and this has been tested on Windows'. Because it never is. This channel is almost bait/switch clickbait how he continuously does this. I've had it with this B__ll S__t. Unsubcribed. Drop this dude and demonetize the clickbait channel
Nice work first of all !!
RAG (Retrieval-Augmented Generation) systems work well for general applications where summarizing content is more important than getting highly accurate results. However, when you need precise or deterministic answers, RAG systems might not be reliable, especially when dealing with large amounts of data. They can struggle with accuracy and scale, making them less suitable for scenarios where correctness is crucial.
Agree (till now) .. and that's why I like text-to-sql agent with automatic error correction😋
Great stuff. Once i saw colpali, i thought this would be the future of RAG.
thanks :)
My issue with LocalGPT was inability to upload multiple files. But I see right now you've fixed that. Interesting...
Very useful and it includes a very nice and clever idea.. thanks for the good content and thanks for LocalGPT Vision... It deserve a very big star 🌹🌹🌹
thank you :)
Good start! Will it work for images? Such as photography or currently mainly limited to data?
It will work for images as well. Need to test the pipeline but in theory, it converts the PDF files into images before processing.
Impressive! Nice Job. Thanks.
thanks :)
Where do you store your vision vectors?
very good question and it is a dangerous one 🤔
Its under a folder called .byaldi in the project directory.
Awesome work
It would be really cool to have the whole rag as an api server where i can upload documents and query them. This way we would be able to upload as many documents as we want into the document store and then query them as we want. This then could be used in all kinds of applications.
Btw, i wonder how to make sure that not a fix amount of documents is retrieved, but instead only as many as needed 🤔
API server is a neat idea and I will have a look into it. At the moment, I am not sure if the model can decide dynamically how many documents/pages to return but you can alway return more and then filter them out as secondary step.
Hi! how does Colpali handle cases where the information starts on one page and continues on the next? I ran a test with a query where the answer is found in 7 items across two consecutive pages, but Colpali only recognizes that the answer is on the first page where the context of the question is provided.
I don't think colpali will be able to handle this specific situation but I think there is a way around it. The idea would be that when I page is returned, we can take the page before and after it and send that as context as well. This will cover the case if the answer spans more than one page.
Too expensive to run anything at scale, but cool stuff.. this is the future
Agree, it can be expensive but with the reduction in prices, hopefully it will become a viable solution.
Thanks but is it support using ollama instead of paid one?
LocalGPT Vision depends on vLLM for LLM inference.
vLLM does support open source free LLM to be used in LocalGPT Vision
Qwen2-VL-7B-Instruct
(it is on HuggingFace)
colqwen2-v0.1
(it is on HuggingFace)
Ollama depends on llama.cpp which does not support Qwen 2's vision arch YET
and I do not think that Ollama support colqwen2-v0.1
so you can use LocalGPT Vision for free but without Ollama .. it will download the needed open source free LLMs 😁
You can use ollama but as far as I know, ollama doesn't have support the multimodal llama3.2 yet (ollama.com/blog/llama3.2)
Does it actually learn from the info submitted or it is just a searchable database made from those pdf? Can it reason on that info?
No, it is basically a search mechanism. There is no learning involved.
Does this use the Claude / Anthropic killer RAG approach?
this uses vision models for everything. Its very different than the text based RAG approach.
Great effort and a novel new approach to RAG. Theoretically looks nice, ! BUT in real world production environment the resources and performance for such a system will NOT be as cost effective as text based RAG.
Vision models in general will always be more expensive to run and scale than text models in the range of double to triple per token.
For the implementation demonstrated there is a room for improvements: vLLM is the one of the worst preforming LLM management tools out there so maybe that could be replaced . Also Conda is high on resources compared to other environments.
We will see how the project evolves, and hopefully it will gain attraction and improve overtime.
What are other good tools for LLM management and better inference ?
@@time8553 In many benchmarks for performance and workloads types for many LLM interfaces, llama.cpp seems to have best overall scoring for most use cases.
@@time8553 llama.cpp seems to score best overall results in benchmarks of man workloads scenarios.
llama.cpp seems to score best overall results in benchmarks of man workloads scenarios.
Hrm for some reason colqwen2 is using CPU to index the images... strange. Anyone else having this problem? Looks like all the requirements installed... cuda is true... is there some setting I'm missing?
this seems to be coming from the byaldi package which I am using under the hood for indexing (github.com/AnswerDotAI/byaldi/issues/35).
@@engineerprompt I have byaldi working in another conda env, so that's not the problem I don't think.. will dig into it. While using my GPU it colqwen2 takes less than a second to index a page, and qwen2-VL-7B then takes about 3-4 seconds to answer a question given a particular page. Maybe scaling is different or something... or it's not called with "cuda" but "cpu"? But I can tell it's not using the GPU in localGPT-vision, VRAM isn't taken up and utilization doesn't go anywhere, just the CPU is doing work.
What's the inference time?
It takes 5-10 seconds if you are using one of the APIs. For Qwen2, I am getting about 15 seconds on my M2 Pro. This will be significantly lower if you are using NVIDIA GPUs and if you can get to run the quantized models.
Can this use with ollama instead of (openAi) ?
LocalGPT Vision depends on vLLM for LLM inference.
vLLM does support open source free LLM to be used in LocalGPT Vision
Qwen2-VL-7B-Instruct
(it is on HuggingFace)
colqwen2-v0.1
(it is on HuggingFace)
Ollama depends on llama.cpp which does not support Qwen 2's vision arch YET
and I do not think that Ollama support colqwen2-v0.1
so you can use LocalGPT Vision for free but without Ollama .. it will download the needed open source free LLMs 😁
You could but ollama doesn't have support for the multimodal llama yet (ollama.com/blog/llama3.2)
you included molmo cool....
its a pretty nice model :)
Could you make a video to build small language model from scratch ⚙️🛠️
This would need a very very long video or a series of videos .. to be honest it will not be practical and it will need a huge computing resources + long time for training.. but if you would like to know how this can be done (for learning) then do some search, and you will find some tuts and some GitHub repos dealing with your request 🙂
Karpathy has one, I think. Probably a better teacher than me :)
Sad that developers always forget Windows users for tutorial commands. :/
That is a valid statement. I don't have a windows machine on which I can test this.
But bro you kept skipping 'waiting' part!!!! How long does it take to retrieve responses for these simple short answers questions you are submitting?!? Time is number 1 factor that determines how good a RAG is!!!! In this video it sounds like it takes forever and that's why you were skipping it!?!
Just try it mfker
Why do you just complain here? Someone build something and is sharing it with the world for free. Be grateful and if you have something to contribute then improve the speed if you can.
Indexing is taking forever even for 1 small size image.
Edit: I do acknowledge the potential of this technology and hope that soon we'll see a solution which can do all these in a matter of seconds/ms.
The title was a bit misleading nevertheless it's an informative video and I certainly had fun testing it!
Looking forward for more such videos!
@@prasannakarthik7721
I know! It takes long for regular text file, so I would imagine it takes more for an image! I mean this is assuming it would capture all info in the image accurately/correctly!
@@richardkuhne5054
Relax dude! Haven't you been in any open discussions forums, or work related meetings/environment!?
Stick around to learn!
Can you make a video using LocalGPT in Openwebui?
? what?
i hope no triple what after this
Once you find a video on how to drive a Mercedes that is inside a BMW, then he might do it for you! 😅
this is getting more and more absurd 😁
@@RickySupriyadi 😅
Doesnt anyone install any of this crap, or is it total 'looks cool' while not installing any of it? I get errors over and over on win 11 . Pretty common with his 'local GPT' stuff that he posts on here and NEVER runs on Windows but says ' run the install on the requrements.txt' and waste yer time downloading garbage packages that don't always install right. It's become a waste of time following channel on windows (basically anything he does code wise) whlle he NEVER says 'and this has been tested on Windows'. Because it never is. This channel is almost bait/switch clickbait how he continuously does this.
I've had it with this B__ll S__t. Unsubcribed. Drop this dude and demonetize the clickbait channel