Goodbye Text-Based RAG, Hello Vision AI: Introducing LocalGPT Vision!

Prompt Engineering

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 8 жов 2024

КОМЕНТАРІ • 72

@brijeshsingh07 11 годин тому ⁺²
Nice work first of all !!
RAG (Retrieval-Augmented Generation) systems work well for general applications where summarizing content is more important than getting highly accurate results. However, when you need precise or deterministic answers, RAG systems might not be reliable, especially when dealing with large amounts of data. They can struggle with accuracy and scale, making them less suitable for scenarios where correctness is crucial.
@HassanAllaham 10 годин тому
Agree (till now) .. and that's why I like text-to-sql agent with automatic error correction😋
@IdPreferNot1 17 годин тому ⁺⁴
Great stuff. Once i saw colpali, i thought this would be the future of RAG.
@engineerprompt 2 години тому
thanks :)
@VerdonTrigance 17 годин тому ⁺¹
My issue with LocalGPT was inability to upload multiple files. But I see right now you've fixed that. Interesting...
@HassanAllaham 11 годин тому
Very useful and it includes a very nice and clever idea.. thanks for the good content and thanks for LocalGPT Vision... It deserve a very big star 🌹🌹🌹
@engineerprompt 2 години тому
thank you :)
@nivimg 19 годин тому ⁺¹
Good start! Will it work for images? Such as photography or currently mainly limited to data?
@engineerprompt 2 години тому
It will work for images as well. Need to test the pipeline but in theory, it converts the PDF files into images before processing.
@RunningStallion-o6u 11 годин тому
Impressive! Nice Job. Thanks.
@engineerprompt 2 години тому
thanks :)
@gulson83 18 годин тому ⁺⁶
Where do you store your vision vectors?
@HassanAllaham 10 годин тому
very good question and it is a dangerous one 🤔
@engineerprompt 2 години тому
Its under a folder called .byaldi in the project directory.
@kees6 13 годин тому
Awesome work
@CocaNoah 17 годин тому
It would be really cool to have the whole rag as an api server where i can upload documents and query them. This way we would be able to upload as many documents as we want into the document store and then query them as we want. This then could be used in all kinds of applications.
Btw, i wonder how to make sure that not a fix amount of documents is retrieved, but instead only as many as needed 🤔
@engineerprompt 2 години тому
API server is a neat idea and I will have a look into it. At the moment, I am not sure if the model can decide dynamically how many documents/pages to return but you can alway return more and then filter them out as secondary step.
@mightymagnus-m4n 2 години тому
Hi! how does Colpali handle cases where the information starts on one page and continues on the next? I ran a test with a query where the answer is found in 7 items across two consecutive pages, but Colpali only recognizes that the answer is on the first page where the context of the question is provided.
@engineerprompt Годину тому
I don't think colpali will be able to handle this specific situation but I think there is a way around it. The idea would be that when I page is returned, we can take the page before and after it and send that as context as well. This will cover the case if the answer spans more than one page.
@st3ppenwolf 15 годин тому ⁺¹
Too expensive to run anything at scale, but cool stuff.. this is the future
@engineerprompt 2 години тому
Agree, it can be expensive but with the reduction in prices, hopefully it will become a viable solution.
@Cingku 17 годин тому
Thanks but is it support using ollama instead of paid one?
@HassanAllaham 10 годин тому
LocalGPT Vision depends on vLLM for LLM inference.
vLLM does support open source free LLM to be used in LocalGPT Vision
Qwen2-VL-7B-Instruct
(it is on HuggingFace)
colqwen2-v0.1
(it is on HuggingFace)
Ollama depends on llama.cpp which does not support Qwen 2's vision arch YET
and I do not think that Ollama support colqwen2-v0.1
so you can use LocalGPT Vision for free but without Ollama .. it will download the needed open source free LLMs 😁
@engineerprompt 2 години тому
You can use ollama but as far as I know, ollama doesn't have support the multimodal llama3.2 yet (ollama.com/blog/llama3.2)
@BACA01 10 годин тому
Does it actually learn from the info submitted or it is just a searchable database made from those pdf? Can it reason on that info?
@engineerprompt 2 години тому
No, it is basically a search mechanism. There is no learning involved.
@awesomedata8973 7 годин тому
Does this use the Claude / Anthropic killer RAG approach?
@engineerprompt 2 години тому
this uses vision models for everything. Its very different than the text based RAG approach.
@techoman5986 15 годин тому ⁺¹
Great effort and a novel new approach to RAG. Theoretically looks nice, ! BUT in real world production environment the resources and performance for such a system will NOT be as cost effective as text based RAG.
Vision models in general will always be more expensive to run and scale than text models in the range of double to triple per token.
For the implementation demonstrated there is a room for improvements: vLLM is the one of the worst preforming LLM management tools out there so maybe that could be replaced . Also Conda is high on resources compared to other environments.
We will see how the project evolves, and hopefully it will gain attraction and improve overtime.
@time8553 9 годин тому
What are other good tools for LLM management and better inference ?
@techoman5986 7 годин тому
@@time8553 In many benchmarks for performance and workloads types for many LLM interfaces, llama.cpp seems to have best overall scoring for most use cases.
@techoman5986 7 годин тому
@@time8553 llama.cpp seems to score best overall results in benchmarks of man workloads scenarios.
@techoman5986 2 години тому
llama.cpp seems to score best overall results in benchmarks of man workloads scenarios.
@irrelevantdata 2 години тому
Hrm for some reason colqwen2 is using CPU to index the images... strange. Anyone else having this problem? Looks like all the requirements installed... cuda is true... is there some setting I'm missing?
@engineerprompt 2 години тому
this seems to be coming from the byaldi package which I am using under the hood for indexing (github.com/AnswerDotAI/byaldi/issues/35).
@irrelevantdata Годину тому
@@engineerprompt I have byaldi working in another conda env, so that's not the problem I don't think.. will dig into it. While using my GPU it colqwen2 takes less than a second to index a page, and qwen2-VL-7B then takes about 3-4 seconds to answer a question given a particular page. Maybe scaling is different or something... or it's not called with "cuda" but "cpu"? But I can tell it's not using the GPU in localGPT-vision, VRAM isn't taken up and utilization doesn't go anywhere, just the CPU is doing work.
@sourav-bz 6 годин тому
What's the inference time?
@engineerprompt 2 години тому
It takes 5-10 seconds if you are using one of the APIs. For Qwen2, I am getting about 15 seconds on my M2 Pro. This will be significantly lower if you are using NVIDIA GPUs and if you can get to run the quantized models.
@caseyj789456 15 годин тому
Can this use with ollama instead of (openAi) ?
@HassanAllaham 10 годин тому
LocalGPT Vision depends on vLLM for LLM inference.
vLLM does support open source free LLM to be used in LocalGPT Vision
Qwen2-VL-7B-Instruct
(it is on HuggingFace)
colqwen2-v0.1
(it is on HuggingFace)
Ollama depends on llama.cpp which does not support Qwen 2's vision arch YET
and I do not think that Ollama support colqwen2-v0.1
so you can use LocalGPT Vision for free but without Ollama .. it will download the needed open source free LLMs 😁
@engineerprompt 2 години тому
You could but ollama doesn't have support for the multimodal llama yet (ollama.com/blog/llama3.2)
@RickySupriyadi 17 годин тому
you included molmo cool....
@engineerprompt 2 години тому
its a pretty nice model :)
@harikrishnank913 18 годин тому
Could you make a video to build small language model from scratch ⚙️🛠️
@HassanAllaham 10 годин тому
This would need a very very long video or a series of videos .. to be honest it will not be practical and it will need a huge computing resources + long time for training.. but if you would like to know how this can be done (for learning) then do some search, and you will find some tuts and some GitHub repos dealing with your request 🙂
@engineerprompt 2 години тому
Karpathy has one, I think. Probably a better teacher than me :)
@awesomedata8973 7 годин тому
Sad that developers always forget Windows users for tutorial commands. :/
@engineerprompt 2 години тому
That is a valid statement. I don't have a windows machine on which I can test this.
@userrjlyj5760g 18 годин тому ⁺⁸
But bro you kept skipping 'waiting' part!!!! How long does it take to retrieve responses for these simple short answers questions you are submitting?!? Time is number 1 factor that determines how good a RAG is!!!! In this video it sounds like it takes forever and that's why you were skipping it!?!
@PassingTheDog 17 годин тому
Just try it mfker
@richardkuhne5054 16 годин тому ⁺⁹
Why do you just complain here? Someone build something and is sharing it with the world for free. Be grateful and if you have something to contribute then improve the speed if you can.
@prasannakarthik7721 15 годин тому
Indexing is taking forever even for 1 small size image.
Edit: I do acknowledge the potential of this technology and hope that soon we'll see a solution which can do all these in a matter of seconds/ms.
The title was a bit misleading nevertheless it's an informative video and I certainly had fun testing it!
Looking forward for more such videos!
@userrjlyj5760g 14 годин тому ⁺¹
@@prasannakarthik7721
I know! It takes long for regular text file, so I would imagine it takes more for an image! I mean this is assuming it would capture all info in the image accurately/correctly!
@userrjlyj5760g 14 годин тому
@@richardkuhne5054
Relax dude! Haven't you been in any open discussions forums, or work related meetings/environment!?
Stick around to learn!
@RedCloudServices 19 годин тому ⁺¹
Can you make a video using LocalGPT in Openwebui?
@RickySupriyadi 18 годин тому ⁺¹
? what?
@RickySupriyadi 17 годин тому ⁺¹
i hope no triple what after this
@userrjlyj5760g 17 годин тому ⁺²
Once you find a video on how to drive a Mercedes that is inside a BMW, then he might do it for you! 😅
@RickySupriyadi 17 годин тому
this is getting more and more absurd 😁
@userrjlyj5760g 17 годин тому
@@RickySupriyadi 😅
@joepropertykey3612 3 години тому ⁺¹
Doesnt anyone install any of this crap, or is it total 'looks cool' while not installing any of it? I get errors over and over on win 11 . Pretty common with his 'local GPT' stuff that he posts on here and NEVER runs on Windows but says ' run the install on the requrements.txt' and waste yer time downloading garbage packages that don't always install right. It's become a waste of time following channel on windows (basically anything he does code wise) whlle he NEVER says 'and this has been tested on Windows'. Because it never is. This channel is almost bait/switch clickbait how he continuously does this.
I've had it with this B__ll S__t. Unsubcribed. Drop this dude and demonetize the clickbait channel

Наступне

Автоматичне відтворення

EASIEST Way to Fine-Tune LLAMA-3.2 and Run it in Ollama