ColPali: Vision-Based RAG System For Complex Documents

Поділитися
Вставка
  • Опубліковано 4 жов 2024

КОМЕНТАРІ • 36

  • @manishsharma2211
    @manishsharma2211 6 днів тому

    one thing what I like about this man is - he shows some background on each line / framework / library used to make people aware about all those nuances interactions b/w projects and researchers involved in it. love that

  • @israelazarkovitch5852
    @israelazarkovitch5852 28 днів тому +7

    Colpali is an excellent technique for English documents.
    When you try to use non-English documents, the retrieval doesn't work well because colapli uses the paligemma model which is a relatively small model trained mostly on an English data set

    • @engineerprompt
      @engineerprompt  27 днів тому +4

      good point, but I think you can finetune the vision model for other languages. Qwen is probably a good option there as well. Will see if there are any resources available and will share.

    • @vardhan254
      @vardhan254 26 днів тому +1

      qwen 2 VL is good for indic languages atleast from what i have tested

  • @saranepalashok
    @saranepalashok 27 днів тому

    Excellent. Exactly what I was looking for. A "fine-tuning" episode of such a VBRAG pipeline would be a great followup episode.

  • @frag_it
    @frag_it 27 днів тому +3

    Can you make an end to end project where instead of an index we throw the embedding to a vectorstore like chromadb or pinecone or something would be amazing

  • @darpnpro
    @darpnpro 2 дні тому

    Thank you for sharing this!

  • @kai_s1985
    @kai_s1985 27 днів тому +2

    Great work! Thanks! I wonder how it compares to vanilla RAG for text pdfs in terms of accuracy? Vanilla RAG suffers when the answer for the user question needs to be synthesized from different parts of the text. GraphRAG is good for those cases bit it is slow and expensive. Can this handle complex questions like those?

  • @manishsharma2211
    @manishsharma2211 6 днів тому

    EXCELLENT VIDEO - THANK YOU

  • @LTBLTBLTBLTB
    @LTBLTBLTBLTB 27 днів тому +3

    I tried to do this technique but with gemini-1.5-flash-exp-0827 and it works fine.

  • @IdPreferNot1
    @IdPreferNot1 26 днів тому

    Cool find in Claudette

  • @wdonno
    @wdonno 26 днів тому

    How do you "chunk" or parse sections out of longer documents? Or if we want to create a Knowledge graph? The ultimate analysis is done on an LLM, so we still have context length issues especially for local implementations. Can you extract the text itself for further processing?

  • @loicbaconnier9150
    @loicbaconnier9150 27 днів тому +2

    Why do we need a large Vram GPU ? where for Colpali or VLM ?

  • @tecnom7133
    @tecnom7133 27 днів тому

    may be if you pass an Image URL instead of the Image bytes you will consume less input tokens so less Cost?

  • @shobhitbishop
    @shobhitbishop 27 днів тому

    Will this work properly on pdf comprising detailed tabular information? And the hand drawn images?

  • @Yes-lm9dq
    @Yes-lm9dq 26 днів тому

    Do you think one could use this and convert a pdf into a text file which can be used to generate a knowledge graph using Microsoft's GraphRAG?

  • @goran-ai
    @goran-ai 27 днів тому

    What is the best way to contact you for consulting with our dev company?

  • @RedCloudServices
    @RedCloudServices 28 днів тому

    I wonder if a VBRAG could perform math calculations extracted from an image table? 🤔 I suppose if the results are accurate they could then be passed to another agent capable of calculations on the result?

    • @engineerprompt
      @engineerprompt  27 днів тому

      math might be a little hard but I think its worth trying.

  • @tecnom7133
    @tecnom7133 27 днів тому

    Thanks

  • @BACA01
    @BACA01 27 днів тому

    Very good content, thank you.

  • @absar66
    @absar66 27 днів тому

    many thanks for this great video…I have a set of scanned pages saved as pdf. will this work?..thanks..

    • @engineerprompt
      @engineerprompt  27 днів тому

      Yes, I think this approach will work on scanned pages as well.

  • @amortalbeing
    @amortalbeing 28 днів тому

    thanks

  • @user-uk9ls
    @user-uk9ls 28 днів тому

    This does not run on local Nivida RTX 4x 16 RAM GPU ?

    • @engineerprompt
      @engineerprompt  27 днів тому

      I think that will be able to run the pipeline.

  • @neatpaul
    @neatpaul 28 днів тому +1

    But this works only for pdf, what about docx, pptx, epub files? I want to work multimodal on those files too.

    • @frazuppi4897
      @frazuppi4897 28 днів тому +6

      it works with whatever that can be converted to image, so everything

  • @skyplanet9858
    @skyplanet9858 27 днів тому

    Man I love your videos and I'm a big fan. I just have one request and I wonder if everyone else agrees. I get very annoyed and distracted each time your video recording software zooms in and out of the the page or screen that you are showing. The problem is it's inconsistent and keeps jumping around. Please ,please consider making it static, unless you are zooming in for a longer period of time, like writing a code in VS. Thank you and keep the amazing work.

    • @engineerprompt
      @engineerprompt  27 днів тому

      thanks for pointing it out. that's a good feedback. Will see what I can do.

  • @ayanshproplayer5559
    @ayanshproplayer5559 27 днів тому

    Offline work???

    • @engineerprompt
      @engineerprompt  27 днів тому

      if you watch the video, you will know the answer :)

  • @kareemyoussef2304
    @kareemyoussef2304 17 днів тому

    None of these solutions are open source..even in your other videos. I think the video you have that uses marker is the only one