Optimizing ColPali for Retrieval at Scale, from Theory to Practice

Поділитися
Вставка
  • Опубліковано 13 січ 2025

КОМЕНТАРІ • 14

  • @RobCaulk
    @RobCaulk Місяць тому +2

    I've been waiting for a break down like this to help me wrap my head around ColPali. Thanks!!

  • @AmitDeshmukh-l8d
    @AmitDeshmukh-l8d Місяць тому +3

    I'm currently building with qdrant (love the binary quantization and multi vector approach to scale retrival with colpali) I was wondering if a js/ts example exists because thats primarily our tech stack. If not I'll try to put something out eventually.

    • @sabrinaesaquino
      @sabrinaesaquino Місяць тому

      Thanks! We don’t have a JS/TS example yet, but we'd love to see what you create if you decide to put one together!

  • @rodyatube
    @rodyatube Місяць тому +3

    i've used tesseract ocr to get text from images and the result is just ok, certainly no where near what you shown colpali can do. will certainly give it a try.

    • @MrAhsan99
      @MrAhsan99 Місяць тому

      What approach did you take? Did you extract the images and store their text summaries? Then, during retrieval, did you use these summaries along with the original images to get the answer?

  • @nitin7554
    @nitin7554 Місяць тому +2

    Towards the end, she passed the whole image to a large 90b llama or gpt 4o, what's the point if have to pass the whole image instead of patches. Better owuld be if we can get the patches retrieved using copli and run some small vision model to extract answer.

    • @EvgeniyaSukhodolskaya
      @EvgeniyaSukhodolskaya Місяць тому

      You can, using ColPali's attention mask

    • @MrAhsan99
      @MrAhsan99 Місяць тому

      @@EvgeniyaSukhodolskaya what does that mean? how to do it?

  • @zishaansayyed2092
    @zishaansayyed2092 Місяць тому +2

    This is amazing. Punch down OCR’s

  • @haralc6196
    @haralc6196 Місяць тому +1

    Why is this better than asking GPT4o to read the image?

    • @EvgeniyaSukhodolskaya
      @EvgeniyaSukhodolskaya Місяць тому +1

      1) It's a free model
      2) It's optimized for retrieval
      You can't ask GPT4o to read 100k pages each time your user wants to find some answer among them:)

    • @haralc6196
      @haralc6196 Місяць тому

      @@EvgeniyaSukhodolskaya You can just ask GPT to extract the information you want and save it to the vector db. Don't have to analyse the image every time.

    • @EvgeniyaSukhodolskaya
      @EvgeniyaSukhodolskaya Місяць тому +2

      ​@@haralc6196 well then you need to think about every possible question you could answer with this one pdf page, and ask GPT-4o generate all of them&answer all of them. Not 100% it will cover all, and if you're doing VRAG and need PDF to be retrieved regardless (say, to look at the graph/chart), you'll have to save it regardless in db. Imo doesn't make much sense, unless it's a specific Q&A use-case/you really want to use gpt-4o for the sake of using it/volumes are too big for ColPali.