Multi-modal RAG: Chat with Docs containing Images

Поділитися
Вставка

КОМЕНТАРІ • 36

  • @engineerprompt
    @engineerprompt  26 днів тому +1

    If you want to learn RAG Beyond Basics, checkout this course: prompt-s-site.thinkific.com/courses/rag

  • @ilaydelrey3122
    @ilaydelrey3122 25 днів тому +4

    a nice open source and self hosted version would be great

  • @AI-Teamone
    @AI-Teamone 24 дні тому

    Such an insightful information, Eagerly waiting for more multimodel approches.

  • @aerotheory
    @aerotheory 25 днів тому

    Keep going with this approach, it is something I have been struggling with.

    • @waju3234
      @waju3234 24 дні тому

      Me too. For my case, the answer is normally hidden behind the data, context and the images.

  • @tasfiulhedayet
    @tasfiulhedayet 25 днів тому

    We need more videos on this topic

  • @RolandoLopezNieto
    @RolandoLopezNieto 26 днів тому

    Lots of good info, thanks

  • @legendchdou9578
    @legendchdou9578 25 днів тому +1

    Very nice video but if you can do it with open source embedding model it would be very cool. thank you for the video

  • @ArdeniusYT
    @ArdeniusYT 26 днів тому +2

    Hi your videos are very helpful thank you

  • @mohsenghafari7652
    @mohsenghafari7652 26 днів тому

    it's great job! Thanks

  • @ai-touch9
    @ai-touch9 25 днів тому

    I appreciate your effort. Pl create one to fine tune the model for efficient retrieval if possible, with lang chain.

  • @garfield584
    @garfield584 24 дні тому

    Thanks

  • @BACA01
    @BACA01 26 днів тому

    Thanks your videos are very helpful. I have several Gigs of pdf ebooks that i would like to process with RAG. What do you think what approach would be the best, this or a graphrag. In my case i'm looking only for local models as the costs would be very high. What if to convert all pdf pages into images first and then process them with local model like phi 3 vision and then process it with Graphrag, would it work out?

  • @Techn0man1ac
    @Techn0man1ac 21 день тому

    What about make same, but using LLAMA3 or less local LLM?

  • @vinayakaholla
    @vinayakaholla 26 днів тому +1

    Can you pls dive deeper into why qdrant was used and other vector dbs limitations to store both text and image embeddings, thx

  • @ignaciopincheira23
    @ignaciopincheira23 26 днів тому +2

    It is essential to conduct a thorough preprocessing of the documents before entering them into the RAG. This involves extracting the text, tables, and images, and processing the latter through a vision module. Additionally, it is crucial to maintain content coherence by ensuring that references to tables and images are correctly preserved in the text. Only after this processing should the documents be entered into a LLM.

  • @BarryMarkGee
    @BarryMarkGee 8 днів тому

    Out of interest what is the application called that you used to illustrate the flows? (2:53 in the video) thanks.

    • @engineerprompt
      @engineerprompt  8 днів тому +1

      I am using mermaid code for this.

    • @BarryMarkGee
      @BarryMarkGee 8 днів тому

      @@engineerprompt thanks. Great video btw 👍🏻

  • @amanharis1845
    @amanharis1845 26 днів тому

    Can we do this method using Langchain ?

  • @JNET_Reloaded
    @JNET_Reloaded 26 днів тому

    wheres the code used?

  • @codelucky
    @codelucky 23 дні тому

    Is it better than GraphRAG? How does the output quality compare to it?

    • @engineerprompt
      @engineerprompt  22 дні тому +1

      You could potentially create a graphRAG on top of it.

  • @RickySupriyadi
    @RickySupriyadi 26 днів тому

    I except image generation will be have another kind of breed... image gen based on image understanding based on facts

  • @redbaron3555
    @redbaron3555 25 днів тому +3

    This approach is not good enough to add value. The pictures and text needs to be referenced and linked in both vector stores to create better similarities.