Multi-modal RAG: Chat with Docs containing Images

Поділитися
Вставка
  • Опубліковано 1 гру 2024

КОМЕНТАРІ • 47

  • @engineerprompt
    @engineerprompt  4 місяці тому +1

    If you want to learn RAG Beyond Basics, checkout this course: prompt-s-site.thinkific.com/courses/rag

    • @jfbaro2
      @jfbaro2 2 місяці тому

      Does it cover how to minimize (or even eliminate) hallucinations, and that the result would ALWAYS consider the content added into the RAG "database"?

  • @aerotheory
    @aerotheory 4 місяці тому +1

    Keep going with this approach, it is something I have been struggling with.

    • @waju3234
      @waju3234 4 місяці тому

      Me too. For my case, the answer is normally hidden behind the data, context and the images.

  • @rubencabrera8519
    @rubencabrera8519 3 місяці тому

    This is the best AI channel out there, PERIOD. Thanks for sharing your knowledge

  • @ilaydelrey3122
    @ilaydelrey3122 4 місяці тому +7

    a nice open source and self hosted version would be great

  • @AI-Teamone
    @AI-Teamone 4 місяці тому

    Such an insightful information, Eagerly waiting for more multimodel approches.

  • @b.lem.2499
    @b.lem.2499 Місяць тому

    Thanks, is there a video of the same project, but with langchain instead of llama index?

  • @Makkar-b3v
    @Makkar-b3v 7 днів тому

    Great stuff.

  • @Techn0man1ac
    @Techn0man1ac 4 місяці тому

    What about make same, but using LLAMA3 or less local LLM?

  • @AyishaAshraf-s2f
    @AyishaAshraf-s2f Місяць тому

    Use case is to extract the relevant text information along with images available in the file using generative ai, When any prompt is given then relevant text information and image should display as response.

  • @tasfiulhedayet
    @tasfiulhedayet 4 місяці тому

    We need more videos on this topic

  • @BACA01
    @BACA01 4 місяці тому

    Thanks your videos are very helpful. I have several Gigs of pdf ebooks that i would like to process with RAG. What do you think what approach would be the best, this or a graphrag. In my case i'm looking only for local models as the costs would be very high. What if to convert all pdf pages into images first and then process them with local model like phi 3 vision and then process it with Graphrag, would it work out?

  • @ArdeniusYT
    @ArdeniusYT 4 місяці тому +2

    Hi your videos are very helpful thank you

  • @BarryMarkGee
    @BarryMarkGee 4 місяці тому

    Out of interest what is the application called that you used to illustrate the flows? (2:53 in the video) thanks.

    • @engineerprompt
      @engineerprompt  4 місяці тому +1

      I am using mermaid code for this.

    • @BarryMarkGee
      @BarryMarkGee 4 місяці тому

      @@engineerprompt thanks. Great video btw 👍🏻

  • @RolandoLopezNieto
    @RolandoLopezNieto 4 місяці тому

    Lots of good info, thanks

  • @vinayakaholla
    @vinayakaholla 4 місяці тому +1

    Can you pls dive deeper into why qdrant was used and other vector dbs limitations to store both text and image embeddings, thx

    • @engineerprompt
      @engineerprompt  4 місяці тому

      will see if I can create a video on it.

  • @legendchdou9578
    @legendchdou9578 4 місяці тому +2

    Very nice video but if you can do it with open source embedding model it would be very cool. thank you for the video

  • @ai-touch9
    @ai-touch9 4 місяці тому

    I appreciate your effort. Pl create one to fine tune the model for efficient retrieval if possible, with lang chain.

  • @avinashnair5064
    @avinashnair5064 2 місяці тому

    can you make it using comeplete open source models?

  • @RedCloudServices
    @RedCloudServices 3 місяці тому

    do you think all of this is now replaced with Gemini ?

  • @ScottzPlaylists
    @ScottzPlaylists 3 місяці тому +2

    Need to do it all in open source. No API Keys.

  • @ignaciopincheira23
    @ignaciopincheira23 4 місяці тому +2

    It is essential to conduct a thorough preprocessing of the documents before entering them into the RAG. This involves extracting the text, tables, and images, and processing the latter through a vision module. Additionally, it is crucial to maintain content coherence by ensuring that references to tables and images are correctly preserved in the text. Only after this processing should the documents be entered into a LLM.

  • @mohsenghafari7652
    @mohsenghafari7652 4 місяці тому

    it's great job! Thanks

  • @amanharis1845
    @amanharis1845 4 місяці тому

    Can we do this method using Langchain ?

  • @codelucky
    @codelucky 4 місяці тому

    Is it better than GraphRAG? How does the output quality compare to it?

    • @engineerprompt
      @engineerprompt  4 місяці тому +1

      You could potentially create a graphRAG on top of it.

  • @cristiantironi296
    @cristiantironi296 Місяць тому

    What if the user query contain text + image?

    • @engineerprompt
      @engineerprompt  Місяць тому

      You can you a VLM to generate description of the images and send that as part of the text query

    • @cristiantironi296
      @cristiantironi296 Місяць тому

      @@engineerprompt yeah as i was expected, but what if i pass an image that VLM doesn't understand, for example personal image not available online, i should first fine tune the VLM on my images then do what u said right?

  • @garfield584
    @garfield584 4 місяці тому

    Thanks

  • @JNET_Reloaded
    @JNET_Reloaded 4 місяці тому

    wheres the code used?

  • @redbaron3555
    @redbaron3555 4 місяці тому +4

    This approach is not good enough to add value. The pictures and text needs to be referenced and linked in both vector stores to create better similarities.

  • @RickySupriyadi
    @RickySupriyadi 4 місяці тому

    I except image generation will be have another kind of breed... image gen based on image understanding based on facts