Multi-Modal RAG: Chat with Text and Images in Documents

Поділитися
Вставка
  • Опубліковано 21 жов 2024

КОМЕНТАРІ • 26

  • @engineerprompt
    @engineerprompt  3 місяці тому

    If you want to learn RAG Beyond Basics, checkout this course: prompt-s-site.thinkific.com/courses/rag

  • @wtcbretburstjk3726
    @wtcbretburstjk3726 3 місяці тому +2

    thank you, keep it coming chief great work !

  • @aa-xn5hc
    @aa-xn5hc 3 місяці тому +2

    These rag videos are super interesting

  • @stressrelaxationmusicchann4638
    @stressrelaxationmusicchann4638 3 місяці тому +4

    Hey this is amazing and i kindly request you to upload some videos how can we work with pdf document extraction for text ,tables, images graphs etc.. in the documents for rag application

  • @IdPreferNot1
    @IdPreferNot1 3 місяці тому

    Such great code explanation and layout... so many Gist-able functions...thanks!!

  • @declan6052
    @declan6052 Місяць тому

    Is there a cost to using the API keys here? Wondering if this can be built into an application at scale

  • @roip429
    @roip429 3 місяці тому

    Excellent tutorial!
    Can you share the .ipynb please

  • @alpcan3777
    @alpcan3777 2 місяці тому

    Thanks for great video. Is it possible to take both input image and text from user and query this? For example, user will upload its car image and ask about similar cars with lowest price based on the uploaded image. Then the system retrieve related car image and text from database.

  • @amanharis1845
    @amanharis1845 3 місяці тому +1

    Hi, I had a small doubt. Doesn't the Langchain's document loaders extract image from the document?

    • @engineerprompt
      @engineerprompt  3 місяці тому +1

      No, by default, its does not. You can use something like unstructedio that can extract images and tables. Will create a video on it soon.

    • @amanharis1845
      @amanharis1845 3 місяці тому +1

      @@engineerprompt I have actually built a RAG chatbot using Langchain for my organisation. The pdf that we load usually contains lots of tables and few images. So far it is giving good responses from those PDFs. But ya if there is a method to extract these non text datas more efficiently, I'll definitely want to integrate with my chatbot.

    • @aadarshunniwilson8517
      @aadarshunniwilson8517 3 місяці тому

      ​@@engineerprompt any updates on this.

  • @AEismann-d6c
    @AEismann-d6c 3 місяці тому

    I wonder how much time before we will be able to run this locally, and then what would be a good model. So far from my testing nothing could compare to GPT-4... Thanks for the video

    • @free_thinker4958
      @free_thinker4958 3 місяці тому

      CLaude 3.5 sonnet is far more performant than any model now

    • @engineerprompt
      @engineerprompt  3 місяці тому +1

      local vision models have still a long way to go. But hopefully we will have something "good enough" soon.

  • @VidishArvind
    @VidishArvind 3 місяці тому

    Can u make the same thing using free api models cause gpt api ain't free. Also a guide to host it on a cloud would also be great. End to end app deployed on cloud

  • @pratheekbabu272
    @pratheekbabu272 2 місяці тому

    hey will this code not run in windows only in colab?

  • @Know_Ur_World
    @Know_Ur_World 2 місяці тому

    Can u use pdf containing images instead of this text data and image data

  • @TheAstralftw
    @TheAstralftw 3 місяці тому +2

    This is nice demo but really useless in real world scenarios because you can maybe extract those images from wiki, but you can not from specific PDF file.. but it is still nice demo, but not very useful in real world projects where you need to build specific app .. still good thing for someone who wants to learn

  • @zoranProCode
    @zoranProCode 3 місяці тому

    Why it’s exactly 10x better?! Maybe it’s just better?

  • @kishorethota9959
    @kishorethota9959 2 місяці тому

    Can we get the code?