Multimodal RAG: Chat with PDFs (Images & Tables) [latest version]

Поділитися
Вставка
  • Опубліковано 23 гру 2024

КОМЕНТАРІ •

  • @algatra6942
    @algatra6942 Місяць тому +7

    Idk, i just finally found most understandable AI Explanation Content. Thank you Alejandro

    • @alejandro_ao
      @alejandro_ao  Місяць тому

      glad to hear this :)

    • @argl1995
      @argl1995 Місяць тому

      ​@@alejandro_aoI want to create a multi llm chatbot for telecommunication, is it a way to connect with you apart from UA-cam so that I can share the problem statement with you ?

  • @onkie.ponkie
    @onkie.ponkie Місяць тому +4

    i was about to learn from the previous video. But you brother. just bring more gold.

  • @ZevUhuru
    @ZevUhuru Місяць тому

    Bro I literally came to back to get your old video on PDFs and you already have an update. Thank You!

  • @whouchekin
    @whouchekin Місяць тому +2

    the best touch is when you add front-end
    good job

    • @alejandro_ao
      @alejandro_ao  Місяць тому +4

      hey! i'll add a ui for this in a coming tutorial 🤓

  • @nirmesh44
    @nirmesh44 24 дні тому

    Best Explanation ever.

  • @jaggyjut
    @jaggyjut День тому

    Thank you for building this tutorial. What about in the RAG preprocessing stage to clean data, redact PII information before sending to LLM for chunking and chunk enrichment (keywords, metadata etc)

  • @Diego_UG
    @Diego_UG Місяць тому +2

    What do you recommend or how do you suggest that the conversion of a PDF of images (text images) to text can be automated? The problem is that traditional OCR does not always do the job well, but ChatGPT can handle difficult images.

  • @jaimeperezpazo
    @jaimeperezpazo Місяць тому

    Excellent!!!! Thank you Alejandro

  • @muhammadadilnaeem
    @muhammadadilnaeem Місяць тому +1

    Amazing Toturial

  • @ahmadsawal2956
    @ahmadsawal2956 Місяць тому +2

    thanks for great content, how can we modify this to user local LLM , Ollama3.2 and Ollama-vision

  • @suksukram1234
    @suksukram1234 13 днів тому

    Good job!

  • @uknowme4_5
    @uknowme4_5 3 дні тому

    when will the frontend part come? Super excited for this part

  • @SidewaysCat
    @SidewaysCat Місяць тому +3

    Hey dude what are you using to screen record? Mouse sizing and movement looks super smooth id like to create a similar style when giving tutorials

    • @alejandro_ao
      @alejandro_ao  Місяць тому

      hey there, that's this screen studio app for mac developed by the awesome Adam Pietrasiak @pie6k, check it out :)

  • @ronnie333333
    @ronnie333333 Місяць тому

    Thank you for the video. Just curious, how to go about persisting the multivector database? What data sources are available that cater to such requirements? Also, how do we go about getting an image as an input from the user, so the language model can relate to it within the documents and predict an answer!

  • @davidbaur300990
    @davidbaur300990 25 днів тому

    The functionality you showed to visualize the PDF Page and the retrieved chunk in the end seems super usefull for Citation display. Since I couldnt find the implementation in your colab is there a way to find this implementation somewhere else?
    Amazing content!

  • @duanxn
    @duanxn Місяць тому

    Great tutorial, very detailed. Just one question, any options to link the text chunk that describes the image as the context of the image to create more accurate summary of the image?

    • @alejandro_ao
      @alejandro_ao  Місяць тому

      beautiful question. totally. as you can see, the image is actually one of the `orig_elements` inside a `CompositeElement`. and the `CompositeElement` object has a property called `text`, which contains the raw text of the entire chunk. this means that instead of just extracting the image alone like i did here, you can extract the image alongside the text in its parent `CompositeElement` and send that along with the image when generating the summary. great idea 💪

  • @maryamesmaeili7365
    @maryamesmaeili7365 18 днів тому

    Thanks for this beautiful and comprehensive presentation. I do have one question about its security. I would like to use Mistral instead of opening and also using ollama run the code locally, so I would like to know your opinion about the data security. Will it rest secure? Because the pdf files I'm going to use as input are confidential. Thanks for your response in advance

    • @MrAhsan99
      @MrAhsan99 17 днів тому

      Well, if you're running the model (LLaMA, Mistral, or Qwen) locally, you don't need to worry. It's safe unless someone hacks your PC and steals all your data. 😛

    • @alejandro_ao
      @alejandro_ao  8 днів тому

      hey there. sure thing. if you are running these models locally, then there is absolutely no data leaving your machine, so no need to worry about data leakage. that being said, consider that there are two moments where your data from your files could leave your computer:
      - when you call your LLM. so if you are using a local LLM with OLlama, no need to worry. Just make sure to use a multimodal model, such as Llava to make sure to get the images. or stick to a text to text model if your pdf does not require multimodality.
      - when you are parsing your document. in this example, i used Unstructured's open source library locally. if you do the same, your data never leaves your computer. but you can also use their serverless api. if you do that, then your data would be transiting through their servers.
      that being said, neither unstructured nor OpenAI/Anthropic use the data you send them to train their models. but I understand if you still don't want your confidential data transiting through foreign services

  • @saivikas96
    @saivikas96 25 днів тому +1

    Hi Bro, Thanks for the video. I have implemented the same algorithm using local LLM Ollama & I have used LLava model as multimodel . But Whenever I am asking some questions the algorithm is not able to retrieve images. I am stuck at this point.

    • @maryamesmaeili7365
      @maryamesmaeili7365 11 днів тому

      Hi I saw your comment and I am trying to do the same thing, and I am stocked with ChromaDB, my kernel keeps dying, then I found ChromaDB works well with openAI models, do you use another Vector Database? If yes which one? Thank you in advance.

    • @uknowme4_5
      @uknowme4_5 3 дні тому

      bro can you please share ur code

  • @Pman-i3c
    @Pman-i3c Місяць тому +1

    Very nice, is it possible to be done with local LLM like Ollama model?

    • @alejandro_ao
      @alejandro_ao  Місяць тому +2

      Yes, absolutely. just use the langchain ollama integration and change the line of code where i use ChatOpenAI or ChatGroq. Be sure to select multimodal models when dealing with images though

  • @GowthamRaghavanR
    @GowthamRaghavanR Місяць тому

    Good one!!, did you see any opensource alternatives like Markers?

  • @ChristopherFoster-McBride
    @ChristopherFoster-McBride 28 днів тому +1

    Could you create a repo for running this on Windows? Great video btw!

  • @akagi937
    @akagi937 10 днів тому

    when is the frontend part , being really fun following allow so far

    • @alejandro_ao
      @alejandro_ao  8 днів тому

      that would be a great addition. i'm working on that one!

  • @julianomoraisbarbosa
    @julianomoraisbarbosa Місяць тому

    # til
    tks for you video.
    is possible using crewAI in the same example ?

  • @blakchos
    @blakchos Місяць тому

    any idea to install poppler tesseract libmagic in windows maqhine?

  • @alexramos587
    @alexramos587 Місяць тому +1

    Nice

  • @AjayKumar-p2f1y
    @AjayKumar-p2f1y 18 днів тому

    Hi, Its a great video. can you help me with how to install these decencies in windows machine

  • @MrAhsan99
    @MrAhsan99 17 днів тому

    how to increase the accuracy of this RAG system?

    • @alejandro_ao
      @alejandro_ao  8 днів тому

      several improvement ideas. here are 3:
      - you can retrieve the image on the fly only if the chunk retrieved contains images instead of indexing them separately.
      - you can keep the images indexed separately, but instead of sending the single image to the LLM, you can retrieve their parent chunk and send the entire chunk along with the image for better context.
      - you can add a persistent document store so you don't have to re-index the whole thing every time

  • @texasfossilguy
    @texasfossilguy 19 днів тому

    Why did you decide to use Groq?

    • @alejandro_ao
      @alejandro_ao  8 днів тому

      just because they have a very generous free tier and pretty good models. that turns out useful for people watching who don't want to enter their credit card details to use these LLMs

  • @champln
    @champln 17 днів тому

    Can I connect to SQL database source ?

    • @alejandro_ao
      @alejandro_ao  8 днів тому

      sure thing. that would be more like structured retrieval rather than unstructured. you can check out this video: ua-cam.com/video/9ccl1_Wu24Q/v-deo.htmlsi=kSu-QkMInzq98u-u

  • @AkashL-y9q
    @AkashL-y9q Місяць тому +1

    Hi Bro, Can you create a video for Multimodal RAG: Chat with video visuals and dialogues.

    • @alejandro_ao
      @alejandro_ao  Місяць тому +3

      this sounds cool! i’ll make a video about it!

    • @AkashL-y9q
      @AkashL-y9q Місяць тому

      Thanks @@alejandro_ao

  • @karansingh-ce8yy
    @karansingh-ce8yy Місяць тому

    what about mathematical equatons?

    • @alejandro_ao
      @alejandro_ao  Місяць тому +3

      in this example, i embedded them with the rest of the text. if you want to process them separately, you can always extract them from the `CompositeElement` like i did here with the images. then you can maybe have a LLM explain the equation and vectorize that explanation (like we did with the description of the images). in my case, i just put them with the rest of the text, i feel like that gives the LLM enough context about it.

    • @karansingh-ce8yy
      @karansingh-ce8yy Місяць тому +1

      @@alejandro_ao thanks for the context i was stuck at this for a week now