Marker:Get Your PDFs Ready for RAG & LLMs|High Accuracy Open-Source Tool

Поділитися
Вставка
  • Опубліковано 19 гру 2024

КОМЕНТАРІ • 15

  • @nnamdiodozi7713
    @nnamdiodozi7713 6 днів тому

    Thanks for the video. I'm looking to extract items for company financial statements. Thes items are mostly in tables. Is this a good approach to use for this problem?

  • @abdinegara3135
    @abdinegara3135 6 місяців тому

    Hey man i really appreciate your video, actually you deserve a more viewers ❤

  • @mukeshkund4465
    @mukeshkund4465 6 місяців тому +4

    Appreciate it. How can we build RAG on top of this?? If you can make a video on that it will be very helpful.

  • @Reality_Check_1984
    @Reality_Check_1984 2 місяці тому

    I see how to run this out of the terminal but how do we import and run this in a python file? I have had some issues.

  • @ignaciopincheira23
    @ignaciopincheira23 5 місяців тому

    Could you add the description of each image to the text with the aim of having a single Markdown file, similar to the original PDF? This way, it would be possible to pass a file to a language model that is readable and maintains its content.

  • @intellect5124
    @intellect5124 5 місяців тому

    Very informative video. Could you try to build a system that can run on a large number of PDFs and further convert these to .md files for an LLM to query or generate specific prompts with a UI?

  • @Mehraj_IITKGP
    @Mehraj_IITKGP 17 днів тому

    this isn't supported on python3.13

  • @DassS-dass
    @DassS-dass 6 місяців тому

    It's great 👍

  • @atomobianco
    @atomobianco 6 місяців тому

    Details matter, you say the index is well formatted into a table but it seems to me that the Markdown displays two columns while the PDF index only had one column

    • @DataEdge01
      @DataEdge01  6 місяців тому

      The limitations were addressed in the beginning of the video