Thanks for the video. I'm looking to extract items for company financial statements. Thes items are mostly in tables. Is this a good approach to use for this problem?
Could you add the description of each image to the text with the aim of having a single Markdown file, similar to the original PDF? This way, it would be possible to pass a file to a language model that is readable and maintains its content.
Very informative video. Could you try to build a system that can run on a large number of PDFs and further convert these to .md files for an LLM to query or generate specific prompts with a UI?
Details matter, you say the index is well formatted into a table but it seems to me that the Markdown displays two columns while the PDF index only had one column
Thanks for the video. I'm looking to extract items for company financial statements. Thes items are mostly in tables. Is this a good approach to use for this problem?
Hey man i really appreciate your video, actually you deserve a more viewers ❤
Appreciate it. How can we build RAG on top of this?? If you can make a video on that it will be very helpful.
Noted thank
Same request
I see how to run this out of the terminal but how do we import and run this in a python file? I have had some issues.
Could you add the description of each image to the text with the aim of having a single Markdown file, similar to the original PDF? This way, it would be possible to pass a file to a language model that is readable and maintains its content.
Noted!
Very informative video. Could you try to build a system that can run on a large number of PDFs and further convert these to .md files for an LLM to query or generate specific prompts with a UI?
Noted,thanks!
this isn't supported on python3.13
It's great 👍
Details matter, you say the index is well formatted into a table but it seems to me that the Markdown displays two columns while the PDF index only had one column
The limitations were addressed in the beginning of the video