Super Easy Way To Parse PDF | LlamaParse From LlamaIndex | LlamaCloud

Data Science Basics

197

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 4 лют 2025

КОМЕНТАРІ • 24

@SantK1208 11 місяців тому ⁺³
You have latest and quality content, I always wait for ur video. Thanks ❤❤
@datasciencebasics 11 місяців тому
You are welcome. Glad that the videos are helpful.
@THE-AI_INSIDER 11 місяців тому ⁺⁷
Pls Do a video for unstructured io. Much needed. Please do it with local ollama
Also if we use this method , then will llama cloud have access to our private documents? As we are parsing those documents with llama parse
@datasciencebasics 10 місяців тому
Sure, will take that into account. Well, you are sending data to API so yes, it is stored somewhere in the cloud. If you have sensitive informations better to talk with them how to deal with it.
@NiKogane 11 місяців тому
Your Chanel is a gold mine and your videos are gems!
Thanks you for the great work!
BTW, what do you use to highlight webpages like in the LlamaParse page?
Keep up the great work!
@datasciencebasics 11 місяців тому
You are welcome, Glad the videos are helpful. The highlighter I am using is Weava Highlighter.
@THE-AI_INSIDER 11 місяців тому
can i replace llama2 embedding with nomic embed text and the ollama model with mistral? will it work ? actually i tried and it didnt, am i missing something?
@saeednsp1486 11 місяців тому ⁺²
i would really like to see a great local RAG
right now i think privategpt+ollama {mixtral} +reranker + unstructured io + OCR + qdrant is a good combination , as you said garbage in, garbage out !! so preprocessing the pdf files, specially pdfs that are complex and have tables,pictures,diagrams and all sorts of stuff , is the key to get the correct answer in RAG system
can you please build the most accurate Local RAG platform for complex pdf files as of 2024 ? i think we all need a video like this, please make this video
@datasciencebasics 11 місяців тому
that’s a good combination of tools out there. But again, running locally ( as models need to be quantized ), I still see some hiccups. But, lets see it might change.
@vap0rtranz 5 місяців тому
@saeednsp1486 look at AutoRAG. It's the only LlamaIndex based "all-in-one" app that I've seen that can do all inference locally (via Ollama backend). pip install and it's up and running. Its author actually meant to help evaluate RAG pipelines, so it's a bit programmer heavy still. Its takes some Python to get working. But a by-product of that goal is that AutoRag has everything together based on local LlamaIndex. The authors stuff is around on YT, Medium, and Reddit. I don't think it has LlamaParser, so maybe a feature request/PR?
@TooyAshy-100 11 місяців тому
Dear Sir,
Could you please, in integrating multiple cutting-edge technologies into a single system. Specifically, I am interested in combining the following components:
- RAG (Retrieval-Augmented Generation)
- Nomic Embedding Model
- OllaMa language model
- Groq hardware accelerator
-Chainlit
Additionally, please specify which language models should be used as the base for the system. Two potential options could be:
#model_name='llama2-70b-4096'
#model_name='mixtral-8x7b-32768'
and etc...
Thank you for your time and consideration. I greatly enjoyed your recent video and anticipate future content.👍
@datasciencebasics 11 місяців тому
Hello, you should have checked the video before this 😎
Crazy FAST RAG | Ollama | Nomic Embedding Model | Groq API
ua-cam.com/video/TMaQt8rN5bE/v-deo.html
@vatsalsharma6384 8 місяців тому
I had a naive doubt, beneath the query engine there's an associated llm that is working right? Otherwise how are we getting responses without using a llm?
If yes, then where is the model specified, as to which llm we are using.
If no, how such a well framed answer is coming without using llm, because as far as i know, it is the llm which actually takes the relevant pieces of context and stitches them to provide an answer in natural language.
@datasciencebasics 8 місяців тому ⁺¹
Yes, Llamaparseuses offcourse something behind the scene which is not revealed as its their service 🙂 It kow supports GPT-4o model for the same which is more expensive but better.
@vatsalsharma6384 8 місяців тому ⁺¹
@@datasciencebasics Yes, I tooo studied the documentation and found out that until a specific model is not given llamaindex uses the OpenAI GPT 3.5 turbo model.
Just another quick question, any downsides of llamaparse? Because for me it works well on parsing as well as extracting data from text as well as tables in a pretty satisfactory manner.
Why are people then using pypdf, or apache pdf extraction tools or even paddleocr kind of ocr engines for text extraction and not simply this library.
Additionally llamaparse can be integrated with the chains of langchain as well, which means it has no restriction that it can be used by llamaindex only, then why other frameworks??
please clarify this doubt I am new to this field.
@nicolassuarez2933 8 місяців тому
Outstanding! But how to get the metadata? Thanks!
@ashokpandey57 10 місяців тому
great job!!
@datasciencebasics 10 місяців тому
Thanks !!
@SantK1208 10 місяців тому
Using llama parsing means our data is exposed to external API , right ?
@datasciencebasics 10 місяців тому
yes it is, before using any sensitive information, I suggest you to contact them how it is handled.
@VenkatesanVenkat-fd4hg 11 місяців тому
Thanks and waiting for ur valuable videos. I like to process .docx files and get text & page number details. I think, no proper library available to get page number details of .docx.....
@datasciencebasics 11 місяців тому ⁺¹
You are welcome. I hope soon Llamaparse will handle that. If I find something useful, will make a video on that.
@jaivalani4609 7 місяців тому
Very Nice Video ,thanks, One thing I have found additional steps post Markdown , could you help understanding what does it do post the markdown "node_parser = MarkdownElementNodeParser(
llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8
nodes = node_parser.get_nodes_from_documents(documents)"
@Kani_Srini 8 місяців тому
Hi.Thanks for the video & great explanation. I tried your code. I am getting Error : "Retrying llama_index.embeddings.openai.base.get_embeddings in 0.97 seconds as it raised APIConnectionError: Connection error.." . Do you know how to resolve this?. This error happens in line "index = VectorStoreIndex.from_documents(documents)"

Наступне

Автоматичне відтворення

Super Easy Way To Parse Documents | LlamaParse Premium 🔥