Im a medical researcher and, surprisingly, my life is all about pdfs i dont have any time to read; let alone learn the basics of code. And i think there's a lot of people on the same boat as mine. Unfortunately, its very fucking hard to actually find an ai tool thats barely reliable. Most of youtube is damped with sponsors for ai magnates trying to sell their rebranded and redudant worthless ai-thingy for a montlhy subscription or an unjustifiably costly api that follows the same premise. The fact that you, the only one that came closer to what i actually need - and a very legitimate need - is a channel with
Thank you so much for sharing about the pain points you're experiencing and the solution you're seeking. I'd like to be more helpful to you and many more like you as well. I have an idea of creating a UI using Streamlit for the code in this tutorial with a step-by-step explanation of how to get it running on your system. You will essentially clone the repository, install Ollama and pull any models you like, install the dependencies, then run Streamlit. You'll then be able to upload PDFs on the Streamlit app and chat with it on a chatbot like interface. Let me know if this will be helpful. Thanks again for your feedback.
I'm in the space as well, and am trying to find the best way to parse PDFs. I've setup grobid on docker and tried that out. My work laptop is a bit garbage, and being in the world's largest bureaucracy, procuring hardware is a pain in the ass. Anyways, great video.
@@tonykipkemboiI think most people are in pain now with just this part "upload pdfs to service X". This is what they want/have to avoid. Anyhow, nice video you made here.
You sir are awesome! It is easy to make things hard, yet hard to make them simple. Thanks for working so hard to make this simple. Excellent presentation. I will be coming back for more!!
I am a layman and have been trying to figure this out for a week, I've watched alot of video's but yours has been by far the best. You cadence is good, you are direct while making it accessible on a high level to follow along. No obfuscation or assumptions. etc etc etc. Great video thank you, have a comment like and subscribe.
@@tonykipkemboi its not working giving error ConnectError: [Errno 61] Connection refused Traceback: File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 75, in exec_func_with_error_handling result = func() ^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 574, in code_to_exec exec(code, module.__dict__) File "/Users/adityagupta/Desktop/rag/ollama_pdf_rag/streamlit_app.py", line 278, in main() File "/Users/adityagupta/Desktop/rag/ollama_pdf_rag/streamlit_app.py", line 200, in main models_info = ollama.list() ^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ollama/_client.py", line 464, in list return self._request('GET', '/api/tags').json() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ollama/_client.py", line 69, in _request response = self._client.request(method, url, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_client.py", line 827, in request return self.send(request, auth=auth, follow_redirects=follow_redirects) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_client.py", line 914, in send response = self._send_handling_auth( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_client.py", line 942, in _send_handling_auth response = self._send_handling_redirects( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_client.py", line 979, in _send_handling_redirects response = self._send_single_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_client.py", line 1015, in _send_single_request response = transport.handle_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_transports/default.py", line 232, in handle_request with map_httpcore_exceptions(): File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 158, in __exit__ self.gen.throw(value) File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_transports/default.py", line 86, in map_httpcore_exceptions raise mapped_exc(message) from exc
@@tonykipkemboi That's an interesting question. I tried and found that MultiQueryRetriever works well in general, when LLM needs to connect indirect information from document, but fails to provide relevant information for direct information present in the document. But, this observation could differ case to case.
It is by far the easiest and excellent tutorial for learning RAG + PDF. I'd love to see more of topic with a bit more advance in the future. Thank you very much!
Sure thing. I actually have this in my list of upcoming videos. Agentic RAG is pretty cool right now and will play with it and share a video tutorial. Thanks again for your feedback.
Good to see fellow Kenyans on AI. Perhaps the Ollama WebUI approach would be easier for beginners as one can attach a document, even several documents to the prompt and chat.
Thank you for this excellent intro. You are a natural teacher of complex knowledge and this has certainly fast-tracked my understanding. I'm sure you will go far and now you have a new subscriber in Australia. Cheers and thank you - David
Really userful content and well explained. t would be interesting to see a video but with different types of files, like only PDFs, for example Markdown, PDF, and CSV all at once. It would be very interesting.
Thanks for the share. Quite enlightening. I will def build upon that. Here is the problem I have. Let's say Ihave two documents and I wanna chat with both at the same time (for instance to extract conflicting points between the two). What would you advise here?
Thank you! That's an interesting use case for sure. My instinct before looking up some solutions is to maybe create 2 separate collections for each of the files then retrieve them separaetly and chat with them for comparison. I'm sure my suggestion above might not be efficient at all. I will do some digging and share any info I find.
@@tonykipkemboi you deserve it. Too many LLM UA-camrs are more concerned to show a lot of things than to make them easy to understand and to reproduce. Keep up the great work!
Great RAG tutorial! With Web3NS, imagine pairing YourName.Web3 with local AI pipelines like this to power secure, decentralized knowledge management in Web3. 🚀
i am coming from the visual model and tensor world... you have my subscription about to head over to the second video and try this out. I have some huge pdf's I am hoping to work with.
Thank you so much for this video! I am learning so much from it! I am trying to process hundreds of short PDFs (10 - 20 pages) and extract the same information from those PDF to generate a database, so in my case I don't necessarily need a live two-way chat, or the MultiQueryRetriever. Would you have a recommendation of which retriever would best fit my goal please?
🙏 thank you. I'm actually working on a video for RAG over CSV. The demo in this tutorial will not work for CSV or structured data; we need a better loader for structured data.
Wonderful tutorial, man! Let me ask you, what are the other kinds of prompts we can use? Also, is it normal for the rag to answer questions about things not on the pdf that was loaded? For example, i tested with the prompt "what is a dog" and got a answer back. Is it because of the RAG and Ollama? Thanks a bunch
Thank you so much for this great tutorial! It was really helpful and insightful. I have a few questions: Could you please share what operating system you are using for this setup? Which Python version worked for you? If possible, could you share the specific versions of the libraries you installed? I’ve checked the requirements file on GitHub, but having the exact versions would be super helpful to avoid compatibility issues.
What GPU do you use ? I have Ollama running on an i5 intel with integrated CPU and so unable to use any of 3B + models. TinyLama and TinyDolphin works but the accuracy is way off
I have an Apple M2 with 16GB of memory. I noticed that larger models slow down my system and sometimes force a shutdown of everything. One way around it is deleting other models you're not using.
First of all, thank you for this video. I understand that running models locally is good to deal with private data, but you are using chroma as a vector database. Is chroma reliable? How do I know the way they use our data?
@@dudulascasas4509 thank! Good question. The instance of Chroma can run locally as well is you prefer or pick another vectordb like milvus or pg vector and spin up a localhost instance to connect to. Making it totally airgapped is important as you mentioned.
hello, 8:03, i am getting an error: OSError: No such file or directory: '/home/supriya/nltk_data/tokenizers/punkt/PY3_tab' What it means? and how to fix it?? Please HELP!!
I've been given a story, the trojan war which is a 6 page pdf or I can even use the story as a text , also 5 pre decided question is given to ask based on the story, I want to evaluate different models answers but I am failing to evaluate even one, kindly help, please guide thoroughly.
This sounds interesting! I believe if you're doing this locally, you can follow the tutorial to create embeddings of the PDF and store it in a vector db then use the 5 questions to generate output from the models. You can switch the model type in between each response and probbly have to save each response separately so you can compare them afterwards.
Yes, there are smaller quantized models on Ollama you can use, but most of them require a sizeable amount of RAM. Check out these instructions from Ollama on the size you need for each model. You can also do one at a time, then delete the model after use to create space for the next one you pull. I hope that helps. github.com/ollama/ollama?tab=readme-ov-file#model-library
Hey there (from a new subscriber) :) Thank you for this amazing video. I have a few questions please 1. You are talking to the MultiQueryRetriever in a way that it understands your english sentences (when you instruct it to create 5 questions) is this MultiQueryRetriever an AI itself that understands english and has some common sense like how ChatGPT does? 2. similarly, you create a prompt telling ""Answer the question based on the context ONLY" and supply this prompt to the chain, meaning, to the local_model. So the local model also has some common sense to understand your instruction in the prompt. Right? A video with details like these would be super helpful for beginners and aspirants like me, I find no videos online that explains it at a more higher level. Thanks for your work !
Thank you for your explanation. However, I am encountering an issue: after getting OllamaEmbeddings to 100%, the Jupyter notebook requires an automatic restart. Why does this happen? As a result, I have to run the app again
@@tonykipkemboi "The kernel appears to have died. It will restart automatically." This message popup on jupyter notebook after OllamaEmbeddings to 100%
Yeah facing the same problem. "The Kernel crashed while executing code in the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. " How to solve this ?
Greatvideo thanks, but it's essentially a control+f from vecotr database right? I thought we would train a LLM with data and then it would generate a result form a given question
yes you can. you can use the directory loader function from langchain but you'll have to adjust the embeddings to accommodate for the different file types.
Hi y'all! I know a lot of you reported some errors in getting the current code to run. Good news, I have updated the code and will be pushing it out today. Should I make a quick video to highlight the changes?
I managed to get the code working. Are there any tricks to speed up retrieval? I'm using a fairly modest business laptop. I have to wait 5 minutes per response.
@@thomashoddinott4537 oof yeah that's slow. TBH that's due to the bloat introduced by using LangChain. I'll try coming up with a solution and record an updated short supplementary video
Glad you found it helpful. For different file types, you would consider the loading/parsing and chunking strategies that fit those data types. I'm working on the next video which I will go over CSV & Excel RAG.
Nice video, and very informative. My question: I have downloaded the LLMs like gemma, llama2, llama3 and so on on my MacOS. But due to some technical issue, I deleted these LLMs. ( e.g: $ ollama rm llama2) Now I want them again, and noticed that if I run "$ ollama run llama3", this **downloads the entire 4.7GB from the internet** over again. Is it possible to keep them downloaded at some place and when I want it - just run $ ollama run and use it and later delete it when not needed ? Again Thanks in advance and would appreciate a response.
Thank you. What you did earlier is the standard way of downloading, serving, and deleting the Ollama models. You can also download more quantized options for each, with less memory. I usually add and then delete whenever I don't need it or when I need to download another model.
Great video! Thanks for sharing. I ran into an issue with a Chroma dependency on SQLite3 (i.e. RuntimeError: Your system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0). The suggested solutions are not working. Is it possible to use another DB in place of Chroma?
Thank you! Yes, you can swap it with any other open-source vector database. You might also try using a more recent version of Python, which should come with a newer version of SQLite. Do you know what version you are using now? You can also try installing the binary version in the notebook like so: `!pip install pysqlite3-binary`
Great delivery of material. How about fine-tuning for llama3 using your own curated dataset as a video? There are some out there, but your teaching style is very good.
I assume you're talking about Llama2? Or are you referring to the Nomic text embedding model? If it's Llama2, it's possible to use it to interact with tabular data by passing the data to it (RAG or just pasting data to the prompt) but cannot vouch for its accuracy though. Most LLMs are not great at advanced math but they're getting better for sure.
Thank you for the kind words. Yes, if you use Ollama models like we did on the video, then your content will stay private and not be sent to any online service. To be sure, I'd recommend turning off your WiFi or any connection once you've loaded all the dependencies and imports. You can then run the cells to lead your PDF to a vector db and chat with it. After you're done, you can delete the collection where you saved the vectors of your PDF before turning your connection back on. This is an extra measure to give you peace of mind.
Good one.. ok you touched on security- you have here something that doesn’t let things flow out to the internet. I saw a bunch of vids about tapping data from dbs using sql agents. But none said specifically anything about security. So qn- does using sql agents violate data security?
You bring up a critical point and question. Yes, I believe most agentic workflows currently, especially tutorials, lack proper security and access moderation. This is a growing and evolving portion of agentic frameworks + observability, IMO. I like to think of it as people needing special access to databases at work and someone managing roles and the scope of access. So agents will need some form of that management as well.
Hello friend, thank you very much for your content. I have a question, how can I make it listen to my server within Google Collab so I don't have to use Jupyter, since my resources are a bit limited?
Hello ! nice tutorial. I was stuck on the first part unfortunately as I get the error: "Unable to get page count Is poppler installed and in PATH". Do you have any idea how to solve this ? I have already installed poppler using brew.
Thank you nice tutorial, I work for automobile dealer in IT. Can I use the this approach in connecting millions of seperate invoices to llama3? Thanks in advance
@@krakan4383 that is possible but since your documents have structured data within them, you have to make sure you test that it parses the numbers appropriately. Unstructured has more fucntions to parse structured data that you can implement in the loading stage. Another thing to keep in mind is that the models are currently not too great at math so they might not return accurate calculations. You might consider adding an agent that does the calculation in a sandbox using something like Pandas. Look into e2b.dev or LangChain's pandas agent.
Thanks for the video tutorial. It clearly guided us through all the key elements for a RAG system and was very helpful!! When trying your code, I got the following errors when submitting a question. What could the root cause of this issue be? Thanks! - ERROR - Error processing prompt: no such table: embeddings
Quite interesting and thanks for sharing it, can you let me know if this would run on 32GB CPU RAM Core i7 processor? Considering you are using mistral model
Is there any restriction on size of the pdf? Is it possible to load multiple pdf files? Will the contents of the pdf will be passed to LLM so that will use token?
Thank you. Yes that would be cool. I can see the challenge coming from finding an open source model that is good at multiple languages. The ones I used are not great at all. For voice, it'd probably be easy to use an open source TTS or even be more granular and use 11labs for a better quality in spite of it not being local.
Thanks for sharing this. Very helpful. Also, what are you using for screen recording and editing this video ? I see that it records the section where your mouse cursor is ! Nice video work as well. Only suggestion is to increase gain in your audio
I'm glad you find it very helpful. I'm using Screen Studio (screen.studio) for recording; it's awesome! Thank you so much for the feedback as well. I actually reduced it during editing thinking it was too loud haha. I will make sure to readjust next time.
Hi @xrlearn - Found a way to print the 5 questions using `logging`. Here's the code you can use to print out the 5 questions: ``` import logging logging.basicConfig() logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO) unique_docs = retriever.get_relevant_documents(query=question) len(unique_docs) ``` Here are more detailed docs from LangChain that will help. python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever/
that was a really helpful video thanks a lot , but i have only one problem is that it took me so long to respond like 30min , btw im using weviate image in docker as vector db and nominic embeddings also the ollama's phi3 as my pretrained llm but it doesnt take so much time could u please suggest me to do smth to make it work
Thank you so for such impressive video, just one point when running the loading steps, I’m receiving SSL error certificate verification error… not sure why and which certificate it’s referring to?
@tonykipkemboi, Thank you very much for valuable video. It helped me a lot. I was struggling to get the right LLM that can run locally. I have a question: How do I create a persistent RAG so that the query results can be faster.
@@bhagavanprasad glad you found it useful. For this example, the speed depends on several factors one major one being your system configuration. If you have a GPU, then it will be much faster. An intermediate step would be to remove the MultiQueryRetriever since that generates more questions from your prompt then retrieve context for all the questions from the vectoredb which takes time and introduces latency. You can use the generic one question query then optimise retrieval another way like using a reranking model. But that might also be a bit more than what we covered in this tutorial. There's definitely a trade off where you sacrifice accuracy for speed and vice versa.
It would largely depend on the capabilities of the given model to translate from English to the target language. You can try by adding the target language in the prompt. Tell it to return the results in X language.
🚨 NEW UPDATED TUTORIAL 🚨
I've created a V2 tutorial of this video here ua-cam.com/video/SXjfAIwbkZY/v-deo.htmlsi=hQugknx01XYuemqJ
Im a medical researcher and, surprisingly, my life is all about pdfs i dont have any time to read; let alone learn the basics of code. And i think there's a lot of people on the same boat as mine. Unfortunately, its very fucking hard to actually find an ai tool thats barely reliable. Most of youtube is damped with sponsors for ai magnates trying to sell their rebranded and redudant worthless ai-thingy for a montlhy subscription or an unjustifiably costly api that follows the same premise. The fact that you, the only one that came closer to what i actually need - and a very legitimate need - is a channel with
Thank you so much for sharing about the pain points you're experiencing and the solution you're seeking. I'd like to be more helpful to you and many more like you as well. I have an idea of creating a UI using Streamlit for the code in this tutorial with a step-by-step explanation of how to get it running on your system. You will essentially clone the repository, install Ollama and pull any models you like, install the dependencies, then run Streamlit. You'll then be able to upload PDFs on the Streamlit app and chat with it on a chatbot like interface. Let me know if this will be helpful. Thanks again for your feedback.
hey, hmu and ill give you my rag that supports multiple pdfs and you can choose the llm you desire to use.
I'm in the space as well, and am trying to find the best way to parse PDFs. I've setup grobid on docker and tried that out. My work laptop is a bit garbage, and being in the world's largest bureaucracy, procuring hardware is a pain in the ass. Anyways, great video.
USe nvidia RTX chat for pdf summarizing and querying. Purchase a cheap RTX card of minimum 8GB vRAM.
@@tonykipkemboiI think most people are in pain now with just this part "upload pdfs to service X". This is what they want/have to avoid. Anyhow, nice video you made here.
Welcome on my special list of channels I subscribe to. Looking forward to you making me smarter😊
Thank you for that honor! I'm glad to be on your list and will do my best to deliver more awesome content! 🙏
You sir are awesome! It is easy to make things hard, yet hard to make them simple. Thanks for working so hard to make this simple. Excellent presentation. I will be coming back for more!!
Thank you @ten2the6, am glad you found it useful! 🫡
I don't subscribe easily even at gun point. But I like your way and now I'm a subscriber. Great technical clarity is what the modern world needs.
@@daixtr am glad you enjoyed the content and thanks for the sub!
I am a layman and have been trying to figure this out for a week, I've watched alot of video's but yours has been by far the best. You cadence is good, you are direct while making it accessible on a high level to follow along. No obfuscation or assumptions. etc etc etc. Great video thank you, have a comment like and subscribe.
Thank you, @maly9903, I am glad you found it useful! 🫡
Congrats man. Really useful content. Well explained and effective.
Thank you, @ISK_VAGR! 🙌
great man and you really supporting your viiewers and try to solve the error what a great personality
@@Lumix-o1j I appreciate the support. I owe that to my viewers tbh.
Such an amazing video, I didn't get a single video like this to get me started so easily. You are truly amazing!
@@nacksters3987 🙏
one of the best videos about RAG and LLM i have come across!! thanks a lot!!
Glad you found it helpful!
@@tonykipkemboi code is not working
@@Adinasa2 what exactly is not working? can you share the error message?
@@tonykipkemboi its not working giving error
ConnectError: [Errno 61] Connection refused
Traceback:
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 75, in exec_func_with_error_handling
result = func()
^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 574, in code_to_exec
exec(code, module.__dict__)
File "/Users/adityagupta/Desktop/rag/ollama_pdf_rag/streamlit_app.py", line 278, in
main()
File "/Users/adityagupta/Desktop/rag/ollama_pdf_rag/streamlit_app.py", line 200, in main
models_info = ollama.list()
^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ollama/_client.py", line 464, in list
return self._request('GET', '/api/tags').json()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ollama/_client.py", line 69, in _request
response = self._client.request(method, url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_client.py", line 827, in request
return self.send(request, auth=auth, follow_redirects=follow_redirects)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_client.py", line 914, in send
response = self._send_handling_auth(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_client.py", line 942, in _send_handling_auth
response = self._send_handling_redirects(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_client.py", line 979, in _send_handling_redirects
response = self._send_single_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_client.py", line 1015, in _send_single_request
response = transport.handle_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_transports/default.py", line 232, in handle_request
with map_httpcore_exceptions():
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 158, in __exit__
self.gen.throw(value)
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpx/_transports/default.py", line 86, in map_httpcore_exceptions
raise mapped_exc(message) from exc
Thanks for this amazing tutorial on building a local LLM. I applied it to my research paper PDFs, and the results are impressive.
Awesome 🤩 Love to hear that! Did you experiment without using the MultiQueryRetriever in the tutorial to see the difference?
@@tonykipkemboi That's an interesting question. I tried and found that MultiQueryRetriever works well in general, when LLM needs to connect indirect information from document, but fails to provide relevant information for direct information present in the document. But, this observation could differ case to case.
Thank You. I have done several similar projects and I learn something new about 'local RAG' with each one !
very high quality tutorial. You explain things so well!!
@@soudaminipanda 🙏
It is by far the easiest and excellent tutorial for learning RAG + PDF. I'd love to see more of topic with a bit more advance in the future. Thank you very much!
Thanks! More to come for sure. What topics would you like me to cover?
Thats a pretty clean explanation.
looking for more videos.
Thank you! Glad you like the delivery. I got some more cooking 🧑🍳
Can you make one video of RAG using Agents? Great video btw. Thanks
Sure thing. I actually have this in my list of upcoming videos. Agentic RAG is pretty cool right now and will play with it and share a video tutorial. Thanks again for your feedback.
I was planning on doing this as a project. If you beat me to it, I can compare notes
You are a awesome teacher, thank you so much to explain this in a clean and objective way :)
🙏
Wow, what a legend! Subscribed!
I just learned a new work and better workflow, great tutorial buddy
Good to see fellow Kenyans on AI. Perhaps the Ollama WebUI approach would be easier for beginners as one can attach a document, even several documents to the prompt and chat.
🙏 Yes, actually working on a Streamlit UI for this
This is a fun and potent project. This provides access to a powerful space. Peace be on you.
Thank you and glad you like it!
Thank you for this excellent intro. You are a natural teacher of complex knowledge and this has certainly fast-tracked my understanding. I'm sure you will go far and now you have a new subscriber in Australia. Cheers and thank you - David
Glad to hear you found the content useful and thank you 🙏 😊
Really userful content and well explained. t would be interesting to see a video but with different types of files, like only PDFs, for example Markdown, PDF, and CSV all at once. It would be very interesting.
Thank you! I have this in my content pipeline.
Thanks for sharing. Great video and explanation.
Thanks for the share. Quite enlightening. I will def build upon that. Here is the problem I have. Let's say Ihave two documents and I wanna chat with both at the same time (for instance to extract conflicting points between the two). What would you advise here?
Thank you! That's an interesting use case for sure. My instinct before looking up some solutions is to maybe create 2 separate collections for each of the files then retrieve them separaetly and chat with them for comparison. I'm sure my suggestion above might not be efficient at all. I will do some digging and share any info I find.
maybe a bit late but yuo can use instead of the pdfloader the directoryloader, there you can load folders and not just single pdfs
This is a really good video. Thanks a lot for making it, I found it very helpful!
Glad you found it helpful.
Ahsante sana Kip. Working on a bank/ fintech chatbot and will use this info to build it.
Thank you. Hood tutorial!
@@KinoInsight thank you!
very clear, thanks tony
@@leolovetech thank you!
Very good! Easy to understand, easy to try, expandable ....
Awesome! Great to hear.
@@tonykipkemboi you deserve it. Too many LLM UA-camrs are more concerned to show a lot of things than to make them easy to understand and to reproduce. Keep up the great work!
Clear instruction, excellent tutorial. Thank you Tony!
Thank you for the feedback and glad you liked it! 😊
You're welcome Ezekiel!
Excellent bro! You just gained a new sub!
🙏
Great RAG tutorial! With Web3NS, imagine pairing YourName.Web3 with local AI pipelines like this to power secure, decentralized knowledge management in Web3. 🚀
this was super clear, extremely informative, and was spot on with the exact answers I was looking for. Thank you so much.
Glad you found it useful and thank you for the feedback!
i am coming from the visual model and tensor world... you have my subscription about to head over to the second video and try this out. I have some huge pdf's I am hoping to work with.
Top-tier information here. Thank you!
🙏
Simple and well illustrated, Arap Kemboi 👍🏾👍🏾👍🏾
Asante sana bro! 🙏
Dope video man! Keep them coming
Appreciate it!!
Great job
Thank you! 🙏
Would love it if can make the streamlit app! I am still struggeling to make a streamlit app based on open source llms
Thank you! Yes, I'm working on a Streamlit RAG app.
I have released a video on Ollama + Streamlit UI that you can start with in the meantime.
@@tonykipkemboi thanks bro! I will defo watch👌
Instead of using a single file as the PDF, can I point to a folder with many PDF's? Like 100 PDF's, and use that as the context for my model?
Yes you can . Make a dir and put all the PDFs inside the dir .
Thank you so much for this video! I am learning so much from it! I am trying to process hundreds of short PDFs (10 - 20 pages) and extract the same information from those PDF to generate a database, so in my case I don't necessarily need a live two-way chat, or the MultiQueryRetriever. Would you have a recommendation of which retriever would best fit my goal please?
Great job. Does the file you chat with have to be a PDF or can it be a CSV or other structured file type?
🙏 thank you. I'm actually working on a video for RAG over CSV. The demo in this tutorial will not work for CSV or structured data; we need a better loader for structured data.
That cool be nice to see multiple pdfs loaded, to see if can be made handle different topics at once.
This is excellent!! Thank you!!
🙏
Very helpful! Great video! 👍
🙏❤️
This is a great tutorial. Thank you
🙏
Wonderful tutorial, man! Let me ask you, what are the other kinds of prompts we can use? Also, is it normal for the rag to answer questions about things not on the pdf that was loaded? For example, i tested with the prompt "what is a dog" and got a answer back. Is it because of the RAG and Ollama? Thanks a bunch
Hi Tony.
Thanks for the video.
Can you please make a video on how to use Colpali VLM.
@@bramarambikaambati6352 yes, I got this coming.
Good one, Good luck🤞
Thanks ✌️
Really good video thank you
Thank you for sharing good content
🙏
Thank you!
Useful tip : use a proper wifi dont use Mobile hotspot while pulling the model from ollama ,i had a error with that ,hopes it helps someone😊
Lol this is common sense 😂
Thank you so much for this great tutorial! It was really helpful and insightful. I have a few questions:
Could you please share what operating system you are using for this setup?
Which Python version worked for you?
If possible, could you share the specific versions of the libraries you installed? I’ve checked the requirements file on GitHub, but having the exact versions would be super helpful to avoid compatibility issues.
What GPU do you use ? I have Ollama running on an i5 intel with integrated CPU and so unable to use any of 3B + models. TinyLama and TinyDolphin works but the accuracy is way off
I have an Apple M2 with 16GB of memory. I noticed that larger models slow down my system and sometimes force a shutdown of everything. One way around it is deleting other models you're not using.
Nice job, thanks Tony!
🙏
Thanks, Can you please explain one by one and slowly. Especially the RAG part
Thanks for asking. Which part of the RAG pipeline?
First of all, thank you for this video. I understand that running models locally is good to deal with private data, but you are using chroma as a vector database. Is chroma reliable? How do I know the way they use our data?
@@dudulascasas4509 thank! Good question. The instance of Chroma can run locally as well is you prefer or pick another vectordb like milvus or pg vector and spin up a localhost instance to connect to. Making it totally airgapped is important as you mentioned.
Thanks Useful. Very Man.
hello, 8:03, i am getting an error: OSError: No such file or directory: '/home/supriya/nltk_data/tokenizers/punkt/PY3_tab'
What it means? and how to fix it?? Please HELP!!
I've been given a story, the trojan war which is a 6 page pdf or I can even use the story as a text , also 5 pre decided question is given to ask based on the story, I want to evaluate different models answers but I am failing to evaluate even one, kindly help, please guide thoroughly.
Can you please reply, would really appreciate that.
This sounds interesting! I believe if you're doing this locally, you can follow the tutorial to create embeddings of the PDF and store it in a vector db then use the 5 questions to generate output from the models. You can switch the model type in between each response and probbly have to save each response separately so you can compare them afterwards.
@@tonykipkemboi What amount of storage will the model take.
I don't have greatest of the hardware.
Yes, there are smaller quantized models on Ollama you can use, but most of them require a sizeable amount of RAM. Check out these instructions from Ollama on the size you need for each model. You can also do one at a time, then delete the model after use to create space for the next one you pull. I hope that helps.
github.com/ollama/ollama?tab=readme-ov-file#model-library
Do you think I should use llama 3.1 8b or mistral7b for rag?
Good question. I'd even say try the new models like Llama3.2 and see how they perform
Hey there (from a new subscriber) :)
Thank you for this amazing video. I have a few questions please
1. You are talking to the MultiQueryRetriever in a way that it understands your english sentences (when you instruct it to create 5 questions) is this MultiQueryRetriever an AI itself that understands english and has some common sense like how ChatGPT does?
2. similarly, you create a prompt telling ""Answer the question based on the context ONLY" and supply this prompt to the chain, meaning, to the local_model. So the local model also has some common sense to understand your instruction in the prompt. Right?
A video with details like these would be super helpful for beginners and aspirants like me, I find no videos online that explains it at a more higher level.
Thanks for your work !
Thank you for your explanation. However, I am encountering an issue: after getting OllamaEmbeddings to 100%, the Jupyter notebook requires an automatic restart. Why does this happen? As a result, I have to run the app again
@@tharindulakshan4782 what do you mean by automatic restart?
@@tonykipkemboi "The kernel appears to have died. It will restart automatically." This message popup on jupyter notebook after OllamaEmbeddings to 100%
Yeah facing the same problem.
"The Kernel crashed while executing code in the current cell or a previous cell.
Please review the code in the cell(s) to identify a possible cause of the failure. "
How to solve this ?
@@ADITYARAJPANDA-h7m I solved my issue by using the Kaggle platform and getting a GPU to run Ollama
thanks for this tony
🙏
What do you use to record your screen capture??
I use Screen Studio
Greatvideo thanks, but it's essentially a control+f from vecotr database right? I thought we would train a LLM with data and then it would generate a result form a given question
Thanks. Btw, how did you make your your UA-cam profile photo? It looks very nice.
Thank you! 😊
I used some AI avatar generator website that I forgot but I will find it and let you know.
Thank you
Great video! There's any way I can let a LLM read a folder on my PC and answer me using archives (pdfs, .md, .doc, sheets, etc) from that source?
yes you can. you can use the directory loader function from langchain but you'll have to adjust the embeddings to accommodate for the different file types.
thanks man this is extremely helpful!
🙏🫡
Hi y'all! I know a lot of you reported some errors in getting the current code to run.
Good news, I have updated the code and will be pushing it out today. Should I make a quick video to highlight the changes?
I managed to get the code working. Are there any tricks to speed up retrieval? I'm using a fairly modest business laptop. I have to wait 5 minutes per response.
@@thomashoddinott4537 oof yeah that's slow. TBH that's due to the bloat introduced by using LangChain. I'll try coming up with a solution and record an updated short supplementary video
yes, please
Are the libraries you used (langchain , chromaDB ...) open source? and can we use any ollama model?
yes and yes
Thanks a lot! If we have a mix of multiple PDFs, Words or Excel files, how can we change the RAG to support retrieval of them?
Glad you found it helpful. For different file types, you would consider the loading/parsing and chunking strategies that fit those data types. I'm working on the next video which I will go over CSV & Excel RAG.
Nice video, and very informative.
My question: I have downloaded the LLMs like gemma, llama2, llama3 and so on on my MacOS. But due to some technical issue, I deleted these LLMs. ( e.g: $ ollama rm llama2)
Now I want them again, and noticed that if I run "$ ollama run llama3", this **downloads the entire 4.7GB from the internet** over again.
Is it possible to keep them downloaded at some place and when I want it - just run $ ollama run and use it and later delete it when not needed ?
Again Thanks in advance and would appreciate a response.
Thank you. What you did earlier is the standard way of downloading, serving, and deleting the Ollama models.
You can also download more quantized options for each, with less memory. I usually add and then delete whenever I don't need it or when I need to download another model.
Great video! Thanks for sharing. I ran into an issue with a Chroma dependency on SQLite3 (i.e. RuntimeError: Your system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0). The suggested solutions are not working. Is it possible to use another DB in place of Chroma?
Thank you! Yes, you can swap it with any other open-source vector database. You might also try using a more recent version of Python, which should come with a newer version of SQLite. Do you know what version you are using now?
You can also try installing the binary version in the notebook like so: `!pip install pysqlite3-binary`
What are the Ollama modules that were used, I don't want to install unimportant modules on my machine since it has only limited space.
@@sulayamar8538 did you watch the video?
Thanks for the content and i am stuck and unable to find how to add context/chat history back in the model ?
Great delivery of material. How about fine-tuning for llama3 using your own curated dataset as a video? There are some out there, but your teaching style is very good.
Thank you and that's a great suggestion!
I'll add that to my list.
Can this model query with tabular data or image data, can't it?
I assume you're talking about Llama2? Or are you referring to the Nomic text embedding model? If it's Llama2, it's possible to use it to interact with tabular data by passing the data to it (RAG or just pasting data to the prompt) but cannot vouch for its accuracy though. Most LLMs are not great at advanced math but they're getting better for sure.
@@tonykipkemboi Does this model works on image data?
@@sasikumartist not the llama2. you'd have to use an image or multimodal model for that. check out llava model.
Appreciate your work, wanted to know can i use it for confidential pdf. is there will be any chances of data leak ??
Thank you for the kind words. Yes, if you use Ollama models like we did on the video, then your content will stay private and not be sent to any online service. To be sure, I'd recommend turning off your WiFi or any connection once you've loaded all the dependencies and imports. You can then run the cells to lead your PDF to a vector db and chat with it. After you're done, you can delete the collection where you saved the vectors of your PDF before turning your connection back on. This is an extra measure to give you peace of mind.
Good one.. ok you touched on security- you have here something that doesn’t let things flow out to the internet. I saw a bunch of vids about tapping data from dbs using sql agents. But none said specifically anything about security. So qn- does using sql agents violate data security?
You bring up a critical point and question. Yes, I believe most agentic workflows currently, especially tutorials, lack proper security and access moderation. This is a growing and evolving portion of agentic frameworks + observability, IMO. I like to think of it as people needing special access to databases at work and someone managing roles and the scope of access. So agents will need some form of that management as well.
Hello friend, thank you very much for your content. I have a question, how can I make it listen to my server within Google Collab so I don't have to use Jupyter, since my resources are a bit limited?
Hello ! nice tutorial. I was stuck on the first part unfortunately as I get the error:
"Unable to get page count Is poppler installed and in PATH".
Do you have any idea how to solve this ?
I have already installed poppler using brew.
Thank you. Have you tried using chatgpt to troubleshoot?
Thank you nice tutorial, I work for automobile dealer in IT. Can I use the this approach in connecting millions of seperate invoices to llama3? Thanks in advance
@@krakan4383 that is possible but since your documents have structured data within them, you have to make sure you test that it parses the numbers appropriately. Unstructured has more fucntions to parse structured data that you can implement in the loading stage. Another thing to keep in mind is that the models are currently not too great at math so they might not return accurate calculations. You might consider adding an agent that does the calculation in a sandbox using something like Pandas. Look into e2b.dev or LangChain's pandas agent.
Does this pdf library encode embedded tables in the pdf document
I didn't cover that piece in this tutorial but my guess would be no.
Thanks for the video tutorial. It clearly guided us through all the key elements for a RAG system and was very helpful!!
When trying your code, I got the following errors when submitting a question. What could the root cause of this issue be? Thanks!
- ERROR - Error processing prompt: no such table: embeddings
Quite interesting and thanks for sharing it, can you let me know if this would run on 32GB CPU RAM Core i7 processor? Considering you are using mistral model
Thank you. Yes that should be sufficient to run the program.
Is there any restriction on size of the pdf? Is it possible to load multiple pdf files? Will the contents of the pdf will be passed to LLM so that will use token?
Hi Tony. Thanks for this work. I get "ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'"
nevermind. think I got it. Thanks again!
nice one
very detailed explanation, thanks, can you please make the same project to give responses in multi-language and with voice output?
Thank you. Yes that would be cool. I can see the challenge coming from finding an open source model that is good at multiple languages. The ones I used are not great at all. For voice, it'd probably be easy to use an open source TTS or even be more granular and use 11labs for a better quality in spite of it not being local.
Is it compulsory to pull mistral model from Ollama to run the project which size around 4GB??
@@sivakumar7679 you can pick any other model.
Good video 👍👍👍
Thanks for sharing this. Very helpful. Also, what are you using for screen recording and editing this video ? I see that it records the section where your mouse cursor is ! Nice video work as well. Only suggestion is to increase gain in your audio
I'm glad you find it very helpful. I'm using Screen Studio (screen.studio) for recording; it's awesome!
Thank you so much for the feedback as well. I actually reduced it during editing thinking it was too loud haha. I will make sure to readjust next time.
@@tonykipkemboi Btw, can you see those 5 questions that it generated before summarizing the document?
@@xrlearn, I'm sure I can. I will try printing them out and share them here with you tomorrow.
Hi @xrlearn - Found a way to print the 5 questions using `logging`. Here's the code you can use to print out the 5 questions:
```
import logging
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)
unique_docs = retriever.get_relevant_documents(query=question)
len(unique_docs)
```
Here are more detailed docs from LangChain that will help.
python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever/
that was a really helpful video thanks a lot , but i have only one problem is that it took me so long to respond like 30min , btw im using weviate image in docker as vector db and nominic embeddings also the ollama's phi3 as my pretrained llm but it doesnt take so much time could u please suggest me to do smth to make it work
Thank you so for such impressive video, just one point when running the loading steps, I’m receiving SSL error certificate verification error… not sure why and which certificate it’s referring to?
@@mslashm can you share the full error log
@@tonykipkemboi Sure, it's happening whil loading the PDF file
URLError:
@tonykipkemboi, Thank you very much for valuable video. It helped me a lot.
I was struggling to get the right LLM that can run locally.
I have a question: How do I create a persistent RAG so that the query results can be faster.
@@bhagavanprasad glad you found it useful. For this example, the speed depends on several factors one major one being your system configuration. If you have a GPU, then it will be much faster. An intermediate step would be to remove the MultiQueryRetriever since that generates more questions from your prompt then retrieve context for all the questions from the vectoredb which takes time and introduces latency. You can use the generic one question query then optimise retrieval another way like using a reranking model. But that might also be a bit more than what we covered in this tutorial. There's definitely a trade off where you sacrifice accuracy for speed and vice versa.
very good!
Thank you!
thanks for the tutorial ! how can I make the model to give answers in a different language?
It would largely depend on the capabilities of the given model to translate from English to the target language. You can try by adding the target language in the prompt. Tell it to return the results in X language.