Who’s ready to build a local RAG pipeline from scratch? 🔨🔥 PS A big shout out to NVIDIA for sponsoring this video! Be sure to checkout NVIDIA GTC for the latest developments in AI, deep learning and GPU technology. It’s running from March 18-21 in San Jose California but is free to attend virtually (what I’m doing): nvda.ws/3GUZygQ
Great stuff, thanks for making this available for us noobs! I managed to follow it to the end with my custom source of text (a phpbb forum's posts), although it wasn't very good at answering it did produce some answers. All text was in norwegian, and the topics and the related answers was concatenated to its each line in a txt file. I'm sure i could have done this much better but for a first try i was rather satisfied! Could not have done this without this video tho, so yeah, great stuff! Any tips on models i can use for scandinavian/norwegian? I tried to search around a bit and after a quick talk with chatgpt (lol) i found the NB-BERT (Norwegian BERT), but did not manage to get it going.
It would be great if you do a video on how to add to the query a file, just like in the Chat GPT. My use case is that, A group of files as a context for the RAG and a query with a document of the RAG to compare the document in the query vs criteria in the content of other documents and evaluate compliance of the document in the query.
This video is a gem. I have recently completed watching your Tensorflow and PyTorch tutorials. Those videos are great in terms of resources. Plz, make a video on fine-tuning LLMs. Looking for more amazing videos from you in the future.
A lot of times I struggle to make it through a 10 or 20 minute video on this stuff. But somehow I can watch 5hours or even 24hours of you. I love how you keep it lite and you let use see your typos. You are a special an unique person and I am glad you do what you do. I really loved your 24 hour pytorch video too.
I'm a follower for quite some time and also a big advocate for open sourcing, knowledge exchange and community building. Really happy to see you consistently push yourself and your work but never loosing site of your core values. Great Job!
Finished the tutorial today and will start building something using this tremendous knowledge. Thankyou so much!! And a note : We are happy and excited with any Idea that pops into your head about a new educational video. we need tutorial on different frameworks and building from scratch.
Thank you so much! I just finished watching this video. As a beginner in ML/RAG, your teaching style is really helpful for entry-level users like me. I am from Sri Lanka
Finished the whole 5:40min Hours. thank you so much for sharing this video. I have tried the RAG system on some of the Geostatistics books that I have and it comes up with really good results. if there are any advanced topics on improving RAGs please share them. I found that the code sometimes have a hard time understanding difficult concepts and just ignore them. my assumption is the embedding can be improved to include better understanding.
Almost 6 hours of training but it only took me two day's to get through and build our own RAG model, it was a breeze. Very good training and it really helped me in understanding what RAG is and how to implement it. Thank you very much!
What a GREAT tutorial Daniel!! Truly step-by-step. Thanks for putting this together. I did not have major issues following it through. The decision of using Jupyter notebooks here was key..
Hey daniel, i have seen tensorflow and pytorch code videos and I just loved it and today I have completed this video.i just want to say thank you so much. love your content and your teaching style as well. love from INDIA. good luck
I am baffled by the sheer quality of this content, damn thanks, would love to see a similar tutorial on fine-tuning newer models like llama 3.1 or mistral ones
I am going to be a Graduate Researcher at my University and going to research RAG! I was worried since I was new to RAG until this video! Thanks, Daniel!
Excellent , I've been watching your videos for a while (the Python in 24 hours took some time to get through!) I like how you explain absolutely everything and don't jump ahead (even if some things may seem obvious) , it may take longer but by the end you get a very good understanding of the topic and all the concepts.
Jesus almost 6 hours! 😳 Gonna have to do a rail of Adderal and dig in. 😂 Seriously though with my ADHD I’ll have to break it down in sections, but I’m excited to watch the content. Thanks for the in-depth video brothers!
This tutorial is awesome!!! Thank you, Daniel. You are a fantastic teacher! I look forward to future videos about creating an app with this and optimizing LLMs locally! Super cool.
Holy shit, this was awsome, first time implementing rag, following all steps, just immediately fell in love. Also, I would love to see a deploy using gradio, and adding a "upload file" feature on it. Thanks for sharing your knowledge Daniel!!!
Thanks Daniel, so much to love about this video. Firstly I love this is true Python open-source code and not frameworks chained together. The delivery style works perfectly for me, with occasional offshoots to related, though not required, subjects of interest. Your thoughts and advice along with extensive notes build the perfect all-round package. Please Please do more on LLM optimization techniques you suggested and any other in-depth subject you care to share like embedding.
Are you kidding me? how can you be so easy to follow? I don't mean only in the technical part because I guess that not everyone can follow it, I mean in making such a long, code and faceless piece on content so entertaining and useful!
as always great stuff . thank you Dan . Based on this , can you make a video or share resources about evaluation methods . and additional one for deployment.
hi @mrdbourke, thanks a lot for the great stuff you provide. if you don't know what to do ;) it would be really cool to get also some insights/tutorial/video on the following topic - Our example only focuses on text from a PDF, however, we could extend it to include figures and images from a PDF file or any other source Thanks a lot. Cheers from Austria :) Harry
I came from your "Day in a Life as a Startup Founder video". Do you have a video on setting up what you have set up in your business? How to create the infrastructure, deploy models etc., in a business context. Or anything close to that.
Hi Dan, great content and thank you! Is it possible to have another video on the "optional" chat application for a complete end-to-end RAG pipeline? Thanks again!
Hello daniel. i love your content i just want to request you to create video on LLM from scratch and Stabel diffusing image generation from scratch. thank you again for very good leaning resources. good luck
Thank you so much for this great video. I have a question. Let's suppose that a PDF has a lot of images which include numbers or other information. Which library should we use to retrieve information from those images?"
Great video ! I will definitively make my own. Is there other ways to evaluate the LLM responses ? In order to evaluate automatically many answers given by the LLM (as for supervised learning for example) I guess we would have to pay the GPT4 access to the API... Could we use embeddings and cosine again to compare the generated answer to a label ? Or maybe use another local LLM on a bigger hardware ? Thank you so much !
Great Video Daniel! Thank you Would be a small word. Do you think We can apply same mechanism to build a chatbot on a CSV file or SQL database. How can we handle different columns with different datatypes likes strings for names, integers for ranks, counts,and scores and float for rates and other metrics. How to make an embedding model understand the differences? Would Love if you can make a video on it.
I see the whole video and I didn't expect that we are using a pretrained model for that 😢 I am watching because your title said from the start , for that we can directly use the model like gpt 2 or other that will work in less then 10 lines of code 😕
Thank you so much for a great video! I watched all of your deep learning and machine learning content and absolutely loved it! Always craving for more, can we expect a following video teaching Attention models and transformers or vision-related generative models such as DDPM?
nice tutorial! Anyone else using gemma-2b-it with 4 bit precision and getting answers like 'The context does not provide any information about the best source to fulfill nutritional requirements, so I cannot answer this question from the provided context.'? Maybe a larger llm will help.
Thanks for putting together an awesome walk through here. I'm thinking about using this implementation as a base and doing a video tutorial on turning it into an app or website & backing it with ChatGPT(for all the folks without crazy GPU power). Do you have any objections to me using your code for that purpose, assuming it's all properly attributed back to you with links to the channel?
Hey Robert! Feel free to adapt the code for your own purposes + tutorial :) It should all be available at the GitHub link in the description. Links back would be really appreciated. Cheers
Need videos on RAG agents that can handle math also. Working with clients, I am really seeing the limitations of the RAG framework when it comes to math stuff. It just sucks. RAG + Graph Agent is worth exploring.
return torch.mm(a, b.transpose(0, 1)) RuntimeError: Tensor for argument #2 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for mm)
Thanks for the video, I see that all the optimization techniques in order to speed up the process of generating the output requires a good GPU, is there any techniques recommanded that fits on the cpu (i don't have a gpu) , thanks
Hi, I want to watch this video but since it is 5 hours long i first need some informations: -Does it requires any paid API subscription? -The models and the code can also run on google colab? -Does it requires any training or it relies on pre-trained models? Thank you!
You can run this on a MacBook, however, you will need to change the device to “mps” (Metal Performance Shaders) rather than “cuda”. For example, `device = “mps” if torch.backends.mps.is_available() else “cpu”`, see more here: pytorch.org/docs/stable/notes/mps.html Also, depending on how much memory you have available will depend on the LLM you can use
so, at around the 5ish hour mark, where we're playing with the base prompt and connecting it to our retrieved context, I am wondering how can I be sure that all that fits my LLM context window? am I missing something here? is the llm context window much greater than for example our previous embedding model that tokenized max 384 tokens per sample (i know its not apples to apples, as the embedding model will embed all the sequences and maybe leave some tokens out on some of them, but in the llm window context we have to make sure we are fitting literally everything in our prompt)?
I built this and was following along great until we got to the step of actually generating text with Local LLMs. I have a GPU with 6gb VRAM and my system 16gb normal RAM. I run models in Ollama just fine or through the Python package for Ollama no issue. However, in this example I tried following along with the Transformers package and hugginface models and my results were just horrible. It was taking minutes and minutes to just generate 256 tokens. Any explanation why? I was using Gemma-2B-it here and quantizing.
Who’s ready to build a local RAG pipeline from scratch? 🔨🔥
PS A big shout out to NVIDIA for sponsoring this video!
Be sure to checkout NVIDIA GTC for the latest developments in AI, deep learning and GPU technology.
It’s running from March 18-21 in San Jose California but is free to attend virtually (what I’m doing): nvda.ws/3GUZygQ
Once again, an awesome video, Daniel. Two comments only, if I may: a. lose the tash :) b. I just got a mac pro and you started coding on a windows :(
Great video, I have been waiting for such video on RAG. Thanks Daniel
Great stuff, thanks for making this available for us noobs! I managed to follow it to the end with my custom source of text (a phpbb forum's posts), although it wasn't very good at answering it did produce some answers.
All text was in norwegian, and the topics and the related answers was concatenated to its each line in a txt file.
I'm sure i could have done this much better but for a first try i was rather satisfied!
Could not have done this without this video tho, so yeah, great stuff!
Any tips on models i can use for scandinavian/norwegian? I tried to search around a bit and after a quick talk with chatgpt (lol) i found the NB-BERT (Norwegian BERT), but did not manage to get it going.
It would be great if you do a video on how to add to the query a file, just like in the Chat GPT. My use case is that, A group of files as a context for the RAG and a query with a document of the RAG to compare the document in the query vs criteria in the content of other documents and evaluate compliance of the document in the query.
This video is a gem. I have recently completed watching your Tensorflow and PyTorch tutorials. Those videos are great in terms of resources. Plz, make a video on fine-tuning LLMs. Looking for more amazing videos from you in the future.
A lot of times I struggle to make it through a 10 or 20 minute video on this stuff. But somehow I can watch 5hours or even 24hours of you. I love how you keep it lite and you let use see your typos. You are a special an unique person and I am glad you do what you do. I really loved your 24 hour pytorch video too.
There is a 25 hour PyTorch video from Daniel if you wanna try.
I'm a follower for quite some time and also a big advocate for open sourcing, knowledge exchange and community building. Really happy to see you consistently push yourself and your work but never loosing site of your core values. Great Job!
Finished the tutorial today and will start building something using this tremendous knowledge. Thankyou so much!!
And a note : We are happy and excited with any Idea that pops into your head about a new educational video.
we need tutorial on different frameworks and building from scratch.
Hey will a m2 Mac work for this?
@@shubhsinghal4364 Possibly no. But you can try and see how it goes.
Thank you so much! I just finished watching this video. As a beginner in ML/RAG, your teaching style is really helpful for entry-level users like me.
I am from Sri Lanka
I must give you my total gratitude for taking your time (5 hours) to explain baby steps such invaluable knowledge.
How do you guys not give a like to this video if you watch it fully? Excellent content as always.
Machine Learning Bootcamps will never exist 'cause of Daniel 🤗
Yeah! Tutorials on these topics are rare, he can make huge amount of money but he teaches as an ideal teacher.
It’s always a good day when Daniel drops one of these tutorials. Love it!
💪
After reading and manually building the local RAG, it took three afternoons, thanks to the author
Finished the whole 5:40min Hours. thank you so much for sharing this video. I have tried the RAG system on some of the Geostatistics books that I have and it comes up with really good results. if there are any advanced topics on improving RAGs please share them. I found that the code sometimes have a hard time understanding difficult concepts and just ignore them. my assumption is the embedding can be improved to include better understanding.
This guy is game changer really...........
Great content i am watching TF course now thanks for your effort to make the learning easy and fun
Almost 6 hours of training but it only took me two day's to get through and build our own RAG model, it was a breeze. Very good training and it really helped me in understanding what RAG is and how to implement it. Thank you very much!
What a GREAT tutorial Daniel!! Truly step-by-step. Thanks for putting this together. I did not have major issues following it through. The decision of using Jupyter notebooks here was key..
Such a great tutorial mate, kept my interest in every minute of this long video. God bless you 🙏
i would love seeing this done for graph rag. Fantatic tutorial❤
Hey daniel, i have seen tensorflow and pytorch code videos and I just loved it and today I have completed this video.i just want to say thank you so much. love your content and your teaching style as well. love from INDIA. good luck
I am baffled by the sheer quality of this content, damn thanks, would love to see a similar tutorial on fine-tuning newer models like llama 3.1 or mistral ones
I am going to be a Graduate Researcher at my University and going to research RAG! I was worried since I was new to RAG until this video! Thanks, Daniel!
So much useful information in one video. Can't be more grateful for this awsome work.
Excellent , I've been watching your videos for a while (the Python in 24 hours took some time to get through!) I like how you explain absolutely everything and don't jump ahead (even if some things may seem obvious) , it may take longer but by the end you get a very good understanding of the topic and all the concepts.
Thank you! Glad you enjoy!
Funny that I happened to find this video, it's gonna be very helpful for a project that I'm working on.
Good timing! Enjoy!
Jesus almost 6 hours! 😳 Gonna have to do a rail of Adderal and dig in. 😂 Seriously though with my ADHD I’ll have to break it down in sections, but I’m excited to watch the content. Thanks for the in-depth video brothers!
Gonna watch this after buying gpu. Love your TF course and ZTM community!
Enjoy legend!
Hi Daniel, Thank you very much. This is concise .I understand this and finished building this RAG.
This tutorial is awesome!!! Thank you, Daniel. You are a fantastic teacher! I look forward to future videos about creating an app with this and optimizing LLMs locally! Super cool.
Holy shit, this was awsome, first time implementing rag, following all steps, just immediately fell in love. Also, I would love to see a deploy using gradio, and adding a "upload file" feature on it. Thanks for sharing your knowledge Daniel!!!
Thanks Daniel, so much to love about this video. Firstly I love this is true Python open-source code and not frameworks chained together. The delivery style works perfectly for me, with occasional offshoots to related, though not required, subjects of interest. Your thoughts and advice along with extensive notes build the perfect all-round package. Please Please do more on LLM optimization techniques you suggested and any other in-depth subject you care to share like embedding.
Finally! Now no more second guessing on what LLM RAG’s are. My boy Daniel got us covered!!
You will definitely know what RAG’s are after this!
Great! instead of following those wrapped functions defined by langchain, using this method could learn the details of RAG.
Loved it daniel . Always admire your work , please add deployment methods , it will help alot
Yes, we want an extension video on how to get this all into an app
Video Stamp 3:36:50
Oh there you are Daniel, always ready to help with complex concepts.
Are you kidding me? how can you be so easy to follow? I don't mean only in the technical part because I guess that not everyone can follow it, I mean in making such a long, code and faceless piece on content so entertaining and useful!
This is a golden video! Thanks a lot for sharing this! :)
Always love your content, and I am going to follow throughout and build my own RAG tooo!
The best tutorial I have ever watched!
So enspiring!
Great work Daniel educate the world 🌎 Keep going 🚴♂️
if i could, i would give this video multiple thumb up! 🥰
as always great stuff . thank you Dan . Based on this , can you make a video or share resources about evaluation methods . and additional one for deployment.
This is great tutorial for RAG! Just amazing!
You are the goat of this planet. Thanks for the video, it's very clear.
It is always good to mention system requirements in the course. this needs GPU to run it
hi @mrdbourke, thanks a lot for the great stuff you provide. if you don't know what to do ;) it would be really cool to get also some insights/tutorial/video on the following topic
- Our example only focuses on text from a PDF, however, we could extend it to include figures and images from a PDF file or any other source
Thanks a lot. Cheers from Austria :)
Harry
Thank you so much for everthing Daniel!
I came from your "Day in a Life as a Startup Founder video". Do you have a video on setting up what you have set up in your business? How to create the infrastructure, deploy models etc., in a business context. Or anything close to that.
Day 1:
42:01
Day 2:
1:31:27
Day 3:
2:22:44
Day 4:
4:03:57
This is what we were waiting for!!
Great content, All the best!!
Enjoy Amit!
Hi Dan, great content and thank you! Is it possible to have another video on the "optional" chat application for a complete end-to-end RAG pipeline? Thanks again!
really cool video daniel. i really liked it!
Hello daniel. i love your content i just want to request you to create video on LLM from scratch and Stabel diffusing image generation from scratch. thank you again for very good leaning resources. good luck
Really very informative , thanks a lot
Very interesting and well made. Thank you!
Thank you so much for this great video. I have a question. Let's suppose that a PDF has a lot of images which include numbers or other information. Which library should we use to retrieve information from those images?"
Alot of insights here. Pls you could also show fine tuning of the LLM for a specific domain instead of embedding. 🎉
Great video ! I will definitively make my own.
Is there other ways to evaluate the LLM responses ? In order to evaluate automatically many answers given by the LLM (as for supervised learning for example) I guess we would have to pay the GPT4 access to the API... Could we use embeddings and cosine again to compare the generated answer to a label ? Or maybe use another local LLM on a bigger hardware ?
Thank you so much !
You are God, There is Daniel and then there is Jesus! Thanks for this!!!!
Hahaha that’s a very big compliment! Thank you for the kind words
This tutorial is just too good
amazing tutorial, thanks for you share!
Great Video Daniel! Thank you Would be a small word. Do you think We can apply same mechanism to build a chatbot on a CSV file or SQL database. How can we handle different columns with different datatypes likes strings for names, integers for ranks, counts,and scores and float for rates and other metrics. How to make an embedding model understand the differences?
Would Love if you can make a video on it.
I see the whole video and I didn't expect that we are using a pretrained model for that 😢 I am watching because your title said from the start , for that we can directly use the model like gpt 2 or other that will work in less then 10 lines of code 😕
gpt 2 cant query the pdf used in this tutorial
@@costa2150 simple we can use any model from the hugging face library to do that In even short code
Big fan!
Amazing content as usual
Binging this asap!!
2:56:28 The power of RTX 4090
one of the best tutorials
Thank you so much for a great video!
I watched all of your deep learning and machine learning content and absolutely loved it!
Always craving for more, can we expect a following video teaching Attention models and transformers or vision-related generative models such as DDPM?
nice tutorial! Anyone else using gemma-2b-it with 4 bit precision and getting answers like 'The context does not provide any information about the best source to fulfill nutritional requirements, so I cannot answer this question from the provided context.'? Maybe a larger llm will help.
hi, the same here on old 8gb nvidia 3060. i tested it on two gpus and the same gemma-2b-it model works smooth and painless on new 64gb ampre
This is such a great and helpful video!!
Thanks for putting together an awesome walk through here. I'm thinking about using this implementation as a base and doing a video tutorial on turning it into an app or website & backing it with ChatGPT(for all the folks without crazy GPU power). Do you have any objections to me using your code for that purpose, assuming it's all properly attributed back to you with links to the channel?
Hey Robert! Feel free to adapt the code for your own purposes + tutorial :)
It should all be available at the GitHub link in the description.
Links back would be really appreciated.
Cheers
@@mrdbourke Much appreciated! I'll let you know what I come up with.
@@mrdbourke Just wanted to say I wrapped up the series. Thanks again, this was a lot of fun to build. ua-cam.com/video/P3k-KRNEL_I/v-deo.html
Awesome content! however I got stuck with the Numpy/Spacey installation, I'm currently getting an error
Thanks soooooo much this is a lifesaver!
Need videos on RAG agents that can handle math also.
Working with clients, I am really seeing the limitations of the RAG framework when it comes to math stuff. It just sucks.
RAG + Graph Agent is worth exploring.
A wonderfull project men, thank you so much ❤
You’re welcome! Have fun!
Hi Danny !!!....Great Job !!!!...Sky is the limit
Amazing video thank you. Can you do something for FAISS
All good, but how sure r u that the LLM being used doesn’t take the privately hosted proprietary data? Can u throw some light?
return torch.mm(a, b.transpose(0, 1))
RuntimeError: Tensor for argument #2 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for mm)
Superb (Recommended)
@mrdbourke Can you make a detailed video on LLM fine tuning?
This is awesome!! Thank you 🙌🏾💜
Enjoy Andy!
Excellent Video!
Simple RAG pipeline ---> 5.5 hours ?!
Build a RAG pipeline in just 267 steps
@@davide0965 Build a *simple* RAG pipeline in just 267 steps
Thanks @mrdbourke, this was really helpfull in breaking down some concepts in LLM.
this was entertaining to watch.
Thanks for the video, I see that all the optimization techniques in order to speed up the process of generating the output requires a good GPU, is there any techniques recommanded that fits on the cpu (i don't have a gpu) , thanks
thank you daniel
Do you have video showing how to great an phone app using LLM from A to Z ?
Hi, please bring in the optimised generation video of llm please
Hi Daniel, thanks a lot for your support! I'm a real beginner (art historian).... where do I put in the code??? THANKS from Vienna
such a good video
Hi, I want to watch this video but since it is 5 hours long i first need some informations:
-Does it requires any paid API subscription?
-The models and the code can also run on google colab?
-Does it requires any training or it relies on pre-trained models?
Thank you!
Should I try to build on GTX 1650Ti will be able to run, or it will crash
Can i run this on my macbook M1 or do i need to purchase a nvidia GPU + windows OS? Can you clarify?
You can run this on a MacBook, however, you will need to change the device to “mps” (Metal Performance Shaders) rather than “cuda”. For example, `device = “mps” if torch.backends.mps.is_available() else “cpu”`, see more here: pytorch.org/docs/stable/notes/mps.html
Also, depending on how much memory you have available will depend on the LLM you can use
Excellent
Which one is better - a MacBook Pro with the M3 Max chip, or a setup with an RTX 4090 for tasks like creating embeddings and building an RAG system?
an RTX 4090
@@shivampradhan6101 thanks, what if the llm requires more than 24GB VRAM to run?
Great tutorial 👏
Thank you!
Well done !
so, at around the 5ish hour mark, where we're playing with the base prompt and connecting it to our retrieved context, I am wondering how can I be sure that all that fits my LLM context window?
am I missing something here? is the llm context window much greater than for example our previous embedding model that tokenized max 384 tokens per sample (i know its not apples to apples, as the embedding model will embed all the sequences and maybe leave some tokens out on some of them, but in the llm window context we have to make sure we are fitting literally everything in our prompt)?
I built this and was following along great until we got to the step of actually generating text with Local LLMs. I have a GPU with 6gb VRAM and my system 16gb normal RAM. I run models in Ollama just fine or through the Python package for Ollama no issue. However, in this example I tried following along with the Transformers package and hugginface models and my results were just horrible. It was taking minutes and minutes to just generate 256 tokens. Any explanation why? I was using Gemma-2B-it here and quantizing.
Is there any practical difference in approach between RAG as described here and RAG for tabular data (CSVs and excel docs)?