Solving Real-World Data Science Problems with LLMs! (Historical Document Analysis)
Вставка
- Опубліковано 8 чер 2024
- In this video we walk through the process of analyzing historical documents using Python & Large Language Models. We start by setting up LLMs using both closed-source (OpenAI API) and open-source (Llama 2 via Ollama) options. Next, we walk through how we can leverage the LLMs to parse out entities from text. After this we actually start playing around with our data, loading in a specific subcategory of documents from Kaggle and see how we can connect pages from the same documents together. Once this is completed, we repeat the entity parsing process for our actual data to get pieces of information such as names, ages, and locations from our documents. Finally we analyze these entities to learn some insights from our document database.
Kaggle Dataset: www.kaggle.com/datasets/keith...
GitHub Repo: github.com/keithgalli/histori...
Project Website: freedmensbureau.info
Contributors:
Abdessalem Boukil (NLP Research & Analysis): / abdessalem-boukil-3792...
Trent Self (Kaggle Dataset Setup): / trentonself
If you enjoyed this project video, make sure to throw it a thumbs up & subscribe! Let me know in the comments if you have any questions. It would also be helpful for people to upvote the Kaggle dataset for visibility!
---------------------------
Video timeline!
0:00 - Video Overview & Reference Material
3:05 - Data & Code Setup
5:04 - Task #0: Configure LLM to use with Python (OpenAI API)
20:10 - Task #0 (continued): LLM Configuration with Open-Source Model (LLama 2 via Ollama)
27:39 - Task #1: Use LLM to Parse Simple Sentence Examples
41:22 - Sub-task #1: Convert string to Python Object
44:29 - Task #1 (continued): Use Open-Source LLM to Parse Sentence Examples w/ LangChain
56:24 - Quick note on a benefit of using LangChain (easily switching between models)
58:06 - Task #2 (warmup): Grab Apprenticeship Agreement rows from Dataframe
1:06:22 - Task #2: Connect Pages that Belong to the Same Documents
1:56:36 - Task #3: Parse out values from merged documents
2:12:44 - Task #4 (setup): Analyze Results
2:17:52 - Fixing up our results from task #3 quickly
2:20:41 - Task #4: Find the average age of apprentices in our merged contract documents
2:30:59 - Other analysis, wlho had the most apprentices?
-------------------------
If you are curious to learn how I make my tutorials, check out this video: • How to Make a High Qua...
Practice your Python Pandas data science skills with problems on StrataScratch!
stratascratch.com/?via=keith
Join the Python Army to get access to perks!
UA-cam - / @keithgalli
Patreon - / keithgalli
*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.
The best part about Keith's videos are, that they are completely RAW, nothing is staged, he'll make mistakes which we always do but also show you how to troubleshoot those, he'll even go on google mid video read documentation, blogs etc to help solve them , which really helps the next time you get the same error and furthermore strengthen the concept you are learning.
Im in the Aviation Industry and your Pandas tutorial taught me how to cut up and arrange excel worksheets. I use that knowledge almost every day to make my life easier. Thank you so much!!!!
Glad you are coming back! Love your videos
You're my inspiration. I am glad you are back.
More machine learning content! This is awesome stuff Keith!
First time subscribe . I am ALL IN!!❤
My big bro am so happy that you came back just like b4.
Always happy to be here :)
Haii brother I 've never forget your video amazingg... Now you create other more more valuable... Amazing...
Excellent tutorial. Thank you for sharing this.
Glad you enjoyed!
first
You're the GOAT.
Curious what kind of computer/laptop you use and also what keyboard you use for the computer.
I'm currently using a Macbook Pro M2 w/ 16gb RAM and 512gb SSD. The keyboard I'm using currently is a logitech K850.
@@KeithGalli thanks! So your setup is a laptop with an external keyboard? That’s how you do these videos/work?
Keith is back!!!!!
😎😎
He's back guys!!
You know it
Quality information
Dude, thank you. You are awesome
You are very welcome! Thanks for kind words.
🎉 amazing content
Thanks bro!!
You know, I saw this movie like 2 weeks ago called Dumb Money and for a few minutes I thought the main character was you; but no, the guy's name was Keith Gill.
Anyway, thank you for your service. You're a real human being.😁
Except he doesn’t look like him at all :)
@@Lnd2345 Yeah, I know. It just took me a couple of minutes to figure it out.
Haha yeah that's not me, but I did get a lot of people thinking we were the same for a short time period when that was all going on xD
you re the best
Hi Keith,
I am getting an error while saving the OpenAI key to the Secrets in Kaggle Notebook.
ERROR: Permission 'kernelSessions.enableInternet' was denied.
Can you help me on this??
Hi, which operating system that you prefer, windows or macos?
why the frequency of your videos is too low ?@keith Galli
Fantastic real world problem as a lot of your other videos. I've got to say that all models on Ollama absolutely stink in comparison to OpenAI. However I have been using a preprocessing text function I created for using in a news article project I'm working on using Spacy. I have been able to pass the transcription_text's through my function with some minor tweaking and have been able to recreate what the LLM's are doing just through code, by using the doc.ents functionality. Only 1:27:00 through the video at the moment and perhaps you use something similar later on, but Spacy has been a bit of a godsend if you don't/can't pay for OpenAI
Yeah Spacy is great for a non-LLM approach to so many NLP tasks. I didn't use it at all in this video because it was focused on LLMs, but I have used it a bunch for personal work in the past. Glad you've been enjoying the video!
🎉🎉🎉🎉🎉🎉🎉