Solving Real-World Data Science Problems with LLMs! (Historical Document Analysis)

Поділитися
Вставка
  • Опубліковано 8 чер 2024
  • In this video we walk through the process of analyzing historical documents using Python & Large Language Models. We start by setting up LLMs using both closed-source (OpenAI API) and open-source (Llama 2 via Ollama) options. Next, we walk through how we can leverage the LLMs to parse out entities from text. After this we actually start playing around with our data, loading in a specific subcategory of documents from Kaggle and see how we can connect pages from the same documents together. Once this is completed, we repeat the entity parsing process for our actual data to get pieces of information such as names, ages, and locations from our documents. Finally we analyze these entities to learn some insights from our document database.
    Kaggle Dataset: www.kaggle.com/datasets/keith...
    GitHub Repo: github.com/keithgalli/histori...
    Project Website: freedmensbureau.info
    Contributors:
    Abdessalem Boukil (NLP Research & Analysis): / abdessalem-boukil-3792...
    Trent Self (Kaggle Dataset Setup): / trentonself
    If you enjoyed this project video, make sure to throw it a thumbs up & subscribe! Let me know in the comments if you have any questions. It would also be helpful for people to upvote the Kaggle dataset for visibility!
    ---------------------------
    Video timeline!
    0:00 - Video Overview & Reference Material
    3:05 - Data & Code Setup
    5:04 - Task #0: Configure LLM to use with Python (OpenAI API)
    20:10 - Task #0 (continued): LLM Configuration with Open-Source Model (LLama 2 via Ollama)
    27:39 - Task #1: Use LLM to Parse Simple Sentence Examples
    41:22 - Sub-task #1: Convert string to Python Object
    44:29 - Task #1 (continued): Use Open-Source LLM to Parse Sentence Examples w/ LangChain
    56:24 - Quick note on a benefit of using LangChain (easily switching between models)
    58:06 - Task #2 (warmup): Grab Apprenticeship Agreement rows from Dataframe
    1:06:22 - Task #2: Connect Pages that Belong to the Same Documents
    1:56:36 - Task #3: Parse out values from merged documents
    2:12:44 - Task #4 (setup): Analyze Results
    2:17:52 - Fixing up our results from task #3 quickly
    2:20:41 - Task #4: Find the average age of apprentices in our merged contract documents
    2:30:59 - Other analysis, wlho had the most apprentices?
    -------------------------
    If you are curious to learn how I make my tutorials, check out this video: • How to Make a High Qua...
    Practice your Python Pandas data science skills with problems on StrataScratch!
    stratascratch.com/?via=keith
    Join the Python Army to get access to perks!
    UA-cam - / @keithgalli
    Patreon - / keithgalli
    *I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

КОМЕНТАРІ • 38

  • @abhishekpatil8599
    @abhishekpatil8599 2 місяці тому +5

    The best part about Keith's videos are, that they are completely RAW, nothing is staged, he'll make mistakes which we always do but also show you how to troubleshoot those, he'll even go on google mid video read documentation, blogs etc to help solve them , which really helps the next time you get the same error and furthermore strengthen the concept you are learning.

  • @dabunnisher29
    @dabunnisher29 2 місяці тому +5

    Im in the Aviation Industry and your Pandas tutorial taught me how to cut up and arrange excel worksheets. I use that knowledge almost every day to make my life easier. Thank you so much!!!!

  • @edsonwinnerify
    @edsonwinnerify 2 місяці тому +1

    Glad you are coming back! Love your videos

  • @lucaskhoo3712
    @lucaskhoo3712 2 місяці тому +1

    You're my inspiration. I am glad you are back.

  • @sushibooshi
    @sushibooshi Місяць тому

    More machine learning content! This is awesome stuff Keith!

  • @nelsonnjikelani4844
    @nelsonnjikelani4844 2 місяці тому

    First time subscribe . I am ALL IN!!❤

  • @muhammadabdulsalam602
    @muhammadabdulsalam602 2 місяці тому +2

    My big bro am so happy that you came back just like b4.

    • @KeithGalli
      @KeithGalli  2 місяці тому

      Always happy to be here :)

  • @bajangsekacang
    @bajangsekacang 2 місяці тому +2

    Haii brother I 've never forget your video amazingg... Now you create other more more valuable... Amazing...

  • @chineduezeofor2481
    @chineduezeofor2481 2 місяці тому +1

    Excellent tutorial. Thank you for sharing this.

  • @KeithGalli
    @KeithGalli  2 місяці тому +7

    first

  • @aryehpaulwalter7520
    @aryehpaulwalter7520 2 місяці тому +1

    You're the GOAT.
    Curious what kind of computer/laptop you use and also what keyboard you use for the computer.

    • @KeithGalli
      @KeithGalli  2 місяці тому

      I'm currently using a Macbook Pro M2 w/ 16gb RAM and 512gb SSD. The keyboard I'm using currently is a logitech K850.

    • @aryehpaulwalter7520
      @aryehpaulwalter7520 2 місяці тому

      @@KeithGalli thanks! So your setup is a laptop with an external keyboard? That’s how you do these videos/work?

  • @gulamgauskhan6933
    @gulamgauskhan6933 2 місяці тому +2

    Keith is back!!!!!

  • @utkarshkapil
    @utkarshkapil 2 місяці тому +2

    He's back guys!!

  • @cocoarecords
    @cocoarecords 2 місяці тому

    Quality information

  • @anthonypriest214
    @anthonypriest214 2 місяці тому +1

    Dude, thank you. You are awesome

    • @KeithGalli
      @KeithGalli  2 місяці тому

      You are very welcome! Thanks for kind words.

  • @wiz8058
    @wiz8058 2 місяці тому +1

    🎉 amazing content

  • @JonR4m
    @JonR4m 2 місяці тому

    You know, I saw this movie like 2 weeks ago called Dumb Money and for a few minutes I thought the main character was you; but no, the guy's name was Keith Gill.
    Anyway, thank you for your service. You're a real human being.😁

    • @Lnd2345
      @Lnd2345 2 місяці тому +1

      Except he doesn’t look like him at all :)

    • @JonR4m
      @JonR4m 2 місяці тому

      @@Lnd2345 Yeah, I know. It just took me a couple of minutes to figure it out.

    • @KeithGalli
      @KeithGalli  2 місяці тому +1

      Haha yeah that's not me, but I did get a lot of people thinking we were the same for a short time period when that was all going on xD

  • @Master_of_Chess_Shorts
    @Master_of_Chess_Shorts 2 місяці тому +1

    you re the best

  • @atharvasawai8309
    @atharvasawai8309 2 місяці тому +1

    Hi Keith,
    I am getting an error while saving the OpenAI key to the Secrets in Kaggle Notebook.
    ERROR: Permission 'kernelSessions.enableInternet' was denied.
    Can you help me on this??

  • @ucphattruong4341
    @ucphattruong4341 2 місяці тому

    Hi, which operating system that you prefer, windows or macos?

  • @venugopal-nc3nz
    @venugopal-nc3nz 2 місяці тому +1

    why the frequency of your videos is too low ?@keith Galli

  • @MaxwellSmi41483
    @MaxwellSmi41483 2 місяці тому +1

    Fantastic real world problem as a lot of your other videos. I've got to say that all models on Ollama absolutely stink in comparison to OpenAI. However I have been using a preprocessing text function I created for using in a news article project I'm working on using Spacy. I have been able to pass the transcription_text's through my function with some minor tweaking and have been able to recreate what the LLM's are doing just through code, by using the doc.ents functionality. Only 1:27:00 through the video at the moment and perhaps you use something similar later on, but Spacy has been a bit of a godsend if you don't/can't pay for OpenAI

    • @KeithGalli
      @KeithGalli  2 місяці тому +1

      Yeah Spacy is great for a non-LLM approach to so many NLP tasks. I didn't use it at all in this video because it was focused on LLMs, but I have used it a bunch for personal work in the past. Glad you've been enjoying the video!

  • @sebastianalvarez1537
    @sebastianalvarez1537 2 місяці тому

  • @Intellectualmind4
    @Intellectualmind4 2 місяці тому +1

    🎉🎉🎉🎉🎉🎉🎉