Solving Real-World Data Science Problems with Python! (Predicting Healthcare Insurance Costs)

Поділитися
Вставка
  • Опубліковано 14 січ 2025

КОМЕНТАРІ • 43

  • @t-zero8880
    @t-zero8880 3 місяці тому +1

    This is my 2nd project im doing along with you. Looking forward to the next one! Looks fun so far

  • @Website_TV_1
    @Website_TV_1 3 місяці тому +1

    Wow, this is an incredible dive into solving real-world data science problems! 🔥 Loved how you broke down complex concepts like regression so clearly at 0:47, and the step-by-step coding walkthrough at 2:29 was spot on. 💻 Task #4 (41:32) on fitting a linear regression model with sklearn was especially helpful! Great work on making data science approachable for everyone. Looking forward to seeing more content like this! 👍 #DataScience #Python #RegressionModeling

  • @ameenshareef1709
    @ameenshareef1709 3 місяці тому +1

    Thank you so much for these uploads, hope you continue them in the future

  • @MrSagar94
    @MrSagar94 2 місяці тому +1

    Please do more of these, this is a perfect way to learn ML. Please please please do one for each algorithm

  • @simondechoisy779
    @simondechoisy779 3 місяці тому

    Keith loved this. Very useful for refreshing on linear regression.

  • @gabrieltorres289
    @gabrieltorres289 2 місяці тому

    Please do more of those! So helpful!

  • @mkgeidam
    @mkgeidam 3 місяці тому

    Thank you so much for the real-time project, it's helpful

  • @manuelatienzo9764
    @manuelatienzo9764 3 місяці тому

    Awesome, was waiting for It!

  • @lisitashamatutu1140
    @lisitashamatutu1140 11 днів тому

    Thanks @Keith

  • @fantasytalker
    @fantasytalker 3 місяці тому

    thank you so much for this tutorial!

  • @Optimus_Gaming07
    @Optimus_Gaming07 3 місяці тому

    Loved it🎉. 13 children moment was awesome 😂

  • @franciscoortega104
    @franciscoortega104 3 місяці тому +1

    Thanks keith ❤ !!

  • @rogueknight2414
    @rogueknight2414 3 місяці тому +1

    @ 20:52 and 21:21 there's a null value in charges. I checked the raw csv and found some '$nan' entries which didn't get dropped coz we first did .dropna, and then .strip(), I think?

    • @KeithGalli
      @KeithGalli  3 місяці тому +1

      Yeah retroactively looking at what happened, those entries didn't get dropped when we did our first dropna(). And then when we stripped the '$' and converted to a float type, they became a new null value. We handle this later in the video with an additional dropna(). Good catch!

    • @rogueknight2414
      @rogueknight2414 3 місяці тому

      @@KeithGalli yes! this walkthrough was insightful. Thank you for the content.

  • @SalehGoodarzian
    @SalehGoodarzian 3 місяці тому

    Always helpful and education. I have a folder in my laptop with your name which contains the things I have learned from you. Just a quick question, how do you record your desktop?

  • @TJ-pt8ei
    @TJ-pt8ei 3 місяці тому +1

    Great video.
    I curious to what presenting a model to stakeholders looks like.
    I can’t seem to find that

    • @KeithGalli
      @KeithGalli  3 місяці тому +1

      It varies, but I often like to spin up either a Streamlit or Shiny app that is connected to my model and can show what the model outputs for different input values. Stakeholders often like this because they can interactively understand the types of values the model produces.

    • @TJ-pt8ei
      @TJ-pt8ei 3 місяці тому

      @@KeithGalli That’s great. Thank you.

  • @venkateshkannan7398
    @venkateshkannan7398 3 місяці тому +1

    Hi Keith thank you for the detailed walk through. One question please, in real life how are these models maintained and run each month. For example, in my company if I'm running a linear regression on a similar monthly data, should I just run in Jupyter notebook linked to Git. Please share any best practices thanks again!

    • @asfasdfsd8476
      @asfasdfsd8476 3 місяці тому

      Get task scheduler like apache airflow. Enqueue tasks that do calculations, and dump the results into database on schedule. Wake up on Monday and retrieve already ready data from your database.

  • @praveenm3414
    @praveenm3414 3 місяці тому

    Hi, it was very nice explanation with a real world dataset application,
    I appreciate your effort, very clear programming skill and thinking
    about affordable medical charges for the population of the United States,
    congratulations good job done with help Regression Model Analysis of Machine Learning
    I Like It
    🥰👍

  • @Sawatzpa
    @Sawatzpa Місяць тому

    Hey Keith,
    great Video as always. I have 2 Questions:
    First: Are Juptyter Notebooks used alot in a professional Setting, especially for Problems envolving creating Models?
    Second: Why did you only Dummie Encode 3 of the Regions? Is there any advantage to exclude one of them or is this just a efficency thing?

  • @lecturesfromleeds614
    @lecturesfromleeds614 13 днів тому

    I live in Leeds 🇬🇧 I'm glad that this isn't a real word problem for me

  • @us6ey
    @us6ey Місяць тому

    So in order to follow along we have to purchase a subscription to the website right? Because you chose a premium user only data set?

    • @KeithGalli
      @KeithGalli  Місяць тому +2

      Nope! I added the data to my Github and linked that in the description :)

    • @us6ey
      @us6ey Місяць тому

      @ oh my bad I didn’t see that, thank you!!

  • @odiseo-l1o
    @odiseo-l1o 2 місяці тому

    24:06

  • @ДмитрийКолышницын-с2л

    so , cool!

  • @superfreiheit1
    @superfreiheit1 3 місяці тому

    would nice to see how to make it available on html website

  • @husan_ismoilov
    @husan_ismoilov 3 місяці тому +1

    Watched till the end.
    I am curious, why didn't you use chatgpt? Also do we have to create pipelines all the time?

  • @trippstreehouse
    @trippstreehouse 3 місяці тому +1

    n-dimensional hyperplane*

    • @KeithGalli
      @KeithGalli  3 місяці тому

      Thank you for the correction! You're right, 'n-dimensional hyperplane' is the proper terminology I should have used when describing fitting a linear regression model in a space with more than two dimensions 🙂

  • @sebastianalvarez1537
    @sebastianalvarez1537 3 місяці тому

    love u

  • @Kurtosis3
    @Kurtosis3 3 місяці тому +4

    why is it that every "data scientist" does only have rudimentary statistics and econometrics knowledge? The model, that you are building is highly biased. You're not even checking for heteroscedasticity?

    • @asfasdfsd8476
      @asfasdfsd8476 3 місяці тому +2

      Bro this is clearly a beginner video made to get people started. Heteroscedasticity would require another 30 minutes explaining.

    • @Kurtosis3
      @Kurtosis3 3 місяці тому +3

      then what is the intention of this video? teaching people how to build biased models? there should be at least a mention about the possibility of biasness.

    • @mohanma4270
      @mohanma4270 2 місяці тому +2

      ​@@Kurtosis3- Can you pls share your teaching videos?

    • @mansooralt
      @mansooralt 23 дні тому

      ​@@Kurtosis3 you are right brother. What's the point of throwing Biased data analysis at people..
      Isnt that CNN's Job?