Credit Scoring Project using Machine Learning | Risk Modelling | Logistic Regression | ML Project#1

Поділитися
Вставка
  • Опубліковано 3 лип 2024
  • 🔥 In this video, we shall build a Credit Scoring Model for a Bank, enabling them to make data-driven lending decisions. We shall use Logistic Regression classifier to build our model and decile methodology to formulate a lending strategy.
    We are Skillcate!! And we are on a mission to bring you application based machine learning education. We launch new machine learning projects every month. So, make sure to subscribe to our channel to get access to all of our ML projects.
    At Skillcate, our endeavour is to take you through the end-to-end journey of solving real business problems, of which coding is just one part. So, even if you have little to no experience of coding, don’t worry. As part of this free course, we are giving you a free Do-It-Yourself toolkit, having ready-to-use python code for your hands-on learning and reuse.
    Enjoy data science!
    🔥 Course Sections
    00:00 Introduction
    03:19 Client's Business Case
    05:22 Solutioning Intuition
    06:11 ML Model Building
    13:34 Decile Methodology
    22:21 Solution Delivery to Client
    🔥 Resources
    Google Drive: drive.google.com/drive/folder...
    Data Normalization: developers.google.com/machine...
    Credit Scoring: www.investopedia.com/terms/c/...
    ROC Curve: towardsdatascience.com/unders...
    Do like, share & subscribe to our channel.
    🔥 Keep in touch
    Website: www.skillcate.com
    LinkedIn: / 67209084
    Email: skillcate@gmail.com

КОМЕНТАРІ • 46

  • @skillcate
    @skillcate  Рік тому +2

    Hey guys!! Glad to see such amazing feedback on this ML Project🤗 Need your support in reaching out to more learners by subscribing to my channel 🙂 Also, join me on my Skillcate Discord Server: discord.gg/GyMBfD4ER5 🙂 Let's talk Machine Learning ❤❤

  • @suattuncer8727
    @suattuncer8727 11 місяців тому +2

    Just for those who don't know about scaling: In the video, the standard scaling method is used, which scales values based on the data's variation itself. On the other hand, scaling between 0 and 1 is Min/Max scaling, where the maximum value is set to 1 and the minimum value to 0. It's important to note that these boundaries are determined by all your data, not on a column-by-column basis.

  • @MayankDayal1234
    @MayankDayal1234 Рік тому

    Good stuff.. thanks for sharing.

  • @luizfelipevercosa
    @luizfelipevercosa Рік тому

    Thanks a lot! I loved the threshold analysis!

    • @skillcate
      @skillcate  Рік тому +1

      Glad you liked it Luiz 😊

  • @xolanitarence8853
    @xolanitarence8853 Рік тому +1

    Wow it gives a good exposure

  • @user-qq8bc8km7e
    @user-qq8bc8km7e 11 місяців тому

    Thanks for the awsome explanation! I still couldn't understand, how you come up with the decision of taking 79.73% for keeping profitability & expansion in mind? why not 72.45% for exemple?

  • @shahwaheedullah4799
    @shahwaheedullah4799 2 роки тому +1

    Hi Sir, thanks for sharing this video, a-lots of knowledge and information.
    But Sir how we can use financial ratios dateset of industries in the logistic regression for predicting credit and investment risk for financial institutions??? Please share some videos/links to learn!
    Thanks & Regards

    • @skillcate
      @skillcate  2 роки тому +1

      Dear Waheed, thank you for your comment & apologies for the delay in response.
      It's really an excellent question. Financial ratios are a solid proxy for gauging credit worthiness of FIs. The key here would be to get/build a good labelled dataset on historical numbers for achieving this. From a ML standpoint, process essentially remains the same, with a bunch of independent variables (Financial Ratios in this case + other features) and the dependent variable Good / Bad.
      Hope this answers your query :-)

  • @NEWLIFESTYLE3
    @NEWLIFESTYLE3 8 місяців тому

    Hi, please, I have a small problem with my dataset. When I concatenate the 'prob_0' and 'prob_1' columns with the 'Actual_target' (which is my 'y_test' values), the 'Actual_target' column contains NaN values. Why is that?

    • @kevinmugo2662
      @kevinmugo2662 4 місяці тому +1

      When declaring x and y , you left (.values) at the end.

  • @rolfjohansen5376
    @rolfjohansen5376 Рік тому +1

    I sort was expected something more after the logistic regression , something like an advanced ML technique like gradient boosting, Neural nets, etc ...

    • @skillcate
      @skillcate  Рік тому +2

      Hey Rolf, thanks for the feedback. On Neural Network specifically, I have done couple of projects: one on Movie Review Sentiment Analysis using LSTM: ua-cam.com/video/oWo9SNcyxlI/v-deo.html, and second, Age-Gender-Emotion Detection using CNN: ua-cam.com/video/uovo1s1barU/v-deo.html
      If your request is specific to Credit Scoring, I am making a note to do an advance approach on this project in coming days. Happy learning to you :)

  • @rahulshukla7380
    @rahulshukla7380 Рік тому +1

    Could you please help me with how you created the pivot table and added all the column fields to it

    • @skillcate
      @skillcate  Рік тому +1

      Dear Rahul, hope you are well!
      Were you able to access our Skillcate Toolkit, link to which is shared in the Video Description? Here we have kept this Datasheet for your easy reference on the Excel Formulas used.
      Anyways, you may drop me an email on skillcate@gmail.com and I may take a quick 1:1 session with you (no charges, of course), where we may prepare this sheet together :)
      Toolkit link: Link: drive.google.com/drive/folders/1Rz7w2FuC9CegZ-o6SeJoBivj08iaLGer?usp=sharing

  • @igorgomez1002
    @igorgomez1002 Рік тому

    Hi, did you change the Dataset? Because de IDs you are using in the example don't appear in the Drive's dataset (for example, the ID 66, first row). And the results I get are totally different

    • @skillcate
      @skillcate  Рік тому

      Dear Igor, just checked the code myself. The dataset is unchanged. For Train-Test-Split function, I have added a new parameters "stratify=y" to balance out good-bad loans across train and test set. With this, I'm getting even better accuracy of 83.3%.
      Do try it out once yourself. I'm sure your results will improve.

  • @mathslearningmadefun7226
    @mathslearningmadefun7226 2 роки тому

    It was again a great video especially the last part i.e., Decile concept.
    I have a doubt regarding how to find the 'Target (predicted)' value for a new observation, other than what is already present in train and test data. Could you please help me with this?

    • @skillcate
      @skillcate  2 роки тому

      Thanks for your feedback! We have added Prediction code and supporting files in the Credit Scoring Toolkit folder: drive.google.com/drive/folders/1Rz7w2FuC9CegZ-o6SeJoBivj08iaLGer.
      Here's a crisp Read Me document on using the new files: docs.google.com/document/d/14FJ137z-zdFSmatOZa1etE34_n0-i1oqOmMXGTqdJws/edit?usp=sharing.
      With this new prediction file, you may learn how to predict Probability_of_Good for new loan applications. And once you have the probability values, you may check for decile specific Cut-off Probabilities to figure out whether to Approve or Reject a loan.
      Happy Learning! :)

    • @mathslearningmadefun7226
      @mathslearningmadefun7226 2 роки тому

      @@skillcate Thank You for making the extra effort in uploading new files to drive. It helped me a lot.

    • @skillcate
      @skillcate  2 роки тому

      @@mathslearningmadefun7226 Happy to help :) Do like, share and subscribe if you like our content. This helps us reach to more folks like yourself! Keep learning!!

  • @vishwav5753
    @vishwav5753 2 роки тому +1

    can u please provide feature selection algorithm for this credit scoring project please bro..

    • @skillcate
      @skillcate  2 роки тому +1

      Dear Vishwa, that's a great question.
      In logistic regression, you may check for multi-collinearity among input features. VIF or Variance Inflation Factor, is a proxy to gauge multi-collinearity. In simple words, Multicollinearity describes the state where the independent variables used in a study exhibit a strong relationship with each other.
      Sharing this comprehensive article that has the details and code snippets for you to implement this functionality into your Credit Scoring project: towardsdatascience.com/targeting-multicollinearity-with-python-3bd3b4088d0b
      Keep learning!! :)

  • @karthikharisamy5367
    @karthikharisamy5367 2 роки тому +1

    Can you share how to build a credit risk model end to end under IFRS 9 Regulations. It will be of great help if you can share any links as well. Thanks

    • @skillcate
      @skillcate  2 роки тому

      Hi Karthik! Really appreciate your feedback. Please reach out to us at skillcate@gmail.com with details.
      Thank you!!

  • @saurabhmeshram7315
    @saurabhmeshram7315 Рік тому +1

    how you calculated profit to business values?

    • @skillcate
      @skillcate  Рік тому

      Hey Saurabh, we used this formula for each decile: (Cumm Good * Profit from a good loan) - (Cumm Bad * Loss from a bad loan)

  • @JoanaOdtojan
    @JoanaOdtojan Рік тому

    Hi, wondering how do I apply this for financial crime scoring?

    • @skillcate
      @skillcate  Рік тому

      Dear Joana, I'm actually planning to do a project on Credit Card Fraud Detection by next week. Stay tuned for that :)

  • @mansimishra7089
    @mansimishra7089 Рік тому +1

    Where are you collecting this dataset please let me know? Thankyou!!

    • @skillcate
      @skillcate  Рік тому

      Hi Mansi. As an example this specific dataset we have got from UCI repository: archive.ics.uci.edu/ml/index.php. However, there are plenty other credit scoring dataset available where this approach is applicable as well: github.com/JLZml/Credit-Scoring-Data-Sets.

  • @omdivyatej3818
    @omdivyatej3818 Рік тому +1

    What is "No.of trade lines Unes 60 days worse or ever" in the feature part? What is Unes here?

    • @omdivyatej3818
      @omdivyatej3818 Рік тому

      And what is pct

    • @skillcate
      @skillcate  Рік тому

      Dear Om, hope you are all well. Sharing clarifications here:
      1. Trade lines are essentially the credit lines you have availed, ex: you may have a credit card + an education loan, then you have 2 trade lines open
      2. For Credit Scoring problem, timeline of Trade Lines is important as well, ex: you have recently been opening a lot many trade lines all of a sudden is a Red-flag
      3. Read 'Unes' as Lines. There's a typo there :-)
      4. 'Pct' is percentage. No. of Trade Lines 50pct utilised means, means the count of Credit Lines you have consumed over 50%
      5. 'No. of trade lines 60 days or worse ever' is the count of all active/inactive credit lines in last 60 days + the credit lines that were active prior to 60 days where you defaulted

  • @retiredman5722
    @retiredman5722 2 роки тому +1

    You need to show that 20% validation sample result based on your 80% sample scorecard. right?

    • @skillcate
      @skillcate  2 роки тому

      Hey buddy!! Thank you for your comment & apologies for the delay in response.
      That's correct!! 20% is basically unseen data, that was kept separate from the remaining 80% training data used for Model Building. So, this 20% is the one we use for generating labels from our model + formulating lending strategy..

  • @kevinmugo2662
    @kevinmugo2662 4 місяці тому

    Kindly , someome help how the decile formula was applied for all cells

    • @nelya.kulch11
      @nelya.kulch11 2 місяці тому

      it's shown in the video: in cell 61 we input formula: first row cell +1. And the stretch this formula down/ or copy this formula to all cells below / or double click the cell

  • @user-mn8qu8lv6v
    @user-mn8qu8lv6v 11 місяців тому

    could not convert string to float: '$2,327' Getting this Error Please Help Out

    • @nelya.kulch11
      @nelya.kulch11 2 місяці тому

      Maybe because if $ sign, try to delete it before

  • @vinayaktalukder2381
    @vinayaktalukder2381 Рік тому

    Hello Sir, I am getting this message " FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/1_LiveProjects/Project1_Credit_Scoring/a_Dataset_CreditScoring.xlsx' " Could you please help me out

    • @skillcate
      @skillcate  Рік тому

      Hi Vinayak, Can you check if
      1. if your google drive is mounted (check Cell 2 in the notebook)?
      2. If yes to above, check if "a_Dataset_CreditScoring.xlsx" file is present at by browsing to the location /content/drive/My Drive/1_LiveProjects/Project1_Credit_Scoring/? If not make sure to copy file at the location.

  • @pratipadakhatode1146
    @pratipadakhatode1146 Рік тому +1

    hello sir I have query with source code that you provided how can I contact you

    • @skillcate
      @skillcate  Рік тому

      Hi Pratipada, Sure. Reach out to us at skillcate[AT]gmail.com for any query.