Loan Prediction Analysis (Classification) | Machine Learning | Python

Поділитися
Вставка
  • Опубліковано 14 жов 2024

КОМЕНТАРІ • 176

  • @HackersRealm
    @HackersRealm  3 роки тому +6

    Small correction viewers, I mentioned distribution of left and right skew graph in opposite manner. To avoid error while converting to log values add +1 to the column. I have updated the notebook in the github. Enjoy the rest of the video!!!

  • @vamsichittoor1974
    @vamsichittoor1974 3 роки тому +10

    I have just started learning Machine Learning and I understood every bit you explained and done one project on my own similar to this .Really great explanation. I would like to know how to master Machine Learning. I am not student of CSE I am learning this on my own interest

    • @HackersRealm
      @HackersRealm  3 роки тому +1

      Glad it was helpful!!! kudos to you learning with your own interest. Try to pick a mini project in some domain and solve it. That's a quick way to understand and learn...

  • @Umeshwarvaithy
    @Umeshwarvaithy 4 роки тому +2

    Super bro while started learning ML I found your channel and started my learning and progress doing the project thanks for your interest and effort

    • @HackersRealm
      @HackersRealm  4 роки тому

      Hope the videos are useful to you!!! Thanks for watching and please share it for better reach. Thank you!!!

  • @Kalyan1143
    @Kalyan1143 8 місяців тому +1

    Finally the final output is wt?
    I mean loan eligible yes or no?

    • @HackersRealm
      @HackersRealm  8 місяців тому

      for the test data, we are predicting from the model and calculating the score of how well it's predicting

  • @chiragparmar3678
    @chiragparmar3678 3 роки тому +1

    Bro u explained much much better than edureka I swear bro thanks!

  • @anuragupadhayay8405
    @anuragupadhayay8405 Рік тому +2

    ValueError: could not convert string to float: 'Male' WHEN I AM USING THE COUNTPLOT IT KEEP SHOWING THIS

    • @nitisht4040
      @nitisht4040 10 місяців тому +1

      bro did you get solution, if yes please help me out

  • @rakeshnargund570
    @rakeshnargund570 2 роки тому

    Hi.. well explained. i have one question ...... why you did not drop "ApplicantIncome" even though you combined with "CoapplicantIncome" and created "Totalincome"...??

  • @gouthamkarakavalasa4267
    @gouthamkarakavalasa4267 3 роки тому +4

    Bro, it looks like at 17:08, u applied logit for coapplicant income, but u viz graph for applicant income, ... In the co applicant income, logit function is throwing a error as it contains zeroes.. Request to pls advice on this issue.

    • @HackersRealm
      @HackersRealm  3 роки тому

      you can add +1 to the data column, it will resolve the issue

    • @mitali3j
      @mitali3j 3 роки тому

      At which level does 1 needs to be added?

    • @HackersRealm
      @HackersRealm  3 роки тому

      @@mitali3j you can add 1 when you see some 0 values, or you can use it generally, there won't be much change in log values

  • @MrJeffoneal
    @MrJeffoneal Рік тому +1

    Thank you! Very insightful and thorough explanations.

  • @mokshsharma6943
    @mokshsharma6943 2 роки тому +1

    in Explanatory data analysis section of video, how to use for loop for sns.countplot() ?

    • @HackersRealm
      @HackersRealm  2 роки тому

      You can store it in a variable and use the subplot to show multiple shots

  • @dr.mahaboobbasha1074
    @dr.mahaboobbasha1074 Рік тому

    Sir..we normalised data of income of applicants and coapplicant and where it is impacting on analysis

    • @HackersRealm
      @HackersRealm  Рік тому

      It will impact on the model training and testing... but those comparison is not covered in the video

  • @muhammedfaizals4427
    @muhammedfaizals4427 Рік тому +1

    for i in ['LoanAmount','Loan_Amount_Term','Credit_History']:
    tr_data[i] = tr_data[i].fillna(tr_data[i].mean())
    we can use this instead filling everything seperately

  • @abhiavasthi624
    @abhiavasthi624 3 роки тому +3

    i have seen that you respond to comments so i would just like to ask you,
    what changes do i have to make if my training and testing dataset are in different files already?
    for example in a kaggle project where the training and testing data are in different files, what changes in the code will i have to make?

    • @HackersRealm
      @HackersRealm  3 роки тому

      For training, don't split the data, train with the whole data. After that preprocess the test data similar to train and try to predict it. You can also see the video for how to predict test data in the playlist

    • @abhiavasthi624
      @abhiavasthi624 3 роки тому +1

      @@HackersRealm thanks so much man, respect your timely response.
      what i did is i skipped split part and simply preprocessed the test data as well and the used
      y_test = model.predict(x_test)
      for the prediction
      but for this case we can't check the score and all right?
      since i didn't see the loan_status column in the test data.

    • @HackersRealm
      @HackersRealm  3 роки тому

      @@abhiavasthi624 yes, that's right, you only get the output results

  • @sidharth_mohanty
    @sidharth_mohanty 3 роки тому +2

    I m unable to apply correction matrix on categorical data before label encoding.
    How did you do that ?

    • @HackersRealm
      @HackersRealm  3 роки тому +1

      correlation matrix can be calculated with numbers only, not with strings.

  • @pkmisra769
    @pkmisra769 2 роки тому +1

    Very nice video. Best thing is your response to people's queries (unlike others). Great Job. I have 1 suggestion. If you could also cover how to deploy this model somewhere (with fresh data coming in and how model throws output). That would be amazing. Thanks.

    • @HackersRealm
      @HackersRealm  2 роки тому +1

      Thank you very much. In this video, I have explained the process for deployment ua-cam.com/video/2LqrfEzuIMk/v-deo.html

  • @funnybunnies3985
    @funnybunnies3985 3 роки тому +1

    why are you using log transformation? you can normalise the data?

    • @HackersRealm
      @HackersRealm  3 роки тому

      you can use any preprocessing approach. It's no issue, try to test & see how it works

  • @snehacookie4138
    @snehacookie4138 2 роки тому +1

    Is this project can be done for final year project is this good topic to do

    • @HackersRealm
      @HackersRealm  2 роки тому

      yeah many people have done this as final year project

    • @snehacookie4138
      @snehacookie4138 2 роки тому

      @@HackersRealm tq u
      Like this itself we can present ryt

    • @HackersRealm
      @HackersRealm  2 роки тому +1

      @@snehacookie4138 yes

    • @snehacookie4138
      @snehacookie4138 2 роки тому

      @@HackersRealm bro is this project good for jobs when u put in resume is this good for getting selected in a company pls say bro

    • @HackersRealm
      @HackersRealm  2 роки тому

      @@snehacookie4138 Well that completely depends on the recruiter, but students said they used for resume

  • @SanyAnnieJohn
    @SanyAnnieJohn 3 роки тому +1

    Hi Sir, Logistic regression gave the best score, then why chose Random forest for hypertuning?

  • @akshaykrishnan7985
    @akshaykrishnan7985 3 роки тому +1

    Hi Ashwin. Could you please upload videos on model deployment with flask using heroku?

    • @HackersRealm
      @HackersRealm  3 роки тому +1

      Hello, deployment of models, I will cover in later videos for sure, now just covering the basic concepts for better understanding!!!

    • @akshaykrishnan7985
      @akshaykrishnan7985 3 роки тому +1

      Thanks a lot 😊

  • @snehamagadum1342
    @snehamagadum1342 2 роки тому

    Sor I did not get the conclusion of this project, After the heat map , How can we tell the loan is approved or not?

    • @HackersRealm
      @HackersRealm  2 роки тому

      the model training and results, section you're asking?

  • @michaelk765
    @michaelk765 3 роки тому +1

    Great explanation of your model building. Thank you!

  • @SovannLy-h3s
    @SovannLy-h3s 9 місяців тому

    Hello Sir
    I followed your codes, arrival at section ' Exploratory Data'. I replaced the missing values ' df['Gender']=df['Gender'].fillna(df['Gender'].mode()[0])
    the line of codes below
    sns.countplot(df['Gender'])
    the result
    ValueError: could not convert string to float: 'Male'
    could you please advise me, to correct the codes.
    Thank you

    • @HackersRealm
      @HackersRealm  9 місяців тому

      try this, sns.countplot(x='Gender', data=df)... It's due to update in seaborn package.

  • @rodsdesignestudio
    @rodsdesignestudio Рік тому

    hi, thanks for the vids but i want ask: why u did use LabelEncoder to the input values (['Gender',"Married","Education",'Self_Employed',"Property_Area","Loan_Status","Dependents"])? thx

    • @HackersRealm
      @HackersRealm  Рік тому

      we have to convert string to numeric values so model can accept the input. label encoder is one of the technique

    • @afserali450
      @afserali450 Рік тому

      @@HackersRealm how to convert male in gender column to float

    • @HackersRealm
      @HackersRealm  Рік тому

      @@afserali450 In video, I used label encoder or one hot encoder to do that.. You can use whichever method that is feasible

  • @anilsailakhinana94
    @anilsailakhinana94 3 роки тому +1

    I'm subscribed ur channel for this clear explanation 👍 it was so helpful

  • @rameshkannan1075
    @rameshkannan1075 3 роки тому

    Can u explain the credit history in data mentioned 0 and 1. Can u post video or tutorial link how cibil data are analysed to get credit history values

    • @HackersRealm
      @HackersRealm  3 роки тому

      If the person has credit history, it's 1 or else its 0. I will try analysing cibil data if possible

    • @rameshkannan1075
      @rameshkannan1075 3 роки тому

      @@HackersRealm I need to know there will be n no of customers. These customers cibil how to extract to single excel file. Then based on past repayment we can decide the probability of default.

  • @shellm1447
    @shellm1447 3 роки тому

    Have you also covered hmeq dataset for loan default prediction

  • @SanyAnnieJohn
    @SanyAnnieJohn 3 роки тому +1

    Hi Sir, When I am plotting for Gender, why my x axis not giving the labels, as Male and Female. Instead it is displaying 0 and 1

    • @HackersRealm
      @HackersRealm  3 роки тому

      If you have done some transformation on that column, it will show like that

    • @SanyAnnieJohn
      @SanyAnnieJohn 3 роки тому +1

      @@HackersRealm Thanku, got it....

  • @oushnik
    @oushnik 3 роки тому

    Can I segregate and train the model instead of using log function? Or else It's necessary to use Log function in this whole project. And 1 more confusion as I'm new so what is the agenda of this whole project? I know it sounds like silly but please explain me.

    • @HackersRealm
      @HackersRealm  3 роки тому

      We are trying to predict whether a person can get loan or not from the bank. And log transformation is not compulsory, you can use other methods

    • @oushnik
      @oushnik 3 роки тому

      @@HackersRealm hmm so I used the same as previous then it's ok...another thing why feature scaling is not working here???
      I'm getting error like this
      "TypeError: float() argument must be a string or a number, not 'StandardScaler'"

  • @dr.mahaboobbasha1074
    @dr.mahaboobbasha1074 Рік тому

    Sir..will it possible to get the python code..of this and other videos

    • @HackersRealm
      @HackersRealm  Рік тому

      It's available in the github repo, link in the description

  • @afreen2806
    @afreen2806 2 роки тому

    except for logistic regression, all other models accuracy and cross-validation is changing if I run it more than once. Can u explain y?

    • @HackersRealm
      @HackersRealm  2 роки тому

      you can set random state inorder to get same results for rerunning

  • @diff008
    @diff008 Рік тому

    while plotting countplot keep Value Error: getting could not convert to float " Any idea why . Data set was downloaded from your kaggle link. No changes ( although looks like the file names have now changed.)

    • @HackersRealm
      @HackersRealm  Рік тому

      try to check the values you're plotting, that may be the issue.

  • @sakshituteja3841
    @sakshituteja3841 4 роки тому

    This video is a great explanation of this project. I have just one doubt. From where I took the data set, Test data has a separate file of around 350 observations. How do I make use of that ?

    • @HackersRealm
      @HackersRealm  4 роки тому +1

      Glad you liked this video!!! You can use the test data to predict the output and submit it, if there is a competition. For practice, there won't be much use to it.

    • @PravinKumar-zc2eq
      @PravinKumar-zc2eq 4 роки тому

      @@HackersRealm how to do it??

  • @MEGAMINDLIVE
    @MEGAMINDLIVE 3 роки тому +1

    14:38 you are saying distribution is left skewed but its right skewed.

    • @HackersRealm
      @HackersRealm  3 роки тому

      Sorry, I mispronounced the skewed data

  • @nathanthadmalla9268
    @nathanthadmalla9268 3 роки тому

    where can v get the main dataset the link isleading to only the train and testing dataset where can the get the first dataset tha u have entered in your video

    • @HackersRealm
      @HackersRealm  3 роки тому

      that is the train data. you can use that

  • @DhirajKrGupta-ke7xn
    @DhirajKrGupta-ke7xn 2 роки тому

    What tech skills you learnt from the project
    • Why did you pick that domain?
    • Where can we use your tech skills / software’s learnt during project
    • Reason for working on that project
    Sir Please Help me for Interview preparation

  • @kartiksolanki9390
    @kartiksolanki9390 Рік тому +2

    Very helpful

  • @shellm1447
    @shellm1447 3 роки тому +1

    Amazing explanation

  • @ranjangowda9878
    @ranjangowda9878 3 роки тому

    Hello,
    Can use this project as my mini project.??

  • @sodiqrafiu9072
    @sodiqrafiu9072 4 роки тому +2

    Please, come up with more projects

  • @vedgadge8659
    @vedgadge8659 2 роки тому

    At 40:36 dependents is already in numeric form why does it require label encoding?

    • @HackersRealm
      @HackersRealm  2 роки тому +1

      yes, we don't need to include that

    • @vedgadge8659
      @vedgadge8659 2 роки тому

      @@HackersRealm hey man I tried that but if we don't include dependents it gives and error while classifying. It is the same error as in the video
      ValueError: could not convert string to float:'3+'. I'm not understanding this

    • @HackersRealm
      @HackersRealm  2 роки тому

      @@vedgadge8659 Oh yeah, i forgot that, it represents as string, that's y i used label encoder. but you can remove that + and convert that string to integer

    • @vedgadge8659
      @vedgadge8659 2 роки тому +1

      @@HackersRealm okay sure I'll try thanks man

  • @LoneWolfff07
    @LoneWolfff07 4 роки тому +1

    bro how can i get accuracy more than 80.42
    which algorithm should i use

    • @HackersRealm
      @HackersRealm  4 роки тому +1

      It depends on every factor, not only algorithm, Check out other projects in the tutorial series, so you can get additional insights on increasing accuracy.

  • @oushnik
    @oushnik 3 роки тому

    Another question...why feature scaling is not working here?

    • @HackersRealm
      @HackersRealm  3 роки тому

      we can use feature scaling too. There are various preprocessing methods to use and get insights.

  • @brit_indi1930
    @brit_indi1930 2 роки тому +1

    U JUST EARNED THE SUB

  • @avishkaravishkar1451
    @avishkaravishkar1451 3 роки тому +1

    Excellent video, found it very helpful!

  • @nandinijain4461
    @nandinijain4461 3 роки тому

    From where we can download the dataset can you provide link or dataset in zip format

  • @saisudhir5005
    @saisudhir5005 3 роки тому +2

    How to increase accuracy?

    • @HackersRealm
      @HackersRealm  3 роки тому

      using different models, hyperparameter tuning, etc., watch other projects of mine to learn more techniques

  • @iamrahul2944
    @iamrahul2944 3 роки тому

    sir, i am not able to add new column getting error as
    my code: data['total_income']=data['ApplicantIncome']+['CoapplicantIncome']

    • @HackersRealm
      @HackersRealm  3 роки тому

      it's data['CoapplicantIncome'], please check the syntax

  • @snrmedia8965
    @snrmedia8965 3 роки тому

    How you directly fill with mean in loan amount why not check outlier

    • @HackersRealm
      @HackersRealm  3 роки тому

      To handle outlier, used log transformation

  • @kumarsanjibray9415
    @kumarsanjibray9415 3 роки тому

    sns.distplot is working but not showing the graph properly ..could u tell me what to do??

    • @HackersRealm
      @HackersRealm  3 роки тому

      try specifying the x, y values properly

    • @kumarsanjibray9415
      @kumarsanjibray9415 3 роки тому

      @@HackersRealm How to specify them ??...Tell me If u can

    • @HackersRealm
      @HackersRealm  3 роки тому

      @@kumarsanjibray9415 seaborn.pydata.org/generated/seaborn.distplot.html try this documentation

  • @lalithkishorep2618
    @lalithkishorep2618 6 місяців тому

    How u say that imputed with mean??

    • @HackersRealm
      @HackersRealm  6 місяців тому

      which part you're referring?

  • @rahulgaddam7110
    @rahulgaddam7110 4 роки тому +1

    how to remove -inf total income coapplicantincome i was tried but not couldn't resolve it.pls help

    • @HackersRealm
      @HackersRealm  4 роки тому

      If you are using log transformation, try like this - np.log(1+df['name']), it will solve the problem

    • @akhilkrishna8521
      @akhilkrishna8521 4 роки тому

      np.seterr(divide = 'ignore')
      train['CoapplicantIncomeLog'] = np.where(train['CoapplicantIncome']>0, np.log(train['CoapplicantIncome']), 0)
      this will solve your problem

    • @mitali3j
      @mitali3j 3 роки тому

      But after adding 1 then in the graph generated, I can see 2 bell curves....
      What does that mean?

  • @mohmmedshahrukh8450
    @mohmmedshahrukh8450 3 роки тому +1

    but in your result doesnt shown any where who are eligble or not

  • @zainabkhalil268
    @zainabkhalil268 2 роки тому

    is there any way of connecting with you via email etc?

    • @HackersRealm
      @HackersRealm  2 роки тому

      you can reach me via linkedin or instagram, links are in the description

  • @quincykao749
    @quincykao749 2 роки тому

    Is it possible if you can add subtitles

    • @HackersRealm
      @HackersRealm  2 роки тому

      It may automatically generated by youtube

    • @quincykao749
      @quincykao749 2 роки тому

      @@HackersRealm it is not avalible for some reason

  • @PravinKumar-zc2eq
    @PravinKumar-zc2eq 4 роки тому

    Can u tell how to train LogisticRegression model??🙏

    • @HackersRealm
      @HackersRealm  4 роки тому

      i think i have explained how to train logistic regression also, could you please check the video again.

    • @PravinKumar-zc2eq
      @PravinKumar-zc2eq 4 роки тому

      @@HackersRealm sorry I mean to say that how to tune the LogisticRegression model

    • @HackersRealm
      @HackersRealm  4 роки тому

      ok, i didn't cover hyperparameter tuning, it will take a complete video for that. I will try to post the videos for that in future

  • @niklausmikealson3115
    @niklausmikealson3115 3 роки тому +1

    I didn't understand where it's shown how many people are approved for loan and already

    • @HackersRealm
      @HackersRealm  3 роки тому

      In the dataset itself, it is clearly mentioned, please use head function to see the labels

    • @niklausmikealson3115
      @niklausmikealson3115 3 роки тому

      What is the goal pls tell

    • @HackersRealm
      @HackersRealm  3 роки тому

      @@niklausmikealson3115 based on the attributes of the person, we need to find whether they are eligible for loan

  • @HossainRabin
    @HossainRabin 4 роки тому +1

    Excellent tutorial but you mispronounced left-skewed and right-skewed data. Appreciate your effort.

    • @HackersRealm
      @HackersRealm  4 роки тому

      Yes, you are right. I will correct it next time. Thanks for watching the video

  • @siddharthlasiyal4037
    @siddharthlasiyal4037 3 роки тому +3

    Thank uuuuu boss

  • @VickyKumar-sg3jc
    @VickyKumar-sg3jc 3 роки тому +1

    so helpful

    • @HackersRealm
      @HackersRealm  3 роки тому +1

      Glad you liked it!!!

    • @VickyKumar-sg3jc
      @VickyKumar-sg3jc 3 роки тому

      @@HackersRealm thankyou sir for responding
      I am getting error on preprocessing labelencoder
      Typeerror:not supported between instances of str and float

    • @HackersRealm
      @HackersRealm  3 роки тому

      @@VickyKumar-sg3jc I think in one column you have float and string values, Please check the type of data

  • @naveengodara6777
    @naveengodara6777 3 роки тому

    hi...needed some help for loan prediction workshop...could you please help

    • @HackersRealm
      @HackersRealm  3 роки тому

      please reach me via insta or linkedin

    • @naveengodara6777
      @naveengodara6777 3 роки тому

      @@HackersRealm texted on instagram...please have a look

  • @varunrokade1617
    @varunrokade1617 4 роки тому

    can some one tell me what is the currency of applicant income and the other amount (currency) in this data set

    • @PravinKumar-zc2eq
      @PravinKumar-zc2eq 4 роки тому +1

      It's in dollars

    • @varunrokade1617
      @varunrokade1617 4 роки тому

      @@PravinKumar-zc2eq thank you !!

    • @be_it_b_76_saurabhyadav36
      @be_it_b_76_saurabhyadav36 3 роки тому

      @@PravinKumar-zc2eq Is it in dollars after log transformation? because before log transformation for example in 1st row applicant income was 5489 then it became 8.67. What if i want income like it was in original dataset? im guessing it was in rupees before log. kindly help if u know praveen.

    • @HackersRealm
      @HackersRealm  3 роки тому

      No, it's in dollars all the time, I have done some data preprocessing on that, that's why the values are small after that. That will be helpful in getting good results

  • @akhilkrishna8521
    @akhilkrishna8521 4 роки тому

    at line number 23 u havent done sns.distplot for coaplicant so u have done wrong ??

    • @HackersRealm
      @HackersRealm  4 роки тому

      I have done for coapplicant income, check 16th minute of video. But mistakenly plotted applicant income, sry for that.

    • @binduskumar3201
      @binduskumar3201 4 роки тому

      Hi,
      You are doing a good job....thanks for the video....
      there is a mistake while plotting the distplot of 'CoapplicantIncome'
      Instead of 'CoaaplicantIncome' you have choosen 'ApplicantIncome'....

    • @binduskumar3201
      @binduskumar3201 4 роки тому +1

      And one more thing, we cannot apply log function to 'CoapplicantIncome' since it contains zero value....

    • @HackersRealm
      @HackersRealm  4 роки тому +2

      If you are using log transformation, try like this - np.log(1+df['name']), it will solve the problem

    • @HackersRealm
      @HackersRealm  4 роки тому

      Yes, my mistake. Sorry for the error

  • @mdtahsinkhan242
    @mdtahsinkhan242 3 роки тому

    Sir Plzz provide the data set

  • @lumdevsawarkar4497
    @lumdevsawarkar4497 Рік тому

    outliers detection

    • @HackersRealm
      @HackersRealm  Рік тому

      There is a separate video in ml concepts playlist, You can check that out!!!

  • @gautamranafounderofbexpert6539
    @gautamranafounderofbexpert6539 4 роки тому +1

    Thank you , I found helpful same

  • @Mandarpatil091
    @Mandarpatil091 Рік тому

    check cell no. 23