Tutorial 28- Ridge and Lasso Regression using Python and Sklearn

Поділитися
Вставка
  • Опубліковано 29 січ 2025

КОМЕНТАРІ • 113

  • @abhishekchatterjee9503
    @abhishekchatterjee9503 4 роки тому +4

    Watch the 2nd part just now.... You're like a savior to me as I have some deadlines due tomorrow and this helped me a lot sir. Thank you very much.💯💯

  • @anishyekhande6520
    @anishyekhande6520 3 роки тому

    Thanks!

  • @shindepratibha31
    @shindepratibha31 4 роки тому +7

    I feel, steps for regression process can be like this:
    1) split the data into train and test.
    2) Use train data for cross-validation and find the parameter with min MSE.
    3) Use the same parameter over test data and check for the accuracy of different models.

    • @ilirsheraj2092
      @ilirsheraj2092 4 роки тому +2

      You are right, he has actually overfitted the data on test because the model was already trained with them. Spliting and then doing regularization is the correct way, but he probably dindnt wanna go into details here

  • @alakshendrasingh3425
    @alakshendrasingh3425 4 роки тому +40

    You have already trained the model with whole data at first then splitting it for the prediction. Nothing wrong but I don't think so its an ideal technique.

    • @ektamarwaha5941
      @ektamarwaha5941 4 роки тому +1

      agreed

    • @danielwang977
      @danielwang977 3 роки тому

      What would be a better technique? Obviously it's best to train with the whole data set right?

    • @sidkapoor9085
      @sidkapoor9085 3 роки тому +3

      @@danielwang977 Test train split is the way to go. You can overfit the model if you train on the entire dataset.

    • @bhanuteja9408
      @bhanuteja9408 3 роки тому

      @@danielwang977 no
      1.What is the use of testing again(we shouldnot believe that accuracy), if the same dataset is already used in making the model. It is more of overfitting case.
      2.dataset has tobe split first and then 1 has tobe trained, other has to be tested.
      I think 2 is better. We get to know how the model works for a foreign data.
      Imo

    • @trentcrosby9196
      @trentcrosby9196 3 роки тому

      you all probably dont care but does anybody know a trick to get back into an Instagram account..?
      I somehow lost my account password. I would appreciate any tips you can give me!

  • @vishalvishwakarma7603
    @vishalvishwakarma7603 2 роки тому

    Beautifully explained. Really Helpful. Thank you!

  • @iftekarpatel123
    @iftekarpatel123 5 років тому +1

    I m d first to to watch ur video.. Was watching ur bias n variance video... N notification came for this... Big fan of urs sir...

  • @lotmoretolearn-dataanalyti9312
    @lotmoretolearn-dataanalyti9312 4 роки тому

    Great explanation. Very nice and simple explanation for ridge and lasso. Both theoretical and practical concept is good. Keep doing such videos.

  • @Kumar-oh2jl
    @Kumar-oh2jl 4 роки тому +3

    I think the good example for regularization would be to show that model's accuracy on training data is excellent and accuracyon test data is bad i.e- overfitting; then we can use regularization and compare the results

  • @rajeshthumma5930
    @rajeshthumma5930 5 років тому

    Hi Krish, You are explained in a clear and easy manner to understand the concepts. Krish, If possible can you please explain the machine learning algorithms as like you explained as linear regression algorithm. I Mean to say, theoretical explanation of the all the important machine learning algorithms. thank you

  • @bassamal-kaaki3253
    @bassamal-kaaki3253 4 роки тому

    Excellent straightforward video :)

  • @sinister_deamon
    @sinister_deamon 4 роки тому

    your videos are really nice keep it up!

  • @ayankoley5358
    @ayankoley5358 4 роки тому +16

    I'm following the 'the complete machine learning playlist' playlist, but you're step-jumping skipping many details saying 'hope you know this', love your teaching though but Can you make any good completed playlist??

    • @hasanzaman9783
      @hasanzaman9783 4 роки тому +3

      Agreed.. I'm also following the complete list.

    • @utkarshsalaria3952
      @utkarshsalaria3952 3 роки тому

      yes u said that right the explanation is good but it is difficult to follow the videos in step even after the playlist is available . some topics need to be rearranged and some need to be added in the playlist so that it becomes easy to follow the path ....but still its the best material available on youtube you can find all the topics but for that you need to search it manually on his channel

  • @esakkiponraj.e5224
    @esakkiponraj.e5224 4 роки тому +5

    A small suggestion --- better differentiate the terms - alpha (for Lassa) and lambda (for ridge). It is confusing.

  • @apoorvshrivastava3544
    @apoorvshrivastava3544 4 роки тому

    Sir you are saviour

  • @economicslover9534
    @economicslover9534 2 місяці тому

    Tankk u soomuch... Sir❤

  • @kushswaroop7436
    @kushswaroop7436 2 роки тому +1

    How do we select the value of Aplha/Lambda, what is the ideal value

  • @Amir-English
    @Amir-English 9 місяців тому

    Why did we split the data into train & test "after" doing fitting in lasso regression? I mean, shouldn't it be like splitting the data before fitting and creating a model? 8:34

  • @MrPiickel
    @MrPiickel 5 років тому +5

    Is the method correct?
    In my understanding we should:
    1. split up the data train and test.set (maybe - depends on data units - standardize before)
    2. do the hyperparameter optimization on the train set with cv-> best model for training data
    3. predict with best model on the test set -> realistic result for unseen data
    you did:
    1. hyperparameter optimization on all data using cross validation -> best model for all data
    2. split up all data in training and test set
    3. predict with best model on test set -> in my opinion your "best model" has already seen the data in the test set. Hence the result should not be quit realistic. What do you think?

    • @madhavanrangarajan6097
      @madhavanrangarajan6097 4 роки тому

      true...was thinkinhg the same

    • @jaysoni7812
      @jaysoni7812 4 роки тому

      main aim of this video is to explain the ridge and lasso regression this small small things we can do it our self, so just do it your self instead of finding this silly type of mistake

  • @sandipansarkar9211
    @sandipansarkar9211 4 роки тому

    Superb explanation. Need to get my hands dirty in jupyter notebook. Thanks

  • @shashankverma4044
    @shashankverma4044 4 роки тому

    Excellent !!

  • @Emotekofficial
    @Emotekofficial 3 роки тому +3

    Krish thank you for this video very informative. I have question though. Dont you think predicting (x_test, y_test) from the model that is trained from (x,y) would predict a memorized value? shouldn't it be accurate and realistic prediction if model is trained from x_train and y_train rather than x,y for testing purpose?

  • @2828jordan
    @2828jordan 4 роки тому +10

    Good videos. So far so good. From most of the videos, i feel inference part is missing. What can we infer from the plots ?

  • @surajsoren637
    @surajsoren637 4 роки тому +1

    can we use other scoring method rather than neg_mean_squared_error to solve the problem...If any please suggest...Please help me out..

  • @3pandya
    @3pandya 4 роки тому +2

    Hi.
    I just want to know if I am not wrong, we need to use train_test_split method before training the data. right?
    But you trained the data and then split the data into train & test, which for sure do not give us an accurate prediction on future predictions.
    Please correct me if I am wrong.
    Thank you.

    • @shashankkhare1023
      @shashankkhare1023 4 роки тому

      Hi, he used cross validation first, which in itself does train-test split. cv=5 means splitting ino train-test 5 times and checking scoring(neg MSE) on each test set and showing mean value. So cross validation itself takes care of testing on unseen data.
      Doing a train-test split manually later on is an optional choice, hope this answer helps.

    • @shindepratibha31
      @shindepratibha31 4 роки тому

      @@shashankkhare1023 I think it is better to split the data into train and test first and then go for cross validation using train data. Cross validation splits train data into train and validation.

  • @raghavsinghal22
    @raghavsinghal22 3 роки тому

    is alpha is Lamba which we are trying to found out in ridge?

  • @preranav7149
    @preranav7149 Рік тому

    veryhelpful

  • @sanjanaprakash6847
    @sanjanaprakash6847 10 місяців тому

    The concept is well explained. However, as the ridge and lasso model is trained with the entire dataset and not x_train, data leakage has happened as the model is aware of the test inputs much before testing.

  • @onlydilip8124
    @onlydilip8124 3 роки тому

    cross_val_score is the costfuntion as u send in the previous video

  • @arindamghosh3787
    @arindamghosh3787 4 роки тому

    how to check which are the best variables left for the model

  • @shubhamkundu2228
    @shubhamkundu2228 3 роки тому

    Is it necessary to use GridSearchCV for ridge and lasso regression?

  • @dhivya_animal_lover
    @dhivya_animal_lover 4 роки тому +1

    Thanks for your previous video Krish. I am not getting whether Lasso or Ridge is better at the end of the code. Also i referred a blog on this topic and found Elastic net is more efficient. Can you kindly explain this.
    Dy

  • @madhu1987ful
    @madhu1987ful 5 років тому +1

    Krish
    Just the video I wanted to see how ridge and lasso can be used in Python. Thanks a ton
    Can u pls explain what is the inference from last 2 dist plots...I didn't get

  • @kks47
    @kks47 4 роки тому +2

    Hello krish ,
    i am confused at the end which one is performed well ? Lasso or Ridge in this case?
    Please give some feedback.

    • @viveksingh881
      @viveksingh881 4 роки тому +1

      ridge performed well....closer to zero better perfromance

  • @yadavsanderamniwas
    @yadavsanderamniwas 4 роки тому +1

    I have a question. what do you mean by stable in the last ses distplot graphs? both ridge and lasso looks same to me. how is one more stable than other

    • @nabiltech1366
      @nabiltech1366 4 роки тому

      As u can watch at 4:10, he said that the more data closer to zero,the better the model u have.At histogram u can see that data is more fall in zero using Ridge that Lasso

  • @adarshnamdev5834
    @adarshnamdev5834 4 роки тому

    Hello Krish, thanks for making this wonderful video. Could you also please make a video on SVM and its underlying aspects like Kernelization etc...

  • @adityay525125
    @adityay525125 4 роки тому +1

    Sir after you made everything, I did not get things related to Cross Validation, can you please explain it in brief?

  • @vinayakpawar2460
    @vinayakpawar2460 4 роки тому +1

    Hi Krish, Amazing videos....On what basis Alpha parameter list is decided? Also can you please explain in more detail about the 2 plots in the last.? Thanks.

  • @ejikeozonkwo1705
    @ejikeozonkwo1705 2 роки тому

    Hi Krish, is there a reason you trained the whole data set before doing a train test split.

  • @abinsharaf8305
    @abinsharaf8305 3 роки тому

    once we findout the best parameters do we use it ? where are we using it ?

  • @daspopdsa
    @daspopdsa Рік тому

    Sir there is error like 'load_boston' has been removed from scikit-learn since version 1.2.
    What should i do???

  • @mitulkoul1909
    @mitulkoul1909 3 роки тому

    I had a doubt.
    what would happen if the best fit line has a slope 0 i.e parallel to the x axis. How would ridge and lasso regression help overcome overfitting in that case?

  • @RandomGuy-hi2jm
    @RandomGuy-hi2jm 4 роки тому +1

    but sir u have not scaled the values????

  • @barax9462
    @barax9462 4 роки тому

    why im getting -1.34e+23 when i do mean_mse for linear reg??? is that bad?

  • @techbenchers69
    @techbenchers69 5 років тому +2

    Is deep learning playlist is complete ?if not please complete it🙏🙏

  • @kapilchand6017
    @kapilchand6017 4 роки тому

    Hi Krish can you please explain why lasso is better than bridge in histogram prediction didn't able to follow the last minutes of the video.It would be great if you can clarify.

  • @deepalisharma1327
    @deepalisharma1327 3 роки тому

    can someone please explain why are we subtracting y_test from prediction_ridge or prediction_lasso?

  • @JaiSreeRam466
    @JaiSreeRam466 5 років тому +1

    Please explain elastic regression aswell

  • @shubhamkundu2228
    @shubhamkundu2228 3 роки тому

    cross_validation not explained clearly. What it is? Why it's needed to perform linear_regression here and what are those hyperparameters e.g cv used under cross_val_score ?

  • @sobhagyashri
    @sobhagyashri 4 роки тому

    Thankyou sooooo much.. :)

  • @easewithjapanese1844
    @easewithjapanese1844 3 роки тому

    Hi Krish. Nice video but not getting my histogram graph. it throws a value error. please help

  • @krunalpatel9952
    @krunalpatel9952 4 роки тому +1

    HI Krish,
    At 5.58 time in video, you said that this best score helps us to find out which lambda value is suitable! but question that how? you have mentioned those values as an alpha values. and alpha values as a learning rate should not be very high number, instead it should be very small in order to reach global minima.
    regards,
    Krunal

    • @rajathbk6915
      @rajathbk6915 4 роки тому +1

      Hey we used alpha for ridge and lasso not linear regression.dont confuse learning rate(Alfa) with lambda.those alpha values is for lambda

    • @nabiltech1366
      @nabiltech1366 4 роки тому

      Thats for gradient descents

    • @venkatasubbaraomandaleeka4973
      @venkatasubbaraomandaleeka4973 3 роки тому

      I also got same issue. He is saying lamda is selected using cv then why he gave alpha values and printed best alpha value?

    • @jeevan88888
      @jeevan88888 Рік тому

      @@venkatasubbaraomandaleeka4973 he just typed lambda as alpha in the code for simplicity I guess..

  • @ali013579
    @ali013579 5 років тому +4

    I just don’t know where to start. By the way nice videos

    • @ali013579
      @ali013579 5 років тому

      wise guy you can do many things, for example code to drive your car while you are sleeping in the car :)

  • @NEHASHARMA-tm9gu
    @NEHASHARMA-tm9gu 3 роки тому

    Just a small doubt Since I m new to data science,
    You used cross_val technique for Linear Regress
    ion but Grid Search for Ridge and Lasso ?

  • @Datacrunch777
    @Datacrunch777 4 роки тому

    How can I find summary of date in ridge regression as like in OLS ( estimatr, standard error and p values ) ? Kindly help plZ

    • @arjunkadam71
      @arjunkadam71 4 роки тому +1

      Use statmodels.GAM library

    • @Datacrunch777
      @Datacrunch777 4 роки тому

      Can you tell me proper command?

    • @arjunkadam71
      @arjunkadam71 4 роки тому +1

      @@Datacrunch777 explore statmodels library bro

  • @RitwikDandriyal
    @RitwikDandriyal 5 років тому +1

    Just curious but shouldn't we be standardizing our data when using linear regression? Or does sklearn automatically take care of that?

    • @hipraneth
      @hipraneth 5 років тому

      I guess it needs to be standardized before applying the algorithm

  • @nagashishsv843
    @nagashishsv843 4 роки тому

    sir if i use cv=10 then the mse is coming still less..so how to chose the cv value appropriately..??will it depends upon the dataset..??

  • @sufiyanansari1739
    @sufiyanansari1739 4 роки тому

    Cannot clone object. You should provide an instance of scikit-learn estimator instead of a class.
    what is this error

  • @manikantasai4766
    @manikantasai4766 5 років тому +3

    Can you please make video on sentiment analysis on twitter

  • @galymzhankenesbekov2924
    @galymzhankenesbekov2924 4 роки тому

    Krish, as always amazing video. But why you decide to use cv=5, and how you come up with alpha values ? thanks

    • @krishnaik06
      @krishnaik06  4 роки тому +1

      U can choose any value for cv and alpha usually ranges between 0and 1

    • @galymzhankenesbekov7242
      @galymzhankenesbekov7242 4 роки тому

      Krish Naik thanks ! Could I also suggest a project for you to do ? Loan approval with online platform where you enter credentials of the client and it gives either to approve the loan or not . Is it just classification problem ?

  • @garima2158
    @garima2158 Рік тому

    Error i m getting:
    It seems like you're encountering an issue because the load_boston function has been removed from scikit-learn since version 1.2 due to ethical concerns related to the dataset.

  • @nikheleshpanigrahi4640
    @nikheleshpanigrahi4640 4 роки тому

    Hello Krish,
    can you tell me how are you selecting alpha(lambda) values ?

    • @danielwang977
      @danielwang977 3 роки тому

      The algorithm cycles through each of the parameters and uses cross validation to comprehensively test how accurate each parameter is.

  • @arvindtechnical940
    @arvindtechnical940 5 років тому +1

    Sir AirIndex Project Second Video?

  • @Prachi_Mayank
    @Prachi_Mayank 5 років тому

    Sir this mse =cross val
    This line is showing error in my system sir... As per this 'neg_mean
    This line is showing error...
    Wat do I do sir?

  • @sudarshansharma8647
    @sudarshansharma8647 4 роки тому

    Sir, will Ridge and Lasso Regression would always give us better result than linear regression or polynomial regression?

  • @hrshtmlng
    @hrshtmlng 9 місяців тому

    Now this dataset load_boston isnt available in sklearn instead I'm using california_hosing dataset

    • @kyubi-tv2bq
      @kyubi-tv2bq 7 днів тому

      Are you able to achieve similar results with cali dataset?

  • @narotian
    @narotian 4 роки тому +3

    your theory videos are good, but i don't like coding part it looks way different from what i do. you must have tried doing it from start with some new dataset. i'm good with theory but now i'm messing my mind with coding part(everyone has their own way of coding).

  • @rajusrkr5444
    @rajusrkr5444 5 років тому +1

    please provide code also sir, when practicing it will take more time to watch the video and type the code in editor.

    • @adityay525125
      @adityay525125 4 роки тому

      it' s on his Github profile

    • @sasikumar-fz1zy
      @sasikumar-fz1zy 4 роки тому

      refer the GitHub link in the description of this video

  • @d39-nischithhegde65
    @d39-nischithhegde65 10 місяців тому

    boston dataset has been depricated

  • @adarshmamidi334
    @adarshmamidi334 5 років тому +1

    Reply in insta

  • @beautyofnature1541
    @beautyofnature1541 3 роки тому

    Thank you for sharing your knowledge. I have recently started watching and have learned a lot. I was just practicing the above code on Colab and it's giving me the following error: AttributeError: 'Series' object has no attribute 'prediction_lasso' on the line sns.distplot(y_test.prediction_lasso) and also at sns.distplot(y_test.prediction_ridge). Can anyone please help with how to fix this?