How to select the best model using cross validation in python

Поділитися
Вставка
  • Опубліковано 24 жов 2024

КОМЕНТАРІ • 79

  • @VVV-wx3ui
    @VVV-wx3ui 5 років тому +1

    I have done a course and also did bit of predictions using ML, DL including ANN, CNN and LSTM. However, now i understand the libraries to use for difference cases. Thanks For coming up with such Videos. Please do more, i am your subscriber.

  • @vaibhavkhobragade9773
    @vaibhavkhobragade9773 3 роки тому

    This video helps me to clear all my doubts regarding cross-validation and data leakage.

  • @srinivasanganesan9294
    @srinivasanganesan9294 3 роки тому +2

    Krish, A very simple explanation of how CV can be used in algorithm selection. Very well done.

  • @drvren030
    @drvren030 3 роки тому

    got an exam in a couple of hours, and this video cleared up a LOT of things! thank you for going into the concept, and using that to explain what's going on in your code. kudos man, kudos

  • @pranavakailash8751
    @pranavakailash8751 3 роки тому +2

    That is Clean AF! Thanks for this video, Really appreciated

  • @Nikhil-jj7xf
    @Nikhil-jj7xf 5 років тому +1

    Explained with simplicity .
    Thanks Krish..

  • @infidos
    @infidos 4 роки тому +1

    Awesome and clean, simple explanation.

  • @sivakumarprasadchebiyyam9444

    Hi its a very good video. Could you plz let me know if cross validation is done on train data or total data?

  • @muskangupta5873
    @muskangupta5873 4 роки тому

    best video 🙌
    keep posting sir you are awesome

  • @anubabu4187
    @anubabu4187 3 роки тому

    Nice video...sir..how to find the cross validation of non standard parameter...example specificity..

  • @animemodeactivated6404
    @animemodeactivated6404 4 роки тому +1

    Hi Krish, After selecting the model, How to select the best chunk of data for training, as different splits of data will give different accuracy. Very helpful if you can post some video for the same.

    • @EcommerceAdvices
      @EcommerceAdvices 3 роки тому

      Hi Anime,
      I think after selecting best model with good average accuracy, we dont need to split further again i.e. now train on whole dataset and make/save a model. What say?

  • @asadkhan-kk2ru
    @asadkhan-kk2ru Рік тому

    Excellent

  • @chinedumezeakacha1604
    @chinedumezeakacha1604 4 роки тому

    Very apt and straight to the point. Thanks for sharing

  • @tarabalam9962
    @tarabalam9962 Рік тому

    Great teaching

  • @borngenius4810
    @borngenius4810 3 роки тому

    Excelllent explanation. So if I am using cross_val_score instead of train_test_split, I don't need to find out and analyse metrices like Precision, Recall , F1 , ROC ? Just getting the accuracy.mean() is good enough? p.s I am new to DS so I hv probably mixed up a few things

  • @l2mbenop346
    @l2mbenop346 5 років тому +3

    Can you increase font size of the editor. Its very small and eye straining to read in mobiles.

  • @akashpoudel571
    @akashpoudel571 5 років тому +1

    Thank you sir , for a lucid explaination...

  • @divyanshuanand9990
    @divyanshuanand9990 5 років тому +1

    Excellent.
    Thanks for the video.

  • @laxminarasimhaduggaraju2671
    @laxminarasimhaduggaraju2671 5 років тому

    Iam following ur videos
    The way u explains is simply awesome N many more happy thanks for sharing the information N knowledge about D.S

  • @VenkatDinesh02
    @VenkatDinesh02 3 роки тому

    Krish For logistical regression problem we should use Mode right ..Use used mean here..why

  • @TJ-wo1xt
    @TJ-wo1xt 2 роки тому

    gr8 explanation.

  • @amankapri
    @amankapri 4 роки тому

    Very Good Explanation

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 3 роки тому +1

    I have a question. In cross validation we perform multiple experiment based on cv value. In K fold also we do the same thing.
    What is the difference between these two ?

  • @salvador9431
    @salvador9431 5 років тому +3

    Is it ok to use your train_x anda train_y data in your cross validation? Or is better to use your whole x and y variables?

    • @generationwolves
      @generationwolves 5 років тому +1

      The X, and y variables. The whole point of using cross validation techniques is to try various combinations of train and test sets from your original dataset, and find out how effective your algorithm is for any of these combinations.

  • @malkitsingh2654
    @malkitsingh2654 3 роки тому

    Don of datascience Community

  • @mekalamadhankumar3224
    @mekalamadhankumar3224 3 роки тому

    It is difficult calculate which model suitable to use data because we can use all models to check the good accuracy . This is to lead to big problem in coding part .

  • @arunkumaracharya9641
    @arunkumaracharya9641 4 роки тому

    You said...if CV = 10, then ten experiments are conducted but did not tell in what ratio train and test sample is split. Also there is different interpretation of random_state on internet. If random_state = 'none' then sample changes and if random_state = any integer then sample does not change irrespective of any integer you choose. But in your case sample did change if any integer was used. Please clarify

    • @maynorhernandez746
      @maynorhernandez746 Рік тому +1

      In the case of CV= 10 the ratio is train=0.9, testing=0.1, thats because you split the "cake" in 10 pieces. Let´s see for instance CV= 5. The cake is split in 5 pieces so you have 4 pieces to training and 1 piece to test. So you will have train= 4/5= 0.8 and testing 1/5= 0.2.
      For a CV= 4. Training = 3/4= 0.75 and testing =1/4= 0.25.
      I hope this clarify

  • @Raja-tt4ll
    @Raja-tt4ll 4 роки тому

    very nice video. Thank you.

  • @YavuzDurden
    @YavuzDurden 2 роки тому

    Sir, how can I achive this datasets which validated? Why we are applying cross validation if we cant select the high scored scores data? Thank you.

  • @asadkhan-kk2ru
    @asadkhan-kk2ru Рік тому

    Very good

  • @swethanandyala
    @swethanandyala 2 роки тому

    Hi sir... When we are using cv=10...then it simply applies Kfold sampling...can we import stratified k fold and put cv=stratified k fold while wroking with a dataset which has class imbalance as the stratified sampling gives same ratio of clases in train and validation data

  • @Hellow_._
    @Hellow_._ Рік тому

    how to use other CV techniques in coding like stratified cv, time series cv and leave one out cv?

  • @louerleseigneur4532
    @louerleseigneur4532 3 роки тому

    Thanks Krish

  • @SumitKumar-uq3dg
    @SumitKumar-uq3dg 5 років тому

    In cross validaion we are running different models and taking mean of all the acuracies. So which model will be our final model!

  • @simplify1411
    @simplify1411 2 роки тому

    Sir what if the total observations are like 107 or 191 or any prime number...How to split using k fold cv?

  • @Hiyori___
    @Hiyori___ 3 роки тому

    great tutorial

  • @denischo2133
    @denischo2133 3 роки тому

    What to do if I wanna apply minmax or standardscale to fit train and transform only test set in cross Val score? The rule of thumb is to apply these technique on train and test separately so how I can perform this? Cross Val score doesn’t has a specific argument

  • @ravikshdikola6089
    @ravikshdikola6089 3 роки тому

    if train _scores and cross_validate scores difference is negative value so does that mean that model perform very well

  • @akpovoghoigherighe964
    @akpovoghoigherighe964 6 років тому

    This is very useful.

  • @sunnysavita9071
    @sunnysavita9071 5 років тому +2

    sir you didn't define the test_size in train_test_split().

    • @rohithn2056
      @rohithn2056 4 роки тому

      if u don't define automatically train_test_split function takes 75:25 ratio

  • @tiagosilvacorrea9004
    @tiagosilvacorrea9004 5 років тому

    Very Good! Thanks

  • @auroshisray9140
    @auroshisray9140 3 роки тому

    Thank you sir

  • @SararithMAO
    @SararithMAO Рік тому

    If i just want to apply K-fold Cross validation, so i don't need to do train test split, right?

  • @shahadiqbal176
    @shahadiqbal176 5 років тому

    have u done it usind decision tree, random forest, naive bayes??

  • @nagandranathvemishetti9247
    @nagandranathvemishetti9247 3 роки тому

    Sir will it work for multi-class problem

  • @BiranchiNarayanNayak
    @BiranchiNarayanNayak 6 років тому

    When to use the trian_test_split() and cross_val_score() on the dataset ? As I have seen most of the programs use train_test_split with 70%,30% or 60%,40% train,test data split and fit the model. So which is the best approach ?

    • @janvonschreibe3447
      @janvonschreibe3447 5 років тому

      There is not really a neat rule. A rule of thumb is to take the same ratio as your test/train set ratio

  • @MasterofPlay7
    @MasterofPlay7 4 роки тому

    can you output the model summary and the confusion matrix using cross_val_score?

  • @kareemel-tantawy8355
    @kareemel-tantawy8355 4 роки тому

    k fold cross validation use to decide which model is the best for regression and classification only
    or I can use it to decide which model is the best for clustering

  • @markmorillo2954
    @markmorillo2954 3 роки тому

    Great

  • @ANILKUMAR-qd8lx
    @ANILKUMAR-qd8lx 6 років тому

    Please can be explain feature selection in model

  • @niteshsrivastava6504
    @niteshsrivastava6504 4 роки тому

    Does cross_val_score functio uses hyper params and startified folds?

  • @yogendrasaikiran4486
    @yogendrasaikiran4486 3 роки тому

    Iam unable to use that cross validation function in my system

  • @shaiksajid613.
    @shaiksajid613. 2 роки тому

    what is that accuracy is that train accuracy or test ?

  • @nguyenluu3082
    @nguyenluu3082 2 роки тому

    Can you explain the effect of random_state to the accuracy?

    • @jackdaws7125
      @jackdaws7125 2 роки тому +1

      each random state is a different randomization of the train-tes split of the data. So the reason the accuracy is changing is that in each case the split was done differently and lead to different results, which is why its quite unreliable and CrossValidation helps us solve it

  • @rajusrkr5444
    @rajusrkr5444 4 роки тому

    xCELENT VIDEO

  • @AmitVerma-yg8pp
    @AmitVerma-yg8pp 4 роки тому

    I am a little confused with cv folds, and no.of values in X and Y dataset.

    • @KrishnaMishra-fl6pu
      @KrishnaMishra-fl6pu 3 роки тому

      If you take your k fold value as 5, then the CV will perform 5 exps
      Suppose there are 50 records and you took k fold value as 5
      Then size of the test data would be = 50/5 i.e 10
      Exp1==> test data = df[0:10,:]
      Exp2 ==> test data = df[10:20,:]
      Exp3 ==> test data = df[20:30,:]
      Exp4 ==> test data = df[30:40,:]
      Exp5 ==> test data = df[40:50,:]

  • @piyushaneja7168
    @piyushaneja7168 3 роки тому

    sir i m confused in this, we are selecting a part of dataset for testing nd rest for training ,in next iteration we are selecting a part for testing(that was already trained in previous iteration? if so then it wont give correct accuracy as model has already seen the data? or Am i missing some point..

    • @zulfiquarshaikh3461
      @zulfiquarshaikh3461 3 роки тому +2

      Bro in second iteration data that goes in training ans testing is randomly lifted..but its not the same in second iteration. And second has unseen data for testing

    • @piyushaneja7168
      @piyushaneja7168 3 роки тому

      @@zulfiquarshaikh3461 okay thnku bro..!!!

  • @markmorillo2954
    @markmorillo2954 3 роки тому

    Nice viddo

  • @akashpoudel571
    @akashpoudel571 5 років тому

    Sir it's cross val comes first n then tunning the parameter always in general??

    • @krishnaik06
      @krishnaik06  5 років тому

      First hypertunning then cross validation

    • @akashpoudel571
      @akashpoudel571 5 років тому

      @@krishnaik06 Ok sir..

    • @akashpoudel571
      @akashpoudel571 5 років тому

      @@krishnaik06 Could u upload some more algorithms with their params meaning ...just the video on hyperparameters for logistic algm,regression ...If u have time sir

  • @Data_mata
    @Data_mata Рік тому

    ❤❤

  • @samueleboh8965
    @samueleboh8965 5 років тому

    Thanks

  • @prashanthpandu2829
    @prashanthpandu2829 5 років тому

    I am having a doubt that u have to use cross_val_score on train datset or on the whole dataset

    • @venkilfc
      @venkilfc 4 роки тому

      @@generationwolves If you use cross validation to tune your hyper parameters and improve your model, then you shouldn't apply cross validation on the entire dataset but only on the training data. Test data must be always independent. Otherwise it will result in data leakage. If you just want to have an overall look of the scores of the splits then you can apply on the whole dataset.

  • @MasterofPlay7
    @MasterofPlay7 4 роки тому

    can you output the model summary and the confusion matrix using cross_val_score?

    • @chinedumezeakacha1604
      @chinedumezeakacha1604 4 роки тому

      No I would't think so. I think cross validation is a quick way of determining which ML algorithm is most suitable. When you use which ever one that returns a high CV score, you can then do the model summary and confusion matrix using the confusion matrix library.

    • @MasterofPlay7
      @MasterofPlay7 4 роки тому +1

      @@chinedumezeakacha1604 actually I tried it and you can do it