Machine Learning Tutorial Python 12 - K Fold Cross Validation

Поділитися
Вставка
  • Опубліковано 30 чер 2024
  • Many times we get in a dilemma of which machine learning model should we use for a given problem. KFold cross validation allows us to evaluate performance of a model by creating K folds of given dataset. This is better then traditional train_test_split. In this tutorial we will cover basics of cross validation and kfold. We will also look into cross_val_score function of sklearn library which provides convenient way to run cross validation on a model
    #MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #MachineLearningModel #sklearn #sklearntutorials #scikitlearntutorials
    Code: github.com/codebasics/py/blob...
    Exercise: Exercise description is avialable in above notebook towards the end
    Exercise solution: github.com/codebasics/py/blob...
    Topics that are covered in this Video:
    0:00 Introduction
    0:21 Cross Validation
    1:02 Ways to train your model( use all available data for training and test on same dataset)
    2:08 Ways to train your model( split available dataset into training and test sets)
    3:26 Ways to train your model (k fold cross validation)
    4:26 Coding (start) (Use hand written digits dataset for kfold cross validation)
    8:23 sklearn.model_selection KFold
    9:10 KFold.split method
    12:21 StratifiedKFold
    19:45 cross_val_score
    Do you want to learn technology from me? Check codebasics.io/?... for my affordable video courses.
    Next Video:
    Machine Learning Tutorial Python - 13: K Means Clustering: • Machine Learning Tutor...
    Populor Playlist:
    Data Science Full Course: • Data Science Full Cour...
    Data Science Project: • Machine Learning & Dat...
    Machine learning tutorials: • Machine Learning Tutor...
    Pandas: • Python Pandas Tutorial...
    matplotlib: • Matplotlib Tutorial 1 ...
    Python: • Why Should You Learn P...
    Jupyter Notebook: • What is Jupyter Notebo...
    Tools and Libraries:
    Scikit learn tutorials
    Sklearn tutorials
    Machine learning with scikit learn tutorials
    Machine learning with sklearn tutorials
    To download csv and code for all tutorials: go to github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.
    🌎 My Website For Video Courses: codebasics.io/?...
    Need help building software or data analytics and AI solutions? My company www.atliq.com/ can help. Click on the Contact button on that website.
    #️⃣ Social Media #️⃣
    🔗 Discord: / discord
    📸 Dhaval's Personal Instagram: / dhavalsays
    📸 Codebasics Instagram: / codebasicshub
    🔊 Facebook: / codebasicshub
    📱 Twitter: / codebasicshub
    📝 Linkedin (Personal): / dhavalsays
    📝 Linkedin (Codebasics): / codebasics
    🔗 Patreon: www.patreon.com/codebasics?fa...

КОМЕНТАРІ • 592

  • @codebasics
    @codebasics  2 роки тому +3

    Do you want to learn technology from me? Check codebasics.io/ for my affordable video courses.

  • @MrSparshtiwari
    @MrSparshtiwari 3 роки тому +115

    After watching so many different ML tutorial videos and literally so many i have just one thing to say, the way you teach is literally the best among all of them.
    You name any famous one like Andrew NG or sentdex but you literally need to have prerequisites to understand their videos while yours are a treat to the viewers explained from so basics and slowly going up and up. And those exercises are like cherry on the top.
    Never change your teaching style sir yours is the best one.👍🏻

  • @beansgoya
    @beansgoya 5 років тому +30

    I love that you go through the example the hard way and introduce the cross validation after

  • @AltafAnsari-tf9nl
    @AltafAnsari-tf9nl 3 роки тому +12

    Couldn't ask for a better teacher to teach machine learning. Truly exceptional !!!!Thank You so much for all your efforts.

  • @rajnichauhan1286
    @rajnichauhan1286 4 роки тому +63

    what an amazing explanation. Finally! I understood cross validation concept so clearly. Thank You so much.

  • @knbharath5947
    @knbharath5947 4 роки тому +6

    Great stuff indeed. I'm learning machine learning from scratch and this was very helpful. Keep up the good work, kudos!

  • @nuraishahzainal1660
    @nuraishahzainal1660 2 роки тому +6

    Hi, I'm from Malaysia. I came across your video and I am glad I did it. super easy to understand and I'm currently preparing to learn deep learning. already watch your Python, Pandas, and currently ML videos. thank you for making all these videos. you making our life easier Sir.
    Sincerely, your student from Malaysia.

  • @mastijjiv
    @mastijjiv 4 роки тому +10

    Your videos are AMAZING man!!! I have already recommended these videos to my colleagues in my University who is taking Machine Learning course. They are also loving it...!!! Keep it up champ!

    • @codebasics
      @codebasics  4 роки тому +2

      Mast pelluri, I am glad you liked it and thanks for recommending it to your friends 🙏👍

  • @AlphaGodzilla1
    @AlphaGodzilla1 3 роки тому +3

    I have never seen anyone who can explain Machine Learning and Data Science so easily..
    I used to be scared in Machine Learning and Data science, then after seeing your videos, I am now confident that I can do it by myself. Thank you so much for all these videos....
    👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏

  • @tatendaVIDZ90
    @tatendaVIDZ90 2 роки тому +1

    that approach of doing the manual method of what cross_val_score is doing in the background and then introducing the method! God send! Brilliant. Brilliant I say!

  • @parikshitshahane6799
    @parikshitshahane6799 3 роки тому +1

    Probably the best machine learning tutorials out there... Very good job
    Thanks!

  • @codebasics
    @codebasics  4 роки тому +4

    Exercise solution: github.com/codebasics/py/blob/master/ML/12_KFold_Cross_Validation/Exercise/exercise_kfold_validation.ipynb
    Complete machine learning tutorial playlist: ua-cam.com/video/gmvvaobm7eQ/v-deo.html

    • @hemenboro4313
      @hemenboro4313 4 роки тому

      we needed to use mean() with cross validation to get average mean of accuracy score. i'm guessing you forget to add. anyways video is pretty good and depth.keep producing such videos.

  • @layanibandaranayake9406
    @layanibandaranayake9406 3 роки тому

    The best and the smilpest explanation for cross validation i could find after so mush searching.! Keep up the good work!

  • @imposternaruto
    @imposternaruto 2 роки тому +1

    My teacher is frustratingly bad. I am learning from your videos so that I can get a good grade in my class. Thank you for taking some time to demonstrate what is happening. When you showed me with the example at 10:47, I finally understood.

  • @apeculiargentleman6925
    @apeculiargentleman6925 5 років тому +5

    You make exquisite content, I'd love to see more!

  • @synaestheticVI
    @synaestheticVI 4 роки тому +1

    What an excellent video, thank you! I got lost in other written tutorials, this was finally a clear explanation!

    • @codebasics
      @codebasics  4 роки тому

      Hey, thanks for the comment. Keep learning. 👍

  • @nicoleluo6692
    @nicoleluo6692 Рік тому +1

    🌹 You are way way... way better than all of my Machine learning professor at school!

  • @christiansinger2497
    @christiansinger2497 4 роки тому +4

    Thanks man! You're really helping me out finishing my university project in machine learning.

    • @codebasics
      @codebasics  4 роки тому +3

      Christian I am glad to hear you are making a progress on your University project 😊 I wish you all the best 👍

  • @vishalrai2859
    @vishalrai2859 3 роки тому +1

    only one channel who has pure quality not beating around the bush thanks dhaval sir for your contribution

  • @pablu_7
    @pablu_7 4 роки тому +6

    Thank you Sir for this awesome explanation. Iris Dataset Assignment Score
    Logistic Regression [96.07% , 92.15% , 95.83%]
    SVM [100% , 96.07% , 97.91%] (Kernel='linear')
    Decision Tree [98.03 %, 92.15% , 100%]
    Random Forest [98.03% , 92.15% , 97.91%]
    Conclusion: SVM works the best model for me .

    • @pranjaysingh4161
      @pranjaysingh4161 6 місяців тому

      pretty ironic and yet amusing at the same time

  • @Suigeneris44
    @Suigeneris44 4 роки тому +1

    Your videos are really good! The explanation is crisp and succinct! Love your videos! Keep posting! By the way, you may not realize it, but you are changing peoples' lives by educating them! Jai Hind!

  • @ricardogomes9528
    @ricardogomes9528 3 роки тому +1

    Finnaly a video explaining de X_train, X_test, y_train,y_teste. Thank you!

  • @zunairnoor2745
    @zunairnoor2745 11 місяців тому +2

    Thanks sir! Your tutorials are really helpful for me. Hope I'm gonna see all of them and make my transition from mechanical to AI successful 😊.

  • @kmchentw
    @kmchentw 2 роки тому +2

    Thank for the very useful and free tutorial series. Salute to you sir!

  • @KK-rh6cd
    @KK-rh6cd 3 роки тому

    I watch several videos of CV but your video is well explained, thank you, thank you very much sir, keep uploading videos sir

  • @alerttrade2356
    @alerttrade2356 4 роки тому +1

    Thank you. This video solved so many questions at once. Nicely done.

  • @mohammadpatel2569
    @mohammadpatel2569 5 років тому +1

    Your video's on machine learning is way bettet than any online paid video's. so keep growing..

  • @learnerlearner4090
    @learnerlearner4090 4 роки тому

    Great tutorials with easy to understand examples. Thanks so much!

  • @pablu_7
    @pablu_7 4 роки тому +22

    After Parameter Tuning Using Cross Validation = 10 and taking average
    Logistic Regression = 95.34%
    SVM = 97.34%
    Decision Tree = 95.34 %
    Random Forest Classifier = 96.67 %
    Performance = SVM > Random Forest > Logistic ~ Decision

    • @manu-prakash-choudhary
      @manu-prakash-choudhary 2 роки тому

      after taking cv=5 and C=6 svm is 98.67%

    • @sriram_cyber5696
      @sriram_cyber5696 Рік тому

      @@manu-prakash-choudhary After 50 splits 😎😎
      Score of Logistic Regression is 0.961111111111111
      Score of SVM is 0.9888888888888888
      Score of RandomForestClassifier is 0.973111111111111

  • @barackonduto5286
    @barackonduto5286 3 роки тому

    You are a great instructor and explain concepts in a very understandable and relatable manner. Thank you

    • @codebasics
      @codebasics  3 роки тому

      I am happy this was helpful to you.

  • @josephnnodim8244
    @josephnnodim8244 3 роки тому

    This is the best video I have watched on Machine learning. Well done!

  • @helloonica8515
    @helloonica8515 2 роки тому

    This is the most helpful video regarding this topic. Thank you so much!

  • @learnwithfunandenjoy3143
    @learnwithfunandenjoy3143 4 роки тому

    AWESOME AWESOME..... Excellent video you have created. I'm learning ML since past more than 1 years and heard almost more 400 videos. Your videos are AWESOME.... Please make complete series on ML... Thanks.

    • @codebasics
      @codebasics  4 роки тому

      Pankaj I am happy it has helped you. :) And yes I am in process of uploading many new tutorials on ML. stay tuned!

  • @silentboy2292
    @silentboy2292 5 років тому

    great work..waiting for more videos on DEEP LEARNING!!

  • @kunalsoni_
    @kunalsoni_ 5 років тому

    Keep up the good work. Cheers man!

  • @oscarmuhammad4072
    @oscarmuhammad4072 3 роки тому

    This is an EXCELLLENT explanation. Straighfoward and simplified....Thank you.

  • @piyushbarthwal1722
    @piyushbarthwal1722 9 місяців тому

    Don't have any words, you're teaching style and knowledge is amazing ✨...

  • @abdulazizalqallaf1704
    @abdulazizalqallaf1704 4 роки тому

    Best Explanation I have ever seen. Outstanding job!

    • @codebasics
      @codebasics  4 роки тому

      I am happy this was helpful to you

  • @arijitRC473
    @arijitRC473 5 років тому

    Sir please post more videos regarding ML...... Its really useful for me and others... Thank you so much for your contribution...

  • @simaykazc1508
    @simaykazc1508 3 роки тому

    Thanks for creating rather authentic content on this topic compare to others. It is more clear!

  • @nnennaumelloh8834
    @nnennaumelloh8834 3 роки тому

    This is such an amazing, clear explanation. Thank you so much!

  • @panagiotisgoulas8539
    @panagiotisgoulas8539 2 роки тому +2

    For the parameter tuning this helps. Just play a bit with indexes due to lists staring from 0 and n_estimators from 1 to match up indexes.
    scores=[ ]
    avg_scores=[ ]
    n_est=range(1,5) #example
    for i in n_est :
    model=RandomForestClassifier(n_estimators=i)
    score=cross_val_score(model,digits.data, digits.target, cv=10)
    scores.append(score)
    avg_scores.append(np.average(score))

    print('avg score:{}, n_estimator:{}'.format(avg_scores[i-1],i))
    avg_scores=np.asarray(avg_scores) #convert the list to array
    print('
    Average accuracy score is {} for n_estimators={} calculated from following accuracy scores:
    {}'.format(np.amax(avg_scores),np.argmax(avg_scores)+1,scores[np.argmax(avg_scores)]))
    plt.plot(n_est,avg_scores)
    plt.xlabel('number of estimators')
    plt.ylabel('average accuracy')
    44 was the best for me

  • @shamsiddinparpiev51
    @shamsiddinparpiev51 3 роки тому +1

    Greatly explained man. Thank you

  • @pappering
    @pappering 4 роки тому +10

    Thank you very much. Very nice explanation. My scores, after taking averages, are as follow:
    LogisticRegression (max_iter=200) = 97.33%
    SVC (kernel = poly) = 98.00%
    DecisionTreeClassifier = 96%
    RandomForestClassifier (n_estimators=300) = 96.67%

  • @leelavathigarigipati3887
    @leelavathigarigipati3887 4 роки тому

    Thank you so much for the detailed explanation.

  • @yoyomovieclips8813
    @yoyomovieclips8813 3 роки тому

    You solved one of my biggest confusion.....Thanks a lot sir

  • @anirbanc88
    @anirbanc88 Рік тому

    thank you so much, i am so grateful for a teacher like you.

  • @Hiyori___
    @Hiyori___ 3 роки тому +2

    your tutorial are saving my life

  • @mvcutube
    @mvcutube 3 роки тому

    Great. You made things look very easy & boosts the confidence. Thank you.

  • @vikamiller1
    @vikamiller1 4 роки тому

    Amazing explanation ! Thank you :)

  • @hpourmamih
    @hpourmamih 4 роки тому +5

    This is one of the best explanation of Kfold Cross Validation!!!
    Thank you so much for sharing this valuable video . :))

  • @naveenkalhan95
    @naveenkalhan95 4 роки тому +7

    @20:39 of the video, noticed something interesting, by default "cross_val_score()" method generates 3 kfolds... but the default has now changed from 3 to 5 :))

    • @gandharvsaxena8841
      @gandharvsaxena8841 3 роки тому +2

      thanks man, i was worried when mine was showing 5 folds results. i thought something was wrong w my code.

    • @khalidalghamdi6303
      @khalidalghamdi6303 2 роки тому

      ​@@gandharvsaxena8841 Me too lol, whi I am getting 5

    • @aadilsstatus8895
      @aadilsstatus8895 Рік тому

      Thankyou man!!

  • @akshyakumarshrestha5551
    @akshyakumarshrestha5551 5 років тому +1

    The way you teach is awesome! I request to make tutorials on Neural Network if you are in that field. Thankyou!

    • @codebasics
      @codebasics  5 років тому

      Akshya I started making videos on neural net. Check my channel, posted first two already..once TF2.0 is stable I will add more.

  • @akashraj.srinivasan
    @akashraj.srinivasan 5 років тому +1

    What Else I can say, AWESOME❤

  • @tech-n-data
    @tech-n-data Рік тому

    Thank you sooooo much. You simplified that beautifully.

  • @anujvyas9493
    @anujvyas9493 4 роки тому +43

    14:15 - Here instead of kf.split() we should use folds.split(). Am I correct??

    • @codebasics
      @codebasics  4 роки тому +14

      Yes. My notebook has a correction. Check that on GitHub link I have provided in video description

    • @veenak108
      @veenak108 4 роки тому +12

      Yes and also just to add to it StratifiedKFold requires X and y both labels to its split method. Stratification is done based on the y labels.

  • @vijayotnm
    @vijayotnm 4 роки тому

    Excellent tutorial, thank you!

  • @Gamesational1
    @Gamesational1 2 роки тому +1

    Useful for identifying many differnt types of categories.

  • @piousringlearning
    @piousringlearning 5 років тому

    Nice Explanation. Thank you Sir

  • @humayunnasir6261
    @humayunnasir6261 3 роки тому

    wonderful explaination. Great tutorial series

  • @kewtomrao
    @kewtomrao 4 роки тому

    Best tutorial.Liked and subscribed!!!

  • @jinks3669
    @jinks3669 2 роки тому

    Dhanyavaad Sir. Bhagwaan aapko swasth aur khush rakhien humesha.
    You are my god.

  • @someshkb
    @someshkb 4 роки тому +2

    Thank you very much for the nice explanation. I have one question in this context: Isn't it necessary to use in train_test_split method the 'random_state' to get the same score for any model?

  • @mohankrishna2188
    @mohankrishna2188 2 роки тому

    Kudos to you, this was the most the crystal clear explanation so fear I have seen.
    but one small query how to get train accuracy in cross_validation algorithm?

  • @santiagovasquez7775
    @santiagovasquez7775 4 роки тому

    what a nice way to do your videos thanks a lot, i have learned a lot
    keep doing it

  • @zakiasalod891
    @zakiasalod891 5 років тому +1

    Hi there!
    Excellent video! This greatly explains the concepts and is very helpful! Keep up the awesome work! I have 2x questions please - Please clarify:
    1) Since the cross_val_score method is used to get the score for the performance of a machine learning model, when using Stratified K Fold Cross Validation, is it the only performance measure? Can we also use the following, and if yes, how? Please explain with an example please:
    - Accuracy
    - Precision
    - Recall
    - Specificity
    - F1_Score
    - ROC_Curve
    - Model Execution Time (How is this possible using Jupyter Notebooks?)
    2) Expanding on the content of this UA-cam video, please explain with an example, on how to retrieve the Feature Importance of a machine learning model please? At which stage would this be done? At the end, right? I mean, after we get the average score of a machine learning model using Stratified K Fold Cross Validation?
    Thanks a lot in advance. Much appreciated.

  • @amandaahringer7466
    @amandaahringer7466 2 роки тому

    Great explanation, thank you!

  • @thineshsubramani
    @thineshsubramani 4 роки тому

    Love your explanation, Thank you very much sir

  • @felipeacunagonzalez4844
    @felipeacunagonzalez4844 4 роки тому

    Thanks!!! great tutorial!!

  • @The_TusharMishra
    @The_TusharMishra 5 місяців тому +4

    He did folds = StratifiedKFold(), and said that he will use it because it is better than KFold
    but at 14:20, he used kf.split, where kf is KFold.
    I think he frogot to use StatifiedKFold.

  • @pratiknarkhede1287
    @pratiknarkhede1287 3 роки тому

    Really helpful, thank you so much I was stuck on this for a long time 🙌

  • @cantanzim6215
    @cantanzim6215 4 роки тому +1

    it is amazing explanation , grate job ...

  • @rahuljaiswal9379
    @rahuljaiswal9379 4 роки тому +1

    very simple n lovely teaching......u r simple n great... thank u so much sir

    • @codebasics
      @codebasics  4 роки тому +1

      Thanks rahul for your kind words of appreciation

  • @davidcalahorrano8681
    @davidcalahorrano8681 4 роки тому

    Thanks Man! Great Tutorial

  • @obheech
    @obheech 5 років тому

    Mind blowing explanation ..

  • @kannanv8831
    @kannanv8831 4 роки тому

    Wonderful teaching. Thanks.

  • @behdadkhosraviani4131
    @behdadkhosraviani4131 4 роки тому

    honestly , this video was great!Tnx

  • @sumayachoya2738
    @sumayachoya2738 3 місяці тому

    thank you for this series. it is helping me a lot.

  • @girikgarg8
    @girikgarg8 Рік тому

    Very nicely explained!

  • @kanyadharani6844
    @kanyadharani6844 3 роки тому

    Super clear explanation, I have been searching for this one, by seeing this video makes me perfect, tq.

  • @zeelpareshmehta1942
    @zeelpareshmehta1942 3 роки тому

    Very nicely explained! Thank you

  • @user-bm9oy4gx2l
    @user-bm9oy4gx2l 5 років тому

    Thank you so much. It helped me a lot!!!

    • @codebasics
      @codebasics  3 роки тому

      I am happy this was helpful to you.

  • @Afussia
    @Afussia 3 роки тому

    thank you! this saved my life

  • @maruthiprasad8184
    @maruthiprasad8184 2 роки тому

    Thank you very much for excellent explanation. I got accuracy SVC=98.04% , RandomForestClassifier(n_estimators=30)=98.04%,
    LogisticRegression(max_iter=200)=96.08%

  • @bhaskarg8438
    @bhaskarg8438 2 роки тому

    simply great !! 🙏

  • @ahmedelsabagh6990
    @ahmedelsabagh6990 2 роки тому

    Great explanation!

  • @jeffallenmbaagborebob5869
    @jeffallenmbaagborebob5869 3 роки тому

    You are the best man ....Thanks very much.

  • @AlphaGodzilla1
    @AlphaGodzilla1 3 роки тому

    I am so close enough to finish your videos and then I'm going to hop into your Machine Learning and Data Science Projects... 😊😊😊😊😊😊😊😊😊😊😊

  • @SupahBro535
    @SupahBro535 3 роки тому

    THIS IS AMAZING THANK YOU!

  • @chukwuemekafidelis3512
    @chukwuemekafidelis3512 Рік тому

    Thank you so much. Very helpful.

  • @samcomnotification8713
    @samcomnotification8713 4 роки тому

    Great explanation, Salute for this knowledge sharing and great tutorial

  • @navjotsingh-hl1jg
    @navjotsingh-hl1jg 6 місяців тому

    love your teaching pattern sir

  • @MNCAMANI15
    @MNCAMANI15 3 роки тому

    So simple. You're a good teacher

  • @alekhakumarambati9315
    @alekhakumarambati9315 4 роки тому

    pretty good explanation and you have made my concept clear!!!!!!!

    • @codebasics
      @codebasics  4 роки тому

      Alekha I am glad you liked it 😊

  • @milanms4593
    @milanms4593 3 роки тому

    Now i understand this concept. Thank you sir😃

    • @codebasics
      @codebasics  3 роки тому

      I am happy this was helpful to you.

  • @sarangabbasi2560
    @sarangabbasi2560 2 роки тому +2

    best explanation... i like the way u give examples using small data to explain how it actually works. 10:20
    no one explains like this... keep doing great work

    • @codebasics
      @codebasics  2 роки тому

      Glad you liked it

    • @strongsyedaa7378
      @strongsyedaa7378 2 роки тому

      @@codebasics
      I have applied K fold on the linear regression's dataset
      I used different activation functions & then I get mean & se values
      How to pick the best model from the k folds?

  • @gezahagnnegash9740
    @gezahagnnegash9740 2 роки тому

    Thanks for sharing and really it's helpful

  • @carpingnyland8518
    @carpingnyland8518 2 роки тому +6

    Great video, as usual. Quick question: How were able to get such low scores for svm? I ran it a couple of times and was getting in the upper 90's. So, I set up a for loop, ran 1000 different train_test_split iterations through svm and recorded the lowest score. It came back 97.2%!

  • @vardhanvishnu618
    @vardhanvishnu618 4 роки тому

    Thank you very much for your class. Its very useful for the beginners.

    • @codebasics
      @codebasics  4 роки тому

      I am happy you liked it Vishnu :)

  • @ignaciozamanillo9659
    @ignaciozamanillo9659 3 роки тому +1

    Thanks for the video! I have a question, when you do the cross validation inside the for loop you use the same folds for all the methods. Does the cross_val_score do the same? If not, it is posible to use the same folds in order to get a more accurate comparison.
    Thanks in advance