ROC Curves and Area Under the Curve (AUC) Explained

Поділитися
Вставка
  • Опубліковано 28 вер 2024
  • An ROC curve is the most commonly used way to visualize the performance of a binary classifier, and AUC is (arguably) the best way to summarize its performance in a single number. As such, gaining a deep understanding of ROC curves and AUC is beneficial for data scientists, machine learning practitioners, and medical researchers (among others).
    SUBSCRIBE to learn data science with Python:
    www.youtube.co...
    JOIN the "Data School Insiders" community and receive exclusive rewards:
    / dataschool
    RESOURCES:
    Transcript and screenshots: www.dataschool...
    Visualization: www.navan.name/...
    Research paper: people.inf.elte...
    LET'S CONNECT!
    Newsletter: www.dataschool...
    Twitter: / justmarkham
    Facebook: / datascienceschool
    LinkedIn: / justmarkham

КОМЕНТАРІ • 642

  • @kowloon5731
    @kowloon5731 8 років тому +220

    excellent explanation, the best that I have seen so far.

    • @dataschool
      @dataschool  8 років тому +1

      Thank you!

    • @chrisolivier6415
      @chrisolivier6415 8 років тому +5

      Indeed, agreed 100% with ed lee, definitely the best I explanation I have seen, much appreciated

    • @dataschool
      @dataschool  8 років тому +8

      You're very welcome!

    • @eduardojreis
      @eduardojreis 6 років тому

      I was about to type the same comment! Amazing explanation! Thank you for your contribution!

    • @iavorbotev1569
      @iavorbotev1569 6 років тому

      100% agree!!! thanks for the video

  • @prateeksingh812
    @prateeksingh812 3 роки тому +1

    I have never seen an explanation of ROC-AUC better than this...thank you so much

  • @twofour1969
    @twofour1969 6 років тому +3

    Excellent work! Thanks very much Kevin, your video explaining ROC and AUC is the most intuitive one I have ever seen. Before watching this, it was still a little confusing for me , now I have a clear understanding of ROC and AUC.

  • @nkt1796
    @nkt1796 7 років тому

    This is the best video on ROC and AUC that I have seen on UA-cam. Great work Data School!

    • @dataschool
      @dataschool  7 років тому

      Awesome! Thanks for your kind comment :)

  • @krish2nasa
    @krish2nasa 5 років тому +3

    A crisp and clear explanation, Thank you very much.

  • @phector2004
    @phector2004 8 років тому

    after 10 minutes of scratching my head looking at a dozen unlabeled lecture slides, I found this video. Thanks a lot for the clear explanation!
    I also now understand why an AUROCC of 0 would be a horrible / "excellent but mislabeled" test

    • @dataschool
      @dataschool  8 років тому

      +phector2004 You're very welcome, glad to hear the video was helpful to you!

  • @NB19273
    @NB19273 5 років тому +1

    amazing explanation the amount of information you fit into 14 minutes is magical.

    • @dataschool
      @dataschool  5 років тому

      Wow! Thank you so much for your kind words! :)

  • @AstroBhavi
    @AstroBhavi 5 років тому +1

    great visualisation and explanation, made everything so much easier to understand

    • @dataschool
      @dataschool  5 років тому

      Awesome! Glad it was helpful to you!

  • @dustinarmstrong7929
    @dustinarmstrong7929 2 роки тому

    Excellent content. This is by far the most concise, clear explanation I have found yet. Thanks!

  • @garriedaden4168
    @garriedaden4168 8 місяців тому

    Many thanks for this excellent video. You have a great gift for lucidly explaining complex concepts

    • @dataschool
      @dataschool  6 місяців тому

      Thank you so much! 🙌

  • @paulm6084
    @paulm6084 4 роки тому +1

    Nice job. Very well explained!

  • @sidharthc5546
    @sidharthc5546 5 років тому

    Went through a couple of videos, this by far is the best explanation with the most apt visualization to support it. Bookmarking it as a reference material for the future in case I get muddled up(which I'm pretty sure I will)

  • @eyeojo
    @eyeojo 8 років тому

    Just fabulous - crystal clear explanation to something I had never really understood. Thank you!

    • @dataschool
      @dataschool  8 років тому

      Wow, thanks for your kind words! You are very welcome!

  • @manueldominguez9157
    @manueldominguez9157 3 роки тому

    So clear and easy to understand. Thank you

  • @llaaoopp
    @llaaoopp 7 років тому

    Great explanation! I've been struggeling with these for some time now. Apperantly, all it took was a good visualisation! Thanks a lot!

  • @kabirkohli4269
    @kabirkohli4269 3 роки тому +1

    Absolutely amazing and intuitive explanation. Thanks a lot

  • @jairia2
    @jairia2 8 років тому

    Very clear and easy to understand! Thanks!

  • @crazy4potatos
    @crazy4potatos 3 роки тому +1

    Thank you so much. Truly. You are so appreciated.

  • @AnthonyO980
    @AnthonyO980 7 років тому

    Detailed, simplisticT, and with great scenarios. Thank you very much for this!!!

    • @dataschool
      @dataschool  7 років тому +1

      You're very welcome! Glad to hear it was helpful to you!

  • @terigopula
    @terigopula 6 років тому

    Sometimes "less is better".
    Crystal clear.. thanks :)

  • @Alex-cn9ot
    @Alex-cn9ot 8 років тому

    Very nice practical example of the roc, let me a clear idea of how I can check my classifier performance, thank you!

    • @dataschool
      @dataschool  8 років тому

      +Alex B. You're very welcome!

  • @abokbeer
    @abokbeer 8 років тому

    Great explanation. Thanks. People like u make the world a better place

    • @dataschool
      @dataschool  8 років тому

      +Mohamed Ghoneim Wow, thank you! I'm glad it was helpful to you!

  • @mikesuri4210
    @mikesuri4210 6 років тому

    Thank you kind sir. U come to aid during dark times.

  • @prudvim3513
    @prudvim3513 6 років тому

    Best explanation I've seen for this topic. Many thanks!

    • @dataschool
      @dataschool  6 років тому

      Thanks for your kind words!

  • @andikacsui
    @andikacsui 9 років тому

    Nice way to explain ROC. Thanks very much :)

    • @dataschool
      @dataschool  9 років тому

      +Andika Yudha Utomo You're very welcome!

  • @shre81
    @shre81 8 років тому

    Super stuff. ROC finally explained the way it should be.

    • @dataschool
      @dataschool  8 років тому

      +kumtomtum Thanks, I appreciate the compliment!

  • @valeriaperez-cong9858
    @valeriaperez-cong9858 5 років тому

    I've been for an explanation like this one for months! Thank you!!

  • @GhaatakPhysics
    @GhaatakPhysics 7 років тому

    This explanation provides aesthetic pleasure to me

  • @samarjithsathyanarayan1576
    @samarjithsathyanarayan1576 6 років тому

    Excellent explanation!! Very helpful, thank you!

  • @hameddadgour
    @hameddadgour 2 роки тому +1

    Great content! Thank you.

  • @adithyapvarma8738
    @adithyapvarma8738 18 днів тому

    you are a legend, brother!

  • @miles611
    @miles611 4 роки тому +1

    Thanks!
    Awesome video

  • @rameshmaddali6208
    @rameshmaddali6208 8 років тому

    Thanks a lot - you make my neurons spike again - :)

  • @capuleto126
    @capuleto126 6 років тому

    Very well explained, thanks!

  • @tianboqiu840
    @tianboqiu840 5 років тому

    finally understand AUC and ROC, excellent!

  • @rosszhu1660
    @rosszhu1660 5 років тому +1

    Excellent explanation of RoC. However I am still struggling to understand what AoC actually means. It looks like it stands for: If you randomly choose a red point, and randomly choose a blue point, then AoC is the possibility that red point is ranked ahead of blue point. Is it correct?

  • @fusbertagaudreau4334
    @fusbertagaudreau4334 2 роки тому

    The auc metric matters most in the ordering of the probabilities rather than the value.

  • @Leonustice22
    @Leonustice22 8 років тому +4

    Awesome explanation ever seen!

  • @dennislixin
    @dennislixin 5 років тому +1

    thanks. very clear explanation.

  • @juliaanethmbalilaki1760
    @juliaanethmbalilaki1760 7 років тому

    Very well explained! The best explaination one.

  • @viniciusmoreira2374
    @viniciusmoreira2374 8 років тому

    Excellent explanation.

    • @dataschool
      @dataschool  8 років тому

      +Vinícius Moreira Thanks!

  • @Ewerlopes
    @Ewerlopes 9 років тому

    Wonderful tutorial, Thank you very much!

    • @dataschool
      @dataschool  9 років тому

      You're welcome! I'm glad it was helpful to you and appreciate the compliment! Are you currently studying machine learning or another field that uses ROC curves?

    • @Ewerlopes
      @Ewerlopes 9 років тому

      Ow, thanks for your attention on replying my comment. Yes, I'm currently studying a lot of machine learning and artificial intelligence for the purposes of my master's degree. I definitely like ML and AI... Keep going with those awesome tutorials! Very, very clear and helpful! Best regards from Brazil.

    • @dataschool
      @dataschool  9 років тому

      Ewerlopes Great to hear! Many more tutorials to come :)

  • @Zoronoa01
    @Zoronoa01 3 роки тому

    Very insightful thank you!

  • @alirezakhamesipour4858
    @alirezakhamesipour4858 8 років тому

    Amazing Video, Thank you very much

    • @dataschool
      @dataschool  8 років тому

      +Alireza Khamesipour You're welcome!

  • @ehsanjeyhani6607
    @ehsanjeyhani6607 7 років тому

    this is the best explanation. wow. you are awesome

    • @dataschool
      @dataschool  7 років тому

      Thanks so much for your kind comment!

  • @Aj_Bhagz
    @Aj_Bhagz 9 років тому

    Thank you. I finally understand this topic!

  • @Brickkzz
    @Brickkzz 3 роки тому

    Excellent reference material, very well explained! Thanks! How do you choose a classification threshold in logistic regression in sci-kit learn?

  • @Danielabit
    @Danielabit 5 років тому

    Amazing explanation! Thank you

  • @laeeqahmed1980
    @laeeqahmed1980 8 років тому

    Great Presentation. What tool you are using for the presentation?

    • @dataschool
      @dataschool  8 років тому +1

      +Laeeq Ahmed I used Camtasia Recorder for the screen capture, and did all of the editing in Camtasia Studio.

  • @louie-5749
    @louie-5749 5 років тому

    Great explanation. Just one question: it was mentioned that Logistic Regression provides prediction probabilities (predict_proba) as with Naive Bayes. Is this what distinguishes them as generative models (vs. discriminative models)?

    • @dataschool
      @dataschool  5 років тому

      That's a great question! I don't think so, but to be honest, I've never 100% understood the terms generative and discriminative.

  • @fernandojackson7207
    @fernandojackson7207 4 роки тому

    Excellent job, Dataschool, upvoted. But how do you plot the curve _for all_ thresholds? Do you use, e.g., that the curve is concave, above the line y=x, etc., to extend from a few values to the whole curve? Also, is there a way of having an explicit formula for the ROC curve, e.g., f(x)=x^3+x-1 (made up)? I mean, this is not even a function in the strict sense since for one input(thereshold) you get two outputs.

    • @dataschool
      @dataschool  4 роки тому

      ROC is the curve for all possible thresholds! No, there is no way to create a formula for an ROC curve.

  • @youngjinkim5427
    @youngjinkim5427 9 років тому

    this was really clear. thank you

    • @dataschool
      @dataschool  9 років тому

      +Young Jin Kim You're very welcome!

  • @mebratutamir1385
    @mebratutamir1385 2 роки тому

    Dear instructor, how are you? I am doing a diagnostic accuracy study and proposed to use ROC , would send a document related to this pleases? I need to understand this thing

  • @davidfield5295
    @davidfield5295 8 років тому

    Great explanation

  • @tesla.8410
    @tesla.8410 8 років тому

    Can the ROC Curve cross over the diagonal line? What would this mean? When would this happen? Thanks for the awesome video!

    • @dataschool
      @dataschool  8 років тому

      Yes, it can cross over the diagonal line. That would mean that your classifier is doing worse than random guessing. This could happen if you build a model that doesn't have any informative features. Or, it could happen if you make a coding error and reverse your response values. Hope that helps, and thanks for your kind words!

  • @mokhadra9289
    @mokhadra9289 8 років тому

    What can I say!!! another great video …Thank you so much :)

  • @shivamkejriwal
    @shivamkejriwal 9 років тому

    Great explanation. Thanx.

    • @dataschool
      @dataschool  9 років тому

      +shivam kejriwal You're welcome! Happy to help :)

  • @Demonithese
    @Demonithese 8 років тому

    Thanks! Very well explained.

    • @dataschool
      @dataschool  8 років тому

      Thanks for your kind comment!

  • @alaamohamed7325
    @alaamohamed7325 7 років тому

    Thank you for your invaluable explanation. I wonder if I can use the ROC curve to see if for example a test with a dichotomous distribution (yes vs no) is performing well compared to the real outcome!
    please advise!

    • @dataschool
      @dataschool  7 років тому

      Yes, it seems like an ROC curve would be good for that purpose, if I'm understanding you correctly. Hope that helps!

  • @karldall3357
    @karldall3357 6 років тому

    Great video, thank you! One question: Let´s consider our dataset consists of further 500 paper. Those papers are of really bad quality. So the model predicts a probability for admission between 0-0.01 for those paper. So how does the AUC value will change then? From my point of view, it is easy for the model to predict that those papers will be not be admitted, because they are of very bad quality. So I was wondering, how this will affect the AUC value. Will it increase? Or in other words, is there a relationship between the prediction performance of the model (AUC value) and the property of the dataset (many samples which are "easy" to predict?)

    • @dataschool
      @dataschool  6 років тому

      I think the short answer is that yes, when a machine learning problem is "easier" (for any reason), you are likely to do better with your evaluation metric (AUC in this case).

  • @susovan97
    @susovan97 7 років тому

    Great video Kevin, thanks! But I've question: it seems the way you choose the threshold (and not what the threshold should actually be) us dependent on the prediction probabilities such as log reg. If so, how can we produce the ROC curve say for SVM, which has seemingly no probabilistic interpretation, in the sense that it doesn't tell us what the probability of an observation lying within a class is. How do we produce ROC then? Also, what does AUC signify for performance?

    • @dataschool
      @dataschool  7 років тому +1

      An ROC curve can be created regardless of whether your predicted probabilities are well-calibrated - all that is required is that your model can output predicted probabilities. Hope that helps!

    • @susovan97
      @susovan97 7 років тому

      @Data School: thanks Kevin, it does. But my follow up question ten would be: how can we predict these probabilities in case of SVM or LDA? Thanks again!

  • @swdefrgtaqq
    @swdefrgtaqq 6 років тому

    Thank you, this helped me a lot!

  • @ramansharma8209
    @ramansharma8209 9 років тому

    great job dude

  • @alisultan7978
    @alisultan7978 9 років тому

    Wonderful, it helped me a lot.

    • @dataschool
      @dataschool  9 років тому

      Ali Sultan Great to hear!

  • @harishp6611
    @harishp6611 5 років тому

    great sir keep making more videos ...

  • @linzhu5178
    @linzhu5178 2 роки тому

    Can we say classifier is bad because of the smaller AUC? or it`s because the validation set is bad? How do we have a validation set of such an overlap?

    • @dataschool
      @dataschool  2 роки тому

      Whether or not AUC is the appropriate evaluation metric depends on the objective of your model. This page might help you to decide: github.com/justmarkham/DAT8/blob/master/other/model_evaluation_comparison.md
      Hope that helps!

  • @shirleyliao9867
    @shirleyliao9867 7 років тому +3

    I like your video! Thank you:)

  • @CHIRAGPATELthelifesailor
    @CHIRAGPATELthelifesailor 7 років тому

    Thanks a lot for the video. One query I have, Is it possible to plot the ROC for Continuous Datasets, also?

    • @dataschool
      @dataschool  7 років тому +1

      ROC curves can only be used for classification problems, meaning ones in which the target value is categorical. However, it doesn't matter whether the training data is made of categorical or continuous data. Hope that helps!

    • @CHIRAGPATELthelifesailor
      @CHIRAGPATELthelifesailor 7 років тому

      Data School Thanks ... this helps ... I have also come across few papers on VUC... there was one who have written about a 3D ROC Curve...

  • @burobernd
    @burobernd 7 років тому

    excellent, thank you

  • @nadirkamelbenamara2837
    @nadirkamelbenamara2837 7 років тому

    Hi, Thanks you , great explanation, i have a questions , can i plot a roc curve for multiclass classification problem ? should i use one vs all or one vs one ? i have a dataset of 50 differents labels for classification

    • @dataschool
      @dataschool  7 років тому

      With that many classes, I would use one versus all!

  • @BhanutejaAryasomayajula
    @BhanutejaAryasomayajula 5 років тому

    Great video!! Here is a small suggestion. The colour coding for the positive & negative classes in the "will the paper be accepted?" should be reversed, shouldn't it?

    • @dataschool
      @dataschool  5 років тому

      Glad you liked the video! I'm not sure why the color coding should be reversed?

  • @sebastiancaja5707
    @sebastiancaja5707 7 років тому

    Great video thanks! What do you think is a good metric when you have very unbalance set beside roc ... I always use F1 score but in this case I haven't being able to get a better score than 0.58 ... what do you recommend ?

    • @dataschool
      @dataschool  7 років тому

      I think Matthews correlation coefficient (MCC) is also useful in the case of unbalanced datasets: scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html

  • @alexyu8166
    @alexyu8166 8 років тому

    Thanks for the video, it helps me a lot in learning all these stuffs.
    I have a question about choosing the better models:
    --- Say we have Model A & B with Model A having an overall larger AUC. Is it possible that at a certain threshold level we chose, Model B will in fact do a better job in predicting the results than Model A? If it is possible, then comparing the AUC alone seems not very decisive in choosing models for prediction??

    • @dataschool
      @dataschool  8 років тому +1

      +Alex Yu Great question! It is true that when choosing between models, model B may do a better job meeting your needs even if model A has a higher AUC. Though, it's not really relevant whether model A or model B is better at a specific (shared) threshold, rather the question is how well each model performs at its "best" threshold, with "best" being defined by your needs.

    • @alexyu8166
      @alexyu8166 8 років тому

      Data School For example, in a medical test setting where Type 1 Error has much severe consequences than Type 2 error,,, I would adopt a low t-value, say, 0.3, to avoid Type 1 Error.
      So you mean if Model B at t = 0.3 performs better (higher accuracy in predicting the Training or the Testing set) than Model A, then I should use Model B, even it has a smaller AUC???

    • @dataschool
      @dataschool  8 років тому

      You shouldn't choose a threshold independent of a model. Rather, you should look at the ROC curves for both models, and then choose the combination of model and threshold that best meets your needs in terms of sensitivity and specificity.

    • @alexyu8166
      @alexyu8166 8 років тому

      ic! Thanks for your explanation!

  • @Mark-ly5mh
    @Mark-ly5mh 6 років тому

    Thank you very much :)

  • @xiaocui1917
    @xiaocui1917 8 років тому

    thank you!

    • @dataschool
      @dataschool  8 років тому

      +Xiao Cui You're welcome!

  • @ccerrato147
    @ccerrato147 8 років тому

    Thanks for the vid!

  • @shiyuwang
    @shiyuwang 7 років тому

    awesome! thanks for sharing!

  • @durairajarjunan9740
    @durairajarjunan9740 Рік тому

    Thanks

  • @AhmadKaako
    @AhmadKaako 8 років тому

    Great. Thanks!

    • @dataschool
      @dataschool  8 років тому

      +Ahmad Kaako You're welcome!

  • @sawanrai6424
    @sawanrai6424 8 років тому

    thanks a lot sir ji

    • @dataschool
      @dataschool  8 років тому

      +Sawan Rai You're welcome!

  • @amnakhan8516
    @amnakhan8516 5 років тому

    good one!

  • @junxikang1583
    @junxikang1583 6 років тому

    very good. thanks.

  • @patrickcavins3231
    @patrickcavins3231 8 років тому

    Maybe you can help direct me to where I can find something about calculating the area under the curve of like a chromatograph in R - I am still a bit confuser on how to do that.

    • @dataschool
      @dataschool  8 років тому

      +Patrick Cavins I'm sorry, I'm not familiar with a chromatograph or the curve it produces. Good luck!

  • @rajuk4773
    @rajuk4773 9 років тому

    Thank you.

  • @tomash9785
    @tomash9785 8 років тому

    great job

  • @jongcheulkim7284
    @jongcheulkim7284 2 роки тому

    Thank you. ^^

  • @ashwaq-1390
    @ashwaq-1390 5 років тому

    The reason why we are here is دكتور جمال 🌚👋🏻

  • @pareshmotiwala
    @pareshmotiwala 8 років тому

    can anybody explain the limitations of an ROC curve in general terms?

    • @dataschool
      @dataschool  8 років тому

      One limitation is that it can only be used with binary classification problems. Does that help to answer your question?

    • @pareshmotiwala
      @pareshmotiwala 8 років тому

      yes sir.

  • @MinhTran-ew3on
    @MinhTran-ew3on 5 років тому

    nice

  • @nickbohl2555
    @nickbohl2555 5 років тому +6

    I need to watch this a few more times to understand how it applies to my use-case, but this is a great overall explanation. Thank you for this!

  • @MarzukiSondoss
    @MarzukiSondoss 7 років тому

    great explanation ! thank u

  • @changli4046
    @changli4046 5 років тому

    Thank you sir!

  • @ahmedovahmed
    @ahmedovahmed 7 років тому

    great explanation, thank you!

  • @bhaumikanirban
    @bhaumikanirban 6 років тому +5

    undoubtedly one of the best explanation of ROC curve!!

  • @metaprog46and2
    @metaprog46and2 3 роки тому +3

    Likely the best explanation I've seen on ROC & AUC curves. Succinct yet thorough. The visualizations were extremely helpful. Nicely done.

    • @dataschool
      @dataschool  3 роки тому +2

      Thank you so much for your kind and thoughtful comment! 🙏

  • @Tedworth307
    @Tedworth307 7 років тому +3

    Thank you so much for this video. Your logical, cumulative explanation and clear visuals have made the rationale for using ROC curves and AUC far easier to understand. I'll be subscribing to your channel immediately!

    • @dataschool
      @dataschool  7 років тому

      Wow, thanks for your very kind comment, and for subscribing! Glad the video was helpful to you :)

  • @tuckieteddy
    @tuckieteddy 8 років тому +9

    Very good explanation!

  • @quirkyquester
    @quirkyquester 20 днів тому +1

    amazing video, thank you so much!

  • @zennicliffzennicliff
    @zennicliffzennicliff 5 років тому +2

    Excellent! I am addicted to watching your vids. Thank you for the amazing work! Could you make some vids on using Tensorflow please? Cheers!

    • @dataschool
      @dataschool  5 років тому

      Thanks for your suggestion, and for your kind words! 👍

  • @karimnasser7368
    @karimnasser7368 8 років тому +4

    Just to confirm. At 7:09, the 235 and 125 used as numerators were an estimate. If not, how to you generate those values?

    • @dataschool
      @dataschool  8 років тому +2

      +Karim Nasser That's correct, those were estimates only.