ROC Curves and Area Under the Curve (AUC) Explained
Вставка
- Опубліковано 28 вер 2024
- An ROC curve is the most commonly used way to visualize the performance of a binary classifier, and AUC is (arguably) the best way to summarize its performance in a single number. As such, gaining a deep understanding of ROC curves and AUC is beneficial for data scientists, machine learning practitioners, and medical researchers (among others).
SUBSCRIBE to learn data science with Python:
www.youtube.co...
JOIN the "Data School Insiders" community and receive exclusive rewards:
/ dataschool
RESOURCES:
Transcript and screenshots: www.dataschool...
Visualization: www.navan.name/...
Research paper: people.inf.elte...
LET'S CONNECT!
Newsletter: www.dataschool...
Twitter: / justmarkham
Facebook: / datascienceschool
LinkedIn: / justmarkham
excellent explanation, the best that I have seen so far.
Thank you!
Indeed, agreed 100% with ed lee, definitely the best I explanation I have seen, much appreciated
You're very welcome!
I was about to type the same comment! Amazing explanation! Thank you for your contribution!
100% agree!!! thanks for the video
I have never seen an explanation of ROC-AUC better than this...thank you so much
Thank you so much! 🙏
Excellent work! Thanks very much Kevin, your video explaining ROC and AUC is the most intuitive one I have ever seen. Before watching this, it was still a little confusing for me , now I have a clear understanding of ROC and AUC.
Great to hear! :)
This is the best video on ROC and AUC that I have seen on UA-cam. Great work Data School!
Awesome! Thanks for your kind comment :)
A crisp and clear explanation, Thank you very much.
You're welcome!
after 10 minutes of scratching my head looking at a dozen unlabeled lecture slides, I found this video. Thanks a lot for the clear explanation!
I also now understand why an AUROCC of 0 would be a horrible / "excellent but mislabeled" test
+phector2004 You're very welcome, glad to hear the video was helpful to you!
amazing explanation the amount of information you fit into 14 minutes is magical.
Wow! Thank you so much for your kind words! :)
great visualisation and explanation, made everything so much easier to understand
Awesome! Glad it was helpful to you!
Excellent content. This is by far the most concise, clear explanation I have found yet. Thanks!
Thank you!
Many thanks for this excellent video. You have a great gift for lucidly explaining complex concepts
Thank you so much! 🙌
Nice job. Very well explained!
Thanks!
Went through a couple of videos, this by far is the best explanation with the most apt visualization to support it. Bookmarking it as a reference material for the future in case I get muddled up(which I'm pretty sure I will)
Great to hear! :)
Just fabulous - crystal clear explanation to something I had never really understood. Thank you!
Wow, thanks for your kind words! You are very welcome!
So clear and easy to understand. Thank you
You're very welcome!
Great explanation! I've been struggeling with these for some time now. Apperantly, all it took was a good visualisation! Thanks a lot!
You're welcome!
Absolutely amazing and intuitive explanation. Thanks a lot
Glad you liked it!
Very clear and easy to understand! Thanks!
You're welcome!
Thank you so much. Truly. You are so appreciated.
🙏
Detailed, simplisticT, and with great scenarios. Thank you very much for this!!!
You're very welcome! Glad to hear it was helpful to you!
Sometimes "less is better".
Crystal clear.. thanks :)
You're very welcome!
Very nice practical example of the roc, let me a clear idea of how I can check my classifier performance, thank you!
+Alex B. You're very welcome!
Great explanation. Thanks. People like u make the world a better place
+Mohamed Ghoneim Wow, thank you! I'm glad it was helpful to you!
Thank you kind sir. U come to aid during dark times.
You're very welcome! :)
Best explanation I've seen for this topic. Many thanks!
Thanks for your kind words!
Nice way to explain ROC. Thanks very much :)
+Andika Yudha Utomo You're very welcome!
Super stuff. ROC finally explained the way it should be.
+kumtomtum Thanks, I appreciate the compliment!
I've been for an explanation like this one for months! Thank you!!
You're very welcome!
This explanation provides aesthetic pleasure to me
Thanks! :)
Excellent explanation!! Very helpful, thank you!
You are very welcome!
Great content! Thank you.
You're welcome!
you are a legend, brother!
Thank you!
Thanks!
Awesome video
Thanks!
Thanks a lot - you make my neurons spike again - :)
Ha! Great to hear :)
Very well explained, thanks!
You're welcome!
finally understand AUC and ROC, excellent!
Great to hear!
Excellent explanation of RoC. However I am still struggling to understand what AoC actually means. It looks like it stands for: If you randomly choose a red point, and randomly choose a blue point, then AoC is the possibility that red point is ranked ahead of blue point. Is it correct?
Exactly!
The auc metric matters most in the ordering of the probabilities rather than the value.
Awesome explanation ever seen!
Thanks very much! :)
thanks. very clear explanation.
You're welcome!
Very well explained! The best explaination one.
Thank you very much!
Excellent explanation.
+Vinícius Moreira Thanks!
Wonderful tutorial, Thank you very much!
You're welcome! I'm glad it was helpful to you and appreciate the compliment! Are you currently studying machine learning or another field that uses ROC curves?
Ow, thanks for your attention on replying my comment. Yes, I'm currently studying a lot of machine learning and artificial intelligence for the purposes of my master's degree. I definitely like ML and AI... Keep going with those awesome tutorials! Very, very clear and helpful! Best regards from Brazil.
Ewerlopes Great to hear! Many more tutorials to come :)
Very insightful thank you!
You're welcome!
Amazing Video, Thank you very much
+Alireza Khamesipour You're welcome!
this is the best explanation. wow. you are awesome
Thanks so much for your kind comment!
Thank you. I finally understand this topic!
Excellent reference material, very well explained! Thanks! How do you choose a classification threshold in logistic regression in sci-kit learn?
Amazing explanation! Thank you
You're welcome!
Great Presentation. What tool you are using for the presentation?
+Laeeq Ahmed I used Camtasia Recorder for the screen capture, and did all of the editing in Camtasia Studio.
Great explanation. Just one question: it was mentioned that Logistic Regression provides prediction probabilities (predict_proba) as with Naive Bayes. Is this what distinguishes them as generative models (vs. discriminative models)?
That's a great question! I don't think so, but to be honest, I've never 100% understood the terms generative and discriminative.
Excellent job, Dataschool, upvoted. But how do you plot the curve _for all_ thresholds? Do you use, e.g., that the curve is concave, above the line y=x, etc., to extend from a few values to the whole curve? Also, is there a way of having an explicit formula for the ROC curve, e.g., f(x)=x^3+x-1 (made up)? I mean, this is not even a function in the strict sense since for one input(thereshold) you get two outputs.
ROC is the curve for all possible thresholds! No, there is no way to create a formula for an ROC curve.
this was really clear. thank you
+Young Jin Kim You're very welcome!
Dear instructor, how are you? I am doing a diagnostic accuracy study and proposed to use ROC , would send a document related to this pleases? I need to understand this thing
Great explanation
Thanks!
Can the ROC Curve cross over the diagonal line? What would this mean? When would this happen? Thanks for the awesome video!
Yes, it can cross over the diagonal line. That would mean that your classifier is doing worse than random guessing. This could happen if you build a model that doesn't have any informative features. Or, it could happen if you make a coding error and reverse your response values. Hope that helps, and thanks for your kind words!
What can I say!!! another great video …Thank you so much :)
You're very welcome!
Great explanation. Thanx.
+shivam kejriwal You're welcome! Happy to help :)
Thanks! Very well explained.
Thanks for your kind comment!
Thank you for your invaluable explanation. I wonder if I can use the ROC curve to see if for example a test with a dichotomous distribution (yes vs no) is performing well compared to the real outcome!
please advise!
Yes, it seems like an ROC curve would be good for that purpose, if I'm understanding you correctly. Hope that helps!
Great video, thank you! One question: Let´s consider our dataset consists of further 500 paper. Those papers are of really bad quality. So the model predicts a probability for admission between 0-0.01 for those paper. So how does the AUC value will change then? From my point of view, it is easy for the model to predict that those papers will be not be admitted, because they are of very bad quality. So I was wondering, how this will affect the AUC value. Will it increase? Or in other words, is there a relationship between the prediction performance of the model (AUC value) and the property of the dataset (many samples which are "easy" to predict?)
I think the short answer is that yes, when a machine learning problem is "easier" (for any reason), you are likely to do better with your evaluation metric (AUC in this case).
Great video Kevin, thanks! But I've question: it seems the way you choose the threshold (and not what the threshold should actually be) us dependent on the prediction probabilities such as log reg. If so, how can we produce the ROC curve say for SVM, which has seemingly no probabilistic interpretation, in the sense that it doesn't tell us what the probability of an observation lying within a class is. How do we produce ROC then? Also, what does AUC signify for performance?
An ROC curve can be created regardless of whether your predicted probabilities are well-calibrated - all that is required is that your model can output predicted probabilities. Hope that helps!
@Data School: thanks Kevin, it does. But my follow up question ten would be: how can we predict these probabilities in case of SVM or LDA? Thanks again!
Thank you, this helped me a lot!
You're welcome!
great job dude
Raman Sharma Thanks!
Wonderful, it helped me a lot.
Ali Sultan Great to hear!
great sir keep making more videos ...
Thanks!
Can we say classifier is bad because of the smaller AUC? or it`s because the validation set is bad? How do we have a validation set of such an overlap?
Whether or not AUC is the appropriate evaluation metric depends on the objective of your model. This page might help you to decide: github.com/justmarkham/DAT8/blob/master/other/model_evaluation_comparison.md
Hope that helps!
I like your video! Thank you:)
You're welcome!
Thanks a lot for the video. One query I have, Is it possible to plot the ROC for Continuous Datasets, also?
ROC curves can only be used for classification problems, meaning ones in which the target value is categorical. However, it doesn't matter whether the training data is made of categorical or continuous data. Hope that helps!
Data School Thanks ... this helps ... I have also come across few papers on VUC... there was one who have written about a 3D ROC Curve...
excellent, thank you
You're welcome!
Hi, Thanks you , great explanation, i have a questions , can i plot a roc curve for multiclass classification problem ? should i use one vs all or one vs one ? i have a dataset of 50 differents labels for classification
With that many classes, I would use one versus all!
Great video!! Here is a small suggestion. The colour coding for the positive & negative classes in the "will the paper be accepted?" should be reversed, shouldn't it?
Glad you liked the video! I'm not sure why the color coding should be reversed?
Great video thanks! What do you think is a good metric when you have very unbalance set beside roc ... I always use F1 score but in this case I haven't being able to get a better score than 0.58 ... what do you recommend ?
I think Matthews correlation coefficient (MCC) is also useful in the case of unbalanced datasets: scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html
Thanks for the video, it helps me a lot in learning all these stuffs.
I have a question about choosing the better models:
--- Say we have Model A & B with Model A having an overall larger AUC. Is it possible that at a certain threshold level we chose, Model B will in fact do a better job in predicting the results than Model A? If it is possible, then comparing the AUC alone seems not very decisive in choosing models for prediction??
+Alex Yu Great question! It is true that when choosing between models, model B may do a better job meeting your needs even if model A has a higher AUC. Though, it's not really relevant whether model A or model B is better at a specific (shared) threshold, rather the question is how well each model performs at its "best" threshold, with "best" being defined by your needs.
Data School For example, in a medical test setting where Type 1 Error has much severe consequences than Type 2 error,,, I would adopt a low t-value, say, 0.3, to avoid Type 1 Error.
So you mean if Model B at t = 0.3 performs better (higher accuracy in predicting the Training or the Testing set) than Model A, then I should use Model B, even it has a smaller AUC???
You shouldn't choose a threshold independent of a model. Rather, you should look at the ROC curves for both models, and then choose the combination of model and threshold that best meets your needs in terms of sensitivity and specificity.
ic! Thanks for your explanation!
Thank you very much :)
You're welcome!
thank you!
+Xiao Cui You're welcome!
Thanks for the vid!
You're welcome!
awesome! thanks for sharing!
You're very welcome!
Thanks
You're welcome!
Great. Thanks!
+Ahmad Kaako You're welcome!
thanks a lot sir ji
+Sawan Rai You're welcome!
good one!
Thanks!
very good. thanks.
You're welcome!
Maybe you can help direct me to where I can find something about calculating the area under the curve of like a chromatograph in R - I am still a bit confuser on how to do that.
+Patrick Cavins I'm sorry, I'm not familiar with a chromatograph or the curve it produces. Good luck!
Thank you.
Raju k You're welcome!
great job
+Tomas Hujo Thanks!
Thank you. ^^
You're welcome 😊
The reason why we are here is دكتور جمال 🌚👋🏻
can anybody explain the limitations of an ROC curve in general terms?
One limitation is that it can only be used with binary classification problems. Does that help to answer your question?
yes sir.
nice
Thanks!
I need to watch this a few more times to understand how it applies to my use-case, but this is a great overall explanation. Thank you for this!
You're welcome!
great explanation ! thank u
You're welcome!
Thank you sir!
You're welcome!
great explanation, thank you!
You're very welcome!
undoubtedly one of the best explanation of ROC curve!!
Thanks very much! :)
Likely the best explanation I've seen on ROC & AUC curves. Succinct yet thorough. The visualizations were extremely helpful. Nicely done.
Thank you so much for your kind and thoughtful comment! 🙏
Thank you so much for this video. Your logical, cumulative explanation and clear visuals have made the rationale for using ROC curves and AUC far easier to understand. I'll be subscribing to your channel immediately!
Wow, thanks for your very kind comment, and for subscribing! Glad the video was helpful to you :)
Very good explanation!
Thank you!
amazing video, thank you so much!
You're very welcome!
Excellent! I am addicted to watching your vids. Thank you for the amazing work! Could you make some vids on using Tensorflow please? Cheers!
Thanks for your suggestion, and for your kind words! 👍
Just to confirm. At 7:09, the 235 and 125 used as numerators were an estimate. If not, how to you generate those values?
+Karim Nasser That's correct, those were estimates only.