This workshop is an absolute goldmine! Everything is very clear. But I have the same question as Matt: Is there ever a situation where we wouldn't want to calibrate a model?
There are some scenarios where you might make a decision based on a threshold that doesn't depend on a probability. For example, in fraud detection you might decide to flag the 10 riskiest transactions each day. Or you might choose a threshold based on precision / recall considerations. So it may not always be necessary. And calibrating very low probabilities (like 1 in 10,000 or 1 in 100,000) can be very difficult. Hope that helps clarify!
Thank you for the workshop, I looked through the comments for the reason why we would not calibrate a model and you say that that can happen for example in fraud detection when we want to flag the 10 riskiest transactions each day. But If I want all transactions with probability above 0.8 I would need calibration, correct? Thank you for your answer
Right, so my point was that if you don't need the actual probabilities it may not be worth the trouble to calibrate. But if you want to make a decision that depends on the probability (such as you describe), than that would be a reason to calibrate
Many thanks for your video. A question on how to calculate the predicted confidence. Are we looking at the softmax score of the predicted label to get the predicted probability (conf)? For example, if I have three classes (cat, tiger, and dog) and I feed my model with a cat image but our model predicts it as a dog with 0.8 softmax score and a cat with 0.2 softmax score. Which softmax score do I use to assign the example to a specific beam and calculate the average confidence of that particular beam? Thank you!
OK, so the Coarsage algorithm outputs a vector of 17 numbers for each case. These numbers represent a probability distribution across 17 "classes", the score being 0, 1, 2, 3, ... , up 16 (or more). So to get the probability of, say, exactly 3, you would look at the 4th number (i.e. Python index 3)
Many thanks for your video. A question: can I first find a model's best hyperparams using RandomizedSearchCV, then create a new model with those hyperparams, without fitting it, and use it for probability calibration? Are the hyperparams found with RandomizedSearchCV still valid if I do this?
Hi - thanks for the question. I'm not sure what you mean by "creating a new model with those hyperparams, *without fitting it*..." You can't do much with a model if it is not fit. But I'm probably missing something. I've got a discord server called "numeristical" (just getting it started) but a question like this would be perfect for that venue. If you could join and post your question there, we can have a longer discussion. Here's a link to join: discord.gg/HagqzZa8 (will expire in 7 days). Thanks!
@@numeristical thanks for your reply. I was told that there are two ways to use CalibratedClassifierCV: 1- model = XGBClassifier model.fit(X_train, y_train) Creating the CalibratingClassifierCV, with cv='prefit', and fitting it with a validation set. 2- model = XGBClassifier Creating the CalibratedClassifierCV, with cv=5 for example, and fitting it with the training set. In the second case can I use the hyperparams found with RandomizedSearchCV?
Hmmm - I am not too familiar with the sklearn CalibratedClassifierCV. I remember it was unnecessarily complicated to use (looks like you are finding that as well). Instead, I would just fit your model normally (using whatever hyperparameter search works best), and then calibrate it with one of the methods I illustrate in this lesson. You can use the code in the notebook as a template. Hope this helps!
A very great explanation. Thank you very much sir. Sir I have inquiry regarding probability calibration. I've read that we can find a transition probabilities between state by using calibration technique, however I am not understand how it works. I would like to know about this technique. I would be really appreciate if you can assist me on this mater. Thank you again.
Thanks for your message. Calibration is used to "correct" probabilities when you have data. So if you have transition probabilities and then actual data, you could potentially used that as a calibration set.
In one of your examples with an imbalanced data set you make a bigger bin toward the right hand side of the reliability curve to account for fewer observations. The bin average might show as calibrated even though the individual predictions within that bin might be all over the place. How can one conclude in that case that that bin is well calibrated? In that case any given prediction may be far off the average which would suggest that it isn't well calibrated in that range. Am I looking at this incorrectly? Thanks in advance.
Right, so this is the fundamental problem with binning - you are averaging the results of predictions with different probabilities. The wider the bin, the more granularity you are losing. So you're right - if you don't have a lot of example of predictions with a particular value (or in a range of values), you can't really conclude that the bin is well-calibrated. To make an analog to hypothesis testing, the best you can say is that you "fail to reject the null hypothesis that the probabilities in the bin are well-calibrated" but your test will not have much power.
There are some scenarios where you might make a decision based on a threshold that doesn't depend on a probability. For example, in fraud detection you might decide to flag the 10 riskiest transactions each day. Or you might choose a threshold based on precision / recall considerations. So it may not always be necessary. And calibrating very low probabilities (like 1 in 10,000 or 1 in 100,000) can be very difficult. Hope that helps clarify!
This workshop is an absolute goldmine! Everything is very clear. But I have the same question as Matt: Is there ever a situation where we wouldn't want to calibrate a model?
There are some scenarios where you might make a decision based on a threshold that doesn't depend on a probability. For example, in fraud detection you might decide to flag the 10 riskiest transactions each day. Or you might choose a threshold based on precision / recall considerations. So it may not always be necessary. And calibrating very low probabilities (like 1 in 10,000 or 1 in 100,000) can be very difficult. Hope that helps clarify!
This playlist is fantastic! Very useful information! :)
Glad it was helpful!
Great video, thanks very much
Glad you liked it!
Thank you for the workshop, I looked through the comments for the reason why we would not calibrate a model and you say that that can happen for example in fraud detection when we want to flag the 10 riskiest transactions each day. But If I want all transactions with probability above 0.8 I would need calibration, correct? Thank you for your answer
Right, so my point was that if you don't need the actual probabilities it may not be worth the trouble to calibrate. But if you want to make a decision that depends on the probability (such as you describe), than that would be a reason to calibrate
Many thanks for your video. A question on how to calculate the predicted confidence. Are we looking at the softmax score of the predicted label to get the predicted probability (conf)? For example, if I have three classes (cat, tiger, and dog) and I feed my model with a cat image but our model predicts it as a dog with 0.8 softmax score and a cat with 0.2 softmax score. Which softmax score do I use to assign the example to a specific beam and calculate the average confidence of that particular beam? Thank you!
OK, so the Coarsage algorithm outputs a vector of 17 numbers for each case. These numbers represent a probability distribution across 17 "classes", the score being 0, 1, 2, 3, ... , up 16 (or more). So to get the probability of, say, exactly 3, you would look at the 4th number (i.e. Python index 3)
Many thanks for your video. A question: can I first find a model's best hyperparams using RandomizedSearchCV, then create a new model with those hyperparams, without fitting it, and use it for probability calibration? Are the hyperparams found with RandomizedSearchCV still valid if I do this?
Hi - thanks for the question. I'm not sure what you mean by "creating a new model with those hyperparams, *without fitting it*..." You can't do much with a model if it is not fit. But I'm probably missing something.
I've got a discord server called "numeristical" (just getting it started) but a question like this would be perfect for that venue. If you could join and post your question there, we can have a longer discussion. Here's a link to join: discord.gg/HagqzZa8 (will expire in 7 days). Thanks!
@@numeristical thanks for your reply. I was told that there are two ways to use CalibratedClassifierCV:
1- model = XGBClassifier
model.fit(X_train, y_train)
Creating the CalibratingClassifierCV, with cv='prefit', and fitting it with a validation set.
2- model = XGBClassifier
Creating the CalibratedClassifierCV, with cv=5 for example, and fitting it with the training set.
In the second case can I use the hyperparams found with RandomizedSearchCV?
Hmmm - I am not too familiar with the sklearn CalibratedClassifierCV. I remember it was unnecessarily complicated to use (looks like you are finding that as well). Instead, I would just fit your model normally (using whatever hyperparameter search works best), and then calibrate it with one of the methods I illustrate in this lesson. You can use the code in the notebook as a template. Hope this helps!
@@numeristical ok, many thanks.
What algorithms are better candidates to predict probabilities for binary outputs in multivariate complex models?
Gradient Boosting is still my "go-to" all purpose algorithm for any kind of structured data.
A very great explanation. Thank you very much sir. Sir I have inquiry regarding probability calibration. I've read that we can find a transition probabilities between state by using calibration technique, however I am not understand how it works. I would like to know about this technique. I would be really appreciate if you can assist me on this mater. Thank you again.
Thanks for your message. Calibration is used to "correct" probabilities when you have data. So if you have transition probabilities and then actual data, you could potentially used that as a calibration set.
is it can be applied for realtime data such as IOT sensors? i have difficulties to know the equations to get to coding sensors
The source of the data doesn't matter, just as long as you have scores that you want to calibrate and an appropriate calibration set.
I can't run the code %matplotlib inline
error: invalid syntax. Can u help me to fix it
It's only used if you are in a jupyter notebook. You can just comment it out if it is giving you trouble.
@@numeristical thank you so much, i forgot to open jupyter notebook in vs code
In one of your examples with an imbalanced data set you make a bigger bin toward the right hand side of the reliability curve to account for fewer observations. The bin average might show as calibrated even though the individual predictions within that bin might be all over the place. How can one conclude in that case that that bin is well calibrated? In that case any given prediction may be far off the average which would suggest that it isn't well calibrated in that range. Am I looking at this incorrectly? Thanks in advance.
Right, so this is the fundamental problem with binning - you are averaging the results of predictions with different probabilities. The wider the bin, the more granularity you are losing. So you're right - if you don't have a lot of example of predictions with a particular value (or in a range of values), you can't really conclude that the bin is well-calibrated. To make an analog to hypothesis testing, the best you can say is that you "fail to reject the null hypothesis that the probabilities in the bin are well-calibrated" but your test will not have much power.
Why wouldn't you ALWAYS calibrate probabilities for models?
There are some scenarios where you might make a decision based on a threshold that doesn't depend on a probability. For example, in fraud detection you might decide to flag the 10 riskiest transactions each day. Or you might choose a threshold based on precision / recall considerations. So it may not always be necessary. And calibrating very low probabilities (like 1 in 10,000 or 1 in 100,000) can be very difficult. Hope that helps clarify!
@@numeristical Yes very helpful. Thank you.