Hi, thanks for this video. Now I know why my classifier always predicted with such high confidence, be it correct or incorrect. Could there be something else other than temperature to solve this? I would like to determine how confident the model is in its prediction. Is temperature the way to go?
Another technique is called label smoothing. It is related but applied to ground truth labels. See - proceedings.neurips.cc/paper/2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf Also there is something model calibration but I have not yet applied them to neural networks.
Instead of using using exp function in softmax to make logits positive what if we shift the logits by least logit value [1, -2, 0] => [3, 0, 2]. This also ensures relativity between logits.
@@krp2834 The min isn't differentiable, but it's still a differentiable function at other points. But if you do that, the minimum value will be guaranteed to always have a "probability" of zero. That may not be desirable... It also will prevent you from using loss functions like KL Divergence or Cross Entropy. Also, they will not be "logits". I suggest you review the definition of logit
Great to see a new video after so many days... Will watch it afterwards... thank you Sir....
🙏
This is brilliant
🙏
Hi, thanks for this video. Now I know why my classifier always predicted with such high confidence, be it correct or incorrect. Could there be something else other than temperature to solve this? I would like to determine how confident the model is in its prediction. Is temperature the way to go?
Another technique is called label smoothing. It is related but applied to ground truth labels. See - proceedings.neurips.cc/paper/2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf
Also there is something model calibration but I have not yet applied them to neural networks.
Thanks a lot. This will come a lot in handy!
Instead of using using exp function in softmax to make logits positive what if we shift the logits by least logit value [1, -2, 0] => [3, 0, 2]. This also ensures relativity between logits.
Thanks Prasanna; forgot to mention that the transformation should be differentiable.
The operation is differentiable; isn’t it just an ordinary subtraction (by 2 in the example)?
@@Gaetznaa The min operation which is required to find the minimum logit to subtract is not differentiate I guess.
@@krp2834 The min isn't differentiable, but it's still a differentiable function at other points. But if you do that, the minimum value will be guaranteed to always have a "probability" of zero. That may not be desirable... It also will prevent you from using loss functions like KL Divergence or Cross Entropy. Also, they will not be "logits". I suggest you review the definition of logit
Awsome explanation ! thanks
🙏
Great explanation, thanks a lot
🙏
Thanks for making it simple and clear.
🙏
You're awesome.
Good
This video was one big aha moment, thanks! A lot of weight readjusting
🙏
Amazing Explanation!
🙏
Great video! Thank you Kapil
🙏
Greate explanation
🙏
Great video!
I thought temperature was like getting a fewer and saying random things:)
Depends on the context. Here it is about logits. In LLM apis it is to control the stochasticity/randomness.
Great video; thank you :)