Softmax (with Temperature) | Essentials of ML

Поділитися
Вставка
  • Опубліковано 21 гру 2024

КОМЕНТАРІ • 32

  • @ssshukla26
    @ssshukla26 2 роки тому +2

    Great to see a new video after so many days... Will watch it afterwards... thank you Sir....

  • @murphp151
    @murphp151 2 роки тому +2

    This is brilliant

  • @mrproxj
    @mrproxj 2 роки тому +1

    Hi, thanks for this video. Now I know why my classifier always predicted with such high confidence, be it correct or incorrect. Could there be something else other than temperature to solve this? I would like to determine how confident the model is in its prediction. Is temperature the way to go?

    • @KapilSachdeva
      @KapilSachdeva  2 роки тому +1

      Another technique is called label smoothing. It is related but applied to ground truth labels. See - proceedings.neurips.cc/paper/2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf
      Also there is something model calibration but I have not yet applied them to neural networks.

    • @mrproxj
      @mrproxj 2 роки тому +1

      Thanks a lot. This will come a lot in handy!

  • @krp2834
    @krp2834 2 роки тому

    Instead of using using exp function in softmax to make logits positive what if we shift the logits by least logit value [1, -2, 0] => [3, 0, 2]. This also ensures relativity between logits.

    • @KapilSachdeva
      @KapilSachdeva  2 роки тому +1

      Thanks Prasanna; forgot to mention that the transformation should be differentiable.

    • @Gaetznaa
      @Gaetznaa 2 роки тому

      The operation is differentiable; isn’t it just an ordinary subtraction (by 2 in the example)?

    • @krp2834
      @krp2834 2 роки тому

      @@Gaetznaa The min operation which is required to find the minimum logit to subtract is not differentiate I guess.

    • @ssssssstssssssss
      @ssssssstssssssss 2 роки тому

      ​@@krp2834 The min isn't differentiable, but it's still a differentiable function at other points. But if you do that, the minimum value will be guaranteed to always have a "probability" of zero. That may not be desirable... It also will prevent you from using loss functions like KL Divergence or Cross Entropy. Also, they will not be "logits". I suggest you review the definition of logit

  • @lielleman6593
    @lielleman6593 2 роки тому +2

    Awsome explanation ! thanks

  • @oguzhanercan4701
    @oguzhanercan4701 2 роки тому +2

    Great explanation, thanks a lot

  • @rbhambriiit
    @rbhambriiit 2 роки тому +1

    Thanks for making it simple and clear.

  • @SM-mj5np
    @SM-mj5np 2 місяці тому

    You're awesome.

  • @behnamyousefimehr8717
    @behnamyousefimehr8717 9 місяців тому

    Good

  • @ninobach7456
    @ninobach7456 Рік тому +2

    This video was one big aha moment, thanks! A lot of weight readjusting

  • @abhishekbasu4892
    @abhishekbasu4892 11 місяців тому +1

    Amazing Explanation!

  • @peterorlovskiy2134
    @peterorlovskiy2134 Рік тому +1

    Great video! Thank you Kapil

  • @kalinduSekara
    @kalinduSekara 9 місяців тому +1

    Greate explanation

  • @victorsilvadossantos2769
    @victorsilvadossantos2769 5 місяців тому

    Great video!

  • @zhoudan4387
    @zhoudan4387 6 місяців тому

    I thought temperature was like getting a fewer and saying random things:)

    • @KapilSachdeva
      @KapilSachdeva  6 місяців тому

      Depends on the context. Here it is about logits. In LLM apis it is to control the stochasticity/randomness.

  • @HellDevRisen
    @HellDevRisen 7 місяців тому

    Great video; thank you :)