Uncertainty (Aleatoric vs Epistemic) | Machine Learning

Поділитися
Вставка
  • Опубліковано 30 кві 2021
  • Machine/Deep learning models have been revolutionary in the last decade across a range of fields. However, sometimes we need to consider the uncertainty of models so we can gauge how confident the model is in its predictions. The total uncertainty in a prediction occurs from a combination of data (aleatoric) and model (epistemic) uncertainty. Check out the video to understand what these types of uncertainty mean and how the total uncertainty can be decomposed into these two terms.
  • Наука та технологія

КОМЕНТАРІ • 66

  • @1nTh3House
    @1nTh3House 29 днів тому

    You explain perfectly! been looking for videos about uncertainty and you explained it the best!

  • @nickrhee7178
    @nickrhee7178 6 місяців тому +1

    Only the seed was changed to get the uncertainty area in 2d plain but there are many other sources of uncertainties that we should have included to get more comprehensive picture of the uncertainty.

  • @c0d1ngclips25
    @c0d1ngclips25 3 роки тому

    you don't know how helpful your channel is, thank you!!!

  • @PranavAshok
    @PranavAshok 2 роки тому +1

    Thanks for the great introduction to this topic! In your explanation about the model uncertainty, you were varying the seeds (and hence indirectly the weights) in order to get all the different models for the same network architecture. Did you choose to do that for the sake of simplicity? Do we also have to think about the various possible model architectures (or alternate models) as well when trying to estimate the model uncertainty more accurately?

    • @TwinEdProductions
      @TwinEdProductions  2 роки тому +3

      Thanks for your comment! Yes, by varying the seeds at training time, a specific model's architecture will train differently and hence the variation in its output can be interpreted as its model uncertainty. But if we change the model architecture (e.g. ensemble across different models), the model uncertainty will no longer be for a given model architecture but for a framework of models. Hence, I would argue that the model uncertainty will not be more accurate it will just be a different value because now our meaning of 'model' is different. For example, let's consider the case we ensemble two trained models that have different architectures. One model architecture gives very confident predictions while the other model architecture leads to less confident predictions. These two models will have large variation in their outputs (think about the outputs as points on a simplex) and hence large model uncertainty. This model uncertainty is not more accurate than varying the seeds of a single model architecture but instead just representing the model uncertainty for different models (i.e. the ensembled pair of models). Hopefully that makes sense!

    • @PranavAshok
      @PranavAshok 2 роки тому +2

      @@TwinEdProductions Thanks for the thought provoking reply. I'm an ML outsider and I'm trying to understand whether epistemic uncertainty means the same to the traditional risk engineering community and the ML folks. From what I have gathered about epistemic uncertainty on reading literature (e.g. www.sciencedirect.com/science/article/pii/S0167473008000556 from risk engineering, link.springer.com/article/10.1007/s10994-021-05946-3 from ML), systematic errors in the model such as consistently underestimating the true value would be considered as epistemic uncertainty. If we stick to a certain architecture, could it be the case that a model under-predicts or is underconfident for whatever random seed we try? And hence we will not be able to capture the epistemic (model) uncertainty properly?
      Alternatively, as the second paper (page 7, footnote) suggests, it could be that there is no consensus on what exactly a model is. I would really appreciate any leads that could give differing views.

  • @ryanyoung6853
    @ryanyoung6853 2 роки тому

    Excellent video. Great speaking tempo. Easy to follow.

  • @EigenA
    @EigenA Рік тому +1

    Great work! Thank you for the presentation.

  • @user-pc6zi6vg1f
    @user-pc6zi6vg1f 2 роки тому +3

    Great Video! I was wondering, what happen if there exist an "Out of Domain Class" (a class not in the training dataset), but the model, or even the ensemble model, still gives a high confidence for the prediction.

    • @TwinEdProductions
      @TwinEdProductions  2 роки тому +7

      Great question! In a classification setting, if there is a distributional shift between the data at training time and at test time (e.g. an extreme example of a distributional shift is the presence of a new class at test time that didn't exist at training time), it is natural to expect that the model will be more uncertain about its prediction on an out-of-domain example at test time as it has not encountered such an example before. Hence, it is not likely that an ensemble model will have a high confidence prediction for an out of domain class as each member of the ensemble should not have misinterpreted the out of domain class in the same way. But this is in fact a hot research area of exploring which uncertainty measures can be used for OOD (out-of-domain) detection. We recently have released a public dataset exactly for this purpose: arxiv.org/abs/2107.07455
      I hope that somewhat answers your question!

    • @user-pc6zi6vg1f
      @user-pc6zi6vg1f 2 роки тому +1

      @@TwinEdProductions Thank you for your explanation ! I will take a look at the paper :)

  • @echolee2686
    @echolee2686 5 місяців тому

    Thanks for explanation! Is the model uncertainty here the variance of Gaussian distribution? Can we define a different total uncertainty?

  • @user-ul4lj6cu9b
    @user-ul4lj6cu9b Рік тому

    very clear explanation, thank you!!

  • @scarlet113
    @scarlet113 2 роки тому

    Thanks for the very clear explanation

  • @InquilineKea
    @InquilineKea 8 місяців тому

    How much does VC dimension contribute to uncertainty in each? [and does high-VC dimension adapt itself well better to aleatoric uncertainty). it sounds like a functional analysis thing

  • @anujshah645
    @anujshah645 2 роки тому

    Thanks for the lucid explanation. I have one doubt regarding aleatoric uncertainty. From the paper of kendel and Gal, the aleatoric uncertainty is obtained by modifying the loss function, so is that aleatoric same as total - epsitemic

    • @TwinEdProductions
      @TwinEdProductions  2 роки тому

      Could you point me to which of Yarin's papers you are referring to here? I'll have a look at it then!

    • @casare2022
      @casare2022 10 місяців тому

      @@TwinEdProductions Kendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision?. Advances in neural information processing systems, 30.

  • @MrVaunorage
    @MrVaunorage 2 роки тому +1

    Okay but how do you calculate the epistemic uncertainty? How do you get the gaussian distribution over the model predictions ? Do you absolutely need to sample or you can get it using the alpha parameters? Thank you

    • @TwinEdProductions
      @TwinEdProductions  2 роки тому

      Practically calculating epistemic uncertainty is achieved by very simple expressions/algorithms for epistemic uncertainty measures such as mutual information, expected pairwise KL divergence and reverse mutual information. The implementations of these can be found at: github.com/yandex-research/shifts/blob/main/weather/uncertainty.py

    • @MrVaunorage
      @MrVaunorage 2 роки тому

      @@TwinEdProductions In the code you shared, all the functions use the probability outputs not the alpha parameters of the dirichlet distribution. In other words,they do not leverage the information behind the dirichlet distribution to derive an epistemic and aleatoric uncertainty so I don't think it works. Correct if I am wrong please

    • @TwinEdProductions
      @TwinEdProductions  2 роки тому

      @@MrVaunorage Hi thanks for your comment. I have given a detailed reply to your more recent comment which hopefully answers your queries.

  • @abiramiap
    @abiramiap 2 роки тому

    Thank you! This video was very useful

  • @jijie133
    @jijie133 Рік тому

    Great video!

  • @amortalbeing
    @amortalbeing Рік тому

    Hi, Thanks a lot really appreciate it.
    what book or books should I read /or video/courses to watch to know what you know here?

    • @TwinEdProductions
      @TwinEdProductions  Рік тому +1

      Hi! To gain a good theoretical understanding of this area, I find it helpful to read the following PhD thesis: mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf

    • @amortalbeing
      @amortalbeing Рік тому

      @@TwinEdProductions thanks a lot really appreciate it🙂

  • @annap1904
    @annap1904 10 місяців тому

    Great video!! Does the same framework apply to random forests?

    • @TwinEdProductions
      @TwinEdProductions  15 днів тому

      Yes it does, as the model doesn't have to be a deep neural network

  • @AbhishekSinghSambyal
    @AbhishekSinghSambyal 2 місяці тому +1

    Any good resource to read in detail what you explained?

    • @TwinEdProductions
      @TwinEdProductions  15 днів тому

      Try the 'prior networks' paper by Andrey Malinin or the PhD thesis of Yarin Gal

  • @paulorjr10
    @paulorjr10 3 роки тому +1

    Great video! One question: the mean of predictions is equal to the predictive entropy (and thus the total uncertainty)?

    • @TwinEdProductions
      @TwinEdProductions  3 роки тому +1

      Thanks! Yes you are correct: the entropy of the mean of the predictions (from different models) is a measure of total uncertainty.

  • @ahmedtech9590
    @ahmedtech9590 Рік тому

    thank you, great video!

  • @MohammedMomen-qn3kc
    @MohammedMomen-qn3kc Рік тому

    Thank you very much. too useful
    .

  • @casare2022
    @casare2022 10 місяців тому +1

    Clearly explained😎😎. Please how do we quantify these uncertainties?

    • @TwinEdProductions
      @TwinEdProductions  10 місяців тому +1

      Hi! Do you mean how we assess whether our uncertainty methods are actually sensible? If so, there is a whole wealth of literature on this with many approaches proposed. All approaches basically want to ensure that the uncertainty measure chosen actually correlates with your errors. One popular approach for assessment if error-retention curves. I find the Shifts dataset paper explains this fairly well.

    • @casare2022
      @casare2022 10 місяців тому

      @@TwinEdProductions Oh okay, thank you very much

  • @kassemhussein6037
    @kassemhussein6037 2 роки тому

    Could you provide a simple mathematical example on how to run the calculations? I've looked for a simple practical example on how to do the calculations but only found complex papers

    • @TwinEdProductions
      @TwinEdProductions  2 роки тому

      Hi, I can provide you with very simple code which shows exactly how to calculate most of the popular predictive uncertainty measures including: entropy of expected, expected entropy, mutual information, reverse mutual information, expected pairwise KL divergence and (negated) confidence. Is this what you are looking for?

    • @MrVaunorage
      @MrVaunorage 2 роки тому +1

      @@TwinEdProductions yes please

    • @TwinEdProductions
      @TwinEdProductions  2 роки тому

      @@MrVaunorage github.com/yandex-research/shifts/blob/main/weather/uncertainty.py I also linked it directly to your other comment

  • @sak9746
    @sak9746 Рік тому

    How can calculate the uncertainty both data and model in an already developed model?

    • @TwinEdProductions
      @TwinEdProductions  Рік тому

      Hi, what kind of output do you have from your model? i.e. are you doing regression or classification?

    • @sak9746
      @sak9746 Рік тому

      @@TwinEdProductions classification .
      It's a prediction model

    • @TwinEdProductions
      @TwinEdProductions  Рік тому

      @@sak9746 If it's a single classification model, you will have some probability distribution output over the classes. An estimate of total uncertainty can be achieved by just calculating the entropy of this probability distribution. You cannot directly get estimates of model uncertainty here with a single model as model uncertainty essentially measures the disagreement between models which is impossible to measure with a single model. Hope that helps :)

  • @rodi4850
    @rodi4850 Рік тому

    How do you compute the total uncertainty ?

    • @TwinEdProductions
      @TwinEdProductions  Рік тому

      Hi! There are some estimates of total uncertainty that can be calculated from an ensemble of predictions for classification tasks. A common example includes entropy of the expected distribution

  • @dragolov
    @dragolov 2 роки тому

    Deep respect!

  • @MrVaunorage
    @MrVaunorage 2 роки тому

    I actually disagree with what you mentionned about the entropy being the total uncertainty, it seems to me that the entropy just refers to a certain type of uncertainty which is the aleatoric one because it kinda is a representation of the distance between the mean and the edges of the triangle. There is another uncertainty tho which is the variance of the Dirichlet distribution and I do not see it in your explanation

    • @TwinEdProductions
      @TwinEdProductions  2 роки тому

      Hi! So I did not go into the details in the video as the aim was to be introductory to the area of predictive uncertainty. There are a multitude of predictive uncertainty measures that capture either total, aleatoric or epistemic uncertainties. Entropy of an expectation of the prediction of several ensemble members is in fact a measure of total uncertainty (it might not be intuitively obvious but it can mathematically be demonstrated). The following PhD thesis chapter 3 and specifically chapter 3.2 explains this very well mathematically: www.repository.cam.ac.uk/handle/1810/298857
      Basically, quoting from chapter 3 above, using predictive uncertainties based on the probability outputs of ensembles in a classification problem can capture/estimate aleatoric and epistemic uncertainties. Note, the above PhD thesis uses the terminology data and knowledge uncertainties in place of aleatoric and epistemic uncertainties. If you read many research papers that utilise predictive uncertainties, you will see that entropy of the expectation of model predictions is indeed a measure of total uncertainty e.g. arxiv.org/abs/2107.07455.
      I hope that answers your questions and thank you for your comment!

  • @user-ko9wx5yx1m
    @user-ko9wx5yx1m 10 місяців тому +1

    Isn't this the same as deep ensembles?

    • @TwinEdProductions
      @TwinEdProductions  15 днів тому

      Yes! We are capturing uncertainty here by considering deep ensembles

  • @lifewithaqs3858
    @lifewithaqs3858 2 роки тому

    Good