26 - Prior and posterior predictive distributions - an introduction

Поділитися
Вставка
  • Опубліковано 30 чер 2024
  • This video provides an introduction to the
    If you are interested in seeing more of the material, arranged into a playlist, please visit: • Bayesian statistics: a... Unfortunately, Ox Educ is no more. Don't fret however as a whole load of new youtube videos are coming soon, along with a book that accompanies the series: www.amazon.co.uk/gp/product/1... Alternatively, for more information on this video series and Bayesian inference in general, visit: ben-lambert.com/bayesian-lect... For more information on econometrics and Bayesian statistics, see: ben-lambert.com/

КОМЕНТАРІ • 41

  • @looploop6612
    @looploop6612 6 років тому +185

    More you learn, more confused you get

  • @NehadHirmiz
    @NehadHirmiz 7 років тому +7

    These are excellent lectures. Thank you for all your nice work

  • @jinrussell4013
    @jinrussell4013 8 років тому +2

    Fantastic. Thank you! I have been struggling to understand this until I watched this video.

  • @olichka1601
    @olichka1601 20 днів тому

    Cool video! I finaly understood it! There are not so much about this theme in the web... thank you!

  • @AnasHawasli
    @AnasHawasli 5 місяців тому +1

    I come back to this video so many times now
    really good thank you!!

  • @ArdianUmam
    @ArdianUmam 6 років тому +3

    This is very clear and intuitive. Thanks a lot, Sir.
    By the way, what software do you use in this video? Like Notability in iOS? I need to find this kind of software, and haven't found it yet.

  • @theforestgardener4011
    @theforestgardener4011 3 роки тому +7

    I don't understand what theta is so this was confusing.

  • @sulgrave_
    @sulgrave_ 7 років тому +23

    Full Text: in this video I want to explain the concepts of prior and posterior predictive distributions so if we start off with the prior predictive distribution what exactly do we mean by this concept well it's quite simple really it's just the distribution of data which we think we were going to obtain before we actually see the data so an example here might be let's say we're flipping a coin 10 times and we cool every time a heads comes up we call that a value of 1 and every time a tears comes up we call that a value of 0 if we're flipping it 10 times and we think that the coin is relatively fair then our sort of frequency distribution for the sort of values that we think we might obtain might look something like this yellow line which I'm drawing here and so you might think that it was relatively centered around the 5 mark here and by frequency distribution I sort of mean just as much just a probability distribution so our PDF looks something like this so this would be our prior predicted distribution and this is based on our prior sort of knowledge about the situation but how can we actually calculate the prior predicted distribution well the idea is that what we're trying to obtain is we're trying to obtain the probability of our data so the probability of let's say Y in this circumstance and this is a marginal probability and we know that we can get to a marginal probability by integrating out all dependence on theta my sort of parameter of our joint probability of Y and theta so if we integrate theta across all the range which theta can say bin so theta sub set of large theta the entire range which can sit in and by integrating this out across all range of theta we're removing this theta dependence and we're just left with a marginal probability but furthermore we know that we can rewrite this using Bayes rule because we know Bayes rule tells us that the probability of Y given theta so the conditional probability of Y given theta is equal to the joint probability the probability of Y and theta divided 3 by the probability of theta actually this isn't so much Bayes rule this is just more the rule of conditional probability so that means that we can actually multiply through by the probability of theta or prior and that gives us our joint probability so that allows us to rewrite this integral as the integral from in this case across all range of theta and we're going to integrate now the likelihood the probability of Y given theta times our prior and we're still integrating across choice of theta so that's how we can get our prior predicted distribution from taking the likelihood and multiplying it by our prior and then integrating our overall parameter choices so that's the prior predicted distribution what is meant by the posterior predictive distribution well this is the idea about this is that essentially this is what value of data we would expect to obtain if we to repeat the experiment after we have seen our sort of data from our current experiment so it's what sort of value would we predict if we were to run the experiment again so the idea here is that after we have flipped our coin in let's say 10 times and let's say it comes up heads nine of those times that might lead us to expect that the coin is in fact biased so if we were to flip the coin another 10 times we might expect that the sort of value which our coin will come up in those 10 times might be something like towards 9:00 so this would be here this sort of move line here would be our posterior predictive distribution and just like the prior predicted distribution it is a valid probability distribution so again this area underneath this code should integrate to one I know the way I've drawn them here it doesn't look like that but they both should integrate to the same value of one so how do we calculate this well the idea is that what we're trying to do is we're trying to calculate the probability of a certain value of our data and all sort of new data which I'm calling here Y prime given that we have observed current data wine and we can sort of just forget about this conditioning here and just remember that essentially this is the same as this is a marginal probability even though it's kind of conditional and we can get that marginal from integrating out the joint probability of y primed and theta remembering that we're still kind of conditioning on Y across all range of theta and just like before weak use the rule of conditional probability to rewrite this this is just the integral of the probability of y prime given now theta and also Y times the probability of theta given Y so integrating that over all range of theta and what do we have here inside our integral well the second term here this is just the posterior distribution which we actually obtained from doing our experiment in the first place and what is this because this looks a little bit more complicated this first term here it is more complicated until you realize that normally when you condition on theta the parameter normally our new observation is independent of the old observations of cos theta tells you everything you need to know about that of new observations so normally we can remove that conditioning and now this is simply a likelihood so the idea is that if you take the likelihood and you multiply 3 by the posterior and you integrate over all parameter ranges that gives you the posterior predicted distribution which is the distribution of observations that we would expect for a new experiment given that we have observed the results of our current experiment

  • @phipag1997
    @phipag1997 2 роки тому

    Great explanation, thank you!

  • @camelcamel7772
    @camelcamel7772 7 років тому +3

    Great job.

  • @thecosmetickoala7145
    @thecosmetickoala7145 5 років тому

    Great explanation. Thank you :)

  • @sepidet6970
    @sepidet6970 6 років тому

    About the posterior probability when you applied conditional probability rule how it changed? it should not be devided by p(y)? I do not understand this part.

  • @PedroRibeiro-zs5go
    @PedroRibeiro-zs5go 4 роки тому

    Thanks, very well explained

  • @moliv8927
    @moliv8927 3 роки тому

    Explained well, thank you

  • @thischannelhasaclevername5481
    @thischannelhasaclevername5481 4 роки тому +8

    Very well explained!
    Didn't get it though

  • @dough4nut
    @dough4nut 4 роки тому +2

    I'm a little confused: where does P(Y | theta) come from if we're calculating the prior predictive probability distribution before seeing the data? If it is the likelihood (as is said at 2:35), that means we've already seen the data.

    • @koochibabua1904
      @koochibabua1904 4 роки тому +1

      Your doubt is correct. Likelihood assumes that there is some data beforehand, and in this case "prior" acts as the data. "Prior" does NOT mean "No result has been seen". It means "data before the next trial". If I know before tossing a coin that the coin is biased with P(heads) = 0.6, it is a data without any previous trial. If I have flipped a coin once and got 1 Head, then for next flip, this data acts as prior.

  • @briansalkas349
    @briansalkas349 Рік тому

    Is P(y,theta) the same as P(y and theta), where they typically use a upside down U for the and?

  • @ywk7282
    @ywk7282 3 роки тому

    I1. am confused is P(y) and P(theta) both a prior?
    2. and is p(y|y') the posterior or the p(theta|y) the posterior?

  • @tuber12321
    @tuber12321 6 років тому

    At 2:37 you call P(theta) the "prior" and use it to get our "prior predicted distribution." These are two different things? It would be helpful to explain this.

    • @tuber12321
      @tuber12321 6 років тому

      Ah I see. Indeed the "prior predictive distribution" is different from the "prior distribution," and it's the whole point of this video :)

  • @hohinng196
    @hohinng196 3 роки тому +1

    Isn't the P(Y) the denominator in the Bayes rule?

  • @NotLegato
    @NotLegato Рік тому

    fantastic. i was having a bit of trouble with the manipulation of the conditioning variables, this cleared it up.

  • @mattetis
    @mattetis 7 років тому +8

    Which one is the posterior?
    p(y'|y) or p(θ|y)?

    • @fpereira77
      @fpereira77 7 років тому +4

      This is tricky as hell. I've been studying Bayesian inference for the past week for some experiments I need to do and I still get confused.
      My understanding is that p(θ|y) is the posterior distribution you actually have after you did the experiment and observed the results.
      On the other hand, p(y'|y) is the distribution of results you expect to get if you repeat the coin flip.
      So both are "posterior".

    • @tuber12321
      @tuber12321 6 років тому +8

      p(y'|y) is the "posterior predictive distribution" and p(θ|y) is the "posterior distribution."
      Similarly, p(y|θ) is the "prior distribution" and p(y) is the "prior predictive distribution."

  • @lemyul
    @lemyul 4 роки тому

    thank you

  • @elsidiegbelhaj2016
    @elsidiegbelhaj2016 6 років тому

    Thanks

  • @parametersofstatistics2145
    @parametersofstatistics2145 5 років тому

    Sir i need to know desperately if
    X1 x2 x3 x4 are 3 iid variates with mean mu and variance 2 where mu~N(0,1/2)
    Then u will find posterior for it what will be the mode median variance of it

  • @MattyHild
    @MattyHild 4 роки тому

    isn't your posterior actually a predictive posterior equation, not the true posterior of the parameters?

  • @mikahebat
    @mikahebat 9 років тому +1

    Does the theta here mean the population mean?
    if so, then what does P( Y, theta ) mean?

    • @MarkovChains223
      @MarkovChains223 8 років тому +1

      +Michael Surjawidjaja Theta represents any population parameter of interest. So, that could be a mean, a median, a variance, etc.
      P(Y, theta) is the *joint* probability density function of the variables Y and theta.

  • @esteelee5935
    @esteelee5935 7 років тому +5

    i dont really get it

  • @alexander53
    @alexander53 2 роки тому +2

    I feel like you already need to understand Bayes theorem to get anything from this video. This more of a "review" not really an introduction.

  • @tachyon7777
    @tachyon7777 4 роки тому +4

    Not a very good video for anyone who isn't already half way there understanding it.

  • @glaswasser
    @glaswasser 4 роки тому +3

    this is too abstract, sorry.