Bayesian Parameter Estimation

Поділитися
Вставка
  • Опубліковано 9 лип 2024
  • #BayesTheorem #DataScience #MachineLearning #ArtificialIntelligence

КОМЕНТАРІ • 6

  • @AnubhabChakraborty-ws6kh
    @AnubhabChakraborty-ws6kh 3 роки тому +3

    Aren't A'(heads in n+1th coin toss) and A(k heads out of n tosses) independent, and P(A'/A) = P(A') = ρ, since every coin toss is an independent event? What am I missing here?
    The notation, P(A/ρ) looks a bit weird, A is an event and ρ is a probability. Aren't both supposed to be events in the Bayes Theorem? Also, isn't P(A/ρ) = nCk * ρ^(k) * (1-ρ)^(n-k) at 9:35 ?

    • @EvolutionaryIntelligence
      @EvolutionaryIntelligence  3 роки тому

      Good questions! So in general, the probability of getting k heads in n tosses will have the nCk factor. But here we are referring to a specific sequence of n tosses and k heads since you performed the experiment only once and now want to predict the probability of head in next coin toss.
      In P(A/rho), rho represents the event that the coin toss probability takes a particular value. So now the value that the random variable takes is another random variable! You can think of this as nested random variables.
      A' and A are independent events, but our ignorance of rho makes them connected. If you knew the value of rho, P(A'/A) would just be rho. But now we do not know what value rho takes, and so that has to be estimated using the event A.

  • @KiranD-xt8sq
    @KiranD-xt8sq 3 роки тому

    If we knew the likely nature of the event, i.e., if we have some clarity on what rho might be, we can just use a gaussian distribution for f(rho) with the mean being at the most likely rho value and the standard deviation and spread depending on how well we knew it was the real rho value(large deviation if we are very unsure and small deviation if we are pretty sure). I guess this would be a better representation than the uniform distribution.

    • @EvolutionaryIntelligence
      @EvolutionaryIntelligence  3 роки тому

      Yes, its always better to use f(rho) which reflects our knowledge of rho. But for a coin toss experiment, since the probability of head can be only between 0 and 1, we have to choose a distribution that is non-zero only in this domain. As mentioned in the view on MAP, the Beta distribution is often used in such cases.

  • @VijayChakravarty-pz4qv
    @VijayChakravarty-pz4qv 3 роки тому

    I don't understand the form of Bayes' theorem that is used in the video with both frequency distribution and probability terms featuring in the same formula. This happens because rho has been taken as a RV but A is just an event. So rho has a frequency distribution while A has probability. I have come across derivation of Bayes' theorem where both A and rho are either RVs or both are events. Could you attach some link where this Bayes' theorem with mixed form of event and RV has been derived?