Frequentism and Bayesianism: What's the Big Deal? | SciPy 2014 | Jake VanderPlas

Enthought

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 22 гру 2024

КОМЕНТАРІ • 92

@zecheng3771 6 років тому ⁺⁷⁷
OMG, starting from 20:50 the difference in the interpretations of confidence interval of frequentist and Bayesian is mind blowing!
@kantankerousk 3 роки тому ⁺²
What does he mean by recipe for generating a credible region and where do credible regions get used in this kind of analysis?
@ArunKumar-yb2jn 2 роки тому ⁺¹
@@kantankerousk recipe = model
@tbaiyewu 6 років тому ⁺¹⁹
best explanation on the difference between bayesian and frequentist approach. Really beautiful.
@FinallyAFreeUsername 5 років тому ⁺⁵
Example 2 at 16:00 is VERY helpful in understanding the differences between the two philosophies. Frequentists give a range and say 95% of all the ranges they could draw (from what their dataset tells them) would contain the true value. Bayesians give a range and say that the true value is in that range with a 95% probability. The Bayesian range is what people usually think the Frequentist range is.
@arnoudt 3 роки тому ⁺¹
Not so. Without having data on the whole population, or know the distribution a priori, there is no possibility to give a 95% estimate of the true value.
@benjamindilorenzo 2 роки тому
also what i dont get: it is the same to say, that there is a 95 percent chance, that the current intervall that i am looking at contains theta, then saying, there is a 95 percent chance that theta is in that intervall. Why is this such a big deal here?
@TangerineTux 2 роки тому
@@benjamindilorenzo Neither of them is what the confidence interval tells you. The confidence level (95%) is the probability of generating an interval that contains the true value _when you draw new random data according to the model._ Once you have effectively generated a given confidence interval, it is fixed and there is no 95% in sight anymore, or at least not intrinsically from the fact that it’s a 95% confidence interval.
You could use Bayesian inference to calculate the probability that the interval you got from the frequentist method contains the true value, and you might even arrive at “95%”, but that is absolutely not a guarantee and you have to check to be sure. If you’re going to use Bayesian statistics to do that anyway, you might as well use them fully to start with.
See the excellent article “The fallacy of placing confidence in confidence intervals” for more on this.
@bayesian7404 8 років тому ⁺³⁸
Fantastic talk. I am finally understanding the differences of Bayesian and Frequentist arguments. Thank you.
@user-0j27M_JSs 2 місяці тому
I need to consult Gemini to grasp the concept fully. Can say that the LLMs start to become great advisors.
@Max-cs1dn 2 роки тому
Recommend 23:00 onwards if you don’t have time. The conclusion and Q&A responses are actually good without personal bias.
@michaelliu6323 2 роки тому
at 13:10, there are other scenarios' probabilities are supposed to be added up to this "bob to win the next 3 in a row". for instance: let win denotes Bob Win, then Win-Lose-Win-Win-Win also works; Win-Win-lose-Win-Win also works... quite a few...
@marceloventura6442 5 років тому ⁺⁷
2:14 I love you, man! How come people don't get that point straight before starting to "teach" Bayesian statistics?! Thank you, sir!
@luisfca 7 років тому ⁺¹²
Greenland (2006: 767):It is often said (incorrectly) that ‘parameters are treated as fixed by the frequentist but as random by the Bayesian’. For frequentists and Bayesians alike, the value of a parameter may have been fixed from the start or may have been generated from a physically random mechanism. In either case, both suppose it has taken on some fixed value that we would like to know. The Bayesian uses formal probability models to express personal uncertainty about that value. The ‘randomness’ in these models represents personal uncertainty about the parameter’s value; it is not a property of the parameter (although we should hope it accurately reflects properties of the mechanisms that produced the parameter).Greenland, S. (2006). Bayesian perspectives for epidemiological research: I. Foundations and basic methods. International Journal of Epidemiology, 35(3), 765-774.
@awesomelongcat 6 років тому ⁺¹
For Bayesians, randomness is *never* a property of parameters.
@jaquelinemoreira7385 3 місяці тому
AMAZING video! Completely changed my perspective about the interpretation of the intervals. Thank you so much!
@ohssjo 4 роки тому ⁺³
This is by far THE best explanation bet. frequentists and bayesians. Thank you so much!!
@austinhaider105 3 роки тому
Wish you had more time! Awesome talk
@vivek2319 4 роки тому ⁺¹
wow, thank you so much. I'd say we have to use both the approaches..
@GuillermoValleCosmos 5 років тому ⁺⁸
well, a Bayesian credible interval may also not contain the true value, just like the frequentist one..
I think in practice the main difference between the two approaches is the existence of a prior in the Bayesian one, which makes it more suitable in some situations where we do have a prior belief.
@wngmv 3 роки тому
Idk who upvoted you, but true bayesians don't believe in "the true value". We believe that EVERYTHING is random. Bayesian credible interval, in a sense, is just another way to describe the distribution of the parameter (which is also a random variable) of interest.
@TangerineTux 2 роки тому
@@wngmv Jaynes would disagree:
“For decades Bayesians have been accused of “supposing that an unknown parameter is a random variable”; and we have denied hundreds of times, with increasing vehemence, that we are making any such assumption. We have been unable to comprehend why our denials have no effect, and that charge continues to be made.
[…]
Orthodoxians trying to understand Bayesian methods have been caught in a semantic trap by their habitual use of the phrase “distribution of the parameter” when one should have said “distribution of the probability”. Bayesians had supposed this to be merely a figure of speech; i.e. that those who used it did so only out of force of habit, and really knew better. But now it seems that our critics have been taking that phraseology quite literally all the time.
Therefore, let us belabor still another time what we had previously thought too obvious to mention. In Bayesian parameter estimation, both the prior and posterior distributions represent, not any measurable property of the parameter, but only our own state of knowledge about it. The width of the distribution is not intended to indicate the range of variability of the true values of the parameter, as Barnard's terminology led him to suppose. It indicates the range of values that are consistent with our prior information and data, and which honesty therefore compels us to admit as possible values. What is “distributed” is not the parameter, but the probability.
Now it appears that, for all these years, those who have seemed immune to all Bayesian explanation have just misunderstood our purpose. All this time, we had thought it clear from our subject matter context that we are trying to estimate the value that the parameter had _when the data were taken._ Put more generally, we are trying to draw inferences about what actually did happen in the experiment; not about the things that might have happened but did not.
”
- E. T. Jaynes, 1986, “Bayesian Methods: General Background”
@gonzothegreat1317 6 років тому ⁺⁷
Where can I find the paper he is talking about in the first few minutes?
@PeterH1900 6 років тому ⁺⁴
Hi,
What I do not understand is the small pink distribution at 6:00.
Shouldn't the size of area under the curves be equal = 1?
Regards Hans
@Diamondketo 5 років тому ⁺⁴
No, the multiplication of two normalized distribution is not always normalized.
You can easily tell by attempting to multiple two of the same gaussian N(0,1) * N(0,1) and see the area is much smaller for the product.
@paullang3414 4 роки тому
The way as I see it is as follows (please correct me if I am wrong):
By multiplying all these n measurements, you get a joint probability density function in n dimensional space. In this case, the pdf is a Gaussian centered at μ = (F_true, F_true, ... F_true). If you integrate it over the whole n dimensional volume, you get 1.
However, this is not what the pink curve shows. To illustrate what the pink curve shows, imagine some random point in the 1000 dimensional Gaussian. How would the probability density in this point change, if you change F_true, i.e. shift the ball around. This is the pink curve. Note that the pink curve is a function of the parameter μ, not the data. As such, it is not a probability density function, but a likelihood function. A likelihood function cannot be integrated over the data, as the data are treated as fixed. It can be integrated over the parameters, but this integral does not have to be one (see also Figure 1 here: en.wikipedia.org/wiki/Likelihood_function#Example).
@pragyanOne Місяць тому
great video!
@r0075h3ll Рік тому ⁺¹
Given that I've got a very little knowledge about stats, it was hard for me to comprehend this, but enjoyed the talk anyway.
@loicniederhauser7130 4 роки тому ⁺¹
Very good talk. A bit sad he didn't have the time to go through everything. I would have loved to hear more about that
@rojanshrestha822 4 роки тому ⁺¹
4:02 Frequentist talk about models being Fixed!! What does that mean?
@samyoe 4 роки тому
Underlying parameters are fixed i.e. they remain constant during this repeatable sampling process.
@abdulkader7104 2 роки тому ⁺¹
min 4:00
frequentists talk about models being fixed and data vary around them
Bayesians talk about data observed being fixed and models varying
min 5:00 frequentists use maximum likelihood
so we have to observed data, we multiply these together to get the likelihood, the more we add values the more it will give u the true value
min 7:30 in Bayesian, we are asking for the probability of the true flus (F) given the data observed
here we are interested in posterior probability, which is the probability of my model value given the data that I observed
min 8:30 it is based on likelihood and prior beliefs (can be empirical or a non informative prior which is making all the fluxes equally probable sine I am ignorant about it)
min 21:00 so for frequentists 95 percent of the interval will contain the correct value, no here 95 refers to the interval itself, so the 95 % of the resulting ensemble of intervals will contain the value
as for Bayesians, the correct value is present 95 percent of the time in the interval, meaning if we repeat the experiment 100 times, 95 times the value will be between the interval value, so there is 95% chance that the value is in the interval
min 25:50 Bayesian result is correct only if your prior is correct otherwise your result will be biased
@MW-vg9dn 4 роки тому
Jake is sharp! A very concise and clear delivery.
@heidiwinklee 5 років тому ⁺²
Awesome talk!! I'm currently trying to tackle a psych honours stats assignment and this really helped me get my head around NHST and Bayesian :)
@zilezile4942 4 роки тому
Bonjour à tous,
🔵 Découvrez dès maintenant nos les livres et formations que nous avons réalisés pour vous sur notre site . : www.amikour.wordpress.com
🔵 Cliquez ici pour accéder directement sur nos livres et formations . : amikour.wordpress.com/nos-formations/
🔵 Nous avons pris le temps d’aborder des thèmes importants de notre époque. Plusieurs autres livres et formations sont en cours. Vous pouvez prendre connaissance de l'extrait de chaque livre gratuitement en cliquant sur le bouton feuilletage>> dans notre site web.
Merci
@RealFrankTheTank 10 років тому ⁺²
Excellent talk-stylistically and contentwise!
@flo7968 2 роки тому
I could watch people explaining the difference between frequentist and Bayesian approach all day long. People are always so enthusiastic about it :)
@MagicBoterham 4 роки тому
The model at 5:00 for flux surely cannot be Gaussian, since flux can't go negative.
@richardgamrat1944 3 роки тому ⁺¹
What do you mean? Height of a human also cant go negative and is described by Gaussian curve. Or am I jusr misunderstanding you?
@milianvolz5382 4 роки тому
I know this question comes very late, but I would really like to know: Why is the joint probability in the image shown at 15:37 not a line but an area? Doesn't the probability of B winning the game depend only on the general probability of B scoring a point?
@zilezile4942 4 роки тому
Bonjour à tous,
🔵 Découvrez dès maintenant nos les livres et formations que nous avons réalisés pour vous sur notre site . : www.amikour.wordpress.com
🔵 Cliquez ici pour accéder directement sur nos livres et formations . : amikour.wordpress.com/nos-formations/
🔵 Nous avons pris le temps d’aborder des thèmes importants de notre époque. Plusieurs autres livres et formations sont en cours. Vous pouvez prendre connaissance de l'extrait de chaque livre gratuitement en cliquant sur le bouton feuilletage>> dans notre site web.
Merci
@Rushil69420 5 років тому ⁺¹⁴
"The little red likelihood"
Nice
@onderbektas6532 3 роки тому
Such a great talk. Thank you
@ahmedabbas2595 4 роки тому ⁺¹
Thank you, a very concise and well formed explanations, hoped that you'd be given more time actually but thank you anyway, it was truly helpful!
@benjamindilorenzo 2 роки тому
what i dont get: when i am saying: "i can be 95 percent sure, that the interval i am looking at, contains the value theta" then it is the same as saying: "i can be 95 percent sure, that the value i am looking for is in the interval i am looking at".
Why making a difference here?
@J_The_Colossal_Squid 2 роки тому
Okay, if a thinking and feeling person has somehow assigned "some sort of probability to define that uncertainty in one's knowledge", how does one ascertain and make use such a mechanism or mechanisms without engaging in trials and calculation and without the use of human imagination to arrive at a non trival level of certainty via Baysian analysis accordingly?
@hongkyukim4179 4 роки тому
Best explanation.
Fell in love with him.
@Multilingualtricker 4 роки тому
Bit confused by the graph at 19.21. Is it showing the probability that the net is still functioning (at 10 mins probability of still functioning drops to 0)? If that's true why isn't the region 0-5 minutes shaded in grey? I.e probability of net still functioning=1
@dr.alzarani 9 років тому ⁺⁴
Well organized, well articulated, concise and easy to follow!
@zilezile4942 4 роки тому
Bonjour à tous,
🔵 Découvrez dès maintenant nos les livres et formations que nous avons réalisés pour vous sur notre site . : www.amikour.wordpress.com
🔵 Cliquez ici pour accéder directement sur nos livres et formations . : amikour.wordpress.com/nos-formations/
🔵 Nous avons pris le temps d’aborder des thèmes importants de notre époque. Plusieurs autres livres et formations sont en cours. Vous pouvez prendre connaissance de l'extrait de chaque livre gratuitement en cliquant sur le bouton feuilletage>> dans notre site web.
Merci
@kenanmorani9204 3 роки тому
Thank you
@sachit4 6 років тому ⁺¹
Why was the unbiased estimator used in the truncated exp problem for frequentist approach. And not MLE? The results from MLE would've again been similar to the bayesian result like in the flux example right?
@vleessjuu 4 роки тому ⁺¹
Because frequentists get mad at you if you don't use unbiased estimators. It's their pet peeve. I'm not even kidding: I've had a statistician get cross with me for using the MLE variance instead of the unbiased variance estimator. It also shows how the frequentist obsession with unbiased estimators is actually not all that useful.
@TangerineTux 2 роки тому
It’s kind of beside the point. The unbiased estimator gives a valid confidence interval, therefore it is a valid illustration of why inferring that it is 95% likely to contain the true parameter is wrong. Using a “better” approach would merely hide the problem that such inferences are not justified, not remove it.
As “The fallacy of placing confidence in confidence intervals” puts it:
“The only way to know that a confidence interval is numerically identical to some credible interval is to _prove_ it. The correspondence cannot - and should not - be assumed.
More broadly, the defense of confidence procedures by noting that, in some restricted cases, they numerically correspond to Bayesian procedures is actually no defense at all. One must first choose which confidence procedure, of many, to use; if one is committed to the procedure that allows a Bayesian interpretation, then one’s time is much better spent simply applying Bayesian theory. If the benefits of Bayesian theory are desired - and they clearly are, by proponents of confidence intervals - then there is no reason why Bayesian inference should not be applied in its full generality, rather than using the occasional correspondence with credible intervals as a hand-waving defense of confidence intervals.”
@ullas06 4 роки тому
Simply the best one..
@govinda1993 6 років тому ⁺¹
that was a really good talk
@driyagon 4 роки тому
what is Di and Ftrue in 5:09
@dianamorton3716 2 роки тому
So well explained! Very interesting and helpful
@MannISNOR 5 років тому
This is terrific!
@iAmTheSquidThing 7 років тому ⁺²
I still can't get my head around the difference between confidence intervals and credible regions. If 95% of experiments would return a region which contains the true value. And we've done one experiment. It feels as thought the probability that the true value lies within the region returned by that experiment is 95%
@xnoreq 6 років тому ⁺⁵
Frequentism doesn't provide probability they way you intuitively understand probability. It doesn't give you probabilities of hypotheses. It doesn't give you probabilities of model parameters.
Instead, it assumes model parameters' values or a hypothesis and then tells you what data the model or hypothesis will produce. It tells you how "likely" or "unlikely" (in terms of frequency) data is _given_ an assumed model or hypothesis.
As you see, this is backwards.
This is most obvious when doing hypothesis testing with a p-value: the null hypothesis is assumed, then the likelihood of your data or more extreme results is calculated (= the p-value), and finally if it is unlikely enough (below an arbitrary significance level, commonly 5%) the initially assumed null hypothesis is rejected and concluded to be false!
Again, this is backwards. Similarly with confidence intervals, they are not probability distributions.
Imagine you collect some samples and calculate a confidence interval. It may contain the true parameter value or it may not. This is not a probability. Either it does or it does not.
If you repeat the experiment you will get a different confidence interval. It again may contain the true parameter value or it may not. In fact, the interval may be complete nonsense.
Using a 95% confidence interval about 95 out of 100 of such intervals should contain the true value. But you have one confidence interval, which is based on a random sample, so the CI is random as well.
@TangerineTux 2 роки тому
If you use Bayesian inference to compute the probability that a given confidence interval contains the true value, it may happen to be 95%, or it may not. In the example given in this video with the truncated exponential, it is clear from the data that the particular interval that we got is 0% likely to contain the true parameter. It would clearly be fallacious to claim that it is 95% on the sole ground that it is a 95% confidence interval. You have to actually check to be sure. And if you are going to do that anyway, why even bother with confidence intervals in the first place?
In other words: does a given 95% confidence interval have a 95% probability to contain the true parameter?
Frequentists: no. Both the confidence interval and the parameter are fixed, so the probability is either 1 if it contains it or 0 if it doesn’t. That will always be the case, there is no frequency to compute.
Bayesians: compute the probability and you’ll know. It may be 95% but it may also be completely different.
So neither philosophy will answer “yes”.
@mikelmenaba Рік тому
This is probably one of the coolest things I've ever consumed
@pragyanOne Місяць тому
21:53, the highlight
@isaackagan9662 4 роки тому
Great talk!
@zilezile4942 4 роки тому
Bonjour à tous,
🔵 Découvrez dès maintenant nos les livres et formations que nous avons réalisés pour vous sur notre site . : www.amikour.wordpress.com
🔵 Cliquez ici pour accéder directement sur nos livres et formations . : amikour.wordpress.com/nos-formations/
🔵 Nous avons pris le temps d’aborder des thèmes importants de notre époque. Plusieurs autres livres et formations sont en cours. Vous pouvez prendre connaissance de l'extrait de chaque livre gratuitement en cliquant sur le bouton feuilletage>> dans notre site web.
Merci
@kamesh7818 6 років тому ⁺¹
Awesome explanation
@zilezile4942 4 роки тому
Bonjour à tous,
🔵 Découvrez dès maintenant nos les livres et formations que nous avons réalisés pour vous sur notre site . : www.amikour.wordpress.com
🔵 Cliquez ici pour accéder directement sur nos livres et formations . : amikour.wordpress.com/nos-formations/
🔵 Nous avons pris le temps d’aborder des thèmes importants de notre époque. Plusieurs autres livres et formations sont en cours. Vous pouvez prendre connaissance de l'extrait de chaque livre gratuitement en cliquant sur le bouton feuilletage>> dans notre site web.
Merci
@jehangonsal2162 7 років тому ⁺¹
Holy moly this is amazing. :)
@duxgarnifex3678 8 років тому ⁺¹
Where is a link to the paper?
@LeivaGerman 7 років тому ⁺³
conference.scipy.org/proceedings/scipy2014/pdfs/vanderplas.pdf
@musiknation7218 2 роки тому
Where is the paper
@mmmhorsesteaks 9 місяців тому
"the posterior is what we're interested in" - brother knows what's up!
@veritatemcognoscere1322 6 років тому
Very good.
@alpa1410 5 років тому ⁺¹
so so interesting
@zilezile4942 4 роки тому
Bonjour à tous,
🔵 Découvrez dès maintenant nos les livres et formations que nous avons réalisés pour vous sur notre site . : www.amikour.wordpress.com
🔵 Cliquez ici pour accéder directement sur nos livres et formations . : amikour.wordpress.com/nos-formations/
🔵 Nous avons pris le temps d’aborder des thèmes importants de notre époque. Plusieurs autres livres et formations sont en cours. Vous pouvez prendre connaissance de l'extrait de chaque livre gratuitement en cliquant sur le bouton feuilletage>> dans notre site web.
Merci
@adityaagrawal3576 3 роки тому
THIS VIDEO DEFINES THE ULTIMATE SATISFACTION after 20:50 .
@phillustrator Рік тому
The paper he mentioned:
conference.scipy.org/proceedings/scipy2014/pdfs/vanderplas.pdf
@karannchew2534 3 роки тому
Study notes.
FREQUENTIST
There exists a model and parameters. How much my data or observation conform to the model. How confident my measured parameters fall within the range of the parameters.
BAYESIANS
I don't care about the model. I'm not trying to evaluate how close my data or observation vs the model. I believe the true values of parameters are this and that. From my data/observation, I will update my believe on the true value of the parameters.
Both are calculating same thing. But different mindsets and approaches.
F = what I think the probability or likelihood should or should not be based on a model, that is based on repeating events.
B = what I believe the probability or likelihood should be, based on knowledge, judgement, and updated data.
@hello-zt8ed Рік тому
bayesian: based on multiple sources of evidence, the chances of my keys being in the living room are 50%, laundry room 35%, bathroom 10% and lost 5%.
me, a frequentist: The keys are 100% somewhere
@KenpoGeoff 4 роки тому
The long run performance of the 'ensemble' of intervals is what gives us confidence the observed interval indeed covers the true value. We can therefore say we are 95% confident the observed interval indeed covers the true parameter. It is nonsensical to consider the population level parameter as an unrealized or unobservable realization of a random variable that depends on the observed data just so we can make philosophical probability statements about the parameter. Unless the parameter was sampled with replacement from the prior distribution and a new value sampled from the posterior, it is absurd to attribute posterior probability statements to the parameter. While the p-value is typically not interpreted in the same manner, it does show us the plausibility of a hypothesis given the data. It is the inversion of a hypothesis test that gives rise to the confidence interval. See confidence distributions, confidence curves, and confidence densities.
@TangerineTux 2 роки тому
And what does it actually mean to be 95% confident that the observed interval covers the parameter?
The article “The fallacy of placing confidence in confidence intervals” makes a rather compelling case that confidence intervals _per se_ are best left uninterpreted. They therefore don’t appear to be very useful at all.
@SphereofTime Рік тому
10:15
@J_The_Colossal_Squid 2 роки тому
Oh, I see, well that's at least simple enough conceptually.
@sveu3pm 8 років тому ⁺²
So many times some ugly data killed some beautifull theory. so is it 10 or is it 18, one is right, another is BS
and i presume bayesian is that which is BS
@bryanmorgan1871 7 років тому ⁺⁴
Actually, he explains that. Go to about 26:00, give or take.
@Diamondketo 5 років тому ⁺¹
The Bayesian interpretation is better for Bayes' Billard Game results.
The frequentist made the assumption that p = 5/8. A more practical frequentist would never use this dataset because it's so small for an experiment that's easily repeatable. If more and more samples are to be taken the frequentist will arrive closer to the true value for p and their answer would be near 1/10.
@anthonyrepetto3474 4 роки тому ⁺²
your presumption was a biased prior - he was asked about that very point at the end of the talk, so you mustn't've gotten that far. *Bayes is correct; the frequentist is wrong* , because the frequentist makes a naïve assumption that "I can just *average* it all, to get p=5/8" while Bayes says "I should consider *all p* which produce such observed data." Frequentism is only useful for evaluating the *average* outcome of a *strategy or method* over the long term. Otherwise, it is fallacious reasoning. Bayes takes care of everything else properly.
@sherifbadawy8188 2 роки тому
Walter white

Наступне

Автоматичне відтворення

All About that Bayes: Probability, Statistics, and the Quest to Quantify Uncertainty