The biggest beef in statistics explained

Very Normal

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 24 лис 2024

КОМЕНТАРІ • 379

@very-normal 2 місяці тому ⁺¹³
To try everything Brilliant has to offer for free for a full 30 days, visit brilliant.org/VeryNormal. You’ll also get 20% off an annual premium subscription.
@bificommander7472 Місяць тому ⁺⁹²
At the end of one Baysian statistics lecture, the professor ended with approximately this summary:
"Frequency statistics give a mathematically rigerous answer to questions no one asked. Baysian statistics tells you what you want to know, based on assumptions no one believes."
@sophigenitor Місяць тому ⁺⁵
The reason why frequentist answers to questions no one asked are still somewhat useful is that under some assumptions they are a reasonable approximation of the Baysian answers to the questions you are interested in.
@Critical-Smoke Місяць тому
@@sophigenitor examples?
@sophigenitor 29 днів тому ⁺¹
@@Critical-Smoke The easiest examples are confidence intervals. What people actually want are Bayesian credible intervals. And in most practical examples, the full Bayesian treatment with flat or uninformative priors will result in credible intervals that are indistinguishable from the Frequentist confidence intervals. I have seen constructed examples where this wasn't the case, but that was caused by boundary effects of impossible parameter values.
@berjonah110 2 місяці тому ⁺¹³⁸
I tend to lean toward the Bayesian approach for two reasons: It tends to be easier to build up complicated models using conditional latent variables, and the prior distribution gives a way to incorporate expert knowledge about a subject. I've worked with many subject matter experts who don't have a firm grasp of statistics, but are very knowledgeable about their own corner of the world. Having the ability to take what amounts to "vibe based reasoning" from them and quantify it using an informative prior distribution gives a lot more power than just using a flat prior.
@martian8987 Місяць тому
calibration
@Bamawagoner 29 днів тому
Apparently you also tend to lack a fundamental understanding of these approaches
@martian8987 29 днів тому
@@Bamawagoner tell us then
@dhonantarogundul1737 2 місяці тому ⁺¹²⁴
I always interpret frequentist statistics as "static statistics" and Bayesians statistics as "dynamic statistics," which works well in my field of study, robotics!
Місяць тому
robotic statistics!
@haukur1 28 днів тому
That's a really neat way of putting it
@hughobyrne2588 2 місяці тому ⁺²⁰
It seems like having factions that say "Pi is determined by geometry: the ratio of a circle's circumference to its diameter" and "Pi is determined by calculus: the limit of an infinite sum (pick your favourite)".
@very-normal 2 місяці тому ⁺⁷
and then some people in one of the factions can’t stand it that the other one says it differently, I won’t say who
@LBR_Bounty Місяць тому ⁺¹
So off of an hour of UA-cam math, I’m assuming the geometry side is frequentist ideals while the calculus side is Bayesian ideals. And the geometry side will be upset at they calculus side of how to interpret it. Let me know if I’m right or wrong.
@scepticalchymist 2 місяці тому ⁺⁵⁴
Idealists will not stop to debate which approach is more valid, pragmatists will just use one or the other depending which one fits best in any given situation.
@Kubboz Місяць тому ⁺¹
@@sumdumbmick I mean, it isn't based on dogma. That's why the guy in your story failed, no?
@Zxymr 2 місяці тому ⁺¹²⁴
I told my Asian parents that I was Bayesian.
They disowned me.
@nunkatsu 2 місяці тому ⁺⁸
Dude, worst time possible for me to read that comment. I just found out that my Asian crush (who reminds me of my Asian ex) at statistics classes in college has a boyfriend. Everything you wrote gave me PTSD.
@definitelynorandomvideos24 2 місяці тому ⁺⁸
ya both should've calculated the probabilities of those events happening
@kevinvanhorn2193 Місяць тому ⁺¹
That's because you mispronounced "Bayesian". It's "bay-zee-uhn," not "bay-zhun."
@xinpingdonohoe3978 Місяць тому
@@nunkatsu PTSD over a crush. Is that normal?
@harlowcj Місяць тому
@@xinpingdonohoe3978Probably for the statistics crowd.
@John-zz6fz 2 місяці тому ⁺²²
One of the advantages of the Bayesian approach is it feels more "natural" to incorporate non-quantitative evidence into your calculations. For example it's pretty easy in a frequentists analysis to calculate the odds of rolling a 6 on a d6 if you have rolled it a hundred times and can see the distribution of prior outcomes (fair or loaded). If instead you tell them we have no prior data but there's double sided tape on the 1 side you can easily "swag" that prior with a Bayesian approach and get better results. I've never actually seen a calculable advantage to either view but if you start fudging some numbers using a frequentist approach it just feel like you are doing something wrong... I don't actually think there is a difference or if there is I clearly don't understand it.
@Ethan13371 2 місяці тому ⁺²⁸
One option for avoiding Bayesian prior-rigging is to simply publish the calculation itself, lacking any prior. Then, the probability is simply some function of the prior, which you could graph or something to visualize it better
@andrewharrison8436 2 місяці тому ⁺⁶
That seems a really interesting approach - as much for the psychology as for the mathematics. It might make it easier to soften a dogmatic prior into an introspection on the evidence/knowlege/belief that underlay the prior.
@danielkeliger5514 2 місяці тому
It is kind of what is the frequentist interpetation of Bayesian methods. They are “just a fancy estimate” for which you can proove things like asymltotic unbiasedness and normality, etc. Fun fact, but apart from degenerate cases, Bayesian estimates are not unbiased.
@Tom-qz8xw Місяць тому ⁺³
The prior is a distribution though? How are you going to graph a functional?
@Y2B123 Місяць тому ⁺¹
Exactly, it is very common to have close to no idea about the prior and somehow be able to state something useful in this way.
@micheldavidovich6940 2 місяці тому ⁺²⁹
Question about the null hypothesis you selected as I’ve had this is ue come up as well. Why do you select H0) pi = 85%? If you want to make a decision on whether the coffee shop is good or bad, shouldn’t it make more sense to asume pi
@BobJones-rs1sd 2 місяці тому ⁺⁹
Excuse the long explanation, but I'm trying to correct a few potential fundamental misconceptions in your question first.
First, the null hypothesis by convention is typically assumed to equal a specific value, though there are some textbooks and sources that use notation like you're suggesting for one-sided tests. The reason for the equals sign even in one-sided tests is because the actual computation of a p-value for a hypothesis test kind of requires this. (In more advanced statistics, there are ways of defining things to get around this and construct a p-value that accounts for a range of null values, but that complexity isn't necessary for this video.) The reality is that the test done in this video is basically trying to compute the p-value using a null hypothesis value that maximizes the probably of a Type I error. If you do a one-sided test, that null value will still be the one that has the equals sign, and any values "to the other side" of the null (in this case, less than 85%) would have a smaller chance of producing a Type I error. The test we're interested in produces a p-value for that maximizing case.
To put it more simply from a math standpoint: whether you're doing a one-sided or two-sided test, you still need a specific null value (not a range) to plug into the estimation formulas for the standard deviation and used in computing the z value which is then used to calculate the p-value. The specific null value (here 85% or 0.85) is the one that maximizes that Type I error probability. Any other value below 85% would give a lower p-value and thus potentially produce an inaccurate test result if used by itself. Hence, pi = 0.85 suffices for the null hypothesis.
I think what you're trying to ask here is why the video didn't do a one-sided test. The difference in that case would really be in the ALTERNATIVE hypothesis (not the null). I think you're arguing for an alternative of H_a: pi >85% rather than "not equal to 85%". Arguably, you're correct that it might be more appropriate here, as he's interested in whether the proportion exceeds 85%. But even if he did a one-sided test (in this basic case), he'd still be effectively constructing distributions to test a null hypothesis for a specific value, not a range. The null still wouldn't need to be written as
@micheldavidovich6940 2 місяці тому ⁺¹
@@BobJones-rs1sd on the contrary, really appreciate you taking the time to answer that thoroughly. You are completely correct about the alternative hypothesis, that was wrong from me. The rest is just ignorance on my part, so really appreciate the explanation
@lazerbungalow Місяць тому ⁺¹
@@BobJones-rs1sd I'm also thinking he left it as a two-tailed test because it was the default value for the test he did it R and didn't really think about it too much. Here, it worked out that the test rejected anyway, but yeah, doing a one-tailed test since that was the research question he was interested in might have been preferred. But it still works for this example.
@otsoko66 Місяць тому
@@lazerbungalow Not necessarily -- doing a one-tailed test assumes that all of the error / variation must be in one direction, and you need to provide some proof for that assumption. If you perform a one tailed test at p = .05 without such proof, you are really doing a test with p = .10. One-tailed versus 2-tailed is not a function of your hypotheses, it is a function of how error /variation happens in the world. An example of a good one-tailed test is change in kids' height from age 10 to age 12 -- we can pretty much assume that kids don't shrink from 10 to 12, and that the error/variation will only be how much they grow. But note we MUST assume no kid-shrinkage to do the 1-tailed test here.
@lazerbungalow Місяць тому
@@otsoko66 I see what you're saying, but my initial feeling is to disagree. If he is only interested in whether it is "better," then if the sample ends up being at the end he's not interested in, he fails to reject the null. There is no type I error there. The error rate is still 0.05. If the two populations are the same, he will still only falsely reject 5% of the time.
Now, you might possibly make the argument that it affects type II error if you are saying that the two populations could be significantly different in the opposite direction that he assumes. Because while that is not his research question, a rejection in that direction is valid science because it calls into question his basic ideas about his hypothesis. So in that case, if you feel it is a convincing argument to do a two-tailed test, then that's something to go on.
But he is not really testing at 10% type I error rate, because in one direction he will fail to reject.
@davidarredondo2106 2 місяці тому ⁺¹³
Excellent video!! I’m almost done with Bernoulli’s Fallacy myself.
I do want to add that, for what I’ll call “reasonable” priors, the choice of prior doesn’t matter in the long run, as the data will dominate the posterior through the likelihood.
Basically, with Bayesian statistics, we’ll find the truth if we just keep on collecting more data.
Again, thanks for this great summary! I teach both a high school stats course and a high school Bayesian data science course, and this is the best short explanation of the difference I’ve seen. Congrats!
@simonpedley9729 2 місяці тому ⁺²
But there are many fields of science where you can't collect more data (pretty much the whole of environmental science). So then priors are critical.
@martian8987 Місяць тому ⁺¹
@@simonpedley9729 So is this a chicken or the egg situation ? (which really doesn't make sense because eggs came first as other animals had them...like chicken's ancestors....so egg always came first!).
@Skyhigh91100 Місяць тому ⁺²
Ok but if you can just keep collecting more data, you can just do a frequentist analysis. If the priors eventually “drop out” of the calculation, all you’re left with is the experimental ratio.
@ZekeRaiden 26 днів тому ⁺³
That would seem to be a concession to the frequentist though? That is, the specific reason given for why Bayesian approaches are better is that frequentist assumptions either don't make sense (e.g. long-past or one-off events cannot be understood as having a "frequency") or refer to impossible actions (somehow collecting a brand-new, comparably-sized set and running the "experiment" again, when such data simply doesn't exist). If fixing the problem of a bad prior requires repeatedly collecting data, the Bayesian is now in exactly the same hot water as the frequentist: they both need do-overs that are impossible or nonsensical. Under those lights, the rationale seems to be in the frequentists' favor by parsimony: the Bayesian is embarked, they _must_ commit to a prior, but the frequentist does not. Instead, the frequentist commits to a particular risk of making a mistake.
Now, I'll note that I'm a pretty firm frequentist who does not have a very positive view of Bayesian methods (not least because I find a lot of Bayesian boosters make some strident and excessive claims...), but I think the point still stands. If the Bayesians' problem with frequentist methods is that the latter requires imaginary repeats, why don't they also have a problem with the risk of bad priors for things where we're only able to update our beliefs a very small number of times.
@simonpedley9729 26 днів тому ⁺¹
@@ZekeRaiden To add to your final comment about bad priors...the Datta, Mukerjee, Ghosh and Sweeting (2000) paper shows that the error due to having the wrong prior, and the error due to not having enough data, are the same order, O(1/n).
@itzsnorlax6057 Місяць тому ⁺²
As someone who lives near that Mostra coffee in San Diego, I recommend it!
@philipoakley5498 2 місяці тому
great point at the end about needing to pre identify the prior _distribution_
(and hence how fast or slow the data will pull toward 'truthiness')
@Toksyuryel 26 днів тому
I often look at this debate through the lens of physics models, where you can have one model that is simpler and often "good enough" in most scenarios, and another model that is much more complex and able to more accurately describe a larger number of scenarios. Examples being electron orbitals vs electron cloud, or newton vs einstein. Here, I consider the frequentist approach to be the "simpler, good enough" form and the bayesian approach to be the "complex, more accurate" form.
@BillyViBritannia Місяць тому ⁺²
Apart from maybe being easier to understand for some people, I don't get what the bayesian approach adds.
For problems with a very small frequency or sample size you prior is the thing that's going to influence the outcome the most so you are essentially guessing where the frequentist would say "no idea".
Doesn't sound like a huge improvement to me.
Edit: I guess if you HAVE to make a decision, saying "it's between 0 and 100" is better than saying "it's 50 but I'm probably wrong"
@very-normal Місяць тому ⁺¹
with something like statistics, the value in being easier to understand can’t be understated
@Skyhigh91100 Місяць тому ⁺⁴
There is an excellent 3blue1brown video that explains a situation where the Bayesian approach is very impactful: health screening tests. Imagine you have an extremely specific test for a rare disease (let’s say the “true” probability of having it is ~1/10,000 people), something that only gives a false positive 0.1% of the time, and a false negative 1% of the time as well. That’s a great test, right? We should give it to everyone to screen for this disease! What’s the harm, right? With such a low error (from the frequentist perspective), most people who test positive will have the disease and be able to be treated.
Well, hold on though, is that assumption true? Imagine giving this test to 1,000,000 people. Since I’ve defined this disease to have an actual objective rate of 1/10,000, about 100 people in this group actually have the disease, with 99 of them being caught by the test and 1 missed. On the other hand, since the test has a false positive rate of 0.1%, 1000 people have been given a false negative result.
That means on a test with extremely low type I and II errors, your chance of actually having the disease if you get a positive on the test is only about 10%!
That’s incredibly unintuitive from a frequentist perspective, and how one would even go about getting that number and justifying it isn’t really clear. Bayesian statistics, however, bake all of these assumptions into the calculation, so they can be interrogated and updated. That 1/10,000 number was something that I just magically knew in this example, but a Bayesian statistician can get a similar prior probability estimate from any number of sources.
The really important part of this is that it demonstrates the need for multiple screening techniques, because they represent multiple times that the probability that you are positive for a specific disease are updated. This is why, for instance, it is no longer recommended that all women get mammograms after a certain age unless there is some other indication that boosts the probability that they have breast cancer: there were too many false positives, and false positives are not free. They cause stress and anxiety for patients, they cost additional healthcare resources, and they dilute the pool of patients who actually need care with patients who only have been told they might need care.
@craigparker1410 2 місяці тому ⁺⁴
Do you use Manim to make your visualizations ? I love how you work through the concepts and keep the canvas as clean as possible. Keep up the great work 🎉
@very-normal 2 місяці тому ⁺¹
Yee i am a manim novice
@tuongnguyen9391 2 місяці тому ⁺¹
@@very-normal Where to learn manim from your bayesian prior ?
@very-normal 2 місяці тому ⁺¹
I’m self taught from reading documentation but I’m aware of tutorial videos on UA-cam
@paulschmitt6703 Місяць тому
Excellent, concise description of the essential differences between the Bayesian & Frequentist philosophical perspectives with examples. The frequentist methodology as used today is all too often a hybrid mess of two distinct approaches. The physically separate frequentist approaches of Fisher and Neyman-Pearson have been mistakenly combined in a manner which neither Fisher nor Neyman would have approved. Bayesian findings tend to be more intuitive than frequentist results - so much so that frequentist analyses are often interpreted as in a Bayesian framework! For example, most consumers of statistical information will interpret a frequentist 95% confidence interval in the context of a bayesian 95% credible interval - as the later is much more intuitive to understand!
@douglaszare1215 Місяць тому
Confusing a confidence interval with a credible interval is just a common error. We can run an experiment to calculate pi and might find a confidence interval of (3.1,3.2), or (2.9,3.1). I guess some people might say that our belief is that the probability pi is in (3.1,3.2) is 95%, but these people are wrong.
@PerishingTar 2 місяці тому ⁺²
You got my butt with that ad transition 😅
@very-normal 2 місяці тому ⁺¹
gottem
@PrParadoxy 2 місяці тому ⁺⁷⁹
Bayesian view doesn't apply very well to statistical physics. There is a concept called micro canonical ensemble, where we assume the frequency of each event to be equal. From this, we can calculate entropy, and from that one calculate other physical quantities of interest like temperature. In Frequentists' view, no problem arises as the physical system's properties is independent from observers knowledge. Everyone agree on the temperature of the box of a gas. However, in Baysian point of view, if someone (say by prior measurements) have extra knowledge about the system, they would not assign equal probabilities to each individual events, causing them to claim a different entropy, temperature, etc, which would not agree with our actual observation. I have seen some effort to fix this, however, Baysian view is not as natural as this video makes it to be.
@very-normal 2 місяці тому ⁺⁵³
yeah that’s fair, I see what you mean. Physics is a totally different world than the biostatistics world I’m used to
@naturalequations 2 місяці тому ⁺¹⁵
@PrParadoxy I want to hear more about this! But I would argue that the Logical Bayesian approach has to do with the process of acquiring information on a real phenomenon and updating the status of knowledge of the observer. Its objects are propositions, not the physical system itself. On the other hand, for statistical physics and quantum mechanics, the probabilistic nature of the evolution of the system is a characteristic of the physical model of the system, has nothing to do with the observer (before measurement) and is objective in nature. While the system evolves, there's no update of knowledge to do for anyone. Moreover, when you build the concept of ensemble, one basically starts with the idea of taking an infinite amount of copies of the system. What this means is that when you go and do measurement of the system, when you want to check if your model is correct or not, the model itself is not deterministic but comes with probability distributions. But the Bayesjan statistics would apply in the context of characterizing statistical properties of the system (P, V, T, S, U, whatever) given the experimental data with its own uncertainties and the model which is now non-deterministic.
@bjorntorlarsson 2 місяці тому ⁺⁴
Does "everyone agree"? Isn't that worse than subjective in that it is also collective? What about the ratio of matter to anti-matter in the universe, how does the apriori assumption work in that case? Wouldn't it be a good idea to consider adjusting the parameter, the theory, because the data doesn't fit well with the prior.
@PrParadoxy 2 місяці тому ⁺¹⁰
@@bjorntorlarssonCertain physical quantities are only meaningful in absolute sense. Imagine if someone tells you the temperature of your boiling water in the kettle, is in fact 0 kelvin. They reason that they know the the micro state of each individual particle, so the amount of uncertainty, or the entropy if you will, that they have about the system is zero. It does not make sense, does it? It has nothing to do with parameterization, really.
@naturalequations 2 місяці тому ⁺⁶
@@bjorntorlarsson Firstly you choose your hypothesis H, which is a proposition like "This physical process can be described by this specific model which has these specific parameters". Then the prior gives a probability distribution for the parameters of that theory given all you already know, both about the physical process itself you wanna describe and the model you're trying to describe it with. Then through the likelihood what happens is that your prior knowledge about the parameters of that model is updated and you get a new probability distribution "a posteriori", that takes into account the new data. If you change the model, you consider a totally new prior associated to the new model. In the case of choosing the relative value of dark matter fraction or dark energy fraction, it's not a change in the model. If you want to be totally agnostic you choose a prior so that all the Omega_i sum to 1 but are then free to vary within the allowed range. If instead you've done multiple experiments already on the LambdaCDM model another choice could be to take as your prior the posterior given by the chain of former experiments. I don't see the issue here
@wayneford6504 Місяць тому
Very clearly and skillfully explained.Maybe one day you will do a series on the logic of science material?
@stephenbrillhart6223 Місяць тому ⁺¹
Did anyone else notice that the “prejudiced” prior distribution toward the end is not a valid probability density function?
@very-normal Місяць тому ⁺¹
manim has trouble drawing beta distributions
@coda-n6u Місяць тому
Thanks for your video! I’m by no means a statistician, but I find Bayesian inference to be interesting and valuable in and of itself when you look at statistical learning. When you need to examine and theorize about the process of learning, viewing probability in terms of a belief updating process is extremely useful. So many people get stuck on the “Bayesian stats is subjective”, but if you’re looking at a machine learning model, the point is that over time it can learn and reduce its error over a training process using belief update rules. Is there a frequentist interpretation of machine learning?
@Skeleman 2 місяці тому ⁺²⁴
To understand the beta distribution:
1. Imagine you are sending rockets to an alien planet to see what portion of the surface is covered in water.
2. You can send probes that hit the ground, are destroyed instantly, but send back whether they landed on water or land.
3. Let's say you send down a probe and it says it hit land.
4. If the entire planet were water, there is a 0% chance the probe would say land. If the planet were 100% land, there is a 100% chance of it happening. If you plot the percent of land on the planet on the x axis, and the relative probability that the probe says land, you get a line from 0,0 to 1,1. You can image the opposite being true if it said water: a line from 0,1 to 1,0
5. If you send another probe and it says water, then you can combine two plots. multiply the land plot by the water plot because at each possible percent of land on the planet, the probabilities are being combined. You'll end up with a parabola.
6. Keep multiplying the right plot if the probe says water or land, slowly you'll get a bell curve where the peak is at the ratio of land and water probes. This is what the beta distribution is.
@ClementinesmWTF 2 місяці тому ⁺¹
Except that this interpretation only works for positive integer parameters α, β, whereas the full scope of the distribution works for any positive real numbers α, β. An actual description of the beta distribution would require a bit more complicated “probe sampling” than described here-having troubling thinking up any alterations to the described example at 3a tho.
@hashmarker4994 Місяць тому
> Here we are trying to understand what Mostra cafe's good rating is. We did a hypothesis t-test with 1 degrees of freedom. Since we evaluated only 1 Mostra cafe instance. Realise that we are not comparing Mostra cafe to other cafe's.
> But trying to understand the Good Rating for Mostra cafe in real life based on its Rating in Google.(Which is the sample, not the population)
Using the frequentist approach.
>We realise the google rating for Mostra cafe is around 0.88 (and we are 95% CI the actual real percentage is between 0.859 & 0.899). But we dont really know the actual probability the good rating is at 0.88, We just know that the population mean is 2 standard deviations around it.
@thinkingchristian Місяць тому
Great video. I always thought the frequentist approach was actually more subjective, because what is truly considered a relevant counterfactual is open to interpretation in many cases (In my view, the Bayesian is more up front about this). Alan Hájek has a lot of great work on this-in fact Hajek is worth reading in general. What may be interesting to note is that in my field (Electrical Engineering) I find most of us are Bayesians. It may be because there is an intuitive connection between Bayesian statistics and topics in information theory like entropy and mutual information.
Still, there is a promise for a hybrid view: as Roderick Little suggests, “inferences under a particular model should be Bayesian, but model assessment can and should involve frequentist ideas". Also it is interesting to note that Clayton's book Bernoulli's Fallacy borrows quite a bit from ET Jaynes (though I disagree with Clayton on a few points). Janyes was a great statistician but he was as hardcore a Bayesian as they come.
@Jk-trp Місяць тому
I like the channel, subscribed, keep up the good work.
@danielkeliger5514 2 місяці тому ⁺¹
Yes, I totally agree that for large sample sizes the two methods basically give the same asnwer. They are also compatible in the sense that Bayesians have their own interpetarion for maximum likelihood and Bayesion methods can be analysed via frequentist language. (Infact it is more natural to understand the limit theorem mention in the video in frequentist terms, in my opinion.)
Still, I want to make some remarks.
Firstly, I’m sort of a prularist. I don’t think probability stands for a single concept. Statements like “what is the probability that this previously unknowm sonnet was written by Shakespeare” can be interpeted in a Bayesian way much more generally, while physical problems (see belowe) makes more sense in the frequentist interpetation. Ultimately, there are many things that satisfied by the Kolmogorov axioms that has nothing to do with randomness. (Say the ratio of votes in an election.) It is possible to do probability theory without reffering to randomness at all.
There are cases when we do actually talk about frequencies in the world. Ergodicity is s good example. Saying things like “if I know the exact inital conditions, I can calculate the exact ratio of times the coins will land on head” and therefore “probabilites are purely epistemic” kind of misses the point. I’m not interested in this very particular initial condition. I want to show that this behaviour when roughly 1/2 of the coin tosses lands on head is typical for most of the initial conditions. This 1/2 number is a property of the system, and it doesn’t describe the mental state of an idealised, rational observer. (With the obvious objection that of cource observations themselves are model dependent, etc.)
Lastly, all the populat interpetations have their own philosophycal problems. I don’t know any interpetation that are not ultimately flawed under greater scrutiny. This is actually very typical when it comes to phylosophical problems. (Think about all the different schools of ethics.) I think I like the propensity interpetation of probability the most, but that is not perfect either.
@vagarisaster Місяць тому
One of the best ad segues I've ever heard. 💀
@notimportant2478 Місяць тому
Frequentist approch looks like a neutral pragmatic approach to statistics while the bayesian approach is more flexible and adaptive approach. I'm sure each have their own strengths in different situations. I believe frequentist approach is good as an first value estimation when you know nothing about the data you're studying while bayesian approach allows you to get more precise results as your understanding gets better. I know what I'm saying isn't mathematically rigorous but it mathematics is always derived from "desired properties" and it very much look like these approaches are developed for the desired properties they offer.
If you have any objections, let me know I'm very interested in learning more about what you think.
@alex_zetsu 2 місяці тому
In my opinion, these two philosophies can be reconciled by thinking of frequentist statistics as just approaching the problem with a specific prior that is asking "am I X percent confident in posterior outcome A?"
@Shantanu_Dixit 2 місяці тому ⁺¹
You just gained a subscriber love your content 🌿🌿🌿
@CrypticManu 2 місяці тому
Man I just love your videos, keep it up!
@xenoduck3189 2 місяці тому ⁺⁷⁸
Bayesan probability still has the same definition as frequentist probability!!! What you are showing is not a "definition" of probability, it is just Bayes' rule, which says NOTHING of P(A), only of P(B|A). The law of large numbers gives the definition of probability, regardless of what field of maths you study. I feel like this was really misrepresented in the video.
@foresthobo1166 Місяць тому ⁺²⁶
Your comment only shows one thing, you don't understand the Bayesian viewpoint.
Back when I was doing my PhD I had a designated helper from the statistics department to coach me on methods. One day we started talking about Bayesian thinking (he was a frequentist). After trying to do some math with me being confused he stated (again as a frequentist): If I roll a die covering it with my hand the probability of it being a specific number from one to six is 1/6. As a frequentist, I say it IS one of the six(it's a physical entity) with said probability, a Bayesian will say we belive it to be something but it isn't until we discover more (remove the hand).
This distinction makes very little sense for his example, but a huge difference for more advanced statistics. (and all the natural sciences that depend on it)
As a sidenote, there is more than one kind of math and they don't always agree. Look it up.
@xenoduck3189 Місяць тому ⁺⁴
@@foresthobo1166 Whether or not you calculate the probability as a frequentist would, what you are trying to estimate through Bayesian thinking is how likely a specific outcome is to happen. If given the means, you can verify your result using the law of large numbers by running a bunch of experiments. That is probability, regardless of what method of dealing with it you subscribe to. I feel like this is not particularly debatable or hard to understand.
@santiagobustamante6192 Місяць тому
@@xenoduck3189I’m a probability and statistics professor for physicists, and a little bit of a quantum information scientist. Sure enough, the probability of physical events should not depend on your interpretation of probability. That is, both a bayesian and a frequentist must agree that the probability of obtaining a particular outcome in a fair dice roll is 1/6. However, the frequentist states this due to previous experiencie with fair dices, whilst the bayesian does it due to complete lack of information of the result of the dice roll. Once the bayesian sees the results (obtains information of the system), they update their probability distribution to one where the observed result of the dice roll now has unit probability (a distribution with zero entropy, i.e. a state of complete knowledge).
An instance which may help understand the difference in the interpretations is the following: for a frequentist, the question “what is the probability that God exists?” does not make any sense and cannot be answered since there is no way of performing trials for the existence of God; something that underlies the frequentist definition of probability. On the other hand, a bayesian may say there is a 50% chance that God exists, since the answer is binary (God either exists or not) and, in this case, a 50-50 probability distribution is the one which best describes the state of complete lack of knowledge (i.e. maximum entropy).
@plainguy3567 Місяць тому
This is incorrect. It makes sense but it isn't correct when you start thinking about events that have eventually but single state.
For example the sun WILL go supernova. 100% this will occur. But the sun WON'T go supernova today. 100% it will not occur today.
So what happens is that the Law of Large Numbers struggles with events like this. The law tells you it WON'T happen (though we know it will) because each day as an experiment suggests it won't but it also somehow tells you it WILL happen because many stars have this fate.
So how do you deduce the odds of the sun going supernova today? If you say it's zero which the LLN suggests and it happens then you were wrong but if you say it's one such as the LLN suggests and it doesn't happen you are also wrong.
Sequenced odds that are not IID do not work with LLN. It's just completely incorrect for more complex systems.
@vez3834 Місяць тому ⁺⁵
I don't know why you need to say that Bayes' rule isn't the definition of probability? When he brings it up in the video, he is pretty clear in talking about P(A|B). He is talking about the philosophical interpretation between the two viewpoints. Bayesian thinking gives us a different way of looking at our error and at our assumptions.
@philipoakley5498 2 місяці тому
What is probability? : one also needs to compare and contrast that with 'statistics' as either synonyms or distinctions to help with discussion.
The frequentist 'close enough for practical purposes' get out also isn't great from an engineering perspective either ('when will the bridge fall down?', 'tracking a radar blip', etc.).
I feel that the Bayes formula starts as a 'complicated' (tricky to visualise) formula, and that P(A & B) = P(A|B).P(B) = P(B|A).P(A) is an easier starting point that is just as simple as frequentist counting with the same underlying assumptions (Belief: identically and consistency of the independent events)...
@RAFAELSILVA-by6dy Місяць тому
This video gives the impression that Bayes Theorem is exclusively part of Bayesian probability theory. It's basic set theory and applies whether you are a frequentist or Bayesian. Another issue is that the finite number of measurements applies across all of physics. You cannot, for example, calculate an instantaneous velocity. You can only measure position either side of a finite time interval - and calculate an average velocity. That does not, however, mean that you cannot use calculus in your physical model and cannot use the concept of an instantaneous velocity. Likewise, although you can only repeat an experiment a finite number of times, you can use mathematics to model an infinite number of experiments. We are free, therefore, to use the mathematics of infinite sequences in probability theory. It's not even necessary to believe that there is an absolute underlying probability: only that you can usefully model a scenario using the mathematical concepts of absolute probabilities and relative frequency as the limit of an infinite sequence of experiments. That doesn't need to be practically achievable in order to be a valid mathematical model. Otherwise, physics would have to rely solely on the mathematics of finite numbers!
Finally, I don't agree that a Bayesian can believe that A has a 20% probability and not A a 50% probability. That would be absurd. The priors have to be consistent. In fact, both frequentists and Bayesians are essentially tied to the Kolmogorov axioms.
@donaldlacombe84 2 місяці тому ⁺⁹
Another advantage of Bayesian statistics is that the joint posterior allows for the calculation of the marginal distributions for the parameters and probability statements can be made regarding these parameters.
@f1f1s 2 місяці тому ⁺⁶
The idea of a repeated experiment showcases the inherent variability in the parameter estimate, i.e. the sampling distribution. A frequentist assumes that there could be a different data set borne by the same invisible data-generating process (law). Bayesians tend to jump onto data matrices as if those n=200 observations were the one and only realisation possible, without other hypothetical scenarios occurring, as if there were no Heisenberg principle or quantum uncertainty. The frequentist approach reflects the randomness of Nature and unobservability of hypothetical outcomes better: ‘it could have been otherwise’. Finally, Bayesians often make ridiculous distributional claims: ‘assuming the prior normal distribution, the posterior distribution of the linear regression slope estimator is precisely Student with n=198 degrees of freedom', whilst frequentists are much more careful about heteroskedasticity, calibration, coverage probability, and Bartlett correction, which are essential to control the false discovery rate: ‘there is some unknown law, but we can compute some functionals thereof regardless of the joint and marginal distributions, as long as enough finite moments exist for the WLLN and CLT to work’.
@ucchi9829 2 місяці тому ⁺¹
Finally, a Frequentist defense.
@Tom-qz8xw Місяць тому
There’s a lot of Bayesian bullshit using parametrised distributions in their posterior
@Velereonics 27 днів тому
I have hated the law of large numbers ever since I had the misfortune of learning about it in high school
@aakashparida2026 2 місяці тому
New found love for statistics....Thank you so much!!
@antigonid Місяць тому
That Brilliant joke was, well, brilliant
@tunneloflight 2 місяці тому ⁺¹
Btw - my arguments with statistics do not mean they are useless. Rather, they are frequently and all too easily abused (intentionally or unintentionally). In every instance, the use of statistics as applied to real world analyses must be critically analyzed and scrutinized, starting with the assumptions, presumptions and desires and relative ignorance of those involved. Even when all of that is fair, statistics often goes wildly away from truth or reality.
And researchers often fail to apply even the most basic critical analyses to the results.
Is the population a single population? Is the population linearly, triangularly, normally, poison or other distributed? Are there hidden variables? Is the data the result of stochastic events acting on stochastic events? Do the results violate sanity? Do the results suggest results outside the bounds of the analysis? Is the thesis or hypothesis that resulted in the data gathered biased in its own rights? Etc...
@angrymeowngi Місяць тому ⁺¹
I take offense at that remark against statisticians on their incapacity for violence.
I'd have you know, Sir, that statisticians are just as likely to commit violent crimes but have less probability of being caught because they know how not to become a statistic.
@minhhungle7488 День тому
ok, hope no one's brain was fried😂 19:42
@markuspfeifer8473 Місяць тому
Bayesianism is just superior. It allows for straightforward statistical connectives and gives us distributions rather than rigid numbers. It’s just a lot richer and might also lend itself more readily to generalizations of statistics once we understand them better (eg negative probabilities and so on)
@punditgi 2 місяці тому ⁺¹
Very educational video! 🎉😊
@AdamKuczynski322 2 місяці тому ⁺¹³
Surely more people have visited the cafe than have left a review? So repeating the experiment and collecting another ~1k reviews from those who simply hadn't written theirs down isn't all that far fetched. Impractical, yes, but entirely possible. The idea of 'repetition' breaks down much more when we think of data we can't really sensibly resample/remeasure (like a country's annual GDP or the employment rate).
@scepticalchymist 2 місяці тому ⁺²
The fact that more people have visited than left a review could be already a bias for the statistics because maybe for some weird reason people writing reviews are already prejudiced in their judgement all in the same way. I guess, this could be modeled in a Bayesian approach but a frequentist just has the plain numbers and cannot take anything else into account.
@dataandcolours Місяць тому ⁺¹
To name individuals "frequentists" or "bayesians" is probably one of the most misleading thing one can do when actually trying to explain this in a helpful way.
@huhuboss8274 2 місяці тому ⁺⁸
Why would you risk having a wrong prior when you can simply use the frequentist apprach? Genuine question
@very-normal 2 місяці тому ⁺⁹
I think that for any simple problem like this, enough data will make up for any “wrong” prior.
But I’m not sure what it means to have a wrong prior in the first place
@floatingblaze8405 2 місяці тому ⁺⁵
(Talking as a complete amateur here) As I understand it, having a "wrong prior" is no less fatal than making certain assumptions that aren't applicable to your circumstances in a frequentist context (given that I strongly believe there's no such thing as "simply" using a frequentist approach... or "simply" using statistics in general😅). Both will lead to the same outcome of misinterpreted or straight up wrong numbers, so I believe this is more of a "pick your poison" situation.
@huhuboss8274 2 місяці тому ⁺⁴
@@very-normal If you believe to have prior knowledge that does not match reality, I would call that a wrong or bad prior, resulting in bad results.
@very-normal 2 місяці тому ⁺⁵
If that’s the case, then I’ll need to rely on someone to call me out on a bad prior. If results are going to be published, they have to be vetted. Why risk someone misusing frequentist statistics when we can force them to express their beliefs via the prior
@WeirdPatagonia 2 місяці тому
Most of the time if you have no good guess you actually use non informative priors, which is equivalent to what you suggest. But, even with that, some people would argue that the interpretation of the findings is different, and they would be right
@tahsinahmed7585 2 місяці тому ⁺³
How would you explain the choice of the likelihood distribution chosen?
@very-normal 2 місяці тому ⁺²
I viewed the data as binary, so it needed to come from a discrete distribution. The binomial is the most commonly used family for this, but nothing would stop me from other discrete distributions that also fit binary data. I would lose conjugacy, but there are tools for doing Bayesian things in that case
@Skeleman 2 місяці тому ⁺²
If anyone is interested in if there is an objective way to pick a prior probability distribution, you do it with something called "maximum entropy".
And the entropy they refer to is the same one the physicists talk about.
@danielkeliger5514 2 місяці тому
I disagree. In the case of the p parameter for Bernoulli would be the uniform distribution. That is, however, depends on the coordite system you choose, as opposied to other methods like Jefferey’s prior. Maximal entropy arguments in general rely on some assumption of a unoform distributions even in physics. (Think about the whole combinatoric derivation with Stirling’s formula.)
Ultimately, all models depends on assumptions let it be frequentist or Bayesion. There is no such thing as “purely leting the data talking for itself”.
@Skeleman 2 місяці тому
@@danielkeliger5514 I agree that there is never a way to "let the data talk for itself". I think i misused the term "objective". There are reasons to use the maxent distribution to ensure you aren't adding any "hidden" assumptions to your analysis.
@danielkeliger5514 2 місяці тому
@@Skeleman I totally agree that uninformative priors are great tools for mitigating subjectivity. I just don’t belive in logical positivism :)
@koonsickgreen6272 Місяць тому
I got lost. With the coffee shop exercise, "The probability it receives a 4 or 5 star review". Receive from whom? Does it mean it 'has' received by past customers or is it ideally about the coffee shop's track record at the end of time by its reviewers?
@__-de6he 2 місяці тому
I guess ,probability is derived from geometric property of our microspace (like general relative theory is derived from timespace geometry). So frequentist approach is more relevant.
@thomasjalabert658 Місяць тому
I would love to have another example with fewer points and were results are much differents
@Velereonics 27 днів тому
When I use your views I just go through the the two three and four star reviews until I find a few that are worded and written in the way that I write and speak and think. Basically I'm looking for someone who has the same personality as me and trying to judge that through the way they leave comments which I think is actually probably a pretty robust method given the way I speak and write.
Anyhow I make a choice based on those few reviews alone because I don't really care what somebody thinks about something if we have literally nothing in common because then look what what determines if something is good or bad to that person is not going to resemble what determines if something's good or bad for me.
@very-normal 26 днів тому
what would you do if none of the reviews talk like you
@joe_hoeller_chicago 23 дні тому
Causality and geometric inference for the win, with sometimes some Bayes. Frequency is only good to see what categories of things are trending in time. Nothing else. Correlation for real world uses cases doesn’t translate well outside of that.
@froao Місяць тому
I didn't understand why in the comparison when he mentioned bootstrap he didn't mention or do a one sided freq test
@very-normal Місяць тому
what would doing a one-sided test have changed
@antoinesoonekindt9753 2 місяці тому
Interesting video. I'm a little bit surprised, though. I'm fairly confident (let's say 0.80) that the uninformative prior for the binomial distribution in a beta distribution with parameters alpha=beta=1/2. I'm using Jeffrey's priors. If there's something I'm missing, I'd like to know.
@very-normal 2 місяці тому ⁺¹
It doesn’t matter much in this context because there’s so much data that it dominates the posterior.
From my perspective, the prior parameters can represent “past” successes and failures, and Beta(1,1) just says we saw only one of both. Having 0.5 of a success doesn’t make as much sense, but it still works in the end. In a paper, we might justify our priors slightly differently
@antoinesoonekindt9753 2 місяці тому
@@very-normal, I concur that the alpha and beta parameters are directly linked to the numbers of successes and failures. Jeffrey 's priors are proportional to the square root of the determinant of Fisher's information matrix, it cannot be as readily interpreted. If other methods for uninformative priors exist, I'm interested. Thanks and thanks for the video!
@seriousbusiness2293 2 місяці тому ⁺²
It's very arguable if the idea of probability itself is a fundamentally real thing. Like as a mini example the digits of pi behave random by every single metric we know, yet they are determenistic and nothing random is happening.
The ultimate goal of probability is modeling unknown outcomes and that can be done in many ways.
So there is no true right option, all we care for is how accurate we can predict things and how interpretable it is to us.
(ps in my eyes Bayesian feels more true to real life and my thinking)
@weetabixharry 2 місяці тому ⁺²
I'm not sure what you mean by "real" here. Casinos make real profits. A digit of pi, selected at random, (it is believed, but not proven) has an equal probability of being any number. Meanwhile, a digit of 50/99 (in base 10), selected at random, will be either 0 or 5 with equal probability. These things seem real to me.
@seriousbusiness2293 2 місяці тому
@@weetabixharry I meant the sequence is random but deterministic, if you pick random digits you introduce other randomness. My thinking is multiple things appear to us as something random but if we knew the underlying dynamics we could often agree that probability theory is the wrong approach. Lets imagine an event i can only measure a single time like "Alex immediately says yes if ask him on a date today.", the idea of doing repeated trials is not real unless i have access to parallel universes, and taking other variables into account to refine my guess like comparing with other people i asked gives confidence but doesn't fundamentally reflect Alex choice then. Even if we measured every atom interaction in Alex brain we get into discussions of quantum and chaos theories. So even if our best models say the probability was 50% we cant tangibly experience or measure that 50% since we only see one outcome.
@weetabixharry 2 місяці тому ⁺²
@@seriousbusiness2293 I think I see roughly what you're saying... and it's uncomfortable to think about. I only feel relatively comfortable in the simple cases where the tests are repeatable and the "parallel universes" all behave the same. For example, I need 1000 dice all rolled in parallel to have the same statistical behavior as 1 die rolled 1000 times. And my dice have to have a *known* probability distribution (preferably, perfectly uniform) or I'm gonna panic.
@seriousbusiness2293 2 місяці тому
@@weetabixharry haha 😂 i feel ya. Ya in any case im sure that probability theory is an extremely good tool for reasoning and decision making and often close to some Truth. But as soon as we get philosophical about the fundamentals then there is room for doubt.
I think its comparable to the situation of going from Newtons Theories to Relativity Theory. Having a fixed frame of reference makes the math easy and it works most of the time but if you care about fundamentals and edge cases you need a relative model of physics.
Thinking about dice and cards is more a clean setup like a Newton model that assumes each object has some absolute probability making for an actually very good model. But converting any probability number into a tangible real world concept may not always work and may need a more nuanced idea of what that number means, like in relativity we found that two observers can disagree on a space or time measurement but that gets fixed if you talk about the new concept of space-time.
@GenericInternetter 2 місяці тому ⁺¹
Not a statistician, but I do have a take on this...
The Bayesian method relies on priors which hamstrings the whole practical purpose of analysis. Instead of debating results, people instead debate priors. It just shifts the whole thing from one frying pan to the other.
The simplistic frequentist approach you described is utterly naive. You completely missed the whole concept of random walk.
In practice, the most reliable approach to probability is the non-naive version with a large dataset, or a large set of datasets.
Random walk is critical to understand for the frequentist approach to make any sense.
For example, imagine flipping a balanced coin 4 times (small example, easier to explain)
The naive approach would assume that larger datasets tend towards 50% heads, but this doesn't make sense.
The probabilities are:
0% heads -> 1/16
25% heads -> 4/16
50% heads -> 6/16
75% heads -> 4/16
100% heads -> 1/16
It's a bell curve centered at 50%. With large data sets, your chance of getting the expected 50% result is only around 6/16, but your chances of getting either 25% or 75% is 8/16... Which means the naive approach is more likely to give an inaccurate result!
Random Walk (results steering away) is a huge topic in itself and definitely needs to be accounted for to rely on the frequentist method.
@very-normal 2 місяці тому
how would it accounting for it help us understand frequentist methods any better
@tom-kz9pb Місяць тому
The Bayesian camp drives artificial intelligence. It is a viable approach by the grace of Big Data. It is a double-edged sword. It can sometimes ferret out subtle patterns that humans would miss, but there is risk of conflating correlation and causation.
The frequentist approach works best if you have a theoretically perfect coin with an exact 50-50 chance of heads or tails. The Bayesian approach works best if you CANNOT be sure in advance whether a coin is loaded or honest, but want to make the best estimate as to the outcome of the next throw, regardless of the uncertain coin status.
@VincentKun 2 місяці тому ⁺⁹³
During this video i just started to hate frequentist approach, they just simplify everything as if it's all independent. Bayesians give a guess and can iteratively get to the right probability by bayesian updates taking into account all the complex stuff the world offers. While with the frequentist approach you need to take a lot of trials.
@AkshayKumar-vd5wn 2 місяці тому ⁺⁹
Lot of trials leads to the average of outcomes which can be analyzed than a focused analysis done one time.
@therealjezzyc6209 2 місяці тому ⁺²⁰
Does constant bayesian updating also not require a lot of experimentation and trials? Not defending frequentism, but your reasoning doesn't make sense.
@AkshayKumar-vd5wn 2 місяці тому ⁺¹
@@therealjezzyc6209 Thats alright.
I use whatever, never thought there was a beef.
But in real world averages are fine.
You cannot expect to inspect every little event or data or records one by one in its details;
Hence generalizations beat specialization.
@therealjezzyc6209 2 місяці тому
@@AkshayKumar-vd5wn averages aren't exactly fine in the real world though because not all distributions have finite expectation and variance. Depends on your domain. For example, the ratio of two normal variables is cauchy, whose expected value diverges. This means that if you build a model which ends up requiring a ratio of two samples then you might not have any convergence in your sample means at all. You will need to use extreme value measurements rather than expected values, and estimate the median instead. This actually happens a lot in finance and other complicated modeling because you are working with heavy tailed distributions, so outliers actually occur quite frequently, enough to throw off your samples. Although this is just me being pedantic, I'm sure you get the point and a lot of things end up being normally distributed (but a lot of things also don't too). Typically averages are only good up until the central limit theorem holds, and you can not know whether your distribution has finite variance or expectation before performing your trials in the frequentists perspective. Which means you might not converge to your desired probabilities ever and be wasting your time.
idk what you meant in your last paragraph about inspecting everything at once though.
@VincentKun 2 місяці тому ⁺³
@@therealjezzyc6209 Yeah their in some sense are two face of a medal so a lot of things are in common, in machine learning we love bayesian updates, and I might be biased by my field of study. But I feel that's the right approach to problems.
@brashmane2749 Місяць тому
There is a mechanics anologue to this:
Do you use classical mechanics or include relativistic effects? Depends. If classical is good enough you use that because relativism reduces to classical for simple and slow systems.
Frequentist or Bayesian? Same reasoning. Depends. If your problem is described well enough (or perfectly) by frequentist approaches you use that, otheise Bayesian. Because why would you shoot yourself in the foot intentionally just to do it the more complicated way?
@Barteks2x Місяць тому
To me this seems like the frequentist approach starts with "the experiments is all we know" and therefore you can calculate the probability directly from definition, while bayesian starts with some belief about what we expect and we try to use not just the experiment data but also other knowledge we may have.
Wouldn't then bayesian approach with uninformative prior always reproduce (correctly done) frequentist approach? The frequentist approach is based on the implicit assumption that every possibility is equally likely, with bayesian you don't necessarily have that assumption, you may provide it explicitly though.
@very-normal Місяць тому
what do you mean by every possibility
@hannahnelson4569 Місяць тому
This may be a dumb response. I think 'every possibility' means the support of the random parameter/variable in question.
@tobiaseriksson7216 2 місяці тому ⁺³
How come we assume everything is a Gaussian or treat things as if they where? Alot of statisticial tests rely on it, but it seems like all of the factors for the test to be valid is always not respected.
@very-normal 2 місяці тому ⁺¹¹
central limit theorem
@therealjezzyc6209 2 місяці тому ⁺¹
@@very-normal CLP doesn't always hold though, especially if your working with a distribution where higher moments diverge. In finance and physics this can happen more often.
@Impatient_Ape 2 місяці тому
@@therealjezzyc6209 Levy distributions, for instance.
@therealjezzyc6209 2 місяці тому
@@Impatient_Ape my goto example is the Cauchy distribution because it looks normal, but it's expected value is infinite. It is also the ratio of two normal random variables so it's actually easy to unknowingly make a model cauchy if you start looking at ratios.
@ucchi9829 2 місяці тому
Have you heard of non-parametric statistics?
@chonchjohnch Місяць тому
I thought probability distributions were green’s functions
@tuongnguyen9391 2 місяці тому ⁺¹
When I use machine learning algorithm to predict stuff, is it the bayesian way or the frequentist way ? or something between both or does it really depends on the data distribution or depend of the specific machine learning algorithm ?
@very-normal 2 місяці тому ⁺¹
I think it depends on the model. For prediction, I don’t think the distinction matters all that much. I don’t work a lot with prediction but this has been my experience
But for inference, it changes how you do statistics and interpret results
@qcard76 Місяць тому
“…For some reason.” 0:37 At least you’re honest about you bias right off the bat.
@cutestbear3327 Місяць тому
i know i am probably focusing on the wrong thing, but shouldn't the cafe example be using a one tailed test? 😖
@very-normal Місяць тому
it could be, but it wouldn’t really change the results of the test
@cutestbear3327 Місяць тому
@@very-normal it wouldn't indeed. thanks for the wonderful and interesting video 🙏
@-NguyenDuyTanA-mh1db 16 днів тому
What program did you use to do research
@very-normal 15 днів тому
I’m not quite sure what you mean, but I do use Obsidian to collect and organize all my research in general
@QuandaleDingle-bq1on 2 місяці тому ⁺¹²
Bayesian propaganda 😂
@very-normal 2 місяці тому ⁺⁶
propaganda for great posteriors
@simonpedley9729 2 місяці тому
A lot of it is hammers vs wrenches. There are plenty of cases where subjective Bayesian isn't appropriate at all. If a drug company did a clinical trial, and proved that their drugs works, based on an analysis that involved their own subjective prior which assumed that the drug works, would you believe them? If someone is trying to prove that climate change affects x, and they use their own prior which assumes that climate change affects x, would you believe them? These examples illustrate that objectivity is sometimes really important (where objectivity means: reducing arbitrary decisions as much as possible...clearly nothing can be completely objective). On the other hand, there are plenty of situations where you should be including subjective prior information.
There is also a whole field of statistics which is frequentist Bayesian methods, which to some extent takes the best of both worlds. It uses Bayesian methods, but has the objectivity of frequentism.
The real problem in statistics is over-use of maxlik, which is neither frequentist nor Bayesian.
@very-normal 2 місяці тому
That’s fair, but to clear something up: priors in clinical trials are often done with past studies in mind and with input from field experts, they’re not often made purely from the beliefs and feelings of a sole statistician
@simonpedley9729 2 місяці тому
@@very-normal yes…there’s a big philosophical distinction between subjective priors and priors from previous studies
@manosprotonotarios5187 Місяць тому
Your Hypothesis should be one-sided in classical statistics p>=.85
@ZergZfTw 2 місяці тому ⁺³
Bayesian statistics reminds me of Kalman filters to a certain degree. It also seems to me that frequentist statistics is the limit of Bayesian statistics as you gather more data points.
@WeirdPatagonia 2 місяці тому ⁺²
Or that frequentist is bayesian statistics with non informative priors (keeping only with the likelihood function)
@SAliGhaderi 2 місяці тому ⁺⁴
The Kalman filter is a direct application of Bayes' rule. In fact, there is evidence suggesting that Laplace may have applied a similar approach in his calculations of planetary orbits.
@xavierlarochelle2742 2 місяці тому ⁺²
@@WeirdPatagonia Using non-informative priors is very different to keeping only the likelihood function. This is especially obvious when you condition your posterior on small samples.
@WeirdPatagonia 2 місяці тому
@@xavierlarochelle2742 In rigor, you are right, in practice, it depends on the size as you say. I haven't encountered a difference yet, but it is also true that most of my analysis are with medium/big datasets. Thanks for your comment
@bschrobru536 Місяць тому ⁺⁴
I noticed that my interpretation of the frequentist confidence interval is quite Bayesian, and I have seen this often in courses as well. What is your take on this @very-normal ?
@very-normal Місяць тому ⁺¹
These courses are mistaking the frequentist interpretation for the Bayesian interpretation
@josiaphus Місяць тому
The basis of the science crisis
@6PrettiestPics 2 місяці тому
More please.
@brashmane2749 Місяць тому
So, in short:
1) Governor et al show gross invompetence and broadcast private information of their employees.
2) Governor et al misuse their power in order to cover up their mistakes and silence the witnesses with threats and false accusations.
3) governor et al attempt to influence the legal process they falsely instigated in order to get at an innocent journalist that did them a favor.
4) After being publicly proven wrong, the governor et al persist in their defamation and malicious prosecution of the journalist.
... Hold on. Doesn't this exact playbook resemble the actions of a certain yellow gorilla? It seems the societal rot does spread from the top down.
@Acbelable 2 місяці тому
O love this guy
@haukur1 28 днів тому
Frequentist view prohibits all notions of epistemology. So it fundamentally has no meaningful way to talk about evidence or partial knowledge.
It's the reason why meta reviews are phrased so awkwardly, compared to something like civil court cases ("judged by the weight of the evidence").
@johnrichardson7629 Місяць тому
The problem with a lot of the current faddish enthusiasm for Bayesian analysis is that soms people are pretending to have very specific, numerical priors that are OBVIOUSLY just pulled out of thin air, at which point, it is unclear what point there is to hearing out the rest of their alleged "analysis".
@very-normal Місяць тому
i was not aware bayesian analysis was a fad lol
@johnrichardson7629 Місяць тому
@very-normal It's no doubt not a fad amongst actual statisticians but it seems to have become a mostly rhetorical gimmick in other fields, including debates over the historicity of religious figures, of all ridiculous things.
@BrentVis 2 місяці тому
Can you talk more about bootstrapping?
@very-normal 2 місяці тому ⁺²
I have an earlier video about it, but I think my better explanation is in my “biggest prize in statistics” video. It’s in the first chapter on Bradley Efron
@Tom-qz8xw Місяць тому
The problem with Bayesianism is the assumption that the data will conform to these parametric distributions, in the real world this is never the case.
@very-normal Місяць тому
i think that’s a general problem for statistical models
@RevolutionAdvanced1 2 місяці тому ⁺¹
I have trouble when you say "you can have strange priors, but you're gonna need to justify them with evidence". There is no rigorous method of assessing whether verbal statements such as "I have a heavy prejudice against cafes like mostra" produce valid or invalid priors. If we cannot have rigor in determining the validity of priors presented in a Bayesian analysis, then we are no longer considering logic and are instead considering rhetoric and argumentation, which the frequentists are very right to point out as being a major flaw.
@kellymoses8566 Місяць тому ⁺¹
One major issue with frequentist statistics is that it only considers the total count of events and not their more detailed order. It would consider a coin that did 1000 heads in a row and then 1000 tails to have the same behavior as a regular coin even though that is clearly wrong.
@TwentyNineJP 2 місяці тому ⁺¹
Finally I have the vocabulary to describe my philosophical objections to the way that the topic of statistics is often discussed. Probabilities have no place in a world of perfect knowledge; to a hypothetical god, all probabilities would be either 1 or 0, and nothing in between.
It is only our ignorance of outcomes that gives meaning to statistics.
FWIW the Bayesian approach is what I studied in signal analysis. I just didn't realize that the whole of statistics was bifurcated like this.
@simonpedley9729 2 місяці тому ⁺¹
It's not actually bifurcated. Bayesian statistics produces useful methods, while frequentist statistics is an aspiration. They are not mutually exclusive. There are plenty of methods that are neither, and plenty of methods that are both.
@barttrudeau9237 2 місяці тому
I really enjoy your videos and style of teaching. I found this video especially well done and I learned a lot about a subject that's hard to understand (I'm an architect, not a mathematician, but I love this subject). I have been trying to learn statistical concepts for years and from this video I am starting to understand that if you are a mathematician focused on the computation and not knowledgeable about the subject you are studying, a frequentist approach may be more appealing and appropriate. If you are a subject matter expert trying to predict possible future outcomes, a Bayesian approach would be a better fit. You could say it caters to your prejudices, that's fair, but it also allows you to employee your expertise. So perhaps a Bayesian approach is more corruptible, but done properly it seems to have higher potential.
@very-normal 2 місяці тому ⁺¹
Thanks, I’m glad they could be helpful to you. And I think you’re right, both methods have their use.
Something I took out of this video was the fact that both methods are necessary in my space of biostatistics. Frequentist statistics are very desirable because of error rate control. If a medicine is risky but possibly useful, we want to be as sure as possible it works. Type-I errors are a different beast when humans are involved. But when we’re trying to look for new drugs and there’s millions of candidates to vet, Bayesian methods are a little better at this because posterior updates are still valid with multiple looks/analyses of the data without having to finagle with our level each time. Sometimes I think people skip my conclusion that you need both but it is what it is lol
@barttrudeau9237 2 місяці тому
@@very-normal I wish I could do both, I try, but I don't have the depth of mathematics education to do it properly. I do the best I can and keep learning every day. Thank you again so much for sharing your knowledge. It's really appreciated.
@zecaaabrao3634 6 годин тому
I haven't seen the video, but the solution is easy, just apply the law of big numbers
@very-normal 5 годин тому
I’m going to start calling it that from now on
@zecaaabrao3634 3 години тому
@very-normal at least in my language, that's the name we call it in school, that for big samples, we get closer to the true value
"Big number then true"
It's kind of a meme
en.m.wikipedia.org/wiki/Law_of_large_numbers
Unfortunately, it has a more rigorous definition
Nvm, you mentioned it, i was a frequentist as a joke by accident
@bjorntorlarsson 2 місяці тому ⁺²
I never understood what's going on with this "choice of significance level" stuff. In social sciences it's often 5%, in particle physics it's 0.000something. Doesn't it imply that there is a third choice? To take an example from our unfortunate days: A soldier has to either move now, or stay put. Wouldn't a 49% significance level decision be better than the alternative?
@bjorntorlarsson 2 місяці тому
Astrophysics is by the way the only instance where I've seen clever people draw conclusions based on data which in their diagrams have an error bar that is taller than the Y-axis. So there's physics, and physics. There's stuff in space that we don't know much about, for obvious reasons.
@calloftrading 2 місяці тому ⁺¹
It does imply that there is a spectrum of choices. The significance level is just a measure of how rigorous you should be. When looking at a particular situation, you are probably not seeing every variable inside that system (system is huge and complex), leading to bigger errors and higher variation among each observation of impact of each i.variable in the d.variable.
@calloftrading 2 місяці тому ⁺¹
So for more controlled environments and when trying to prove theorems and turn them into laws essentially or verified characteristics, you need to be more certain. Therefore, there is a higher level of strictness (confidence level).
In Financial forecasting it is normal to have a higher randomness associated with a bigger and more complex system, at smaller time frames specially, which leads to accepting lower levels of confidence in firecasts
@bjorntorlarsson Місяць тому
@@calloftrading It would be nice if there was a way to quantify which confidence level to use. Taking it from the other way, and simply accepting an outcome together with its confidence level, whatever it is, isn't popular. It's looked down upon.
But if one has to make a choice, as things are in reality, then the confidence level seems to me to be as much a relevant paramater as are the expected value and the spread measure. I don't quite get it why the confidence level should be somehow picked first, and only then the rest of the parameters be evaluated given a binary within or not of such an arbitrary significance.
Isn't btw all of this olden Gaussian way obscolete now, that machine learning fits patterns on big data without considering stuff that were once invented only because they made data analysis simple and practical given the limits of ancient tools?
@calloftrading Місяць тому
@@bjorntorlarsson You always can resort to the p-value that indicates you the invalidation point of the significance level
@alistairwall179 2 місяці тому
Is Han "Never tell me the odds" Solo a Bayesian?
@Kram1032 2 місяці тому ⁺¹
I don't really know this very well, I just remember having read it somewhere so maybe it's completely wrong, but I thought the common "uninformative" choice of he beta distribution is alpha=beta= as small as possible?
IIRC the theoretically optima choice is alpha=beta=1/2 (I'm sure you'll eventually talk about that) but I've seen people argue it really should be alpha=beta=epsilon so like 1/10 or even 1/100 basically.
It's impossible to set both parameters to 0 but just in principle I could have the *effect* of that, I think, by fixing my initial prior as "a beta distribution with alpha=beta=0" without worrying about the issues with that, and then just following the regular update rules and go from there, right? It's like a truly limiting-case uninformative prior I think?
Or is there a good reason not to do this?
@very-normal 2 місяці тому ⁺²
That’s a great question. In my experience, I’ve only seen Beta(1, 1), but most of my experience is in clinical trials, so maybe customs are different elsewhere?
My understanding is that your initial prior parameters also influence how much the data will influence the shape of the posterior. Parameters 1 and 1 suggest you know absolutely nothing with discrete trials. But parameters 100 and 100 still look uniform but suggest you had 200 trials that went both ways the same amount of times. Data will influence the shape of the former more than the latter.
Not a complete answer but I hope it helps a little bit
@Kram1032 2 місяці тому ⁺²
@@very-normal reading up a bit about it now: So alpha=beta=1 is the Bayes-Laplace prior, alpha=beta=1/2 is the Jeffreys prior and comes from a specific proof: This choice is invariant under reparameterization, i.e. (more or less) proportional to Fisher's information matrix. - That's where my suggestion about alpha=beta=1/2 came from
There also is Kerman's "Neutral" prior alpha=beta=1/3 and the limiting case, Haldane's prior (alpha=beta=0)
The higher alpha and beta are, the more the prior influences the posterior so in that sense, if you want literally no influence on the posterior, you really ought to go with Haldane's. In that case, the posterior mean equals the maximum likelihood estimate, but there are also plenty people arguing against that choice.
For very small datasets the "uniform" choice alpha=beta=1 can be a pretty strong bias, but of course if you have LOADS of data it's gonna be fine.
@very-normal 2 місяці тому ⁺²
That’s interesting, I hadn’t heard of these before. It definitely highlights the fact that choosing a good prior isn’t trivial, something I chose not to include in the video
@falquicao8331 2 місяці тому
I think the problem with setting both parameters to zero is that you're not "skeptical" of the data.
Suppose you find a restaurant that has a single, positive review. Would you consider that to probably be a better restaurant than one where 990 people leave positive reviews and only 10 leave negative reviews?
Ultimately it depends on how likely you consider any proportion of positive reviews to be. Personally, I'd say that parameters of beta=0.5 and alpha=2 work pretty well in this case. Ideally, you would find the exact rating distribution of any coffee place and use that.
Also keep in mind that alpha=beta=epsilon means that you think it's either zero or one, with no middle ground. It means you don't expect the value to be a probability but merely a true/false with some accepted error.
@Kram1032 2 місяці тому
there is a looong section on Wikipedia about the Beta Distribution titled Bayesian Inference where it compares a bunch of choices for uninformative prior and quotes a bunch of works by different people. Most often it seems that Jeffrey's prior is favored by theorists, at least as presented on that page
@lemmingsoutside 2 місяці тому
What prior could you use to account for the fact that the tails are probably heavy. I.e. 3 star reviews are a lot rarer than 1 and 5 star reviews?
@very-normal 2 місяці тому
You could set the first parameter higher to reflect this. You could choose one to have a particular prior mean to reflect your thoughts on how rare/common the reviews are
@klyklops 2 місяці тому
Major problems with Bayesian stats (according to a very famous Bayesian statistician) - Gelman A, Yao Y. Holes in Bayesian statistics. Journal of Physics G: Nuclear and Particle Physics. 2020 Dec 10;48(1):014002.
@very-normal 2 місяці тому ⁺²
“recognition of these holes should not be taken as a reason to abandon Bayesian methods or Bayesian reasoning”
@klyklops 2 місяці тому ⁺²
@@very-normal one could say the same about the frequentist approach. My point in sharing is only that the problems with the Bayesian approach aren't just "priors"
@LNVACVAC 2 місяці тому ⁺¹
I am not a mathematician but I have knowledge on biostatistics.
The frequentist approach falls short in regards of rare diseases because:
1 - The definition of rare disease is both arbitrary and normative. (less than 1 in 2 thousand).
2 - Most medics, even doctors, are not sufficiently informed about rare diseases.
3 - Differential diagnosis typically is symptom informed and not revolving on the investigation of prevalence of specific causes. (Example: when you go do the medic with a sore throat the medic doesn't ask for a swab before looking for bacterial plaque. However only a subset of individuals with bacterial infection and sore throat will have bacterial plaque when seeing a doctor.)
4 - All these not only create sub notification, typically above 40%, but also the samples of control and unhealthy individuals will not approach infinity.
5 - Complex-Adaptative Natural Entities often behave like Loaded Dice. Not only in prevalence but in appearance/aspect, aggravating the Item 3 problem.
--
Mathematicians need to understand these are not transcendental matters and that tools are instrumental, not necessary. This battle is not a first principles contention.
@jceepf 27 днів тому
ua-cam.com/video/mZBwsm6B280/v-deo.html
This is a video on Bertrand`s paradox. As a physicist, I am not surprised that Jaynes was a physicist. In the video, each of the method he describes, leading to different probabilities, could correspound to an experiment.... a different experiement. This is of course important background information.
@jacobbartkiewicz9994 2 місяці тому ⁺¹
The bayesian approach doesn’t necessarily need to be subjective. It can simply be a difference between modelled and unmodelled probability. Given omnipotent knowledge, you could create a perfect model of all contributing factors towards the probability of a particular event. The randomness would then be only due to unmeasurable quantum effects i.e. Heisenburg’s uncertainty principle, thus achieving a completely objective probability. Of course omnipotent knowledge is impossible, so any measured probability can be expressed as the objective probability biased by unmeasured factors. I think this actually unties the 2 hypothesis’ but I haven’t done the math to check.
@therealjezzyc6209 2 місяці тому
Heisenberg's uncertainty principle doesn't give probabilities in QM. Also, if you had omnipotent knowledge and could control for all degrees of freedom that determine the outcomes you actually wouldn't need to talk about probability at all, your model would be completely deterministic. Which I think js what you meant to say, but then using bayesian reasoning on a deterministic system is not that effective since you'd be better of using a regular deductive approach rather than the inductive bayesian one.
@danielkeliger5514 2 місяці тому
@@therealjezzyc6209I think OP tried to make a distinciton between “true” probabilites and epistemic ones. Saying a coin toss is basicaly deterministic, the main source of uncertainty comes from our lack of knowledge of initial conditions. While the spin of the photone under measurments is not coused my anything. There is no further parameters down the line that would make it possible to predict the photon’s future state. Even Laplace’s demon with complete knowledge of the world wouldn’t be able to predict the outcome of that measurment as opposed to coin flips (which of cource has some fluctuations from quantum mechanics, but they are negligible for all intent and purposes).
Sure, there is a debate about wether these “truely random” events actually happen or not, but based on Bell’s inequality we have to get rid of either determinism or locality.
@piwi2005 2 місяці тому ⁺³
As usual for a bayesian video, there is much bias towards complexification.
First, the test should be one sided: you requested at least 85%, so please have the courtesy to do the correct one. That divides p value by 2 from scratch. Then, you do not need a confidence interval at all, you have p value. What the test tells you is that from a sample of 1074 people, there was a probability of 0.27% to get the data you got if anyone was puting 4 or 5 stars _less_ than 85% of the time (by the way, this is how you got the 99.7% "that only Bayesian gives you", supposedly....). This is the frequentist approach, and it deals with facts and makes two assumptions: independance of choices from users and validity of CLT. Then from that p value, you can do what you want, you are not even obliged to do anything, because so far you only collected data and did maths. Once the computation is done, you can _finally_ go philosophical and decide you do not live in a universe where you got unlucky to be in the 0.27%.
There is no binomial, no beta, no prior, no "I don't have an idea of my prior, so I'll use uniform distribution but I will call it Beta(1,1)", no some god of philosophy told me that "no idea" meant the existence of a uniform distribution in the realm of ideas, etc...
Frequentist works with facts and try, at least when they're not psychologists or marketers, to be rigorous, not forgetting the assumptions they made. They uses stats to falsify theories, and they don't put probabilities on theories which rermain true or false. Bayesians do decision making, using a tool that always works, always getting an answer whatever was the question they had. It is very good for investors who want to use some maths and have a magical tool that allow them to propose a strategy with some appearance of seriousness, and it will work whenever they were lucky with their priors. But at the end of the day, either your posterior "probability" depends a lot on your priors, and you only put a number on your feelings, or it doesn't and you didn't need to go Bayesian. Frequentists don't deal with philosophy. Bayesian do and must.
@very-normal 2 місяці тому ⁺¹
wow
@mousev1093 2 місяці тому ⁺¹
It's a shame this isn't higher but that's probably to be expected in a channel/comment section so heavily biased to one approach. The number one suggested video to follow this is literally called "the better way to do statistics"
His entire interpretation of the "frequentist perspective" was purposefully limited and he tried to divorce it from reality and naturally occurring events. I'd go as far and argue that his interpretation of how to report a confidence interval was bordering on incorrect. It can, and should be, phrased practically identically to the way he talked about credibility intervals later. The entire point is that you can't know something perfectly to arbitrary confidence and the estimation of true probability can only be refined. A confidence interval is the way of quantifying this spread of uncertainty. He even contradicted himself on the definition of "repeated experiment". First he defines experiments as events that produce individual data points and then he's purposely obtuse and redefines repeating the experiment to gather another 1074 reviews.
Really should have partnered with someone else to present the other side. An entire video with straw men is boring
@very-normal 2 місяці тому
feel free to make that video with more correct frequentist teachings, more good statistics videos wouldn’t hurt on UA-cam
@mousev1093 2 місяці тому ⁺¹
@@very-normal I think that might hurt my current content algorithm stuff yknow ;)
@AntonioJose-ot5nf Місяць тому
Another interesting example is the number of permutations of n distinct elements, such that none of them stays in its original position. The answer happens to be the closest integer to n!/e.
@waraiotoko374 2 місяці тому
I still don't understand why the Bayesian method is not susceptible to manipulation and subjectivity. You claim that even if I arbitrarily choose the initial probability, it only makes sense if it is supported by evidence. But where does that evidence come from? From the frequentist method, right? Because if it's from the Bayesian method, then I'm stuck in a circular argument... am I not?
@very-normal 2 місяці тому
If a past study uses a frequentist method to analyze the data, then a new prior should be formed to reflect what that finding found. For example if a past study found the probability to be 70%, then my new study should probably make the prior on and around 70% more likely.
If past studies use a Bayesian analysis, then it’s even easier. The posterior from the past study becomes the prior in the new study.
The past data helps inform the prior, not so much the method was frequentist or Bayesian. You’re right that it can be hard and arbitrary to choose a prior, but that’s not a reason to abandon the method in the first place. Classic frequentist methods don’t work well with smaller sample size, yet people are taught to do it anyway
@camillebrugel2988 2 місяці тому ⁺¹
Couldn't the null hypothesis be an inequality ? That would have been more logical to ask whether mu > 0.85. You have have an MLE of 0.88 and with a t-test you get your p-value and confidence interval but instead of the p-value you could get 1-p_value to get a similar probability than the one from the bayesian side ? I know the student distribution of the test statistic is very different from the bayesian posterior but I would make this kind of bad leap in reasoning intuitively. ^^ There isn't a test statistic for inequalities?
@camillebrugel2988 2 місяці тому
Sorry it's pi not mu, need to practice my greek alphabet.
@camillebrugel2988 2 місяці тому
And it should probably be 1_p_value/2 for the symmetry.
@very-normal 2 місяці тому ⁺²
You could use a composite null hypothesis actually! You’d end up with a one-sided test. I’m aware of other tools for composite null hypotheses, but they’re usually outside the scope of what most statistics users would be familiar with
@sriharsha580 2 місяці тому
"C.I does n't tell us if it contains the true value of PI or not, you can only know that if you repeated the experiment multiple times then most of them will. " Can you explain this statement I didn't get it.
@very-normal 2 місяці тому ⁺¹
The definition of confidence is the proportion of intervals that contain the true parameter value. Different experiment repetitions will produce different datasets, so the ends of the intervals will change depending on the data.
In the same way that choosing a 5% level means you only get a type-I error in 5% of experiments, the confidence interval will contain/cover the value of the true parameter in 95% of experiments. There’s no guarantee that you know the one you calculated actually contains it or not
@ckq 2 місяці тому
I'm a big predictions markets and forecasting guy so I essentially take Bayes for granted.
Typically you look at past frequencies then apply a prior to it, which can get kinda subjective. Bayesian as a word seems so fancy (aka objective) I thought you could never be truly Bayesian. But really its just fancy terminology ppl say to look cool and not be a frequentist dummy who doesn't understand that randomness exists.
I never really saw it as a conflict.

Наступне

Автоматичне відтворення