Thank you for your excellent explanation! This video is like the sufficient statistic for the topic of sufficient statistic because you don't have to go back to other sources to solve the exercises:)
Thank you! Fair warning, a couple of professional statisticians have contacted me saying I'm wrong on a technicality I don't really follow, saying "parameters are not random variables". It remains true that this is a correct result and an easy way to remember it.
Finally! This helps resolve the discrepancy between the UA-cam videos trying to explain the concept and my inference book trying to define it. Good work and thanks a lot!
Thanks very much for this video! I spent the last couple of hours trying to wrap my brain around exactly this. Thanks for saving me from further headache!
Also, my guess as to why they flipped it is that, while the intuitive definition makes sense, it seems less straightforward to compute. If we know the underlying distribution of the data (even if theta is unknown), P(data|U) is more straightforward to calculate than P(theta|U). It would have helped if they clarified that in my textbook though.
Hello, nice video. The reason probably goes back to the guy who coined the concept of sufficiency: Fisher. Your definition, while being indeed more intuitive, treats Theta as random. Fisher was "against bayesianism", thus against treating unknown parameters as random. The advantage of the usual definition of a sufficient statistic is that is can be formulated in a frequentist framework.
@@robertcruikshank4501 Dear Robert! As far as I understand the matter, the treatment of \theta as a random entity is not good at all. My point is that when you try to write all your arguments in a rigorous manner, you will feel that the conditional probability involving \theta as a parameter but not as a random quantity do not make sense. This is my feeling and I may be wrong, although I checked the calculations. However, I appreciate your video since it confirmed to me that not only me and my son have troubles in understanding the definition of sufficient statistics. I am PhD in Probability and my son is the second year at bachelor in Statistics and I had tough time to explain the sufficient statistics to him. If I find time, I shall write down my explanation and send it to you. Thanks once more. Sincerely, Vladimir Belitsky.
I had trouble remembering this definition because to me it didn't quite match the intuitive concept that was presented before the definition was given (i.e. you don't need to go back to the data for a better inference on theta after you observe U). Thanks to you, this finally makes sense and I don't need to check my notes whenever the definition of sufficient statistic is mentioned. However, I want to remark that I didn't find obvious the implication "P(data | U, theta) = P(data | U) -> P(theta | U, data) = P(theta | U)", because whenever I tried to prove it with the axiomatic properties of conditional probability (the ones on the right of your whiteboard) I found it difficult to deal with the "double conditioned probability" (P(data | U | theta)). Thus I decided to write every conditioned probability with its definition and prove the implication this way (to make things clear, I wrote the first equality as P((X in A),(U in B),(theta in C))/P((T in B),(theta in C)) = P((X in A),(T in B))/P(T in B), I hope this is clear). As to why the definition is given backwards, I think it is because the intuitive definition doesn't make sense in the frequentist approach: P(theta | U, data) doesn't mean anything since theta is a parameter and doesn't have a law. I hope this makes sense, thank you again for your video!
Dude yes. I had this lightbulb moment myself a while ago. I am usually really good at figuring out the intuitive notion behind mathematical definitions, but this one took me a while. Good work.
When we say that a distribution doesn't depend on θ, we mean that we do not see θ in its equation. It's not the same thing as random variables' independency, thus we can't really use the respective theorems. Plus, θ is not a random variable.
In Bayesian statistics, theta IS a random variable.I will have to work on figuring out how a frequentist can make sense of this definition. Thank you for pointing out this problem.
@@robertcruikshank4501 I haven't studied Bayesian statistics, so Idk. If I don't take it too strictly, you make sense, though. Thanks for explaining it!
Thanks for the explanation! I wonder if they avoided defining the sufficient statistics by the posterior due to certain regularization conditions, like avoiding the marginal distribution is non-zero at certain points?
It's been pointed out to me that with a frequentist interpretation my description makes no sense. I'm not 100% sure of that, but my expertise is limited. I wasted ten hours wrapping my head around it so I wanted to spare everyone else those ten hours if I could.
The reason why they have to switch it up in general is that the expression P(theta | U, data) = P(theta| U) does not make sense unless you're a Bayesian. If you're doing frequentist statistics, the parameter is not random, just unknown. This means that the expression P(theta) simply does not make sense. This also goes to show how many people are intuitively Bayesian to begin with hahaha.
Yes, I have heard this argument. As far as I can understand (which is limited), it leaves math behind and dives into philosophy. Granted, the philosophy of probability theory is seriously messed up to begin with. It wasn't until I tackled advanced statistics that I realized that I owed QM an apology for calling it nonsensical--it merely inherited most of its problems from probability theory. But to get back on point: if I fully understood the issue you are describing, I would have made another video about it. Sadly I must leave that to better minds than my own.
Thank you for your excellent explanation! This video is like the sufficient statistic for the topic of sufficient statistic because you don't have to go back to other sources to solve the exercises:)
This is the best explanation of the definition of sufficient statistics I have ever seen. Thank you for sharing this awesome experience.
Thank you! Fair warning, a couple of professional statisticians have contacted me saying I'm wrong on a technicality I don't really follow, saying "parameters are not random variables". It remains true that this is a correct result and an easy way to remember it.
@@robertcruikshank4501 It is true. Thank you for the clarification. That is why I'm Bayesian. Your explanation is great.
absolutely great video
Studying for an advanced statistics exam rn, so this helped a lot, thank you! =)
Thank you for this video! I’ve been having a lot of trouble wrapping my head around sufficient statistics
Finally! This helps resolve the discrepancy between the UA-cam videos trying to explain the concept and my inference book trying to define it. Good work and thanks a lot!
Glad it was helpful!
Thanks for the upload, very clear thinking!
This is the best video on this topic I have watched so far. Thanks.
Thank You
Thanks very much for this video! I spent the last couple of hours trying to wrap my brain around exactly this. Thanks for saving me from further headache!
Also, my guess as to why they flipped it is that, while the intuitive definition makes sense, it seems less straightforward to compute. If we know the underlying distribution of the data (even if theta is unknown), P(data|U) is more straightforward to calculate than P(theta|U). It would have helped if they clarified that in my textbook though.
Exactly what I was looking for, thank you
Thanks Robert, really useful explanation!
THANK YOU SO MUCH
awesome explanation!
You are a god sent, bless you sir and thank you.
Hello, nice video. The reason probably goes back to the guy who coined the concept of sufficiency: Fisher. Your definition, while being indeed more intuitive, treats Theta as random. Fisher was "against bayesianism", thus against treating unknown parameters as random. The advantage of the usual definition of a sufficient statistic is that is can be formulated in a frequentist framework.
Thanks for the interesting commentary!
@@robertcruikshank4501 Dear Robert! As far as I understand the matter, the treatment of \theta as a random entity is not good at all. My point is that when you try to write all your arguments in a rigorous manner, you will feel that the conditional probability involving \theta as a parameter but not as a random quantity do not make sense. This is my feeling and I may be wrong, although I checked the calculations. However, I appreciate your video since it confirmed to me that not only me and my son have troubles in understanding the definition of sufficient statistics. I am PhD in Probability and my son is the second year at bachelor in Statistics and I had tough time to explain the sufficient statistics to him.
If I find time, I shall write down my explanation and send it to you. Thanks once more. Sincerely, Vladimir Belitsky.
Thank you 🙏🏾
Thank you so much for your videos! They make life a lot easier, really appreciate it
Glad you like them!
Thanks a lot!
Thanks a lot sir
I had trouble remembering this definition because to me it didn't quite match the intuitive concept that was presented before the definition was given (i.e. you don't need to go back to the data for a better inference on theta after you observe U). Thanks to you, this finally makes sense and I don't need to check my notes whenever the definition of sufficient statistic is mentioned.
However, I want to remark that I didn't find obvious the implication "P(data | U, theta) = P(data | U) -> P(theta | U, data) = P(theta | U)", because whenever I tried to prove it with the axiomatic properties of conditional probability (the ones on the right of your whiteboard) I found it difficult to deal with the "double conditioned probability" (P(data | U | theta)). Thus I decided to write every conditioned probability with its definition and prove the implication this way (to make things clear, I wrote the first equality as P((X in A),(U in B),(theta in C))/P((T in B),(theta in C)) = P((X in A),(T in B))/P(T in B), I hope this is clear).
As to why the definition is given backwards, I think it is because the intuitive definition doesn't make sense in the frequentist approach: P(theta | U, data) doesn't mean anything since theta is a parameter and doesn't have a law. I hope this makes sense, thank you again for your video!
Dude yes. I had this lightbulb moment myself a while ago. I am usually really good at figuring out the intuitive notion behind mathematical definitions, but this one took me a while. Good work.
Thank you!
Wow appreciate the great insight! Very concise too. Thanks!
My pleasure!
Super helpful...
Thanks a lot !!
Thank you so much for the insightful video
Glad it was helpful!
thanks sir
Thank you! Very useful explanation
Glad it was helpful!
amazing, thank you
When we say that a distribution doesn't depend on θ, we mean that we do not see θ in its equation. It's not the same thing as random variables' independency, thus we can't really use the respective theorems. Plus, θ is not a random variable.
In Bayesian statistics, theta IS a random variable.I will have to work on figuring out how a frequentist can make sense of this definition. Thank you for pointing out this problem.
@@robertcruikshank4501 I haven't studied Bayesian statistics, so Idk. If I don't take it too strictly, you make sense, though. Thanks for explaining it!
Thank you 🙏🏻
You’re welcome 😊
Thanks for the explanation! I wonder if they avoided defining the sufficient statistics by the posterior due to certain regularization conditions, like avoiding the marginal distribution is non-zero at certain points?
It's been pointed out to me that with a frequentist interpretation my description makes no sense. I'm not 100% sure of that, but my expertise is limited. I wasted ten hours wrapping my head around it so I wanted to spare everyone else those ten hours if I could.
Thanks!
You're welcome!
감사합니다 많은 도움이 됬어요 :)
천만에요 (You're welcome according to Google Translate)
yes it was really nice
Mahadsanid mudane
Adaa mudan ("You're welcome" in Somali, according to Google Translate)
love you man
The reason why they have to switch it up in general is that the expression P(theta | U, data) = P(theta| U) does not make sense unless you're a Bayesian. If you're doing frequentist statistics, the parameter is not random, just unknown. This means that the expression P(theta) simply does not make sense.
This also goes to show how many people are intuitively Bayesian to begin with hahaha.
Yes, I have heard this argument. As far as I can understand (which is limited), it leaves math behind and dives into philosophy. Granted, the philosophy of probability theory is seriously messed up to begin with. It wasn't until I tackled advanced statistics that I realized that I owed QM an apology for calling it nonsensical--it merely inherited most of its problems from probability theory. But to get back on point: if I fully understood the issue you are describing, I would have made another video about it. Sadly I must leave that to better minds than my own.
Thanks!