Explaining nonparametric statistics, part 1

Very Normal

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 19 січ 2025

КОМЕНТАРІ • 72

@ln8416 8 місяців тому ⁺³³
What a time to be alive... just open UA-cam and get educational quality content to procrastinate from your statistic lectures. Thank you!
@Sarwaan001 8 місяців тому ⁺⁵
I minored in Statistics and I always wondered how we would handle data that doesn’t follow a certain distribution. I’m glad I stumbled on this video
@sokh4887 3 місяці тому
I appreciate everything you do in these videos. A sincere thank you, you helped me a lot.
@Sir-Mighty-Mike 6 місяців тому ⁺¹
This was very well explained. Thanks for sharing your insight!
@christian7559 8 місяців тому ⁺⁴⁴
Bootstrap is life
@ilusoriob 8 місяців тому ⁺¹
Bootstrap is love.
@isaacnewtonstolemyjoy 8 місяців тому
@@ilusoriob Bootstrap is joy
@2nd_ntr 8 місяців тому ⁺²⁵
nonparametrics sounds like a branch of the SCP Foundation
@prod.kashkari3075 8 місяців тому
Cool! Excited for part 2.
@jeffreychandler8418 8 місяців тому ⁺¹¹
I've heard some people argue that rank based nonparametric methods are not very useful because you aren't measuring the data, but the ranks of the data, which is a fundamentally different problem.
What do you make of this debate?
Ive seen the wasserman "all of nonparametric statistics" cited as providing alternatives and support for that contention.
@very-normal 8 місяців тому ⁺¹³
Disclaimer: I have not thought a lot about this, but here’s my two cents.
I think it’s a fair issue to bring up, especially when the specific values of the data have real-world meaning. I.e I’d want a hypothesis test on my blood pressure to say something about my blood pressure, not its rank relative to others.
Overall, it’s still a valuable tool for people because I view working with a transform of the data to be better than totally ignoring the assumptions of a hypothesis test.
@robertwilsoniii2048 7 місяців тому
The answer is that signed ranks allow us to determine whether or not there are statistically significant effects, even when minorities are present in samples. This is something the Central Limit Theorem can't handle, because when using the Central Limit Theorem you can't tell whether or not an unlikely sample mean is caused by minorities or caused by statistically significant effects. This forces you to either decide that the minorities don't matter or that the majority doesn't matter -- it forcibly discriminates against groups that are different from one another. Signed ranks solve this problem, so that you can test hypothesis without discriminating against minorities.@@very-normal
You need to make the judgment call on which to use based on the situation. In personal medicine, parametric tests make sense because your own body doesn't benefit from signed ranks. But anything involving diverse communities of several people of different backgrounds would benefit from non parametric techniques.
Likewise, machine learning involving classification of diverse objects would benefit from non parametric techniques -- such as k-means clustering.
@chemistrycapital 8 місяців тому ⁺³
Loving the pharma twist to this video
@sujathaontheweb3740 7 місяців тому
You're a great teacher!
@robertwilsoniii2048 7 місяців тому
The Central Limit Theorem *always* applies. But, it *also* marginalizes different groups and minorities in the population. And for that reason, I do prefer non-parametric models.
@maloevain5857 8 місяців тому ⁺⁴
Excellent video but at min 4.15 it should not be the density distribution fonction rather than the cdf ? Because the cdf is strictly increasing.
@very-normal 8 місяців тому ⁺⁵
Yeah you’re right, the notation is for a general CDF. I chose to show the PDF instead since it’s easier to see the symmetry but I should have had another bit of notation there to connect that
@maloevain5857 8 місяців тому ⁺¹
@@very-normal yes it's just a detail, anyway the video is super clear
@zacsanchez7520 8 місяців тому
I'd like to know more about where such a statistic was derived from, I'm not an expert but it seems like a sort of intuitive way(almost back of the envelope-ish) to get the behaviour you described at 6:47
@Minisynapse 8 місяців тому ⁺¹
Would love some content on complex linear models, mixed linear models and all that. But maybe you'd have to start with general linear models first.
@very-normal 8 місяців тому ⁺¹
Yeahhh, it might be a while before I get to the more complex linear models, but I’ll definitely get to them since they’re so commonly used
@Minisynapse 8 місяців тому
@@very-normal Subscribed so I can catch those, keep up the good work, love the format of the videos!
@huhuboss8274 8 місяців тому
Will you cover Dempster-Shafer theory in the future?
@Inter_Are 8 місяців тому ⁺¹
Question!! How could you test if the “typical non-work watch time” was either significantly less than or greater than the 60 min?
(Let’s say you get mad at your employees for watching on the clock, but in reality they watch near 0 min which is causing the low p-value)
@very-normal 8 місяців тому ⁺²
I could specify in wilcox.test that I’d like a one sided test via one of its arguments. By default, it goes with a two-sided test
@Inter_Are 8 місяців тому
@@very-normal Faster response time than most of my professors! I appreciate you and your amazing stats content!! Thanks :)
@OneDSystems 8 місяців тому
which SW do you use to show the formulas with the animations and the graphs, curves, etc?
@very-normal 8 місяців тому ⁺¹
I use manim for those!
@Neptoid 8 місяців тому ⁺¹
What if you need to watch a UA-cam tutorial? It still would count as a non-work site wouldn’t it?
@prod.kashkari3075 8 місяців тому
You should also cover nonparametric regression stuff, like smoothing
@very-normal 8 місяців тому
I think that would be cool! You mean something like kernels or splines, yeah?
@prod.kashkari3075 8 місяців тому
@@very-normal yes
@fadhlyazka 8 місяців тому
Thanks man. Keep it up.
@blessedowo1958 8 місяців тому ⁺¹
Thank you bro. This is helpful
@conceptualprogress 7 місяців тому
AWESOME VIDEO
@johanngambolputty5351 8 місяців тому
Wait a second, was hypothesis testing P(param | data) is proportional to P(data | param) (by bayes) all along? Makes sense I suppose, you do that in maximum likelihood estimation I think, this seems like the instantaneous version, where you're judging one case before moving to a more likely param candidate? (single cost evaluation rather than whole optimisation?)
@lordzekrom2 8 місяців тому ⁺³
Pretty sure I get an entire class on these and SEMs next semester
@very-normal 8 місяців тому ⁺¹
Good luck! SEM was rough for me when I took it 💀
@samcs8927 8 місяців тому
What is SEM?
@very-normal 8 місяців тому ⁺¹
It stands for “structural equation modeling”, it’s often used with latent variables, which are common in fields like psychology
@georgessakr1 8 місяців тому
opinion abput all of nonparametric statistics by Wasserman?
Also any suggestions on bayesians / monte carlo methods??
@very-normal 8 місяців тому
I haven’t read all of it, but I have it as a reference! I like his work overall though.
Not sure about Monte Carlo, but my usual rec for Bayesian stuff is Bayesian Data Analysis by Gelman
@glebpl7068 4 місяці тому
to this day i can`t understand this "thing" with t-test, data not normally distributed and using Central Limit Theorem((( what does it mean with a big bunch of data distributed not-normally?
just bootstrap it million times and get new set of bootstrapped means, which will be normally distributed and use t-test on them?
@very-normal 4 місяці тому
What’s the thing you don’t understand?
@glebpl7068 4 місяці тому
@@very-normal is the following correct?
i have a lot of data. like 10 millions data points for group A and the same for group B. but they are not normally distributed.
do i just re-sample it... for example 10 00 times with sample size of 500, calculate the mean and get new data sets of "sample means", which by CLT will be normally distributed. and then i just apply t-test?
if that is correct - what is the threshhold for "a lot of data to use CLT" and how to correctly choose a sample size and a number of times to re-sample?
thank you kindly.
@very-normal 4 місяці тому
Ah okay I see, in that case, you’d probably be okay using a bootstrap hypothesis test instead of a t-test. Based on your resampled data, you can calculate something like the median, which will be more robust to that non-normality. The bootstrap also requires a lot of data to get good results, but it’s also a nonparametric method. As for how many to bootstrap, usually people are fine with a few thousand. These resamples are easily done by a computer so getting a lot of them is not hard, it’s that sample size that’s usually the limiting factor.
To be completely honest, when you have “a lot” of data, there are more options for you in terms of tests.
The question of how much data is enough to assume CLT works is a difficult one. Each dataset has its own quirks (effect size you want, skewness, variance, missingness, etc) so a general answer is hard to give. The best general advice I can give is to test out realistic situations in simulation to see what gives you good type-I error control.
@glebpl7068 4 місяці тому
@@very-normal the "fun" part is.... my example guided you in another direction. so i still have no clue what the assumption about CLT for t-test in your video means....(((
@very-normal 4 місяці тому
lol no worries, can you clarify for me which assumption you’re referring to? Do you mean the assumption that we have enough data to use CLT for t-test?
@lexinwonderland5741 8 місяців тому
What would you recommend for students who can't afford a license to use R or MATLAB environments?
@very-normal 8 місяців тому ⁺⁷
R is free tho! Also you could go with Python
@jeffreychandler8418 8 місяців тому ⁺⁴
R is completely free for everyone, same with Rstudio
edit: so is Python, Julia, and VSCode
@thiagonunes3183 8 місяців тому ⁺¹
great video
@joelbaptista9725 8 місяців тому
I don't have a lot of knowledge in statistics, so this question might sound dumb. The only thing we've assumed about the distribution to perform this test is that the distribution is symmetric, right?
@very-normal 8 місяців тому
Yes! And also that it’s continuous
@t0mcc 8 місяців тому
very helpful!
@mop4193 8 місяців тому
Nonparametric is not applied a lot practically and does not seem to be preferable that much in research papers. Why? Parametric analysis seem the go-to typically
@very-normal 8 місяців тому
This is my opinion, but I think a lot of it comes from unfamiliarity and unawareness that the methods even exist, especially among non-statistician researchers. I’ve seen some researchers use it, but it is not that common. There are other reasons concerning power & efficiency, but I think most people just don’t think about them
@azure-hawk 8 місяців тому
The interpretation of a non-parametric test also tends to be less intuitive and useful to researchers than a parametric test. Bootstrapping would be a good alternative, though most people tend to forget they have it in their bag of tools (including myself)
@piaveipvsenlawp7402 8 місяців тому
Love your vids, but isn't it theta-nought, as in zero, not theta-not
@very-normal 8 місяців тому
lol yeah I know, I tried going for a pronounciation type thing but I don’t think it worked out 😅
@jamesdavis3851 8 місяців тому
@@very-normal I took it as a choice reflective of the no-frills, and approachable non-elitist attitude toward a difficult subject where the content is what matters
@MKhan-zo8xo 8 місяців тому ⁺⁴
for the algo!!
@very-normal 8 місяців тому
✊
@titong_totong 8 місяців тому
for the algo!!!!
@byronwatkins2565 8 місяців тому ⁺¹
The ACTUAL name of the sgn(x) function is 'signum.' It is Latin for something that has a sign or a signature.
@very-normal 8 місяців тому ⁺¹
🆒
@qqq3230 8 місяців тому ⁺²
you are 1 week too late i already flunked my nonparametric statistics midterm exam 😅
@very-normal 8 місяців тому ⁺¹
my b, i gotchu for the final
@busbymath 8 місяців тому
fix your mic
@very-normal 8 місяців тому
on it

Наступне

Автоматичне відтворення

Explaining nonparametric statistics, part 2