Wilcoxon Rank Sum Test in Excel

Matthias Kullowatz

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 30 вер 2024

КОМЕНТАРІ • 83

@brandoncicero7497 9 років тому ⁺⁶⁸
Great information. Background "music" was terrible.
@weloveyoona693 5 років тому
totally agree. upsetting with that noise
@brucenmezi4880 9 років тому ⁺²²
Informational video, but the backgound music is SO DAMN DISTRACTING.
@MatthiasKullowatz 9 років тому ⁺⁴
Haha, yes. It drives me nuts, too. I should go back and re-record that.
@felipeoriani 9 років тому ⁺²
+Matthias Kullowatz record another video without the music! Very good video and helpful! Thank you! :)
@adanesh 2 роки тому ⁺¹
why a music background???!!! distractive
@larsmadsen377 3 роки тому ⁺¹
Great video!
What is the name of the song?
Total banger
@jochenk7884 8 років тому ⁺³
Dear Matthias. Very helpful video! Just one small question. If P-value (one -tailed test) is >0.5, what about the P-value for the two-tailed test? It can't be >1.0. I understand that the two samples are not different, as P is definitely >>0.05, but unfortunately I need to calculate the exact P-value. Thanks for your explanation.
@nathalieklip1608 7 років тому ⁺³
that guitar is realyyyyy annoying
@fuchsberger100 7 років тому ⁺²
Great vid. But how do i evaluate the test if my data is not normal distributed?
@avachavoshi574 3 роки тому ⁺¹
Thank you! :)
@brianevans3087 5 років тому ⁺¹
In the standard error calculation, do you divide it all by the smaller "n"?
@MatthiasKullowatz 5 років тому
Ah, good question. It's a bit confusing in this example because we're dividing by 12 inside the square root, and both sample sizes are 12. Who picked this stupid example anyway?
We always divide by 12 inside the square root. This can be proven through statistical theory if you're curious, but the point is, that denominator has nothing to do with sample size.
@CorrinneKarch Рік тому
Why is there a guitar playing in the background?
@naadde 8 років тому ⁺¹
How can I do it with unequal sample cases? f.ex compare 1500 samples with 1800 samples?
@MatthiasKullowatz 8 років тому
+naadde A few points about your question.
1) Technically, this Rank Sum test is designed for symmetric distributions, and thus with such large sample sizes I would typically suggest using a two-sample T-test. You can find my video for that test here: ua-cam.com/video/VH6SWucBQQc/v-deo.html
2) Despite the large sample sizes, you may still be interested in making a statement comparing medians/distributions rather than means. If you proceed with the Rank Sum test, I think you're going to have a hard time finding a table with such high values (if you've been instructed to use a table of critical values). In that case, you would use the normal approximation, which I covered in the video. Notice that I have cells reserved for n1 and n2. These are your two sample sizes, and they don't have to be the same number. All my cell formulas remain the same.
@martin48428 8 років тому ⁺¹
instead of typing $-sing u could just press F4...and u can drag the formula horizontal wise as well...
@MatthiasKullowatz 8 років тому
+Ruatfela Hmar This is something I have only recently discovered, and if you have a full keyboard, it is awesome!
@liamhoward2208 4 роки тому ⁺¹
I feel like I’m out in the desert smoking mushrooms with Jim Morrison watching this video. Good explanation though haha
@MatthiasKullowatz 4 роки тому
That's what I was going for! Finally someone gets it.
@liamhoward2208 4 роки тому
Matthias Kullowatz seriously?
@MatthiasKullowatz 4 роки тому
@@liamhoward2208 Haha, no, sorry. I was just using free software that, by default, added that music. I only realized it later.
@ericomeli2235 5 місяців тому
get rid of the music
@erickorte1519 4 роки тому
First I hope you and your family are safe and healthy in these odd times.
My questions is based off another question, do you always use this formula =NORMDIST(J12,0,1,1) when the expectation is higher than the sum and the stat is negative? Is this a rule or are there times you wouldn't do that and you would use the original formula?
@นางสาววิราญสุขจันดี Рік тому
Thank you so much
@nikitachaurasiya4375 Рік тому
My p value is 1 ?
@ava_chavoshi 2 роки тому
Thank you!
@syntrope 8 років тому ⁺¹
I am getting a p value of zero, what does it mean for this test
@MatthiasKullowatz 8 років тому
+syntrope Quick answer: A p-value of zero (or close to zero) means you should reject the null hypothesis, and claim there is evidence to suggest the two groups are different.
Longer answer: A p-value of zero (or close to zero) literally means that, assuming the two additives had the same effect on gas mileage, there is about a 0% chance that measurements from two random samples would be as different as they were in your data. The statistician would then suggest that the original assumption (that the two additives had the same effect) was not a good assumption, and she would reject it. That's what we mean by saying "reject the null hypothesis."
@roseb2105 8 років тому ⁺²
what us with the music
@MatthiasKullowatz 8 років тому ⁺¹
It's a classic story about one of those free screengrab apps that defaults to crappy music, but I'll spare you.
@terryfan7403 9 років тому
Hi Matthias Kullowatz! Thank you so much for the video! This totally helped me on statistics!
I have a quick question. For the Stat, I got a negative number from the difference in the equation:
(Sum - Expectation)/std. error
Is it still correct if I put the equation in absolute value? As in:
Abs(Sum - Expectation)/std error
Any reply from anyone would be super appreciated! (Would be awesome if there was source link to any additional information!)
@Muuip 2 роки тому
Excellent!
Much appreciated! 👍
@ravitweissman3832 4 роки тому
Thank you, very helpful and I like the music! Do you have an opposite example or can you explain when t test yields not significant difference and the Wilcoxon Rank Sum Test is significant?
@MatthiasKullowatz 4 роки тому
I can't put together another new example, but around the 10-minute mark you can see a case where you would FAIL to reject the null hypothesis at the 5% level of significance (in the case where there is a data error). The process would look the exact same, and it would come down to the p-value and your chosen level of significance (5% is common).
@thedestroyer7326 4 роки тому
Thanks, it was very helpful !
@svenjawulsch4868 7 років тому
Background music is so annoying, sorry!
@samshore8659 4 роки тому
This was super helpful! What do I do if my data is not normally distributed however?
@MatthiasKullowatz 4 роки тому
The beauty of the rank sum test is that it requires no specific underlying distribution. In fact, one of the instances in which the rank sum test might be better than the two-sample t-test (for difference in means) is when the data are NOT normally distributed.
@KeepingMyDayJob 9 років тому
Great video but I believe you need to use the rank.avg function instead of the rank function if there are any ties! Rank.avg will work without any ties too, of course.
@MatthiasKullowatz 9 років тому
Unfortunately, my 2007 version does not have the rank.avg function. That would have been nice :-)
@ajcarter8456 8 років тому
My professor stated your method in computing the std. error is "not intuitive." Can you provide a reference to support this method?
@MatthiasKullowatz 8 років тому ⁺¹
en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Normal_approximation_and_tie_correction
In that link, the "U" statistic has the same variance as in this test with our "W" statistic. But you'll notice our calculation of the mean, or expectation, is different here. That results from a shift in the distribution. Whenever you shift a distribution up or down by a constant, the variance remains unchanged. So the test cited in Wikipedia, and the test here really are the same test, and you'd get the same Z statistics either way.
@gaildrescher541 10 років тому
What if you have an unequal sample size? What do you divide by?
@MatthiasKullowatz 10 років тому
If you look at the formula for the denominator of the test stat here: en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Normal_approximation
You'll see that the two sample sizes (n1 and n2) could be different.
Is that what you were asking?
@brianschanding2364 7 років тому
Thanks for the helpful video. Can I perform this statistic if I have a) different sample sizes in the two groups I'm comparing and b) approximately 20-30% of values being zero, or many, many values of 1? Oh yeah: and can the two samples have many of the same values? In your example all the values are different (no ties).
PS: I don't hate the music...
@MatthiasKullowatz 7 років тому
Great questions:
a) Yes. The steps in the video allow for different sample sizes, though that's not immediately clear without another example. It doesn't matter which of the samples you call sample 1; the formulas in the video account for however you choose that.
b) Strictly speaking, the test was designed for continuous data with enough decimal points to differentiate all values. That said, it is often used on data with tied values (like all your 1s and 0s). When two or more values are tied, use the average ranks of those values. For example, if the values are 3, 6, 7, 7, 8, then the 7's tie for 3rd smallest and occupy the 3rd and 4th values in the ordered data. Give them both ranks of 3.5. In versions of Excel after 2007, there should be a RANK.AVG function that automatically does this for you, and that's the only thing you'd have to change.
I just simulated the test with distributions like yours--lots of tied values. The test seems to be good at avoiding Type-I error in small samples (n1 = 10 and n2 = 10). Between 3 - 4% of the randomly generated runs rejected incorrectly at the 5% level. However, type-II error was a bit high at 35 - 40%. You may recall that the researcher controls the Type-I error rate with alpha, but it's the sample size that heavily influences type-II error rate. When I doubled the sample sizes of the two samples (n1 = 20 and n2 = 20), my type-I error rate hung around 4 - 5% and my type-II error rate dropped to 5 - 10%.
The basic takeaway is that this test can still perform well with many tied values, though every situation that breaks the assumptions of the test is somewhat unique. That's like my disclaimer.
@brianschanding2364 7 років тому
Thanks for the response!
I think for ties I'm still struggling a bit. In one sample, for example, of 409 entries, there are approximately 35 zeros, maybe 30 with value of 1, etc. The other sample is smaller (286), but with similar distributions of zeros and 1s. So the average rank for the first 35 items (which are all zeros) would be the same rank? I guess it does seem strange to rank zero values.
About rank ordering: Something occurred to me, but perhaps it violates an assumption...
In my research, these values I'm using occurred in student-generated texts. So a value of 1 would mean in one text, the particular item occurred once in that text. If it has a value of 4, the item occurred 4 times in that text, etc. However, each text varies in total word count. What if instead of counting raw frequencies per text, I normalized according to total word count in each text? For example, if one text has 1 occurrence out of 380, it's new value would be .263 per 100 words (or 2.63 per 1,000). This would greatly reduce the number of ties, because each text is of a different length.
But again, perhaps things like percentages, ratios, normalized frequencies would violate the statistic guidelines. However, it seems to me more intuitively logical to count the "per 100" or "per 1000" value rather than raw value, given each writer's variation in text sizes.
What do you think?
@MatthiasKullowatz 7 років тому
First of all, zero is an important value! If we were studying points scored per game by basketball players, 0 is a possibility that represents a shitty game for that player, and it should be ranked and tested just like any other value. So yes, include the zeros.
But you bring up a good point that often normalizing/averaging values before testing them produces a more intuitive hypothesis and eventual conclusion. If you convert "observed items per text" to "observed items per 100 words per text", then you'll get a range of decimal values presumably between like 0.0 and 0.2 or 0.3, each representing items per 100 words for each text. You'll still have a lot of zeros, but you can definitely do a Rank Sum test with that.
And since you have hundreds of observations (1 observation = 1 text, I think), you might even be able to do a two-sample T-test. Even with the mass of zeros you'll have, the means of each dataset are likely to have pretty close to a normal distribution.
@butterystaff3937 7 років тому
at 4.56, should you always divide by 12 or do you divide by n1?
@MatthiasKullowatz 7 років тому
Yes, you always divide by 12 when calculating the standard deviation.
I'll repost this from below:
en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Normal_approximation_and_tie_correction
In that link, the "U" statistic has the same variance as in this test with our "W" statistic. But you'll notice our calculation of the mean, or expectation, is different here. That results from a shift in the distribution. Whenever you shift a distribution up or down by a constant, the variance remains unchanged. So the test cited in Wikipedia, and the test here really are the same test, and you'd get the same Z statistics either way.
@alexdouglass8410 7 років тому
what the difference btw my sum and expectation is negative?
@MatthiasKullowatz 7 років тому
The easy answer: very little changes! in the section where you calculate the test statistic, just take the absolute value (make it positive).
The hard answer: negative differences mean something different than positive differences. If there is a negative difference, then values in sample1 are generally smaller than values in sample2.
If you're alternative hypothesis is to look for a difference in the two populations, then just use the absolute value and double the p-value you get.
If your alternative hypothesis is to test if population1 has larger values than population2, then you can stop right now because there is NO evidence that this is true.
If your alternative hypothesis is to test if population1 has smaller values than population2, then you can take the absolute value of the test statistic and calculate the p-value in same way as shown in the video (i.e. not doubling).
@AndrewMcFetridge 8 років тому
Hey man, could I ask why in your calculation of the Standard error you divide by 12? I am trying to calculate the p-value of the difference between to medians (n1=12 and n2=10) for time taken to treatment for patients in hours and the other out of normal working hours. Is this calculation appropriate? Finally can I ask how you know the standard deviation is 1 (or is it assumed rather than calculated) in the p-value calculation and what the cumulative value of 1 means as well? The independent t-test of the means is significant at 0.006 but I really want to compare the medians to improve the validity of my study. So far my calculated p-value following the above is 0.99999 which I suspect I've gone wrong somewhere haha. I would greatly appreciate your help, thanks in advance your video is great!
@MatthiasKullowatz 8 років тому
+Andrew McFetridge Andrew, as to why you divide by 12, check out my response to Anh Nguyen down below.
I don't completely understand what two populations and sample you are comparing. It is time taken to treatment measured in hours, and the two population/samples are split up by what? Could you clarify the difference between the two groups for me?
When I use the normal approximation, I divide by the standard deviation of the Wilcoxon statistic. A normally distributed statistic divided by its standard deviation will always have a new standard deviation of 1.
Whenever you get a p-value that high when you suspect there is truly a difference in medians between the groups, it's probably because you did the opposite of measuring probability in the tail. 1 - 0.99999 = 0.00001 which is a more reasonable p-value for two samples that appear to be different. In your case, my guess is that your test statistic is negative. I probably should have stressed using the absolute value of the test statistic in the p-value calculation, which also would have naturally fixed that problem.
@AndrewMcFetridge 8 років тому
+Matthias Kullowatz Thanks for your reply Matthias! Sorry to further clarify my study is the time taken to treatment measured in unit time of hours and minutes. My two population groups are different with regard to whether the patient was treated in working hours (8am-5pm) or out of hours (5pm-8am).
Ok so in calculating my standard error I would divide my calculation by 10 as that is the number of the smallest population which I am testing?
Thank you for the explanation that makes a lot more sense. Yea I did get a negative test statistic, so is the absolute value of the test statistic just involve making it a positive number in the p-value calculation?
@AndrewMcFetridge 8 років тому
additionally in my calculation of the expectation as I have one sample smaller than the other (n=10 and n=12) do I treat the n=10 as my first group i.e. = 10x(10+12+1)/2
@MatthiasKullowatz 8 років тому
+Andrew McFetridge
Ah, I was getting confused with time to treatment in hours and the times of day that separate the two groups. Got it.
You always divide by 12 in calculating the standard error for this test, regardless of the sample sizes. There is some statistical theory behind that number that is explained in the Wiki article I cited in my reply to Anh Nguyen.
For a two-tailed hypothesis test, it doesn't matter which sample is considered sample #1, so long as you stick to it. If the n1 = 10 is your first sample, then add up the ranks of those 10 for your "sum" calculation, and subtract 10(10 + 12 + 1)/2. Otherwise, add up the ranks of the other 12, and subtract 12(10 + 12 + 1)/2. So long as you take the absolute value of your eventual test statistic, you'll get the same p-value. This hypothesis test is beautifully symmetric.
@xzyeee 9 років тому
I think this video is wonderful but I just need clarification on one thing. In the first video, you said that they were 12 cars that were given two additives. However, in this one you are saying that there are 24 cars. It sorta implies, at least to me, that in this video, you had 24 cars and split them into two samples instead of one sample of 12 given 2 additives. Can you clarify this for me? Nonetheless, you are clear in everything else. Great talent!
@MatthiasKullowatz 9 років тому
Assume that there were only 12 cars that were re-used for the second sample.
@alisonchew 9 років тому
Matthias Kullowatz hmm, but isn't the Wilcoxon rank sum test used for independent samples? If the 12 cars are re-used, then aren't the samples related?
@MatthiasKullowatz 9 років тому
You are correct. The rank-sum test has an assumption of independence. I was re-using the same data set as though it were independent to show the process, and how the two tests could potentially arrive at different conclusions if used incorrectly.
@alisonchew 9 років тому
Matthias Kullowatz oh I see now. Thank you so much for the informative video!
@JoeyOrlando 6 років тому
NO MORE MUSIC!
@lrzday1704 10 років тому
What if the Expectation is higher than the Sum? Please reply ASAP :(
@MatthiasKullowatz 10 років тому ⁺¹
If the expectation is higher than the sum, then you have two options depending on your Wilcoxon tables. If your tables have critical values in both the upper and lower tails, then you can just compare your Wilcoxon test stat to the other tail's critical value.
It sounds like you're using the normal approximation, so when you subtract the expectation from the sum, you'll get a negative number, and your normal test stat will be negative. Then the p-value is equal to =NORMDIST(teststat, 0, 1, 1).
@syntrope 8 років тому
+Matthias So, if my expectation is higher than the sum or if I get p value of zero does it mean that my data is normally distributed and I should use the above formula to redo the p-value?
@MatthiasKullowatz 8 років тому
+syntrope Quick note: This is not a test to determine if the data is distributed normally. The normal approximation part is just a p-value approximation to avoid using tables.
If the expectation is higher than the sum, that's an indication that there were higher values in group 1 relative to group 2. The p-value is a statistician's way of measuring how much higher. If the p-value were close to 0, that suggest evidence that the two additive had different effects on gas mileage.
@TheMetajose 7 років тому
WHY the music :(
@reikatsukiyomi5926 8 років тому
why is there a mean of 0?
@MatthiasKullowatz 8 років тому
+Reika Tsukiyomi If you are referring to the T-test: the hypothesized mean difference is typically 0 in a two-sample test. This just means that, according to the Null Hypothesis, the average difference between the two groups is 0. "A difference of 0" is just mathy language for "there is no difference." Recall that the null hypotheses is just an assumption against which we test. If the results seem implausible (low p-value) while assuming the null hypothesis, then we reject the null hypothesis.
If you are referring to the Z score from the Wilcoxon Test, where we used the normdist function, the answer is similar. When you subtract the mean (or "expectation") from any random variable or test statistic, its new mean is 0.
@reikatsukiyomi5926 8 років тому
Ah! Now I get it! Thanks!!!
@Kanbaster 7 років тому
Great music
@MatthiasKullowatz 7 років тому ⁺¹
Lolz.
@AnhNguyen-gn7yz 10 років тому ⁺¹
Hi, this is a noob's question: for the formula for standard error, why do we divide the product by 12?
@MatthiasKullowatz 10 років тому
To derive that formula for the theoretical standard deviation, you have to go to the definition of variance which can be found here: en.wikipedia.org/wiki/Variance#Discrete_random_variable.
In words, to get the variance you have to subtract the mean from each possible test statistic, square those values, multiply by each test statistic's probability, and then add all that up. When you calculate all that, you'll find that, regardless of the sample size, a 12 always pops out in the denominator.
@AnhNguyen-gn7yz 10 років тому
Matthias Kullowatz I see... Thanks a lot. This video really helps me understand wilcoxon rank sum better (y)
@kc7690 8 років тому
+Matthias Kullowatz
Would it be 12 no matter the sample size? For example, I am doing this for a current study of mine and my first sample is 10 and the second is 14. I divided by 10 thinking it was whatever the smaller sample size was. However, should I still divide by 12?
@MatthiasKullowatz 8 років тому
+kc7690
In the calculation of the standard error, yes. You always divide by 12.
@CF-lh5xw 6 років тому ⁺¹
Hi Matthias. I believe you can always divide by 12 as long as you have no ties in your rankings. If you do have ties in your rankings, you should instead use the formula Theoretical SD(T) = SD * sqrt(n1*n2/(n1+n2)). This information is found in "The Statistical Sleuth" by Ramsey/Schafer 3rd edition pages 92-95.
@BriannaBrady-jd2ob Рік тому
I actually love the background music it helps me focus lol.
@BryanJL 5 років тому
thank you very much but that music was stupid and annoying
@Hallowais 6 років тому
the background music makes me sleepy and it is hard to concentrate!

Наступне

Автоматичне відтворення

Statistics 101: Nonparametric Methods, Mann-Whitney-Wilcoxon Rank Sum in Excel