You can find the spreadsheets for this video and some additional materials here: drive.google.com/drive/folders/1sP40IW0p0w5IETCgo464uhDFfdyR6rh7 Please consider supporting NEDL on Patreon: www.patreon.com/NEDLeducation
Hi Faiza, and glad the video helped! If your data is non-normal, various procedures can be applied. If it is right-skewed and non-negative (for example, GDP of a country or a market cap of a company), then taking a natural logarithm is common. You can also convert the data into ranks or percentiles, investigating the empirical distribution of the data. This is the logic of many non-parametric tests applied in case of non-normality, including the quantile regression. However, very often the normality assumption is overlooked in real-world research, so you can apply simple linear regressions even to non-normally distributed data, it is just that results in that case are less reliable. Hope it helps!
Hi, Thank you for this video. My question is regarding choosing between the Shapiro Wilk test and the JB test. I have 1700 observation and I was wondering which of them is much better. I have read in some sources that for doing JB test I should have at least 2000 observation.
Hi Kian, and thanks for the question! Shapiro-Wilk is indeed shown to perform better when the sample size is relatively small. I might do a video on Shapiro-Wilk in the nearest future!
Hello an thank you, I wondered if it's normal that I'm obtaining different results for the exact same test and parameters when I'm running it again and again ?
Thank you very much for your videos. We know that as the number of data increases, the distribution approaches normal. However, as the number of data increases in the JB formula, it moves away from normality. At very large values of n, the JB value will also grow and the normal distribution will be rejected. How can we explain this?
Hi Selim, and glad you liked the video! As for your question, this has to do with the fact as the sample size increases, a given deviation is skewness and kurtosis from the expected values of zero would be less likely to originiate from random disturbances and point more strongly towards non-normality. This concept is very common in chi-squared tests and in goodness-of-fit tests in general. Hope it helps!
Hi, i would like to ask if its okay for my data to not be normally distributed if i'm going to use simple linear regression to test my hypothesis (if variable A has an impact on variable B). thanks!
Hi, and thanks for the question! Yes, you can, in practice and research the non-normality aspect of data is often ignored, but the results can be considered more reliable if input data is normally distributed.
Gr8 video..sir video on Martingale Difference hypothesis tests such as Generalized Spectral (GS), Dominguez-Lobato (DL) and the automatic portmanteau test ll of great help
Hi Bodhi and glad you liked the video! Those MDS tests can be quite computationally intensive, I might consider covering them in the distant future though.
Hi Glenn, and glad you are enjoying the channel! In theory, you can convert any data into something normally distributed by first calculating the empirical distribution function (basically, data ranks divided by the number of observations), and then plugging it into the inverse normal distribution function with desired mean and standard deviation. This, however, will remove all other properties of the data which might be important so in general I would not advise it.
You can find the spreadsheets for this video and some additional materials here: drive.google.com/drive/folders/1sP40IW0p0w5IETCgo464uhDFfdyR6rh7
Please consider supporting NEDL on Patreon: www.patreon.com/NEDLeducation
Thank you! This was very helpful. Your explanation is clear and easy to understand!
Thank you so much, Im looking for those papers. You're amazing!!!
Your videos are absolutely amazing !!
Its really helping. Plz make vedio on how to clean raw data before applying any statistical test.
Hi Faiza, and glad the video helped! If your data is non-normal, various procedures can be applied. If it is right-skewed and non-negative (for example, GDP of a country or a market cap of a company), then taking a natural logarithm is common. You can also convert the data into ranks or percentiles, investigating the empirical distribution of the data. This is the logic of many non-parametric tests applied in case of non-normality, including the quantile regression. However, very often the normality assumption is overlooked in real-world research, so you can apply simple linear regressions even to non-normally distributed data, it is just that results in that case are less reliable. Hope it helps!
@@NEDLeducation Thankyou!
Excellent explanation! Thanks a lot!
amazing explanation
Hi, Thank you for this video. My question is regarding choosing between the Shapiro Wilk test and the JB test. I have 1700 observation and I was wondering which of them is much better. I have read in some sources that for doing JB test I should have at least 2000 observation.
Hi Kian, and thanks for the question! Shapiro-Wilk is indeed shown to perform better when the sample size is relatively small. I might do a video on Shapiro-Wilk in the nearest future!
Absolute legend!
Hello an thank you,
I wondered if it's normal that I'm obtaining different results for the exact same test and parameters when I'm running it again and again ?
Hi, and thanks for the question! If it is on the same sample and your test does not involve Monte Carlo simulations, this is not normal.
Thank you very much for your videos.
We know that as the number of data increases, the distribution approaches normal. However, as the number of data increases in the JB formula, it moves away from normality. At very large values of n, the JB value will also grow and the normal distribution will be rejected. How can we explain this?
Hi Selim, and glad you liked the video! As for your question, this has to do with the fact as the sample size increases, a given deviation is skewness and kurtosis from the expected values of zero would be less likely to originiate from random disturbances and point more strongly towards non-normality. This concept is very common in chi-squared tests and in goodness-of-fit tests in general. Hope it helps!
Hi, i would like to ask if its okay for my data to not be normally distributed if i'm going to use simple linear regression to test my hypothesis (if variable A has an impact on variable B). thanks!
Hi, and thanks for the question! Yes, you can, in practice and research the non-normality aspect of data is often ignored, but the results can be considered more reliable if input data is normally distributed.
@@NEDLeducation thank you so much!
Gr8 video..sir video on Martingale Difference hypothesis tests such as Generalized Spectral (GS), Dominguez-Lobato (DL) and the automatic portmanteau test ll of great help
Hi Bodhi and glad you liked the video! Those MDS tests can be quite computationally intensive, I might consider covering them in the distant future though.
Thank you so much for your videos sir, it has helped me a lot in my project. I have a question. Can we transform a non-normal data to normal data?
Hi Glenn, and glad you are enjoying the channel! In theory, you can convert any data into something normally distributed by first calculating the empirical distribution function (basically, data ranks divided by the number of observations), and then plugging it into the inverse normal distribution function with desired mean and standard deviation. This, however, will remove all other properties of the data which might be important so in general I would not advise it.
@@NEDLeducation thank you for clearing it out
Good video
Can I use it in 60 samples? Thanks
Hi Mark, and thanks for the question! Yes, of course you can.