Great lectures! Very clear. Too bad I haven't learned outlier detection in regression models, although studied B.Sc. in computer engineering. I have some questions, hope you can answer them. 1) Why residuals should be normally distributed? 2) In Williams Graph, why do we use 2 means as a threshold? I would expect to see a multiply of stdev(lev * n/p). I watched the following lecture and I saw you calculated Cook's Distance as well, but you didn't use it for filtering outliers, or I missed it? Thank you so much for this quality content!
1) residuals are often non-normally distributed, but sometime they are normal. You should always check if the assumption of normality makes a difference in your statistical analysis. 2) The choice of twice the average leverage as a threshold is arbitrary, but a convenient rule of thumb.
Chris Mack thank you Chris! Just now I saw that you have experience in semiconductors industry :) what a coincidence! I’m analyzing correlation in CD measured on wafer in different fields. It looks like CD distribution is normal. At least sometimes Jacque-Bera test confirms it, sometimes not. Sometimes Shapiro-Wilk confirms it sometimes not. Thank you so much for your great lectures! They are very helpful!
Chris Mack very interesting. While there are many factors contributing to CD, you can tune the mask to compensate for them, or at least most of them. From your experience there is good correlation between different fields if CD is measured on the same locations?
Count Dooku is the best statistics teacher
Thank you so much!!! you are very clear and helpful!!
sir difference between multicollinearity and leverage vs perfect collinearity in x variables?
defining leverage as distance between x's, but formula says cov (that,actual)/ var(y) why so?
cov can be negative, so do we take absolute values?
Thanks!!!
Great lectures! Very clear. Too bad I haven't learned outlier detection in regression models, although studied B.Sc. in computer engineering. I have some questions, hope you can answer them. 1) Why residuals should be normally distributed? 2) In Williams Graph, why do we use 2 means as a threshold? I would expect to see a multiply of stdev(lev * n/p). I watched the following lecture and I saw you calculated Cook's Distance as well, but you didn't use it for filtering outliers, or I missed it? Thank you so much for this quality content!
Oh, sorry you have a dedicated lecture about residuals distribution. So it's pretty much empirical, as I understood it.
1) residuals are often non-normally distributed, but sometime they are normal. You should always check if the assumption of normality makes a difference in your statistical analysis. 2) The choice of twice the average leverage as a threshold is arbitrary, but a convenient rule of thumb.
Chris Mack thank you Chris! Just now I saw that you have experience in semiconductors industry :) what a coincidence! I’m analyzing correlation in CD measured on wafer in different fields. It looks like CD distribution is normal. At least sometimes Jacque-Bera test confirms it, sometimes not. Sometimes Shapiro-Wilk confirms it sometimes not. Thank you so much for your great lectures! They are very helpful!
@@muonneutrino Good luck - I've worked a lot in mapping CD across the wafer.
Chris Mack very interesting. While there are many factors contributing to CD, you can tune the mask to compensate for them, or at least most of them. From your experience there is good correlation between different fields if CD is measured on the same locations?
What is basically the difference between Standardized and Studentized residuals?
See slides 9 and 10: "standardized" is the same as "internally studentized", which is different from "externally studentized".