Hello. I was wondering how to check the assumption that errors are independent. I believe that uve explained all the other assumptions checking with the (very clear explanation by the way, very helpful), except for the independence of the errors. Or have i misunderstood that? Thank you.
You're correct, that assumption is not checked by the graphical methods described. In general, testing that assumption is much more difficult. In many cases it's easier to verify by the logic of the sampling process used to collect the data. In the sample data I used each data point represents a single horseshoe crab. So the independence assumption means we assume that the size of one horseshoe crab is not affected by the size of another crab in the sample. That seems reasonable as long as we've collected the crabs randomly from some large population.
I'm sorry you're having difficulty with your work. There are many reasons why you might run the same code and get different results. This isn't really a great forum for fixing issues like that, unfortunately. I suggest you look for help at forum.posit.co, stackoverflow.com (search for questioned tagged 'R') or stats.stackexchange.com.
Edited: Thank you for this walk through. I just want to get your opinion on how you view the assumptions underlying regression methods. Compared to my field [applied econometrics], would you expect the assumptions to be violated to some degree as applied in ecological statistics? Is the hope of the applied [statistician] that the model is a good-enough approximation of the process that generated the observed data? How does one distinguish that good-enough threshold? Finally, does removing outliers to attain better measured deviations increase the risk of overfitting?
Quote from George Box: "all models are wrong, but some of them are useful" -- wrong in the sense that the assumptions are nearly always false to some degree. I'm not familiar with much from econometrics, but what little I do know suggests that the data sets tend to be larger than in ecology, which helps a lot. Econometric data suffer from less "observation error". Distinguishing the "good enough" threshold is a matter of judgement, and you get better with experience. One way to develop a good "eye" is to simulate data and fit the model -- the data meets the assumptions exactly and you get to see what the residuals "should" look like. As far as removing outliers goes, there are many published methods for testing for outliers. Sometimes removing a data point improves the current model, sometimes not. It's another area where experience, and subject matter knowledge, makes a big difference. Depending on "how" the outlier is removed, you may get a biased estimate as well. I would say it's an area where expert statistical help is a good idea.
This was the best explation over this topic I had. Thank you so much.
THANK YOU! I was doing some googling to try and better understand what this chart was trying to explain in R and this video is exactly what I needed.
Glad it was helpful!
Wow it's all so clear now, thank you so so much ^-^
Good video my man
Hello. I was wondering how to check the assumption that errors are independent. I believe that uve explained all the other assumptions checking with the (very clear explanation by the way, very helpful), except for the independence of the errors. Or have i misunderstood that? Thank you.
You're correct, that assumption is not checked by the graphical methods described. In general, testing that assumption is much more difficult. In many cases it's easier to verify by the logic of the sampling process used to collect the data. In the sample data I used each data point represents a single horseshoe crab. So the independence assumption means we assume that the size of one horseshoe crab is not affected by the size of another crab in the sample. That seems reasonable as long as we've collected the crabs randomly from some large population.
Clear and succinct presentation. This was helpful for me. Thanks!
Excellent explanation !!
Hello. I'm so confused. Why is R studio producing different results while using the same call. 😢
I'm sorry you're having difficulty with your work. There are many reasons why you might run the same code and get different results. This isn't really a great forum for fixing issues like that, unfortunately. I suggest you look for help at forum.posit.co, stackoverflow.com (search for questioned tagged 'R') or stats.stackexchange.com.
Amazing explanation (from an ecology student). :)
Great video. Thank you for the clear explanation!!!
Neatly Explained! Thank you Drew :)
excellent explanation !!
Any chance of getting access to the code used to generate the plots used in this video? Thanks!
Ugh. Reproducibility fail! I reproduced those figures over at my blog: drewtyre.rbind.io/post/checking-assumptions/ hth.
Edited: Thank you for this walk through. I just want to get your opinion on how you view the assumptions underlying regression methods. Compared to my field [applied econometrics], would you expect the assumptions to be violated to some degree as applied in ecological statistics? Is the hope of the applied [statistician] that the model is a good-enough approximation of the process that generated the observed data? How does one distinguish that good-enough threshold? Finally, does removing outliers to attain better measured deviations increase the risk of overfitting?
Quote from George Box: "all models are wrong, but some of them are useful" -- wrong in the sense that the assumptions are nearly always false to some degree. I'm not familiar with much from econometrics, but what little I do know suggests that the data sets tend to be larger than in ecology, which helps a lot. Econometric data suffer from less "observation error". Distinguishing the "good enough" threshold is a matter of judgement, and you get better with experience. One way to develop a good "eye" is to simulate data and fit the model -- the data meets the assumptions exactly and you get to see what the residuals "should" look like. As far as removing outliers goes, there are many published methods for testing for outliers. Sometimes removing a data point improves the current model, sometimes not. It's another area where experience, and subject matter knowledge, makes a big difference. Depending on "how" the outlier is removed, you may get a biased estimate as well. I would say it's an area where expert statistical help is a good idea.
Great explanation. Thank You.
nice video!
Very helpful, thank you