Dear Professor Todd, thank you very much for your great clips. I am able to understand statistic concepts within a week to complete my Master degree assignment thanks to your simplified explanation. Please take care and I wish you all the best!
I want to thank you for this great video. I've been thinking about how to deal with outliers in my Likert scale questions. This video is perfect! Thank you so much for sharing such useful knowledge. :-)
Good point. If you have an outlier in a single Likert item, you have a data entry error. If you have an outlier after combining multiple items, then you have a true multivariate outlier.
Hello, Thank you very much for this eye-opening video! I learned lots of things. I have some questions. Furthermore, I computed univariate outliers in SPSS. I diagnosed some Asterix (extreme) outliers. When I checked them in the data, these Asterix outliers range from the lowest (e.g., 1) to the highest (e.g., 7) values. They are legitimate outliers, although they were shown as Asterix in the box plot. In this case, should I remove these outliers, or should keep them in the data because they represent legitimate outliers? Thank you very much in advance! Best,
That sounds correct. If your data points only go from 1 to 7, you may have a skewed variable, but you do not have extreme outliers because they have been bounded by the measurement scale (1-7). Check out this video and the one after it for more details: ua-cam.com/video/4CNLHO3xOyc/v-deo.html
Thank you for a great video. One question, if anyone could answer: when creating your chart at 03:20, you have gender on the x axis. Is there anyway you can have gender (that is male and female) as two boxplots and then a third one, "total" (that is the whole data set) as a separate boxplot but all in the same graph?
Good question...I don't know of any way to have both a split box plot and the combined (non-split) box plot on the same graph. I usually just create them separately and then combine them in the paper into a single graph.
What does it mean if I have values which are not shown on my boxplot? For example, I can see on my boxplot that my .4 is considered an outlier, however I have values at .7 and .8, but .7 and .8 are not even shown on the boxplot. Is this an SPSS bug? Thank you
Hmmm, not sure...the outliers may or may not be labeled, but if they are in the variable, they should be displayed on the box plot. Not sure what to tell you. Hope you get it worked out. (Do a Frequencies on the same variable and see if you notice something unusual)
Thank you again for sharing your videos! I have a question. Is there a way to calculate whether any outlier values I identify are more than 3 times the IQR beyond the nearest quartile value? Or do I simply just rely on whether the outlier is represented by an asterisk or a circle? Thank you again!
No problem...winsorizing (after biostatistician Charles P. Winsor) is cutting the outlier down to the next most reasonable value. If the data are 1, 2, 3, 5, 6, 35... then 35 gets cut down to 7. It is still the highest value but no longer has the leverage of an outlier. Works well on income values. This new video goes into more detail: ua-cam.com/video/Mf9R-OKQUrU/v-deo.html
@@ResearchByDesign Thanks you professor. I have not studied any statistics in my life and currently doing masters where no stat course is being offered. However, I feel more confident in analyzing data and handling SPSS. All because of your excellent videos and I still recommend your lectures to my colleagues. I watched all your videos more than thrice that are relevant to me. Great thanks to you.
Thanks Dr. Daniel. I want to ask these questions. - Does Boxplot indicates Median. But also take note of the Mean? - Can continuous variables be computed to check outliers, using Boxplots? Thank You.
The box plot just uses the median (50th percentile), and yes, boxplots are a great way to check for outliers because SPSS labels the outlier cases for you.
I don't recall what stats text I originally learned that from, but I know that it is the the Andy Field text (Discovering Statistics with SPSS) and in the Tabachnick & Fidell text on Using Multivariate stats. Both are excellent resources
how come values of 1 are considered as a outlier but not of 7 they both are extreme end of our scale but the values for 27 and 43 were both 1 and we still have values of 7 in data set. Btw i love your videos and your intuitive approach , thank you for making it easy for me to understand SPSS and stats better
The identification of an outlier depends on the other data. So the score may not be extreme compared to other scores around it. When you combine multiple items into a single score, you will be more likely to find extreme values. Thanks for watching the videos!
You really need to use an example with multiple data set because when you remove the outliers the data shifts and this should be noted. Also, if the selection of data is only scale then the x axis cannot be used and the blox plot is not feasible. How do you remove outliers in those situations?
Thanks for the comment. I think that both the raw and clean datasets are available in the google folder, but I will check. Link to the folder is in the description and channel art. Let me see if I can answer your other question: when you have scale data and are looking for outliers, they will show up in histograms, stem-and-leaf, and box plots. In SPSS, I use the Explore command because it has an option to look for outliers. There are also tests like Mahalanobis or just converting to z-scores that can help identify outliers in scale numeric data. Hope that helps.
I am working on some videos about data cleaning for the semester. Send me an email ToddDaniel at MissouriState.edu and I will send you my notes on outliers. Its too much for a comment reply. Good luck
I learned more from this guy in 12 minutes than I did six weeks from my hopeless professor in an introductory stats course lmao. thank you dr todd!!
Very pleased to know that the videos are helpful. Thanks for watching. Good luck in your class
you are amazing ,your explanations are passionate, thorough and just beautifully explained, you've made the abstract come to life
thank you Dr. Daniel. I do really appreciate your professional work
You are very welcome. Thanks for watching
Than you so much Dr. Daniel for helping me understand when to get rid of outliers or simply keep them in the analysis.
Dear Professor Todd, thank you very much for your great clips. I am able to understand statistic concepts within a week to complete my Master degree assignment thanks to your simplified explanation. Please take care and I wish you all the best!
You are very welcome! Thank you for the wonderful comment and well wishes
Thanks a lot Doc. (thumps up). Excellent teaching sir.
Thanks a lot Dr. Daniel for your sharing..really usefull for beginner like me.
I want to thank you for this great video. I've been thinking about how to deal with outliers in my Likert scale questions. This video is perfect! Thank you so much for sharing such useful knowledge. :-)
Good point. If you have an outlier in a single Likert item, you have a data entry error. If you have an outlier after combining multiple items, then you have a true multivariate outlier.
This is a very helpful video.
Great video!! I found the picture of the crazy outlier guy very funny though!
Thanks for the comment. I agree that humor helps reinforce the ideas and clarify what an outlier is.
Thank you for the explanation. Looking sharp.
I LOVE DR Todd! His voice soothing and is helping me in grad stats. Am crushing on Dr. Todd Daniel. Smart men are dreamy...😍😍😍😍Must focus on stats.. 🤓
This video is great! And explained just perfectly.
extremely str8 to the point and useful
Excellent! Thank you!
My analysis got done man. Thank you.
Glad to hear it!
Thank you very much!
Thank god this video exists
Thank you!
I can't thank you enough! This video was helpful.
Thanks! 💯
You are the best
Hello,
Thank you very much for this eye-opening video! I learned lots of things. I have some questions. Furthermore, I computed univariate outliers in SPSS. I diagnosed some Asterix (extreme) outliers. When I checked them in the data, these Asterix outliers range from the lowest (e.g., 1) to the highest (e.g., 7) values. They are legitimate outliers, although they were shown as Asterix in the box plot. In this case, should I remove these outliers, or should keep them in the data because they represent legitimate outliers?
Thank you very much in advance!
Best,
That sounds correct. If your data points only go from 1 to 7, you may have a skewed variable, but you do not have extreme outliers because they have been bounded by the measurement scale (1-7). Check out this video and the one after it for more details: ua-cam.com/video/4CNLHO3xOyc/v-deo.html
Thank you for a great video. One question, if anyone could answer: when creating your chart at 03:20, you have gender on the x axis. Is there anyway you can have gender (that is male and female) as two boxplots and then a third one, "total" (that is the whole data set) as a separate boxplot but all in the same graph?
Good question...I don't know of any way to have both a split box plot and the combined (non-split) box plot on the same graph. I usually just create them separately and then combine them in the paper into a single graph.
@@ResearchByDesign Thank you. What do you mean by "in the paper"?
What does it mean if I have values which are not shown on my boxplot? For example, I can see on my boxplot that my .4 is considered an outlier, however I have values at .7 and .8, but .7 and .8 are not even shown on the boxplot. Is this an SPSS bug? Thank you
Hmmm, not sure...the outliers may or may not be labeled, but if they are in the variable, they should be displayed on the box plot. Not sure what to tell you. Hope you get it worked out. (Do a Frequencies on the same variable and see if you notice something unusual)
Thank you Sir
Most welcome
Sir, THANK YOU!!!
You bet!
Thank you again for sharing your videos! I have a question. Is there a way to calculate whether any outlier values I identify are more than 3 times the IQR beyond the nearest quartile value? Or do I simply just rely on whether the outlier is represented by an asterisk or a circle?
Thank you again!
Hi Prof, could you explain more on Winsorizing thing please? An example would be great! Great thanks!
No problem...winsorizing (after biostatistician Charles P. Winsor) is cutting the outlier down to the next most reasonable value. If the data are 1, 2, 3, 5, 6, 35... then 35 gets cut down to 7. It is still the highest value but no longer has the leverage of an outlier. Works well on income values. This new video goes into more detail: ua-cam.com/video/Mf9R-OKQUrU/v-deo.html
@@ResearchByDesign Thanks you professor. I have not studied any statistics in my life and currently doing masters where no stat course is being offered. However, I feel more confident in analyzing data and handling SPSS. All because of your excellent videos and I still recommend your lectures to my colleagues. I watched all your videos more than thrice that are relevant to me. Great thanks to you.
@@ResearchByDesign I have age variable that has outlier, will it be okay to winsorize? Thanks.
Informative vidio thanks sir
So nice of you. Thanks
Thanks Dr. Daniel. I want to ask these questions.
- Does Boxplot indicates Median. But also take note of the Mean?
- Can continuous variables be computed to check outliers, using Boxplots?
Thank You.
The box plot just uses the median (50th percentile), and yes, boxplots are a great way to check for outliers because SPSS labels the outlier cases for you.
Hi, sir.
May I ask you sir? Is there a reference citation for the outliers with boxplot? Thank you
I don't recall what stats text I originally learned that from, but I know that it is the the Andy Field text (Discovering Statistics with SPSS) and in the Tabachnick & Fidell text on Using Multivariate stats. Both are excellent resources
Ok, thank you so much sir
how come values of 1 are considered as a outlier but not of 7 they both are extreme end of our scale but the values for 27 and 43 were both 1 and we still have values of 7 in data set. Btw i love your videos and your intuitive approach , thank you for making it easy for me to understand SPSS and stats better
The identification of an outlier depends on the other data. So the score may not be extreme compared to other scores around it. When you combine multiple items into a single score, you will be more likely to find extreme values. Thanks for watching the videos!
You really need to use an example with multiple data set because when you remove the outliers the data shifts and this should be noted. Also, if the selection of data is only scale then the x axis cannot be used and the blox plot is not feasible. How do you remove outliers in those situations?
Thanks for the comment. I think that both the raw and clean datasets are available in the google folder, but I will check. Link to the folder is in the description and channel art. Let me see if I can answer your other question: when you have scale data and are looking for outliers, they will show up in histograms, stem-and-leaf, and box plots. In SPSS, I use the Explore command because it has an option to look for outliers. There are also tests like Mahalanobis or just converting to z-scores that can help identify outliers in scale numeric data. Hope that helps.
anyone know how to find how many and what percentage of values are outliers? I'm in a data mining class and am working with very large data set.
I am working on some videos about data cleaning for the semester. Send me an email ToddDaniel at MissouriState.edu and I will send you my notes on outliers. Its too much for a comment reply. Good luck
Overall, it saves me. But if more detailed information about the 1st option for outliers, it would be better
I'm on it...working on a new set of videos and I will make sure the new box plot script includes more detail. Thanks.
where can i get the clickers.sav file?
I'm working on adding all data sets to the RStats Institute website. For now, email me at Missouri State University and I will send you a copy
Ni bukan gameonzz kenal hmm abang iman