Very comprehensive explanation about the running and interpreting the results of Multiple Linear Regression with dummy variables in SPSS. I have never experienced such short but very informative and comprehensive lecture about that topic. I wish to show my gratitude for providing such valuable video lessons.
Dr. Nayak and your videos are mind-blowingly well-organized and informative. I owe the statistics part of my dissertation to you two. I am truly grateful.
This is such a great explanation! It was so helpful in being able to understand and report regression results with a dichotomous independent variable. Thank you so much!
Hi Mike, I would like to thank you very much for all of your informative videos. I would like to ask one question regarding the interpretation of the dummy variables. For instance, in your example, VH - dummy variable has a significant contribution. With VH dummy variable, we compare VH = 1 ( value of 4 in the "education level" grouping variable) with all other groups [VH = 0]. What makes me confused is the interpretation in (18:51) --> a significant difference between "very low" and "very high" education groups. Don't we compare the " very high education (Education level = 4 and VH = 1)" group with all other groups in the data set (Education levels 1, 2, and 3 a.k.a. VH = 0) ? Thank you for efforts and time!
I have the same question. I always want to add another dummy coded column to test the last category versus all the others. It seems that dummy coding doesnt allow pairwise comparisons when there are more than two categories, as it is always one category versus the others. I guess we could use ANCOVA instead: that is put the categorical variable as an effect-coded IV in ANCOVA and put the continuously scaled variables as covariates in the ANCOVA. Then run post-hoc pairwise comparisons on each of the categories if the IV is significant. If anyone knows if this is acceptable to do I'd love the know the answer! :-) OR, another idea: if the categories were coded 0,1,-1 (for three groups) instead of just 0,1 would this work in regression?
It is a good question. My guess would be that the very low education group is used as a reference (0,0,0,0) and hence all other education types are compared versus this reference.
Hi Mike, what if you have a few categorical predictor variables, how do you interpret the slope? e.g. final grade = constant + Year 1 grade + gender + ethnicity + year of graduation? assuming the base reference for year of graduation is 2010, the slope for 2011 is interpreted as the difference in conditional mean for 2011 and 2010? That is assuming the gender and ethnicity r zero? Thanks.
DEAR VIEWERS: I have uploaded a new video on dummy coding with multicategorical independent variables to my UA-cam site. I hope you consider visiting @ ua-cam.com/video/PVYCpeRMvp8/v-deo.html
Hello Mike. Thanks for the video. I tried to use Split File but it isn't working for me. I'm not sure what the problem might be. I want to split ethnicity into separate groups and run a regression model for each group separately. I have 9 levels. The ethnicity groups are dummy coded into separate categories. I tried to put all the ethnicities into the "Group" box in the Split File, but it doesn't seem to work. I would also like to split gender (with 7 levels). I am not sure how this is done with more than two levels. Please assist when you have a moment. I can provide more info, but I don't have your email address. Thanks!
How would you report the example given at the end where VL has a significant slope but the other 2 dummy levels do not? Suppose I have an output in SPSS, where a variable is in tertiles, so I have 3 groups, and I make dummies, then I have 2 of them. SPSS gives a sort of overall p-value, with no B or S.E for the two dummies and then they also have their own p-values individually. So which p-value do I look at to determine whether my independent variable is associated with my determinant or do I have to make a conclusion of that based on each p-value of the individual dummies of that independent variable?
Dear Mike, Thank you for your useful video! Is it possible to perform multiple regression with categorical (with 2 levels such as gender and multiple levels such es faculty) and 1 ordinal variables as IV and 1 ordinal variable as DV. In this particular case, Should I dummy code the categorical variables and mean center the ordinal one (IV)?? Thank you in advance
Dear Mike, thank you for your great video! Could you please tell me, which regression should I use if I have more independent variables (ordinal - Liker scale) and one dependent (ordinal - Likert scale)?
Hi Veronika, your approach to analyzing your data really depends on how you plan on treating the Likert-scale responses -- i.e., whether you are treating variables as continuous or categorical. If your DV is being treated as ordered categorical, then an option is ordinal logistic regression. If you are treating it as continuous, then OLS regression might be the way to go. It is not uncommon for researchers to treat items based on a Likert response format as continuous - particularly when there are 5 or > response options. That is because as more ordered scale points are added, a variable may begin to "behave" more like a continuous variable. With fewer response options, a variable may behave more as ordered categorical. [In fact, that is the reasoning why LISREL will treat variables you designate as ordinal as being continuous if there are more that 18 categories on your variable). So basically, your approach to analysis depends on the assumptions you are making regarding the measurement of your variable(s). I hope this helps!
Any predictor (including control variables) that are included in your model and that are being treated as factor variables need to be re-coded for your analysis. FYI, If you have a binary predictors they technically don't need to be recoded; however, dummy coding renders the intercept more interpretable. Cheers!
If we have the outcome and predictor in same category. Haviny 5.point scale. and have to pick the data just from girls. Should we have to use the ordinal regression. Or we can go through with linear by dummy coding. Can u plz answer me
Hi there. I'm not sure that I understand your question (as written). Can you be more specific about what you are trying to predict, and what your independent and dependent variables are (and the scales used)? thanks.
Hi there. The links should be working now. Thanks for letting me know about the issue. Be sure to see my more recent video on dummy coding here: ua-cam.com/video/PVYCpeRMvp8/v-deo.html Cheers!
The number of dummy variables you create must be equal to the number of groups minus 1. If you have 4 groups, then you should have 3 dummy variables - not 4. If you add in all possible binary codings reflecting each level into your regression, you will end up with either a warning message or one of your dummy variables being 'kicked out' of the equation, as that last dummy variable would be collinear with the others. Cheers!
@@mikecrowson2462 Hi Mike, Noted on your reply above this comment. In the below scenario, if I did not peform the dummy variable change and proceed with linear regression model using single 'edlevel' variable instead. Would it be wrong? - as the regression model would not be able to process the ordinal 'edlevel' variable. Please advise. Thank you.
Very comprehensive explanation about the running and interpreting the results of Multiple Linear Regression with dummy variables in SPSS. I have never experienced such short but very informative and comprehensive lecture about that topic. I wish to show my gratitude for providing such valuable video lessons.
Dr. Nayak and your videos are mind-blowingly well-organized and informative. I owe the statistics part of my dissertation to you two. I am truly grateful.
So nice of you to say, Parisa! Best wishes!
Hello Mike, how would you conduct and interpret a multiple regression with interaction and dummy coded variables?
This is such a great explanation! It was so helpful in being able to understand and report regression results with a dichotomous independent variable. Thank you so much!
The big payoff here is a better understanding of the intercept in a multiple regression. Excellent!
Hi Greg, thanks for watching! Glad you found it useful. best wishes
Very good and clear video! Thanks a lot. A true lifesaver!
You're welcome! best wishes!
Hi Mike,
I would like to thank you very much for all of your informative videos.
I would like to ask one question regarding the interpretation of the dummy variables.
For instance, in your example, VH - dummy variable has a significant contribution.
With VH dummy variable, we compare VH = 1 ( value of 4 in the "education level" grouping variable) with all other groups [VH = 0].
What makes me confused is the interpretation in (18:51) --> a significant difference between "very low" and "very high" education groups.
Don't we compare the " very high education (Education level = 4 and VH = 1)" group with all other groups in the data set (Education levels 1, 2, and 3 a.k.a. VH = 0) ?
Thank you for efforts and time!
I have the same question. I always want to add another dummy coded column to test the last category versus all the others. It seems that dummy coding doesnt allow pairwise comparisons when there are more than two categories, as it is always one category versus the others. I guess we could use ANCOVA instead: that is put the categorical variable as an effect-coded IV in ANCOVA and put the continuously scaled variables as covariates in the ANCOVA. Then run post-hoc pairwise comparisons on each of the categories if the IV is significant. If anyone knows if this is acceptable to do I'd love the know the answer! :-) OR, another idea: if the categories were coded 0,1,-1 (for three groups) instead of just 0,1 would this work in regression?
It is a good question. My guess would be that the very low education group is used as a reference (0,0,0,0) and hence all other education types are compared versus this reference.
Thank you. Learned a lot that will be useful for my research project.
professor, is it possible to mix nominal/binary variable with quantitative variables as independent variable in a linear multiple regression?
Hi Mike, what if you have a few categorical predictor variables, how do you interpret the slope? e.g. final grade = constant + Year 1 grade + gender + ethnicity + year of graduation? assuming the base reference for year of graduation is 2010, the slope for 2011 is interpreted as the difference in conditional mean for 2011 and 2010? That is assuming the gender and ethnicity r zero? Thanks.
DEAR VIEWERS: I have uploaded a new video on dummy coding with multicategorical independent variables to my UA-cam site. I hope you consider visiting @ ua-cam.com/video/PVYCpeRMvp8/v-deo.html
Thank you for uploading it! :)
So much thankful to you
You are very welcome! Thanks for visiting!
Thank you, it is brief and clear explanation.
Hello Mike. Thanks for the video. I tried to use Split File but it isn't working for me. I'm not sure what the problem might be. I want to split ethnicity into separate groups and run a regression model for each group separately. I have 9 levels. The ethnicity groups are dummy coded into separate categories. I tried to put all the ethnicities into the "Group" box in the Split File, but it doesn't seem to work. I would also like to split gender (with 7 levels). I am not sure how this is done with more than two levels. Please assist when you have a moment. I can provide more info, but I don't have your email address. Thanks!
How would you report the example given at the end where VL has a significant slope but the other 2 dummy levels do not? Suppose I have an output in SPSS, where a variable is in tertiles, so I have 3 groups, and I make dummies, then I have 2 of them. SPSS gives a sort of overall p-value, with no B or S.E for the two dummies and then they also have their own p-values individually. So which p-value do I look at to determine whether my independent variable is associated with my determinant or do I have to make a conclusion of that based on each p-value of the individual dummies of that independent variable?
Dear Mike, Thank you for your useful video!
Is it possible to perform multiple regression with categorical (with 2 levels such as gender and multiple levels such es faculty) and 1 ordinal variables as IV and 1 ordinal variable as DV. In this particular case, Should I dummy code the categorical variables and mean center the ordinal one (IV)?? Thank you in advance
I have two sets of independent variables, each with 17 dummy variables, what do I do?
Greetings
In case of dummy variables are used as control variable if they come out to be significant what does it mean?
This was very helpful, many thanks!
You are very welcome! Thanks for visiting!
How can we see the interactions between dummy coded categorical variables and continuous variables?
Dear Mike, thank you for your great video! Could you please tell me, which regression should I use if I have more independent variables (ordinal - Liker scale) and one dependent (ordinal - Likert scale)?
Hi Veronika, your approach to analyzing your data really depends on how you plan on treating the Likert-scale responses -- i.e., whether you are treating variables as continuous or categorical. If your DV is being treated as ordered categorical, then an option is ordinal logistic regression. If you are treating it as continuous, then OLS regression might be the way to go. It is not uncommon for researchers to treat items based on a Likert response format as continuous - particularly when there are 5 or > response options. That is because as more ordered scale points are added, a variable may begin to "behave" more like a continuous variable. With fewer response options, a variable may behave more as ordered categorical. [In fact, that is the reasoning why LISREL will treat variables you designate as ordinal as being continuous if there are more that 18 categories on your variable). So basically, your approach to analysis depends on the assumptions you are making regarding the measurement of your variable(s). I hope this helps!
@@mikecrowson2462 Hello! Thank you for this! Do you have a citation that indicates it is ok to treat a 5 or > Likert scale as continuous?
Greetings
If we have categorical control variables in that case dummy needs to be created?
Any predictor (including control variables) that are included in your model and that are being treated as factor variables need to be re-coded for your analysis. FYI, If you have a binary predictors they technically don't need to be recoded; however, dummy coding renders the intercept more interpretable. Cheers!
@@mikecrowson2462 thanks sir
thank you very much sir. YOU ARE EXCELLENT TEACHER
Hi Manoranjan, that's very kind of you to say. Thank you for visiting my site! Best wishes!
Lifesaver!
Thanks! It was very useful.
If we have the outcome and predictor in same category. Haviny 5.point scale. and have to pick the data just from girls. Should we have to use the ordinal regression. Or we can go through with linear by dummy coding. Can u plz answer me
Hi there. I'm not sure that I understand your question (as written). Can you be more specific about what you are trying to predict, and what your independent and dependent variables are (and the scales used)? thanks.
very informative but i am unable to download the data file. the link is leading to page error
Hi there. The links should be working now. Thanks for letting me know about the issue. Be sure to see my more recent video on dummy coding here: ua-cam.com/video/PVYCpeRMvp8/v-deo.html Cheers!
would it cause problem if we will just make Dummy variables for each of the category under "edlevel" variable?
The number of dummy variables you create must be equal to the number of groups minus 1. If you have 4 groups, then you should have 3 dummy variables - not 4. If you add in all possible binary codings reflecting each level into your regression, you will end up with either a warning message or one of your dummy variables being 'kicked out' of the equation, as that last dummy variable would be collinear with the others. Cheers!
@@mikecrowson2462
Hi Mike,
Noted on your reply above this comment.
In the below scenario, if I did not peform the dummy variable change and proceed with linear regression model using single 'edlevel' variable instead.
Would it be wrong?
- as the regression model would not be able to process the ordinal 'edlevel' variable.
Please advise.
Thank you.
Very helpful. Thank you.
THANK YOU SO MUCH
Very helpful
well done.
I got headache to be honest. Why to speak so fast. Still, quite good video