Multiple Regression - Dummy variables and interactions - example in Excel

Поділитися
Вставка
  • Опубліковано 22 жов 2024

КОМЕНТАРІ • 142

  • @bigermie1
    @bigermie1 9 років тому +19

    Love when people perform on such a high level where the lay person can understand. Great Job

  • @momcilomracajac2242
    @momcilomracajac2242 6 років тому

    YOU ARE THE GREATEST PERSON ALIVE!!! I HAVE BEEN SEARCHING FOR HELP ON DUMMY VARIABLES FOR WHAT TODAY WOULD BE THE FOURTH DAY FOR MY PROJECT AND EVEN MY PROFESSOR WAS OF NO HELP! i really appreciate this video...

    • @econ_drd
      @econ_drd  6 років тому

      I'm glad you found it helpful. ;)

  • @ফকিরতালিব
    @ফকিরতালিব 5 років тому +1

    Great many thanks Dr Delaney! It would be nice if you discuss two more related issues. 1) Explanation of the coefficients in a regression w/o the intercept term. 2) If we define dummies differently then how do we interpret the coefficients? For example, consider the regression y= a1*D1+a2*D2+a3*D3+a4*D4+u where Ds are dummies for season but defined differently- value of D1 is 1 for all observations, value of D2 is 1 for all observations except Spring, value of D3 is 1 for all observations except Spring and Summer, and value of D4 is 1 only for observations in Winter. Thanks again for allowing questions and discussions.

  • @EGaya90
    @EGaya90 10 років тому +3

    I'm into 7:02 and I've been nodding since the video began...thank you man! :)

  • @ellarichardson7804
    @ellarichardson7804 8 років тому

    THANK YOU SO MUCH. I've been so confused with how to do this for ages and now I finally understand it! I couldn't be more grateful.

  • @sfs4708
    @sfs4708 10 років тому +3

    Thank you very much for this tutorial - so intuitive, and guides us directly to what's important. My question is: do you know a similarly intuitive way to run a regression with a dummy dependent variable? I'm trying to analyze survey responses, much of which is discrete data. Thank you!

  • @urzolarl
    @urzolarl 4 роки тому +1

    Thank you! This video was great, it is explained in a way that makes a lot of sense! The book for my business analytics class made it way more complicated!

  • @sobhitc
    @sobhitc 11 років тому

    You have no idea how much this video helped me to do my thesis work! Thank you so much!

  • @floranshadzhieva7829
    @floranshadzhieva7829 9 років тому +1

    This is the best explanation that I've found so far. Thank you so much!

  • @oul1735
    @oul1735 2 роки тому +1

    Thank you very much for the video. It saves my dissertation.

  • @econ_drd
    @econ_drd  10 років тому +4

    Gayathri Ravichandran It's not letting me reply directly to your comment. But the answer is yes, you should be able to check which independent variable contributes more.
    One is to do a series of 8 separate regressions with 1 independent variable in each, and check the R^2. The other is to do 8 separate regressions with all but 1 in each, and check the R^2.
    Finally, you can do the full regression and just see which has the largest coefficient (in magnitude)...this runs into the problem of different scales, so you may want to measure your variables in #'s of standard deviations from the mean value of that variable.
    Caveat: all of this assumes you have enough observations to run all these tests without running into overfitting problems.To safely run 8 regressions here (or 17, maybe), you'll want to make sure that you have at least 17*8*15 = 2040 observations.

  • @alexablanc6647
    @alexablanc6647 3 роки тому +1

    Hi Jason,
    I watched your UA-cam video about using dummy variables with the regression tool in excel. I studied math in college so I was really excited about it.
    I’m trying to use it to forecast sales and I set it up where I had my Y values as previous sales and my X values as weeks 1 to 52, where it would be a 1 if it matched the sales week and 0 otherwise. I also included holidays like Easter x week, 4th of July x week etc.
    It gave me an error that I can only have 16 columns used in the X values, so I tried it with just 16 weeks and the p values were really big. I’m wondering if you know of another way I can do this to include the seasonality from the weeks and the impact of the holidays.
    Thanks so much!

  • @pascalejacquelinepetit5131
    @pascalejacquelinepetit5131 7 років тому

    Great explanation for new users and to refresh. Much appreciated!

  • @araratghazarian2354
    @araratghazarian2354 8 років тому

    I was working on my thesis and these materials were precisely helpful. Thanks!

  • @econ_drd
    @econ_drd  11 років тому +1

    Hi Fang, it's definitely a good idea to run it, and then you can use an F test for a subset of variables to see which model is better. If you search UA-cam for F test for subset, you'll see the video that outlines the process.

  • @Andy1311100
    @Andy1311100 10 років тому

    Thx! You saved me a lot of time to re-take the course of statistics

  • @squirrellover69ify
    @squirrellover69ify 5 років тому

    I did not understand this at all but within the 1st ten minutes it makes so much sense. Makes me want to go play around with an actual data set. curious if theres anyone with videos on how to do this in R?

  • @saeedrabbanifar7458
    @saeedrabbanifar7458 7 років тому

    So useful. May need to watch more than once to master but its worth it.

  • @TDYoung07
    @TDYoung07 5 років тому

    Great breakdown of multiple regression and helped me greatly with educated forecasting.

  • @dermarcopetermp
    @dermarcopetermp 10 років тому

    Your explanation is awesome.
    It helps me understand interaction a lot
    Thank You!

  • @user-hu7ov6fi9y
    @user-hu7ov6fi9y 5 років тому +1

    Thanks for sharing this brilliant video online. I would like to know if I want to calculate the coefficients of Firefox as independent variable, which browser should be excluded as a dummy variables? Many thanks

  • @francescoguerra8917
    @francescoguerra8917 6 років тому +1

    Hi Dr D.
    It is possible to run a multiple regression if i have all categorical variables (both my independent variables and my dependent variable are categorical, two-level variables)?

  • @noodzie89
    @noodzie89 11 років тому

    Thank you so much for this video! I was having quite a bit of trouble grasping this concept but now I get it! Great explanation!

  • @getitdone913
    @getitdone913 2 роки тому

    Wow, thank you so much!! I learned so much!

  • @anskrenes
    @anskrenes 9 років тому

    Hi Dr. Delaney, thanks for the video! Would these rules apply for moderation? For instance, if the predictor had many dummy variables, the outcome didn't, and the moderator didn't, would it work the same way? Thank you!

  • @eagles51593
    @eagles51593 9 років тому +1

    thank you so much!! best tutorial I've seen by far

  • @Andy1311100
    @Andy1311100 10 років тому

    I have a question about general (non-linear) multiple regression. I understand general MR just needs to change x and y into some functions. But my question is: do I need to change the cross dummy into the same functions as well? Take your data as an example, if I use 1/educ as the new x, for the educ*fem dummy, should it be 1/educ*1 or still educ*1? Thanks.

  • @yvesliao6004
    @yvesliao6004 5 років тому

    Thank you very much! It's really helpful! But I wonder if we can get the cofficients without a dependent variable and only with two independent dummies in the equation . And how do we apply constraints on the equation? Like for example, we want to examine how much of y is resulting from the factor b, and much of it is a result of factor c, we have a time series of y and the equation: Y=a+b1*d1+b2*d2+...+b50*d50+c1*e1+c2*e2+...+c34*e34, d and c are the dummy variables. The condition is the sum of the weighted b1~b50=0 and the sum of weighted c1~c34=0. In this case, how can we get the series of b1~b50 and c1~c34?

  • @Gamertime5689
    @Gamertime5689 10 років тому

    Dr. D,
    How can I create a dummy variable model, using 1, -1? For example, I want to run a bunch of observations to estimate how the market feels about each NFL team coming into every season. I want to use the Vegas point spread as expected value of " y". I want to assign 1 to the home team and - 1 to the away team. bonus for home, penalty for away. I'm going to essentially run a bunch of " fake " observations with this model to figure out a rough point differential score for each team prior to the beginning of the first game. Can you help me?

  • @kamalbasnet8793
    @kamalbasnet8793 4 роки тому

    In a regression model given as,
    logpgp95i = γ0 + γ1avexpri + γ2 lat absti + γ3africa + γ4 asia + γ5 other + νi
    where logpgp95 is GDP per capita of country in 1995, africa = 1 if country i in Africa, asia = 1 if country i in Asia, and other = 1
    if country i is not in Asia, Africa, or the Americas.
    The regression coefficient for dummy africa is -0.9163864. How to interpret this coefficient? If I interpret "As other factors being equal, African countries have 91.6% less GDP per capita than non-African countries", is it the right interpretation?

  • @econ_drd
    @econ_drd  11 років тому

    \You can use a MLE method to estimate it directly, or nonlinear least squares (Stata has a "nl" command for just such a purpose) but for Cobb-Douglas, I'm not sure why you'd want to. If you have Q = A * K^a * L^b and you take logs, you get ln(Q) = ln(A) + a * ln(K) + b*ln(L) and you can just regress that in a straightforward fashion and get estimates for your production shares...unless you know the error distribution is wrong...but the ease of this is the whole point of using Cobb-Douglas.

  • @limfangwen1102
    @limfangwen1102 10 років тому

    Hi Dr. D. For your example you explained interactions with a quantitative and a dummy variable, so what I understand is that the reference (Firefox or Male) is always omitted. Does this apply to interactions of 2 dummy variables? For instance, I would like to investigate if there is an interaction between Gender and Browser, so for my interactions, will Firefox and male be omitted?
    Regards,
    Fang

    • @econ_drd
      @econ_drd  10 років тому +1

      Dummy variables are just a way to account for every possible combination, to allow for a full complement of different intercepts, for example. In the case you mention, Gender (G) and Browser (B), if you want full interactions, you can see that you could have:
      G B = 0 0 (Male, Firefox)
      G B = 0 1 (Male, Chrome)
      G B = 1 0 (Female, Firefox)
      G B = 1 1 (Female, Chrome)
      If you had a third browser, say IE, you'd need to add another dummy just because there are more than 4 combinations, and 2 binary variables can only give you 4 combinations. Lets say we had B1 = 1 if Chrome, B2 = 1 if IE:
      G B1 B2 = 0 0 0 (Male, Firefox)
      G B1 B2 = 0 1 0 (M, Chrome)
      G B1 B2 = 0 0 1 (M , IE)
      G B1 B2 = 1 0 0 (Female , Firefox)
      G B1 B2 = 1 1 0 (F , Chrome)
      G B1 B2 = 1 0 1 (F , IE)
      You can see that we never use 011 or 111, because that would imply Chrome AND IE, which are mutually exclusive by assumption. In principle, though, you should let your intuition help you--you just want a different intercept (or slope term depending on your application) for each case.

  • @emerekek
    @emerekek 5 років тому

    Thanks a lot Dr D. Very insightful

  • @srikarbeechu
    @srikarbeechu 8 років тому

    Hi Jason I have a problem in hand, i exactly do not know the function of a model, but using the dataset i have i must find out the function. i have three inputs in hand and i have an output, i must find relationship between these input variables and find the output. could i have a short guidance over this.

  • @solog10
    @solog10 11 років тому

    Excellent video. Very helpful.

  • @eddiele644
    @eddiele644 4 роки тому

    So when do we actually interact our variables? Is there a way to see if it is necessary or do we just do it and then see if the coefficient on the interaction term is statistically significant?

  • @matthewthomas4620
    @matthewthomas4620 10 років тому

    Thank you so much for this video. I have not seen anything else on the web that concisely explains the underlying math, concept, and real world how to.
    Is it possible to do this type of analysis with grouped data? How would you 'weigh' the groups?

  • @Tattenlieve
    @Tattenlieve 10 років тому +1

    Brilliant video - explained really well !! You mentioned in passing that another one of your series explained some of the theory behind dummy variables. I'm interested in how contrasts can be specified, say whether there is a significant difference between each of the browsers with each other and not just with reference to Firefox as per your example? Thanks again

  • @121mohitkumar
    @121mohitkumar 3 роки тому

    Thank you mate! Really helpful video

  • @nabinabi7007
    @nabinabi7007 10 років тому

    Thank you very much, I think that we are able to center only quantitative variable and not dummy variable. Please i ask if you have other videos about RIDGE regression or PARTIAL LEAST SQUARES regression.

  • @raghavmodi7709
    @raghavmodi7709 3 роки тому

    Sir ,
    In case of browser, if we introduce a fourth dummy variable for firefox ( which is against the theory ),then what difference will it make?

  • @chaopazu4951
    @chaopazu4951 9 років тому

    Hi Dr D, I am wondering whether I can look at the interaction between 2 dummy variables? Thanks,.

  • @avinashpoojari9372
    @avinashpoojari9372 8 років тому

    Hello, there are 3 separate dummy variable columns for internet E, safari, chrome...
    is there any choice to take these 3 in a single column with giving discrete values like 0,1,2....please help me over finding this

  • @rachaelmorrison5584
    @rachaelmorrison5584 7 років тому

    Hi there, when I have a dummy variable, a continuous variable and interaction term, does the coefficient of the dummy variable still indicate the results of when it equals 1 (regardless of the continuous variable) unlike the coefficient for the continuous variable, which only represents the values for the continuous variable when dummy =0?

  • @mukalumasaki4558
    @mukalumasaki4558 9 років тому

    Thanks a lot for this video! Very clear and explicit. Great job. :)

  • @twolittlefish
    @twolittlefish 11 років тому

    Big thanks from Switzerland!

  • @danielweaver4372
    @danielweaver4372 10 років тому

    can your dependent variable be categorical? for example if my hypothesis is that males are more likely to use chrome than females. (relationship between gender and browser) both coded categorical variables.

  • @imamsuhadisuhadi8195
    @imamsuhadisuhadi8195 2 роки тому

    Thank you very much for this video .. Very clear

  • @daniaakbar5421
    @daniaakbar5421 6 років тому

    Thanx for informative video.it really helped me.I have some questiins .I want to fit quadratic model with one categorical and one continues variable including interaction term and squared term.but minitab software did not take the square term of categorical variable.can u plx explain me why is it so?.and my second question is I want to know the theory behind model fitting with categorical variables along with the procedure to estimate regression coefficients. Help me from where I can find the material. Thanks in advance

  • @outofthebox5226
    @outofthebox5226 10 років тому

    Sir, I'm having one dependent variable and eight independent variables. can i use regression to see which one of the independent variable contribute more to the dependent variable?

  • @nialldevine1156
    @nialldevine1156 9 років тому

    ridiculously helpful video, thank you

  • @ajaxvi
    @ajaxvi 4 роки тому +1

    Brilliant!

  • @aminurabiuladodo4127
    @aminurabiuladodo4127 11 років тому

    Thanks for the tutorial
    Please I want to do regression analysis between waiting time in a restaurant and profit made to find out if automated system can reduce the waiting time. What are the datas I need to collect?

    • @econ_drd
      @econ_drd  10 років тому

      You would need: Waiting time and whether the associated waiting time was using the automated system. You don't even need to use regression if it's just System A v. System B. You can make fewer assumptions and use a 2-sample t-test, or MANY fewer assumptions and use something like a Mann-Whitney (Wilcoxon) test if all you care about is the average, or a two-sample Kolmogorov-Smirnov test if you want the full distributional test.

  • @lamaung
    @lamaung 11 років тому

    Thanks for your wonderful explanation !!!

  • @TritoneTelephone
    @TritoneTelephone 10 років тому

    Shouldn't the regression equation include the original educ AND browser variables when testing interactions?

  • @econ_drd
    @econ_drd  11 років тому +1

    Hi Katie,
    For qualitative variables, you want to use a dummy variable. I have several videos on the topic. I hope that helps!
    Best regards,
    Dr. D.

  • @CraftingDepths
    @CraftingDepths 8 років тому

    at 26:55 how did you/he insert the colums so quick? what is the shortcut for that? thx!

  • @aaf882010
    @aaf882010 10 років тому

    hi, it's very helpful :) . Please I want to do regression analysis between the home prices and if it's affected by the bank interest , in addition i have some other variables which will be included , such as Population , wages ... but i want to check the relation between interest and prices ... how can i do that ? thanks a lot

  • @econ_drd
    @econ_drd  11 років тому +3

    Thanks! I'm glad it was helpful!

  • @maimunahjohari9229
    @maimunahjohari9229 9 років тому

    Thanks a lot Dr Delaney, really helpful!

  • @arnabdada07
    @arnabdada07 9 років тому

    Hi Jason, here years of education is an independent variable right? and if that is the case, then how can we put it in the X range while doing the regression?

    • @econ_drd
      @econ_drd  9 років тому

      Independent variables all go on the right hand side (i.e. are x's). Dependent variables go on the left (i.e. are y's). If you're concerned about endogeneity (probably not a huge issue in this application), you would want to take a different modeling approach.

  • @dougmisenheimer9289
    @dougmisenheimer9289 Рік тому

    Because I’m in a Managerial Decision Making class and we have some problems to solve. I need some help! It’s a combo of statistics and business calc.

  • @nletizio
    @nletizio 9 років тому

    Excellent video, thanks!

  • @RanaSharif
    @RanaSharif 11 років тому

    cab interactions be between two dummy variables like in example female has mobile, and how we can write the equation

  • @4620extensa
    @4620extensa 6 років тому

    Could u tell me how to find correlation between 1500 categorical variables after dummy encoding

  • @kathleentolentino5077
    @kathleentolentino5077 8 років тому

    A really great tool! Thank you!

  • @md.sakilmahmud4751
    @md.sakilmahmud4751 8 років тому

    It's very good video .Thanks for help .

  • @Crimau12000
    @Crimau12000 9 років тому

    Thank you very much. It has been very helpful

  • @tag_of_frank
    @tag_of_frank 6 років тому

    So will this work if x1 is squared, or if we take e to the power of a constant times x1? : (e^A*x1)

  • @andrewmiller9441
    @andrewmiller9441 8 років тому

    Hi, I am interested in learning how to graph a liner regression for 3 variables. as in is weight a function of height and thickness.

  • @ASOT666
    @ASOT666 8 років тому

    Great video !

  • @ndubuisimachebe764
    @ndubuisimachebe764 9 років тому

    Thanks. The video has been very helpful!!

  • @kathyyue370
    @kathyyue370 11 років тому

    Thanks for the video! My question is - for the later variables such as Male Female, if you are analyzing just gender, why do you still include the previous variables in the regression table? Does that make a difference? I think you said "holding all else constant"?

  • @adityavedam1174
    @adityavedam1174 6 років тому

    Jason, please share the data set used in the video if possible

  • @dutchboybmx
    @dutchboybmx 6 років тому

    Thank you so much!

  • @jrippee05
    @jrippee05 7 років тому

    Good video.
    You should have posted the data set so we could follow along.
    Thanks.

  • @limfangwen1102
    @limfangwen1102 11 років тому

    Hi Dr D,
    This is a great video! Can I ask, for the last example of everything, if we found that some variables are statistically significant, and others are not, is it a good idea to run another regression analysis of only those significant variables?
    Kind Regards,
    Fang

  • @dtumpal6671
    @dtumpal6671 11 років тому

    After putting the interactions there, I found that one of the main effects became not significant (which previously was significant). How do we interpret this? Thanks in advanced.

  • @oluwatoba11
    @oluwatoba11 11 років тому

    Hey Jason,
    This is excellent and really helpful. Thanks. Moreover, I'd like to ask for more. Could you please do a video on exponential regression with multiple variables? E.g., the Cobb-Douglas function. I am ware you could do a log-linear but is there a way of doing this directly?

  • @NamTran-jd6lp
    @NamTran-jd6lp 2 роки тому

    life savior!!!

  • @YellowDog00
    @YellowDog00 7 років тому

    Hi Jason,
    First of all thank you for your great video.
    I have a question as to why we need an omitted variable? In your video, you didn't develop a dummy variable for Firefox. May I ask why?

    • @AmanyFaroun
      @AmanyFaroun 7 років тому

      I need an answer to your question too. How would I know the effect of FireFox?

  • @maimunahjohari9229
    @maimunahjohari9229 9 років тому

    Hi Dr Delaney,
    Can we have interaction like this for example, Educ x IE X FEM which means Education across Internet explorer browser and across female? Thanks.

    • @econ_drd
      @econ_drd  9 років тому +1

      Yes, and then you need to compare the value of that estimated coefficient to that of the particular comparison group you care about.

  • @5522Katie
    @5522Katie 11 років тому

    Thank-you for this video, it really helped me in my project. I have a question though: how would you do this analysis if y were qualitative (i.e., y is either yes or no?)

  • @remyrulez
    @remyrulez 10 років тому +1

    How did you do that 10:05? Filling them down so quickly?

    • @ফকিরতালিব
      @ফকিরতালিব 5 років тому

      Suppose your cursor is at E3. Now, do this step by step - 1) Ctrl+Down arrow to be in the last row of data 2) press Right arrow once to be in the F column 3) Shif+Ctrl+Up arrow to select all the cells above up to cell F2 that has the formula you want to fill down 4) Ctrl+D. All these take small time when you are efficient enough.

  • @lory198
    @lory198 8 років тому

    hey Jason where can I get your data from or create automatically data like that?

  • @sanjayserene
    @sanjayserene 11 років тому

    but can you please say me, what happen if there are two independent variables(non categorical) and 2 dummy variables? in the above exam there is only one independent variable that is education, but what happen if there would be another one independent variable?

  • @waleolusi8875
    @waleolusi8875 11 років тому

    This is really helpful thanks a lot. do you have other videos on working with eviews, and stuffs like that. thanks again

  • @martaarteaga5950
    @martaarteaga5950 10 років тому

    Great video it is very helpful!

  • @naderbensheikhbrahim2195
    @naderbensheikhbrahim2195 8 років тому

    would u help me ith a regression model interpretation?
    Where can I send it to u for your review?

  • @natyelisassis
    @natyelisassis 11 років тому

    WOW! GREAT video, and you have my respect Jason! You are GOOD! =)

  • @manhoor1200
    @manhoor1200 5 років тому

    thank you for this video

  • @nabinabi7007
    @nabinabi7007 10 років тому

    Thanks for the tutorial, but what about multicollinearity? you have variables in interactions, so May be the VIF is more than 10.

    • @econ_drd
      @econ_drd  10 років тому

      The short answer is get more data. :D
      Yeah, including interaction terms can definitely lead to higher VIFs, but it's generally not something to be concerned about with interactions of dummy variables. If you are concerned, you can recenter the variable, particularly if it's a quantitative variable you're interacting with a dummy. But dealing with collinearity is more concerning with other, ostensibly unrelated variables than with interactions, in which the relationship is explicitly stated.

  • @reidclanton4081
    @reidclanton4081 4 роки тому +1

    17.37 is interactions for those who want to know

  • @econ_drd
    @econ_drd  11 років тому

    Hi TheMasterkyle79. Fair enough. I recommend the video on interpreting models which may help clear things up. Good luck and let me know if I can help at all!

  • @luokitty9016
    @luokitty9016 11 років тому

    hey! I am a fresh man who just studied regression and spss for my dissertation. I do a questionnaire research with 45 items. I should connect 2-4 items to get one independent variable or .dependent variable. and if possible, I need to connect 2-4 independent variable to get second-order independent variable, and found out its relationship with another second-order dependent variable.is that possible to achieve with spss?

  • @abbas8646
    @abbas8646 10 років тому

    Thank you that is a great video

  • @somjitbanerjee3003
    @somjitbanerjee3003 3 роки тому

    Thank you!

  • @kirstinegan2572
    @kirstinegan2572 11 років тому +1

    Hey Jason,
    Thank you for this video, it was very helpful! I do have a question though... how would you run a regressional model if your dependent variable was also a categorical variable?
    Thanks!!

  • @econ_drd
    @econ_drd  11 років тому

    Hi Kirstin,
    You would want to use a dummy variable. If you search youtube for "dummy variable" you should find a few videos (some of which are mine). Good luck!
    --Dr. D.

  • @lionelpipper1992
    @lionelpipper1992 11 років тому

    Thank you, very helpful!