Polynomial Regression in R | R Tutorial 5.12 | MarinStatsLectures

Поділитися
Вставка
  • Опубліковано 12 гру 2024

КОМЕНТАРІ • 100

  • @marinstatlectures
    @marinstatlectures  5 років тому +5

    in this video tutorial we learn how to fit the polynomial regression model and assess the regression model in R using the partial F-test with examples. For more in-depth explanation of linear regression check our series on linear regression concept and R (bit.ly/2z8fXg1); Like to support us? You can Donate (statslectures.com/support-us), Share our Videos, Leave us a Comment, Give us a Like or Write us a Review! Either way, We Thank You!

    • @hannukoistinen5329
      @hannukoistinen5329 8 місяців тому

      Actually it is pretty much linear. You can always use log to make it more linear and then make the tests.

  • @alfredkik3675
    @alfredkik3675 3 роки тому +1

    You videos helped me to complete my MSc degree successfully - thank you very much for your very informative videos!

  • @dioszegigergo6770
    @dioszegigergo6770 7 років тому +2

    Dear Marin and Ladan,
    hats off! Clearly explained with such a deep knowledge and human understanding! Thank you very-very much! You=lm (teacher~knowledge+I(statwizard^3)), a talent="TRUE" in your field. Enjoy your life with your family and if you find the time and opportunity, the new series on guiding Us=lm(astronauts~strayed+I(how^2)) in the space of R, is highly welcome!
    All the best,
    Gergo Dioszegi

  • @LuisFuentes1771
    @LuisFuentes1771 6 років тому +1

    Hello Mike,
    With this video I've finished your course of videos of the introduction of R and I don't have the words to express my gratitude. Thanks to your amazing work I've entered the world of data science, and I will continue diving into this wonderful technique full of possibilities. Since I'm an student of Economics this will be incredibly useful.
    You have helped me inmensely without asking for anything, as I'm sure you have thousands of other people who feel equally as thankful
    The world needs more people like you, and I will try to continue the chain of helping others.
    Sincerly from Universidad Carlos III, Madrid,
    Luis

  • @nareshpandey103
    @nareshpandey103 4 роки тому +1

    Thank you for the best tutorial, you have provided the datasheet which made it more beneficial

  • @richaagrawal6677
    @richaagrawal6677 7 років тому

    Hello Mike,
    I am glad you managed to teach all of us with such explanatory and step by step approach. I watched all your series 5 videos and i wish more people can take advantage of your knowledge and skill. Thank you so much. Looking forward to more.
    Regards
    Richa

  • @hapsyottahuang8529
    @hapsyottahuang8529 5 років тому +4

    Really excellent tutorail series. Thank you very much.

  • @anuragkumar2295
    @anuragkumar2295 7 років тому +1

    All the videos are very informative and interactive. Thanks for very much Professor. :)

  • @matgg8207
    @matgg8207 3 роки тому

    This dude is a fxking live saver

  • @huangb6403
    @huangb6403 7 років тому +1

    Thanks for your linear regression series. So helpful!

  • @neofjcn01
    @neofjcn01 3 роки тому

    THANKS MAN, THIS VIDEO WAS SUPER USEFUL

  • @alirostami8794
    @alirostami8794 3 роки тому

    Excellent. Thank you so much for this helpful video. I'm waiting for a new tutorial video.

  • @angeld5093
    @angeld5093 8 років тому +1

    Thank you so much. you are way better than my teacher

  • @patrysia0000
    @patrysia0000 5 років тому +1

    Great video! Thank you!

  • @anuragkumar2295
    @anuragkumar2295 7 років тому +3

    Waiting for your new tutorials on R programming. :)

  • @seeklikas
    @seeklikas 5 років тому +1

    Very clear my friend. Thumb up

  • @md.masumbillah8222
    @md.masumbillah8222 2 роки тому

    well explained thanks for upload

  • @munsirali1896
    @munsirali1896 6 років тому

    Hello Respected Professor Mike Marin, I really appreciate your great tutorials about R. I have watched all of your lectures and paying you more and more gratitude for this great helpful lectures series. And hope it will be continue in future. Wish you have Happy and healthy life. Thank you very much, stay blessed!

  • @eminaker6080
    @eminaker6080 2 роки тому

    You're the best!

  • @ivanalejandrovaciohernande5512
    @ivanalejandrovaciohernande5512 2 місяці тому

    Hi, Mike, What test should I perform on data prior the selection of a Polynomial Model? great video man

  • @user-bh2rd1dz1z
    @user-bh2rd1dz1z 4 роки тому

    Superb. Helped me a lot. Thank you!

  • @siddhft3001
    @siddhft3001 3 роки тому

    This video is gold!

  • @seyedomidshirdelan8609
    @seyedomidshirdelan8609 7 років тому +1

    Thanks for well-featured videos

  • @marziehsafari2440
    @marziehsafari2440 2 роки тому

    That was interesting video comparing first- and second-order of polynomial for linear models, I really liked it. Although I am dealing with a mixed model right now and need to do the same comparison for the fist and second order of polynomial for it, and this does not work for me. Do you have some tutorial video for the mixed model as well? Thanks a lot.

  • @meeadhadi5209
    @meeadhadi5209 Місяць тому

    Hi. Thank you for your great explanation. The page for Dataset & R Script doesn't exist and the provided link doesn't work.

  • @siddharthadas86
    @siddharthadas86 7 років тому

    Fantastic series. Very clear and crisp explanations.Thank you very much again for this. Would it be possible to make some videos on longitudinal data and logistic regression?

  • @dessywl
    @dessywl 5 років тому +1

    Is polynomial regression same with polynomial orthogonal? Thanks!

  • @govindsharma9871
    @govindsharma9871 4 роки тому

    Great help. Thanks a lot.

  • @ericdu8490
    @ericdu8490 5 років тому +1

    Thanks for the tutorial. But may I ask how to use poly() function in multivariable regression? :D

  • @Dr_Finbar
    @Dr_Finbar Рік тому

    When do you use an orthogonal polynomiall rather than a raw poly?

  • @anthonyalanis8119
    @anthonyalanis8119 3 роки тому

    I noticed that the summary output for the cubic model had large p-values for all the coefficients but the multiple R-square still seemed large, the residual error seemed low, and the overall F-statistic was large too, thus we would reject the null (all coefficients=0).
    QUESTION: What should we say about each coefficient since their individual p-values are so high?

  • @ahmet3592
    @ahmet3592 2 роки тому

    Thank you for the nicely explained tutorial. I have a question regarding the Polynomial function. Why do we use the property raw=T in this case? I am currently trying to understand that the multicolinearity is a general problem in this situation since x and x^2 are correlated. The solution to this usually presented by defining raw=F. Therefore by considering only orhtogonal polynomials. But why would orthogonal polynomials only solve the problem of multicolinearity ? Im lost in this field. I hope you can help me out.

  • @HeatherRoseMusician
    @HeatherRoseMusician 5 років тому +1

    Great video! Very helpful :-)

  • @gutzbenj
    @gutzbenj 7 років тому +1

    Very nice! Thank you :)

  • @chaitrabellur8743
    @chaitrabellur8743 2 роки тому

    Thankyou!

  • @Teodorast
    @Teodorast 6 років тому

    Is there any way to control for a variable inside the model? For example, controlling for age

  • @KhaLed-pb4pu
    @KhaLed-pb4pu 5 років тому

    what about the F-statistics p-value (2.2e-16)... what is its significance or importance compared to other p-values of height and height^2? which one should we consider?

    • @marinstatlectures
      @marinstatlectures  5 років тому

      to be honest, none of them are particularly enlightening. the F-stat p-value is testing overall significance of the model....that is somewhat helpful, but it is testing if ALL coefficients are 0...so essentially testing if your model is significantly better than just guessing the mean y-value for everyone (is it better than nothing).
      the p-values for height and height^2 can be misleading as those variables are correlated (as well as can be correlated with other variables in a model), and so their p-values can get inflated by this collinearity.
      the best way to test significance of variables is to compare models with/without a variable included. we have a separate video talking about this here: ua-cam.com/video/G_obrpV70QQ/v-deo.html

  • @camila_braz
    @camila_braz 4 роки тому

    thank you!!

  • @montserratbelinchon3341
    @montserratbelinchon3341 7 років тому +1

    Does anybody know what about if I get a nonsignificant p value for my predictor for the first order polynomial beta and the second order beta is significant?

  • @alexanderbutler2565
    @alexanderbutler2565 6 років тому +1

    Hi, I'm getting the error message "Error in xy.coords(x, y, setLab = FALSE) : 'x' and 'y' lengths differ" when trying to add the regression lines to the original plot. But my data is of the same length. Any advice?

    • @marinstatlectures
      @marinstatlectures  6 років тому

      Hi, if you expand a bit on the exact commands you've entered, i may be able to figure out the issue. the issue is that one of the variables (X and Y) that you are trying to plot has more elements than the other..but without knowing the code you've entered I'm not able to figure out where you've made an error

  • @sabasaghatchi3408
    @sabasaghatchi3408 5 років тому

    Hi Mike. Thanks for your great videos. By adding a polynomial predictor how does the interpretation of it change?

    • @marinstatlectures
      @marinstatlectures  5 років тому

      Well, it becomes harder to interpret the effect of that variable, as the effect of the X variable is not being modelled using a polynomial, and so the effect is not linear....the effect of a 1-unit increased in X on Y is not the same everywhere. one way to provide an interpretation is to take a value for x, calculate Y than calculate the value of Y for x+1....to this for a few different values of x, and this will tell you the effect of a 1-unit increase in x, for specific values of X. that's one way to go, if you wanted to talk about the effect of a 1-unit increase in X on Y.

  • @oacho3
    @oacho3 4 роки тому

    Hi Mike and youtubers, I need to plot two sigmoid curves to the same dataset one for control and one for treatment points. If I subset the x axis it gives me an error. If I do not subset it gives me one line only to fit all the points. Do you have any suggestion to solve this? Thank you!

  • @khairulnizam7439
    @khairulnizam7439 7 років тому

    hello, hi Mike Martin. i have a question for you and hopefully you can help me to answer it. first, this method of polynomial regression in r applicable if i have 3 variables (2 independent variables, and 1 dependent variables). and how to develop it?
    the second one is all the data that possible to use this method or there is any way to verify the data if the data can use this method or not?
    hopefully you can help me Mike, Thanks you

  • @nickfire2k376
    @nickfire2k376 4 роки тому

    thanks alot

  • @diegovirano
    @diegovirano 6 років тому

    thanks a lot for the videos... very helpful.
    I have a question and for the community as well: is there any way to automate the process for selecting the best regression model? instead of running comparing by hand the models. I have a scenario with a lot of variables.

    • @marinstatlectures
      @marinstatlectures  6 років тому

      Hi Diego, there are, but i wouldn't fully recommend them. here's a brief summary of some of that. you can read more about things like *step-wise selection* or *all-subsets*. the key word for these is *automated model selection procedures*. the "stepwise" approach alternates steps (forward and backward) adding and removing variables until you hit a steady state where you can not add/remove any variables. this also requires specifying a "maximum model" (e.g.. will you consider one-way interactions, or two-way interactions, etc). the "all possible subsets" considers every possible model with every possible subset of variables, and chooses the one with the lowest AIC (or lowers BIC if you prefer that).
      the drawback of these is that they are purely automated and dont allow input from the user, and are sort of a "black-box" approach. as an example of what i mean by this, suppose you have a set of data for a bunch of school aged children, and one variable is "age" and another is "grade that they are in". these variables contain almost the exact same info, but are not exactly the same. the one that is selected into your model will be based mostly on chance...i myself would prefer to have some control over which would be in the model (i would personally chose age as i think this is more meaningful than "grade they are in").
      automated procedures also allow for variable to be included/excluded based on chance correlations. by chance, some meaningless variables will always end up correlated with your "Y" variable, and automated procedures will end up including these. i prefer an approach where if i KNOW conceptual that a variable is correlated with Y then i will want to get that into my model, similarly if i KNOW something is not correlated with Y i want to exclude that. these automated procedures let "chance" take a lot of control over your model building and variable selection.
      MY PERSONAL STANCE is that automated selection procedures can be useful as an exploratory tool..to help discover which variable may/may-not be important, but i would always revise a model and variables in it from there, and i wouldn't let an algorithm choose my model...i will combine "what i already know" with "what the data is telling me".
      hope that's helpful...

    • @diegovirano
      @diegovirano 6 років тому

      Hi. Thanks a lot for your answer. It is really helpful in fact. For sure many people will also get a benefit from this answer. I will look for more information related to the topic as suggested. I use it as exploratory tool as well, with a lot of variables sounds like a good idea.
      Thanks a lot again.

  • @witsqafa
    @witsqafa 4 роки тому

    Hi, thanks your video is really useful for me. I have a question, one of my coefficient regresion's p-value is not rejected when I applied linear and non linear regression model, do you have any suggestions for my case? Thanks in advance!

    • @witsqafa
      @witsqafa 4 роки тому

      is that even matter if I keep on using the models?

  • @swiss.girl.travels3301
    @swiss.girl.travels3301 8 років тому

    Hi Mike, I just watched your playlist about regression models in R and it was very helpfull!
    By now, you worked with the lm() function in R, but there are so many others like glm() or lmer() and glmer() from the lme4 package. What are the difference between those models? Certainly it is somehow depending on your data, but how can I find out, which model I should use for my analysis? It would be very great if you may have a tip on what I should focus... Thank you in advance!

    • @marinstatlectures
      @marinstatlectures  8 років тому +2

      Hi +Fa Fa , the others are for entirely different regression models. *lm* is to fit a linear regression (y/outcome is numeric, and assumed normal). *glm* is for generalized linear models, which are a whole class of models on their own. for example, logistic regression is a generalized linear model (for y/outcome that is binary, and assumed binomial), Poisson regression is a generalized linear model (for y/outcome that is a count or rate, and assumed to follow a poisson distribution). and there are many other GLMs. *lme* models are linear mixed effect models, and are often used for longitudinal data. each of these are very large topics on their own.
      in a traditional stats department, there are usually multiple courses offered on GLMs, a full course on longitudinal data analysis, and so forth. so, i can not do these justice in a few short paragraphs.
      the short answer of which to use for your analysis would depend mostly on the type of data (and more importantly, the type of outcome (y) variable that you are working with). the example i use in the videos is for an outcome/y that is lung capacity (which is numeric/continuous) and assumed to be normally distributed, so I'm using linear regression.
      i hope that helps clarify some things.

  • @irondia73
    @irondia73 5 років тому

    Hi Marin, so when polynomial terms are in the model, how do you interpret the coefficients in a valuable way? Intuitively, coefficient makes sense for just "height" but what about "height^2"? Or "height^3" and so forth... thanks!

    • @marinstatlectures
      @marinstatlectures  5 років тому

      in this case, the coefficient doesn't have a simple interpretation...that is because the relationship between X and Y isnt assumed to be something simple like a line (which has a slope with a nice simple interpretation). for X and X^2, the change in y for a 1-unit change in X is NOT the same everywhere....and so you can't have a simple interpretation. if you want to interpret the model coefficients, then there are other options for addressing a non-linearity. one that works and maintains a simple interpretation is to "categorize" the numeric variable (to convert it from numeric into a set of categories).
      we have a separate video talking about the different ways to address a non-linearity, focussing on the concepts of it. im linking to that here in case you wanted to explore that: ua-cam.com/video/tOzwEv0PoZk/v-deo.html

  • @divyaakella8881
    @divyaakella8881 6 років тому +1

    What if when we around 8 independent variables? How to determine x2 / x3 values?

    • @marinstatlectures
      @marinstatlectures  6 років тому

      I'm not sure what you mean by this. if you try clarifying, i may be able to help

  • @betzthomas9693
    @betzthomas9693 4 роки тому

    In polynomial regression do we take log of the Y value Eg:lm(log(Y)~poly(X,2,raw = T)) .

    • @werekorden
      @werekorden 4 роки тому

      good question I am looking for something similar. I need to make a polynomial regression with log10 in x values with poly and I don't get it.

  • @nirmalpurohit4067
    @nirmalpurohit4067 7 років тому

    Hi Mike,
    I have seen somewhere that , we have to divide data into two groups. One for development and second for valiadation/testing. So is it necessary to validate model before presenting it to business peers.
    Please advice.
    Regards
    Nirmal

    • @marinstatlectures
      @marinstatlectures  7 років тому +1

      Hi Nirmal, it depends on the reason for you fitting a model. if you are using the model to make predictions (a predictive model) then you probably want to do some sort of validation of the mode (to ensure that it does make good and reliable predictions). there are lots of packages in R to do different sorts of validation. key words to research are "cross validation", "leave one out validation", and when you search those topics you will come across different sorts of validation methods. cross validation is probably what you want to research the most. good luck!

    • @nirmalpurohit4067
      @nirmalpurohit4067 7 років тому

      Hi mike, Thanks for your time.
      i request you to make a video for the validation of the model. I know you would do in 5 minutes. Other would have take hours to explain same things.
      i hope you will look up to my request.
      Again Thanks for make us understand R easy and faster. Cheers from India.
      Regards,
      Nirmal

  • @anaswahid8520
    @anaswahid8520 4 роки тому

    Sir
    In lungcap vs height
    First you should check the correlation coeff 'r'
    If r=0 then it means no linear relationship
    It means that you can now go for Polynomial regression
    But why you fitted a Polynomial regression
    Here
    r is not 0
    Then why you moved to a Polynomial regression concept

    • @marinstatlectures
      @marinstatlectures  4 роки тому

      Hi, polynomial regression is not for when the correlation is 0, it is an option when there is a relationship, but not a linear one (maybe a bit of a curved/non-linear relationship). We have a video that talks a bit about this: ua-cam.com/video/tOzwEv0PoZk/v-deo.html

  • @TKSGL89
    @TKSGL89 8 років тому +1

    Hi Mike Marin, I'm so sad you stopped making videos! I have a question for you and I hope you can help me (you may start a new series talking about it :)). How do I treat historical datas? I have daily datas for 200 years. I have to plot them all first and then plot only the maximum for each year. And how can I do if I have 365 days in some years and 366 in others? Hope you understand what I mean. Thank you in advance!

    • @TKSGL89
      @TKSGL89 8 років тому

      I checked the function ts() but I got trouble dealing with it (especially with frequency)

    • @marinstatlectures
      @marinstatlectures  8 років тому +3

      Hi +TKSGL89 , thanks! we haven't actually stopped making videos...life has just gotten busy, and we've had to slow down a bit...but we plan on continually making videos for the foreseeable future! we've actually got a few different ones in the works, and a list of topics we want to cover that is WAY too long...there are so many cool topics that could be covered,...just no time!!
      so, that's time series data you've got there, so you'll want to be using time series methods (looks like you've started there, with the ts() function). i wont have time to make anything helpful for you anytime soon, but id suggest to search around for resources for time series in R.
      as for picking out the max for each year, there are different ways to do that, and some of it depends on exactly how your data is. but you should have the *variable* of interest, as well as a *year* variable. to find the max for each year, you would use something like *max(variable[year==2015])* , and this could be done for every year. and you can do this in more efficient ways (like using apply statements, or other ways) once you've coded it in a simple way.
      hope that helps get you started!

    • @TKSGL89
      @TKSGL89 8 років тому +2

      Great to hear you haven't stopped! Thank you for these explanations and for being so quick! I'll be waiting for next videos :) see you soon

  • @balazsgonczy3564
    @balazsgonczy3564 5 років тому

    Why is it good to have orthogonal polynoms? Is it needed in the modell?

    • @balazsgonczy3564
      @balazsgonczy3564 5 років тому

      And yeah your link is disposed. Not avilable.

    • @marinstatlectures
      @marinstatlectures  5 років тому

      hi, it isn't completely necessary, but what the does is it reduces the collinearity between the predictors in the model...because X and X^2 will be highly correlated, and thus their SEs will get inflated...orthogonal polynomials addresses this

    • @marinstatlectures
      @marinstatlectures  5 років тому

      try this one: statslectures.com/r-scripts-datasets

  • @alexslappey2290
    @alexslappey2290 6 років тому

    How do you decide what degree of polynomial you should go to?

    • @marinstatlectures
      @marinstatlectures  6 років тому

      to do a formal test of polynomial terms, you can try comparing a model that uses just "X" to one that also includes "X^2", to test if the model with "X^2" is significantly "better". if it is, then you can compare the model with X, X^2 to a model with X, X^2, X^3 and test if that model is a significant improvement,..and continue until the model does not improve. to do this test you can use the "Partial F test" or "L:likelihood Ratio Test", we have a video showing that here: ua-cam.com/video/G_obrpV70QQ/v-deo.html
      you can also decide conceptually which you think makes sense and begin from there. i work in health research, and most of the time we dont want to go beyond X^2 or maybe up too X^3, as beyond that usually isn't realistic. (e.g.) some things have a sort of exponential growth, and including X^2 may be appropriate. at times including X^3 to allow for another inflection may be relevant...but past that, there aren't many things where you could justify conceptually a relationship up to the power of X^4.
      the most important part in model building is that your model is conceptually sound....and not rely purely on statistical testing...but make sure that your model also makes sense in context.

  • @tolisBFMV
    @tolisBFMV 5 років тому

    I think there is something I am missing here. When running anova for the 2 models, I can get the null hypothesis (not significant difference) but what about the alternative? If there is not significant difference, then couldn't it be that the full model is worse? My question more clearly: With anova, these are always the conditions for the models? That the alternative is that the full model is better? Or could the alt hypothesis be that the full model is worse?
    Thank you.

    • @marinstatlectures
      @marinstatlectures  5 років тому

      yes, the alternative is always that the full model is "better" (it has significantly less unexplained error). adding an unnecessary variable can never increase the SSE (the unexplained error). this test is testing if the larger model has significantly lower unexplained error (lower SSE).

  • @wakjiratesfahun3682
    @wakjiratesfahun3682 3 роки тому

    please upload new tutorials regarding quadratic regression.

  • @CaptainCalculus
    @CaptainCalculus 3 роки тому

    I think R has changed the way it would treat it if you just put x^2 into the equation since this video was made

  • @____darrah____
    @____darrah____ 5 років тому

    What about the multivariate case?

    • @marinstatlectures
      @marinstatlectures  5 років тому

      im a bit unclear what you are asking, but if iyou are asking how to include a polynomial term in a model that also has many X variables it would look like this:
      lm(y ~ X1 + I(X1^2) + X2 + X3 +...)
      the exact same as shown in this video, except including other variables in there as well.
      hope that answers your question

  • @lv274
    @lv274 6 років тому

    Doesn't the inclusion of Height^2 and Height^3 in the model cause multicollinearity?
    BTW you make excellent content, Thank You.

    • @marinstatlectures
      @marinstatlectures  6 років тому

      thanks! yes, including X^2, X^3,... would introduce collinearity between the X, X^2, etc. this may or may not be an issue. first, let me mention that solutions to this are often to either "center the X variable" (i.e. include X and (entered-X)^2 ....this can help reduce the collinearity between the two. you can also use "orthogonal polynomials" to reduce this.
      collinearity between X and X^2 would only really serve to inflate the SE for the coefficients for X and for X^2 (while it wouldn't really affect their coefficients, and the shape of the mode fit), so it is not such a big issue in this sense.

  • @shudanhao8643
    @shudanhao8643 Рік тому

    Thanks. But the data we could download is different from the video you use.

  • @SyedKollol
    @SyedKollol 7 років тому

    IS it possible to find VIF in R?

    • @marinstatlectures
      @marinstatlectures  7 років тому +2

      yes, but it's not in 'base-R' (at least to my knowledge it's not). there are packages that you can install that can get you the VIF and other related things.

  • @콘충이
    @콘충이 4 роки тому

    wow wow

  • @semihardbagels
    @semihardbagels 4 роки тому

    your script link doesnt work.

    • @marinstatlectures
      @marinstatlectures  4 роки тому

      Hi @Semihardbagels . it is fixed now. let us know if you had any trouble with accessing the files.

  • @OlympiaMazzei-t5t
    @OlympiaMazzei-t5t 2 місяці тому

    Kilback Center

  • @SharonAnderson-c8l
    @SharonAnderson-c8l 2 місяці тому

    Walsh Turnpike

  • @LarryBackstrom-s7t
    @LarryBackstrom-s7t 2 місяці тому

    Jacobson Run

  • @hannukoistinen5329
    @hannukoistinen5329 3 місяці тому

    Never studied statistics? This stuff is absolutely linear:). Not even outliers.