Checking Linear Regression Assumptions in R | R Tutorial 5.2 | MarinStatsLectures

Поділитися
Вставка
  • Опубліковано 12 лис 2013
  • Checking Linear Regression Assumptions in R: Learn how to check the linearity assumption, constant variance (homoscedasticity) and the assumption of normality for a regression model in R. To learn more about Linear Regression Concept and with R (bit.ly/2z8fXg1); 💻 For the free Practice Dataset: (bit.ly/2rOfgEJ) 👍🏼Best Statistics & R Programming Tutorials: ( goo.gl/4vDQzT )
    ►► Like to support us? You can Donate (bit.ly/2CWxnP2), Share our Videos, Leave us a Comment, Give us a Like or Write us a Review! Either way, We Thank You!
    How to test linear regression assumptions in R?
    In this R tutorial, we will first go over some of the concepts for linear regression like how to add a regression line, how to interpret the regression line (predicted or fitted Y value, the mean of Y given X), how to interpret the residuals or errors (the difference between observed Y value and the predicted or fitted Y value) and the assumptions when fitting a linear regression model.
    Then we will discuss the regression diagnostic plots in R, the reason for making diagnostic plots, and how to produce these plots in R; You will learn to check the linearity assumption and constant variance (homoscedasticity) for a regression model with residual plots in R and test the assumption of normality in R with QQ (Quantile Quantile) plots. You will also learn to check the constant variance assumption for data with non-constant variance in R, produce and interpret residual plots, QQ plots, and scatterplots for data with non-constant variance, and produce and interpret residual plots, QQ plots, and scatterplots for data with non-linear relationship in R.
    ■ Table of Content:
    0:00:29 Introducing the data used in this video
    0:00:49 How to fit a Linear Regression Model in R?
    0:01:03 how to produce the summary of the linear regression model in R?
    0:01:15 How to add a regression line to the plot in R?
    0:01:24 How to interpret the regression line?
    0:01:43 How to interpret the residuals or errors?
    0:01:53 where to find the Residual Standard Error (Standard Deviation of Residuals) in R
    0:02:14 What are the assumptions when fitting a linear regression model and how to check these assumptions
    0:03:01 What are the built-in regression diagnostic plots in R and how to produce them
    0:03:24 How to use Residual Plot for testing linear regression assumptions in R
    0:03:50 How to use QQ-Plot in R to test linear regression assumptions
    0:04:33 How to produce multiple plots on one screen in R
    0:05:00 How to check constant variance assumption for data with non-constant variance in R
    0:05:12 How to produce and interpret a Scatterplot and regression line for data with non-constant variance
    0:05:40 How to produce and interpret the Residual plot for data with non-constant variance in R
    0:06:02 How to produce and interpret the QQ plot for data with non-constant variance in R
    0:06:12 How to produce and interpret a Scatterplot with regression line for data with non-linear relationship in R
    0:06:40 How to produce and interpret the Residual plot for a data with non-linear relationship in R
    0:06:52 How to produce and interpret the QQ plot for a data with non-linear relationship in R
    0:07:02 what is the reason for making diagnostic plots
    ►► Watch More:
    ►Linear Regression Concept and Linear Regression with R Series: bit.ly/2z8fXg1
    ►Simple Linear Regression Concept • Simple Linear Regressi...
    ►Nonlinearity in Linear Regression • Linearity and Nonlinea...
    ► R Squared of Coefficient of Determination • R Squared or Coefficie...
    ► Linear Regression in R Complete Series bit.ly/1iytAtm
    ► Intro to Statistics Course: bit.ly/2SQOxDH
    ►Data Science with R bit.ly/1A1Pixc
    🤳🏽Follow MarinStatsLectures
    Subscribe: goo.gl/4vDQzT
    website: statslectures.com
    Facebook:goo.gl/qYQavS
    Twitter:goo.gl/393AQG
    Instagram: goo.gl/fdPiDn
    Our Team:
    Content Creator: Mike Marin (B.Sc., MSc.) Senior Instructor at UBC.
    Producer and Creative Manager: Ladan Hamadani (B.Sc., BA., MPH)
    These videos are created by #marinstatslectures to support some courses at The University of British Columbia (UBC) (#IntroductoryStatistics and #RVideoTutorials for Health Science Research), although we make all videos available to the everyone everywhere for free.
    Thanks for watching! Have fun and remember that statistics is almost as beautiful as a unicorn!

КОМЕНТАРІ • 152

  • @marinstatlectures
    @marinstatlectures  5 років тому +4

    In this R video we learn about different ways to test regression assumptions using R. We will also talk about the regression diagnostic plots (like residual plots, qq plots, etc) and how to produce these plots with R. Don’t forget to download the practice dataset here (statslectures.com/r-scripts-datasets); For more in-depth explanation of linear regression check our series on linear regression concept and R (bit.ly/2z8fXg1); You can Donate (statslectures.com/support-us), Share our Videos, Leave us a Comment, Give us a Like or Write us a Review! Either way, We Thank You!

    • @Frenchkisssss
      @Frenchkisssss 4 роки тому

      I work in a contact centre and i have a daily Service Level = 70% of the calls answered within 30 seconds... my variables X are Volume, AHT, Staffing level, OT, external support... my R = 0.50, im trying to predict future Service Levels using the multi linear regression but im getting that big cloud of residual vs fitted... what do you guys suggest? What model should i use to predict my Service Level?

    • @shashinijayakody4187
      @shashinijayakody4187 3 роки тому

      Hey , I need soon reply , Can I use regression assumptions for analyze the trends of livestock sectors over the 30 years

  • @IRockLikeISaid
    @IRockLikeISaid 8 років тому +46

    saved my life. this video is sacred.

  • @RandolphAbelardo
    @RandolphAbelardo 8 років тому +37

    i still remember the LINE mnemonic used for the Regression Assumptions during the six sigma black belt course :)
    L- linearity
    I- independence
    N- normality
    E- equal variance (homoscedasticity)
    again, thanks so much for the video tutorials, +MarinStatsLectures !

  • @annaoctavia6487
    @annaoctavia6487 4 роки тому +2

    In times like this, when learning from home is the only possibility, I'm really thankful for amazing virtual teachers like yourself. I've been looking at countless videos to learn R statistics, and yours are the ones that can make me really understand all these stuff. So thank you thank you thank you!!! (seriously, I can't express this enough) :D

  • @AnishPhilip100
    @AnishPhilip100 6 років тому +5

    This is superb. I can't stress how good the content on this video is to internalize some of the ambiguously thought concepts in Multiple OLS estimaters. Thanks Martin!

  • @vmedisetty
    @vmedisetty 5 років тому +2

    Seriously, I don't know how many times I should thank you for this video.

    • @marinstatlectures
      @marinstatlectures  5 років тому +1

      we don't have an upper limit for that, but once is sufficient ;)

  • @pat5690
    @pat5690 6 років тому +2

    this video is amazingly helpful, THANK YOU MarinStatsLectures!!!

  • @christinap802
    @christinap802 4 роки тому +1

    Wow, thank you so much. Your videos are brilliant. Clear, well organized, and extremely helpful! Thank you!

  • @him4u324
    @him4u324 9 років тому +1

    This video was very helpful. Looking forward to watching more of your videos on model building. Thanks

  • @Icanflyrc2
    @Icanflyrc2 10 років тому +1

    I highly appreciate the work you are doing.

  • @jaycdan1147
    @jaycdan1147 3 роки тому

    Many thanks! This video made things so crystal clear. It was also professionally edited!

  • @ilikedetectives
    @ilikedetectives 6 років тому +3

    Thank you so much! This video is a life-saver!

  • @kibagamij777
    @kibagamij777 8 років тому +2

    One of the most useful videos on linear regression I have seen. Thank you very much!

    • @marinstatlectures
      @marinstatlectures  8 років тому +1

      you're welcome +José P. Barrantes , happy to hear you found it helpful!

  • @rodzhouri
    @rodzhouri 7 років тому

    Life saver, great teaching method.
    Very well made video, learned to use R with you.
    Thanks for sharing your knowledge!

  • @TheEbinocracy
    @TheEbinocracy 5 років тому +1

    Wow perfect video! i'm glad this is the first result for lm model diagnostics! Helped me out a ton.

  • @tejasvi0claw
    @tejasvi0claw 7 років тому

    This video is a lifesaver. Thank you sir, for your work.

  • @Gary1964muslim
    @Gary1964muslim 4 роки тому

    thank you, thank you, thank you!! that even there are any dislikes on your videos in this series is mind boggling!!!!!

  • @adamdaniels9673
    @adamdaniels9673 8 років тому +1

    Nice, simply explained. More stats vids should be done this way!

  • @kalimbasimba579
    @kalimbasimba579 5 місяців тому

    Thanks so much for this, this is perfectly explained!!

  • @kunamate
    @kunamate 7 років тому

    Very useful video, thank you ! My homeworks are required to be written in R in this semester, and I've never learnt R before.. I tried to write some code, but even if they worked, I didnt know what these plots are meaning.. Now I finally understand a bit more what they represent :)

  • @calebterrelorellana2478
    @calebterrelorellana2478 3 роки тому

    Thanks! excelent! very easy to understand!

  • @ellieechoes
    @ellieechoes 5 років тому +3

    This video, thank you.

  • @foedeer
    @foedeer 5 років тому +2

    thank you so much for the effort in explaining this in simple terms :)

  • @samanthakay5385
    @samanthakay5385 9 років тому

    Really great! You explained this wonderfully. Thanks.

  • @bonzai_303
    @bonzai_303 5 років тому +1

    Huge thanks!

  • @priscarogi2183
    @priscarogi2183 4 місяці тому

    I love you so much for this video :)

  • @wynnhunter5481
    @wynnhunter5481 9 років тому

    Great explanation of a difficult concept. Thanks, Mike!

  • @kangjingtan7312
    @kangjingtan7312 8 років тому

    Brilliant video ! very concise and well-explained !

  • @ashulouis309
    @ashulouis309 2 роки тому

    Awesome tutorial

  • @marinstatlectures
    @marinstatlectures  8 років тому +2

    Hi +Liza Kittinger , you have your settings set so that i can not reply to your comment, so i hope you see my reply here. I'm not sure what type of smoother they use for the red line shown in the residual plot at 3:47 of the video. but regardless of whether they are using a spline or loess (or some other smoother) it is just there to help visualize whether there is any pattern in the residuals, and they exact type of smoother used shouldn't make any difference.

  • @basharnablsi6984
    @basharnablsi6984 10 років тому +2

    many thanks your work help a lot :) , its highly appreciated

  • @russell2016
    @russell2016 5 років тому +1

    Great video! Thanks so much :)

  • @counterlee251
    @counterlee251 5 років тому +1

    great video, I feel like I saved a lot of time by watching this

  • @AbrarAhmed10
    @AbrarAhmed10 8 років тому +2

    Thank you sir, I have an exam on Design of Experiments tomorrow, and this was by far the most helpful video I have seen today.

  • @liliang9462
    @liliang9462 8 років тому

    The best video about how to learn R in website. I hope that there are more videos can be display.

    • @marinstatlectures
      @marinstatlectures  8 років тому

      Hi +li liang , thanks! we currently have about 50 videos teaching R on our channel, and we plan to continually add more as we have time to develop them. thanks for your support!

  • @MOB_AMG
    @MOB_AMG 10 років тому +2

    Excellent video thanks :)

  • @muandbao
    @muandbao 4 роки тому

    much more cleared explanations than my professor! thank u!

    • @marinstatlectures
      @marinstatlectures  4 роки тому

      good to hear! we have videos covering pretty much all of intro stats, and more, so feel free to check out more of our content to supplement your own course lecture material :)

  • @himanshu8006
    @himanshu8006 5 років тому +1

    thanks a lot, you are to the point and very clear......

  • @eirinla1140
    @eirinla1140 4 роки тому

    You are a hero

  • @alok6381
    @alok6381 8 років тому

    Thank you! Its very helpful

  • @nasimaakter4164
    @nasimaakter4164 4 роки тому

    Thanks a lot, really helpful!

  • @dennis_buehl
    @dennis_buehl 3 роки тому

    Absolutely brilliant - a must for students.

  • @ashwininarvekar2113
    @ashwininarvekar2113 6 років тому +1

    Videos are very useful. Would highly appreciate if you could add ridge regression, piecewise and step function use as well. Could survive my coursework because of your videos
    Thanks a lot

  • @angeld5093
    @angeld5093 7 років тому

    Thank you very much!

  • @mazoum
    @mazoum 5 років тому +2

    Thank you Mike, It's great work. Can you please upload data for x, y and for xx,yy?

  • @rlpatrao
    @rlpatrao 9 років тому

    Thank you very much for the informative video

  • @yao_barna
    @yao_barna 6 років тому

    Hi Mike, thanks for the video!! I have a question related to model's assumptions:
    Is there any test I can use when running a GLM model, where I meet the homoscedasticity assumption but not normality?
    If it was a lineal model like ANOVA or regression I would use a non-parametric test like Kruskall-Wallis, is there something similar for GLM?
    Thanks!!

  • @juhyeonjang4660
    @juhyeonjang4660 5 років тому +1

    You're the best teacher ever... I am paying $ 40,000 a year for nothing...

    • @marinstatlectures
      @marinstatlectures  5 років тому +1

      thanks @Juhyeon Jang. Glad that you find these videos helpful. Please share these tutorials with others and help us reach more people

    • @juhyeonjang4660
      @juhyeonjang4660 5 років тому

      Yes! I will definitely do it. Do you have videos for transformation of regression model in case the assumptions are violated?

  • @yuvenmuniandy8202
    @yuvenmuniandy8202 7 років тому

    Hi Mike Thank you for this informational tutorial. Do you have the data set for the xx and x? I am just replicating this whole tutorial in R. It would be nice if the data set for non-constant and non-linear relationship.

  • @dr.shahdathossain1931
    @dr.shahdathossain1931 5 років тому +1

    Yor are great Mike

  • @gilliancheng558
    @gilliancheng558 10 років тому +2

    The explanations were very clear! Thanks! Just wondering if you have a tutorial on how to transform the data when they violate the assumptions such that we can still proceed with a regression.

    • @marinstatlectures
      @marinstatlectures  10 років тому +4

      Thanks Gillian Cheng ! At the moment, we don't have that created. Its on our to-do list, but there's so many topics we want to cover that it will take time to really round out this series. Until we get to that, here's some suggestions...if there is a non-linear relationship you can try transforming X using ln(x), or you can fit a polynomial model that uses X and X^2 (if you do, its a good idea to centre X before squaring it), or you can also convert X into a categorical variable (say, if X=age, create age categories to use instead)...making X categorical is a bit of a "work-around" as categorical variables are not subject to the linearity assumption, although this does result in losing some info when making a numeric X into a categorical one. In a later video, I talk about how to interpret the model coefficients for categorical variables in a regression model.
      If the constant variance assumption is not, you can try "variance-stabilizing" transformations, such as ln(y) or sqrt(y)...the negative of these is that they result in a less interpretable model, as your Y-value in the model is on a different scale than Y itself (e.g., if Y=lung capacity, your model will refer to changes in log-lung-capacity, which is not as interpretable.
      The normality assumption is the least important, and probably fine if you have a large sample size. If the Y's are really non-normal, you may try working with ln(y), or other transformations, although this suffers the same problem of a less interpretable model.
      Once you've tried one of these transformations, you can fit the model using that, and then check the assumptions for that model, and see if it has "fixed the problem"
      hope that helps!

  • @natl4519
    @natl4519 7 років тому

    Hi Mike,
    Thanks for the helpful video. However, you don't mention collinearity among independent variables. What's your take on that? Is it correct to use VIFs to remove collinear variables from the model? if so, which threshold would you recommend? I see there are very different opinions online (4, 5, 10, 20).
    Thanks in advance for your time!

  • @ChunLin_UoE
    @ChunLin_UoE 5 років тому

    Very nice video - clearly explained! Thanks very much, Mike. Could you elaborate a bit more on the third and the fourth model diagnostic graphs? Or do you have separate videos later on? Many thanks again!

    • @marinstatlectures
      @marinstatlectures  5 років тому

      Hi, we actually dont elaborate on the anywhere else at the moment,. here is a quick explanation.
      the 3rd plot is very similar to the 1st one. the X axis' are the same for the 1st and 3rd plot...the difference is that in the 3rd, the y-axis is the square root of the absolute value of the residuals, rather than the residuals themselves like in the 1st plot. the way i think of it is that the 3rd is like the firs plot, except when taking the absolute values of the residuals, it is like taking the 1st plot and folding the plot upward along the y=0 line. by doing this, if there is increasing variance, then the trend line fit to it (shown in red) will move upwards. it is very much the same as the 1st plot, except not makes it a bit more easy to identify increasing (or decreasing) variance.
      the 4th plot shows a few things... "leverage" is a measure of "influence"... high leverage observations have a larger influence over the regression equation....the idea is like that of a "level"....the farther an X value is from the rest of the x-values (if it is outlying in terms of its X value), the more "leverage" it has over the regression line....it is is outlying in terms of its Y value, then it can tilt the line...this plot can help identify influential observations (those that have a late influence over the regression equation) (you can read about things like Leverage, Influence, Cooks Distance, DF Beta, and other related topics.
      hope that helps clarify a bit...it's a bit difficult to do in only writing/text in a comment section, but hopefully that made sense...

  • @NaveenSrikanthPasupuleti
    @NaveenSrikanthPasupuleti 9 років тому

    Hi Mike, the explanations were very helpful.. highly appreciate you work !!
    Do you any videos explaining the regression models and assumptions in detail

    • @marinstatlectures
      @marinstatlectures  9 років тому

      Hi Naveen Srikanth , thanks. at the moment, we don't have other videos talking about regression models, aside form the one on simple linear regression (Linear Regression in R (R Tutorial 5.1)) and multiple linear regression (Multiple Linear Regression in R (R Tutorial 5.3))

  • @randomYtuberr
    @randomYtuberr 3 роки тому

    Hi Marin , which is the dataset you mention pre-loading at time 5:08 in this video to work on non-constant variance and later on for checking non-linearity? Wheres the link to those datasets?Thx

  • @prabpharm07
    @prabpharm07 3 роки тому

    Hi Professor, thank you for the great work you have been doing for all these years. I am trying to learn from your tutorials.
    I have two questions, one related to this tutorial, other a more general question about linear regression:
    1. In the residual or the Normal Q-Q plots, a few points were numbered. I believe these are number of observations which are kind of outliers. Is that correct? Also, do we need to do anything about these observations in our model?
    2. I have been working with a small dataset (n=15) to fit a multiple linear model. Now the output shows that F statistics is significant but t statistics of coefficients is not. I did a little research and found that it might be due to multicollinearity. I am supposed to derive a regression equation (a predictive model) based on this analysis. My question is, should I consider the variables with t statistics not significant, in my predictive model? If not, does that mean that I should remove such variables from my model and do the analysis again? Your insight will really be helpful.

  • @ibrahimKhan-nl6of
    @ibrahimKhan-nl6of Рік тому

    Hi Mike Marin..! How can we save/export to a table form in excel or word the result of regression model. like how to export from R the object "mod" in your code..?

  • @noramirahsalim2508
    @noramirahsalim2508 6 років тому

    can i test the regression when then data are all categorical?

  • @biancawiersema9016
    @biancawiersema9016 5 років тому

    Hi Marin,
    Where can i find the dataset ''dd'' with increased variance?
    Best,
    Bianca

  • @orrfrenkel5229
    @orrfrenkel5229 9 років тому +3

    Thanks a lot for the video! (and for the others you shared).
    I was also wondering: can you mark the confidence intervals around the abline? I've been trying to for a long time with little sucess...

    • @marinstatlectures
      @marinstatlectures  9 років тому

      Hi Orr Frenkel , thanks! sure, you can plot those without too much difficulty. there are many different ways to do it. i will use the *predict* command to get the confidence limits, and then add a smooth line thru them using the *smooth.spline* command. there are, of course, other ways to get this done. also worth noting is the difference between the confidence interval (which refers to the mean of the population) and prediction interval (which refers to individual observations)...i will show how to do both below.
      # make the plot
      plot(x,y)
      # fit the model
      mod

    • @orrfrenkel5229
      @orrfrenkel5229 9 років тому +1

      That's Awsome! I can't thank you enough, really. been trying for hours to draw these line for hours before.

    • @marinstatlectures
      @marinstatlectures  9 років тому

      no problem Orr Frenkel , happy to help

  • @dimplethanvi6363
    @dimplethanvi6363 4 роки тому

    Why is it that in the Residuals vs Fitted Values plot, a horizontal flat line suggests that linearity assumption is met? What's the logical explanation for it?

  • @sarbajitg
    @sarbajitg 3 роки тому

    Sir, if I am not wrong but I think 2 assumption would be " The Y mean of values can be expressed as a liner function of X variable", the term is missing @ 2:48.

  • @lizakittinger1274
    @lizakittinger1274 8 років тому

    is the red line shown at minute 3:47, the lowess line?

  • @andrestellez84
    @andrestellez84 8 років тому +3

    Thanks for the videos, are very helpful.
    Could you please upload the data for x, y, xx, yy.
    I like to repeat what you do. Thank you.

    • @user-fl4ti7ir8e
      @user-fl4ti7ir8e 3 роки тому

      did you get the data set? i'm also looking for it....

  • @esabelroncon4761
    @esabelroncon4761 2 роки тому

    so at 5:59 the linearity assumption is met but the relationship between x & y needing to be linear is not met?

  • @karolinakaminska8692
    @karolinakaminska8692 3 роки тому

    I am having trouble getting abline worked with only 19 samples. The line is not showing up. Do you know why?

  • @faguinhosutel
    @faguinhosutel 6 років тому

    plot(x,y) return the error: Error in plot(x, y) : object 'x' not found, X and Y are defined before? [RStudio Version 1.1.383 | Macintosh; Intel Mac OS X 10_13_2]

  • @econ8134
    @econ8134 5 років тому

    You showed some nice graphics. I expected to learn more about testing the assumptions numerically.

    • @marinstatlectures
      @marinstatlectures  5 років тому

      The graphical approach is generally more useful. For example, if you were to decide that a relationship is not linear, the plots can help you decide what shape it may be. Similar to variance...if you decide it is not constant, the plot can help you decide if there is increasing or decreasing variance. Tests are generally overly simplistic, and result in a yes/no answer, but don’t offer much in terms of resolving issues.

    • @econ8134
      @econ8134 5 років тому

      True, but tests are comparable. If you work with data that you need for decision making, tests help you in a sense that they are comparable. And by the way yes / no is what you need mostly for decisions. And I wouldn't reject any graphical appraoch. The two are actually complementary.

  • @sirlujo
    @sirlujo 8 років тому

    Hi Mike, thank you very much for the great series of lectures on R. I am really enjoying the journey. May I ask in which video you created x and y variables that appears in 5.2? Would love to follow it through. Many thanks!

    • @marinstatlectures
      @marinstatlectures  8 років тому

      Hi +sirlujo , you're welcome! i actually didn't create the x/y variables in this video anywhere else. they come from some other data i had, that i use in the video (i had pre-loaded it into R before recording the video). this data isn't available...only the LungCapData is available, under our video in the "SHOW MORE" section....although i believe you already have this data.

    • @sirlujo
      @sirlujo 8 років тому

      Thank you Mike for your response. Can I ask if you continued these series at other virtual places? Would love to follow your lectures. You are a great lecturer.

    • @marinstatlectures
      @marinstatlectures  8 років тому

      Thanks sirlujo . at the moment, all of our prepared lectures are here on UA-cam. we put them here beaches UA-cam has the widest reach, and the videos can be made freely available.

  • @ishfaqkhan7184
    @ishfaqkhan7184 2 роки тому

    Can you Please share the data of X and Y and XX and YY used in the video?

  • @forambarot2317
    @forambarot2317 2 роки тому

    What if the assumptions are not met? do we call it non-linear model?

  • @yatingli2552
    @yatingli2552 8 років тому

    Thank you for your video. I learned a lot. One question: Does the red line on the residuals vs fitted figure show the linearity ? If the red line is zigzag, the linearity assumption is not met?

    • @marinstatlectures
      @marinstatlectures  8 років тому

      Hi +Yating Li , that is correct. the red line should be approximately linear and flat. small deviations from this are ok, but larger curvature indicates there is a non-linearity. what will give you good practice in learning how to see this is to take a look at some simple x-y scatterplots that have non-linear shapes, and see how those show up in the residual vs fitted plot. the residual plot becomes useful when you have multiple X variables, and can not visualize a scatterplot (because there are too many dimensions) and so you must rely on a residual plot to identify non-linearities.

  • @Lew291
    @Lew291 10 років тому

    Hi, I would like to ask if the first residual plot in the video considered as no pattern? because it seems to be lots of vertical points on the model.

    • @marinstatlectures
      @marinstatlectures  10 років тому

      Hi Lew291 , yes, and what I mean by that is that there is not a pattern or relationship between the values on the x-axis (the fitted values) and the values on the y-axis (the residuals). if there is a relationship between these two, then that indicates a problem...the fitted values and the residuals should not be associated, and a plot fo the two should show no relationship/pattern.
      I guess you could say that there is some sort of "pattern" in that the fitted values are taking on a discrete set of values, but this is happening for a reason. the X in the model, Age, only takes on a set of discrete-integer values, and so the predicted/fitted values from the model will only take on a discrete set of values. This is why the x-axis in the plot has a discrete/limited set of values it takes on.
      if you fit a model using Height instead, and take a look at the residual plot of this model, you will not see the same happening, as Height takes on more than just a few values in this dataset.
      hope that clarifies it for you, thanks for watching!

  • @rajithadanda3796
    @rajithadanda3796 10 років тому

    could you explain how we can solve " How well can dependent variable be explained by a multiple regression model including independent variables along with the interactions between them? Remove (if necessary) the non-signi cant covariates and/or non-signi cant interactions from the model till to obtain a nal model whose covariates are all signi cant. Comment on the
    goodness-of- t of the model.?

  • @ginkohsu
    @ginkohsu 8 років тому

    Hi Mike, how can we test independence of error assumption for Indenpedent Anova ??

    • @marinstatlectures
      @marinstatlectures  8 років тому

      Hi +Soutine Coconut , you cant test this with statistical methods. it requires knowledge of the study design, and how the data was collected, in order to determine if the y-values (or the errors) are independent.

  • @plot6021
    @plot6021 9 років тому

    hello! Im wondering if I can test the normality of my residuals using jarque-bera test in R. is there a command like that in r? thank you! very helpful videos, btw!

    • @marinstatlectures
      @marinstatlectures  9 років тому +1

      Hi Norman Jr. Pamisaran , you can, but you have to install the *tseries* package. just use *install.packages("tseries")*, and then load the library *library(tseries)*, and type *?jarque.bera.test* to see the help menu on how to use it.

  • @williamstan1780
    @williamstan1780 Рік тому

    Hi Mike
    If a regression mode has a high p value (say 0.2) and low adjusted R (say 0.002) but their diagnostic plots all looks greats with no violations of any assumptions . Is this regression model practically good to use for prediction ?
    Thanks

    • @marinstatlectures
      @marinstatlectures  Рік тому +1

      Definitely not good. Your model can explain less than 1% of variation in the outcome. It’s pretty much the same as guessing the average Y for everyone. It’s useless for prediction.

    • @williamstan1780
      @williamstan1780 Рік тому

      @@marinstatlectures
      Thanks for the reply. What if it has a good adjusted R (say 0.7) but also high in P ( say 0.2)? Can this regression model good for prediction ?
      Many thanks

  • @akshayrao5894
    @akshayrao5894 10 років тому

    Mike you told that R creates dummy variables automatically while doing linear regression but does this happen when we do non linear regression analysis also?

    • @marinstatlectures
      @marinstatlectures  10 років тому

      Akshay Rao , yes, if you enter a variable that is categorical/factor into a regression command in R, it will create the dummy variables for it. This is true for pretty much all statistical software.

  • @MrMehdies
    @MrMehdies 9 років тому

    Hi Mike, thanks for your nice video. I have a question. Does it works for multiple linear regression too? I mean, could I use this video to check my assumptions in multiple linear regression? Best Regards, Mehdi

    • @marinstatlectures
      @marinstatlectures  9 років тому +1

      Hi Mehdi Eslamifar , yes, you can do the exact same for MLR, and interpret the plots in the exact same way. the only difference is that if you find that there is a non-linearity (when checking the residual plot), you will need to inspect the X variables to find of which one(s) are non-linear with Y. but yes, it will work the exact same way.

    • @MrMehdies
      @MrMehdies 9 років тому

      MarinStatsLectures Thanks for your quick reply.

  • @fitzwilliamlyon8401
    @fitzwilliamlyon8401 8 років тому

    Great tutorial. Just one correction: error and residual are different things.

    • @marinstatlectures
      @marinstatlectures  8 років тому +1

      Hi +Fitzwilliam Lyon , yes. there is a slight difference between the two in that the "error" refers to the deviation from the 'true' mean (the 'true' regression line, which is a theoretical idea) and the residual refers to the deviation from the 'estimated' mean (the estimated regression line). the two are often used interchangeably, but you are correct in saying that one refers to the 'true' values (epsilon) and one refers to the 'estimates from your data' (e)

  • @TheMadKaz
    @TheMadKaz 9 років тому

    If I was looking at Fertiliser effects on crop yield and I had 3 fertilisers, how would I do q-q plots for each one to check normality?

    • @marinstatlectures
      @marinstatlectures  9 років тому

      Hi TheMadKaz , you can use *qqnorm(x)* to produce the plot (inserting your variable name in place of x, of course), and *qqline(x)* to add in the line that the observations should fall on if they're normally distributed. also worth noting is that if you have a pretty large sample size, then you don't need to worry about normality, if the mean is the estimate you're interested in.

    • @TheMadKaz
      @TheMadKaz 9 років тому

      MarinStatsLectures That's great thank you!

    • @TheMadKaz
      @TheMadKaz 9 років тому

      One more question. How do I get R to recognise Fertiliser 1, 2 etc as individual variables so I can create the graph for each one?

    • @marinstatlectures
      @marinstatlectures  9 років тому

      Hi TheMadKaz , you can use square brackets so subset. (e.g.) *qqnorm(x[Fertilizer=="1"])* for fertilizer 1, then *qqnorm(x[Fertilizer=="2"])* for fertilizer 2, and so on.
      i have a video on using square brackets to subset data, if you wanted to learn more about the use of square brackets, you can check that out.

  • @user-dm8zp9ru8h
    @user-dm8zp9ru8h 2 роки тому

    bro i just need rstudio to solve for x for my regression model for my assignment, n now im fricking majoring data science or someshit.

  • @mehdicharife2335
    @mehdicharife2335 Рік тому

    4:06

  • @Frenchkisssss
    @Frenchkisssss 4 роки тому

    I work in a contact centre and i have a daily Service Level = 70% of the calls answered within 30 seconds... my variables X are Volume, AHT, Staffing level, OT, external support... my R = 0.50, im trying to predict future Service Levels using the multi linear regression but im getting that big cloud of residual vs fitted... what do you guys suggest? What model should i use to predict my Service Level?

    • @marinstatlectures
      @marinstatlectures  4 роки тому +1

      Hi, it isn't really possible for me to answer a question like this with such little knowledge. instead, i can make some general statements. first, having a cloud of residuals is a good thing...when a pattern shows up in a residual plot, it gives an indication that certain assumptions are not met (depending on the shape of the pattern), and when they are just a cloud with not real patterns, this indicates that the assumptions for linear regression are met...so this isnt a problem, this is a good thing.
      regarding the type of model, it isnt fully clear to me how your y variable is measured. Linear regression is mainly used for a y variable that is numeric and continuous. if instead you are looking at an outcome variable that takes on a 0/1 value (yes/no), the logistic regression is a good option to consider as a starting point.
      if you can clarify a bit more about your y variable, i may be able to offer some more specific advice.

    • @Frenchkisssss
      @Frenchkisssss 4 роки тому

      Hi, thank you for getting back to me... so my Service Level represents the number of calls answered within 30 seconds... our target is 70%. We have to make sure we answer 70% of our inbound calls within 30 seconds every day.
      My Service level is influenced by multiple variables: Call volume, Average Handled Time, Staffing level, Overtime and External support. Staffing level is my most significant variable if i dont have enough people staffed i wont be able to reach my 70% goal.
      I ran a predictive analysis in R using 1 year of historical data with y(SL)= intercept + aX(volume) +bX1(Staffing) ...
      So i have 2 questions:
      - is it ok for me to use a Multiple linear regression model to predict my future SL assuming that i know the variables
      - if not, do you know any better statistical model to predict that ?
      Thank you again for your help.

  • @flamboyantperson5936
    @flamboyantperson5936 6 років тому

    Hi, where are you? Why don't you make new videos on R? Please come and put new videos. Thanks.

  • @TheGladiator29
    @TheGladiator29 7 років тому +2

    what is x, y, xx, yy ?
    Could you please reply ASAP ?

  • @cyanide4u539
    @cyanide4u539 5 років тому

    Hi , if you remember please I requested for non parametric tests and concepts.................when would you upload that?? i have an exam soon and having difficulties in non parametric......If you please help.............

    • @marinstatlectures
      @marinstatlectures  5 років тому +1

      Hi, we are completing the final edits on one of them, and will be releasing it today

    • @marinstatlectures
      @marinstatlectures  5 років тому

      you can watch the video here : goo.gl/d1WiXn

    • @cyanide4u539
      @cyanide4u539 5 років тому

      @@marinstatlectures I am watching it

  • @11Astonmartin
    @11Astonmartin 4 роки тому

    took me 7 minutes to understand what I didnt in 2 months

  • @raulmartin196
    @raulmartin196 4 роки тому

    well, silly question. Can anybody tell me how to type the "~" symbol?

    • @marinstatlectures
      @marinstatlectures  4 роки тому

      On a “QWERTY” keyboard, it is the key in the upper left hand corner. Just hold down the shift key and hit the key with “~’ “ on it

  • @erchiwang1119
    @erchiwang1119 5 років тому

    业界良心

  • @malakalsabban6715
    @malakalsabban6715 6 років тому

    bla bla bla