Multiple Linear Regression in R | R Tutorial 5.3 | MarinStatsLectures

Поділитися
Вставка
  • Опубліковано 7 вер 2024
  • Multiple Linear Regression Model in R with examples: Learn how to fit the multiple regression model, produce summaries and interpret the outcomes with R! 💻 Find the free Dataset & R Script here (statslectures.... We recommend you first watch the video on simple linear regression concept ( • Simple Linear Regressi... ) and in R ( • Simple Linear Regressi... ) 👍🏼Best Statistics & R Programming Tutorials: ( goo.gl/4vDQzT )
    ►► Like to support us? You can Donate (bit.ly/2CWxnP2), Share our Videos, Leave us a Comment, Give us a Like or Write us a Review! Either way, We Thank You!
    In this R video lecture you will learn to use "lm", "summary", "cor", "confint" functions among others. You will also learn to use "plot" function for producing residual and QQ plots in R.
    We recommend that you first watch our videos on the concept of simple linear regression ( • Simple Linear Regressi... ) and simple linear regression with R ( • Simple Linear Regressi... )
    ◼︎ Table of Content:
    0:00:07 Multiple Linear Regression Model
    0:00:32 How to fit a linear model in R? using the "lm" function
    0:00:36 How to access the help menu in R for multiple linear regression
    0:01:06 How to fit a linear regression model in R with two explanatory or X variables
    0:01:19 How to produce and interpret the summary of linear regression model fit in R
    0:03:16 How to calculate Pearson's correlation between the two variables with R
    0:03:26 How to interpret the collinearity between two variables in R
    0:03:49 How to create a confidence interval for the model coefficients in R? using the "confint" function
    0:03:57 How to interpret the confidence interval for our model's coefficients in R
    0:04:13 How to fit a linear model using all of the X variables in R
    0:04:27 how to check the linear regression model assumptions in R? by examining plots of the residuals or errors using the "plot(model)" function
    ►► Watch More:
    ►Linear Regression Concept and with R bit.ly/2z8fXg1
    ►R Tutorials for Data Science bit.ly/1A1Pixc
    ►Getting Started with R (Series 1): bit.ly/2PkTneg
    ►Graphs and Descriptive Statistics in R (Series 2): bit.ly/2PkTneg
    ►Probability distributions in R (Series 3): bit.ly/2AT3wpI
    ►Bivariate analysis in R (Series 4): bit.ly/2SXvcRi
    ►Linear Regression in R (Series 5): bit.ly/1iytAtm
    ►ANOVA Concept and with R bit.ly/2zBwjgL
    ►Linear Regression Concept and with R bit.ly/2z8fXg1
    ► Intro to Statistics Course: bit.ly/2SQOxDH
    ►Statistics & R Tutorials: Step by Step bit.ly/2Qt075y
    This video is a tutorial for programming in R Statistical Software for beginners, using RStudio.
    Follow MarinStatsLectures
    Subscribe: goo.gl/4vDQzT
    website: statslectures.com
    Facebook:goo.gl/qYQavS
    Twitter:goo.gl/393AQG
    Instagram: goo.gl/fdPiDn
    Our Team:
    Content Creator: Mike Marin (B.Sc., MSc.) Senior Instructor at UBC.
    Producer and Creative Manager: Ladan Hamadani (B.Sc., BA., MPH)
    These videos are created by #marinstatslectures to support some courses at The University of British Columbia (UBC) (#IntroductoryStatistics and #RVideoTutorials for Health Science Research), although we make all videos available to the everyone everywhere for free.
    Thanks for watching! Have fun and remember that statistics is almost as beautiful as a unicorn!

КОМЕНТАРІ • 167

  • @marinstatlectures
    @marinstatlectures  5 років тому +8

    In this tutorial, you will learn to fit a multiple linear regression model in R, produce and interpret the summary of linear regression model fit , produce residual and QQ plots in R, interpret the summary of linear regression model fit, calculate Pearson's correlation between the two variables, interpret the collinearity between two variables, create a confidence interval for the model coefficients in R, check the linear regression model assumptions and more. Download Practice Dataset: (bit.ly/2rOfgEJ); Like to support us? You can Donate statslectures.com/support-us or Share our Videos and help us reach more people!

  • @amitchandak3304
    @amitchandak3304 4 роки тому +4

    Your videos are very insightful and easy to follow. I'm paying $$$ for R course in college, rather I could have learned by following your videos and save my money!

  • @AbheeBrahmnalkar
    @AbheeBrahmnalkar 3 роки тому +7

    This entire series was just LOVELY! Thank you for providing clarity on interpreting various attributes of lm model

  • @marinstatlectures
    @marinstatlectures  10 років тому +11

    Free Video on how to fit and interpret output from a multiple linear regression model in R Multiple Linear Regression in R (R Tutorial 4.12) #stats #rstats

    • @mdev1187
      @mdev1187 10 років тому +1

      Good series, thanks.

    • @marinstatlectures
      @marinstatlectures  10 років тому

      Glad you enjoyed them!

    • @saikumarallaka
      @saikumarallaka 10 років тому

      MarinStatsLectures Where can i find the datasets you are using

    • @marinstatlectures
      @marinstatlectures  10 років тому

      saikumar allaka You can find the dataset in the "About" section of the video, as well as the "About" section of our channel

    • @saikumarallaka
      @saikumarallaka 10 років тому

      MarinStatsLectures Thank you!! .. Can you explain more on logistic regression..

  • @Fletacarling
    @Fletacarling 5 років тому +4

    these videos have literally saved my life for every homework assignment!! Cheers from a linguistics grad student

  • @pschoi92
    @pschoi92 3 роки тому +2

    I have struggled so much in learning r. Your tutorials made learning easy. Thank you!

  • @shakuntalasharma2775
    @shakuntalasharma2775 10 років тому +10

    I have gone through all tutorials. Very clear, well organized. Very helpful.

  • @guannanzou105
    @guannanzou105 8 років тому

    The good part for this video is that it gives very specific details and steps in how to make a model and build up pictures in R. On the other hand, this video seems too complex about some simple parts. It discusses not very much about the background information. But as a learning guide, it seems very useful and efficient for those who just starts get used to the model in R programming.

  • @indzz20
    @indzz20 2 роки тому +1

    Thanks for these videos! Saved me when I was doing my essay that required data analysis🙌

  • @DH1979amsterdam
    @DH1979amsterdam 10 років тому +3

    I've done most tutorials so far. They are clear and well made. Compliments!

  • @KJ-jq1fq
    @KJ-jq1fq Рік тому

    I successfully tuned linear model with a rmse 8 while watching this video. Thank you!

  • @bernardraath7442
    @bernardraath7442 3 роки тому

    Mike, that help function you showed at start is so helpful as a beginner. Thank you.

  • @ZakharovInvest
    @ZakharovInvest 6 років тому +1

    You are just awesome!! I wish I had you as my econometrics professor at school! I am falling in love with econometrics and R language

  • @user-yz7pe3zh7d
    @user-yz7pe3zh7d 3 роки тому

    it's very clear, understandable, beautiful explanation. From Japan.

  • @saulflores5052
    @saulflores5052 3 роки тому

    Thank you! Much clearer than my professor.

  • @siavashaa
    @siavashaa 10 років тому +2

    Just WOW! What an amazing collection you have made! Best videos on Stats.... :)

    • @marinstatlectures
      @marinstatlectures  10 років тому

      Thanks! We'll be working on general intro stats videos over the summer, so keep an eye out for those :-)

  • @mustafanasiri6247
    @mustafanasiri6247 7 років тому +2

    Very useful tutorial and a wonderful, clear and concise video. What if our estimate value in a multi regression model is negative, while p-value still being significant? How do you interpret a negative estimate value for one of the independent variables? Could you please explain a bit?

  • @jiangyanyi1345
    @jiangyanyi1345 8 років тому

    Thank you so much for the great videos. I do not understand what it means when the p value is not significant in the example where age and height are in the model. But later there is a high value of correlation between age and height. Thank you.

  • @sharonregev7266
    @sharonregev7266 2 роки тому +1

    Thank you very much!!! A quick question. Since we’re writing the hypotheses for the F-test as: H0: β1 = β2 = ... = βk = 0, is it possible to use H0:βk=0 for the t-test (instead of H0: βj=0)? Thank you :)

  • @user-hu7ov6fi9y
    @user-hu7ov6fi9y 8 років тому

    Hello Martin, Thanks a lot for your vlogs. They helped me a lot to get a grip on "R". Since I am new to "R", I have a basic question. I have to forecast the number of smart navigation system (Explained Variable) for several countries up to 2020. My explanatory variables are the number of cars and GDP per head in each given country. My explained variables are Austria_SNS, Belgium_SNS, Canada_SNS etc. with 3 data points for each country (2013,2014 and 2015). My explanatory variables are Austria_Cars, Belgium_Cars, Canada_Cars, Austria_GDP, Belgium_GDP, Canada_GDP etc with data from 2000 up to 2020. I would like to run the same model for all individual countries in one go replacing the name of countries with a string (something like X_SNS= X_Cars+X_GDP where X= Austria, Belgium, Canada etc). Thanks for your time in advance, Qmars

  • @elenourchi7492
    @elenourchi7492 3 роки тому +2

    Hello there, great video! I have a question regarding when to omit a variable. On what grounds do we omit an explanatory variable for a multiple linear regression model? Also, if there is an explanatory variable that is a categorical factor will there be 2 equations for the model? Thanks for your help!

  • @yueqiu1656
    @yueqiu1656 5 років тому +1

    Thanks!!! It really helps my statistic final project!!

  • @tomanderson1588
    @tomanderson1588 5 років тому +1

    Thanks for the useful video, Mr Marin.

  • @Bonzari
    @Bonzari 10 років тому +1

    Thank you. I will be using R for work, and it was a good stat refresh as well.

    • @Bonzari
      @Bonzari 10 років тому

      I look forward to more videos on more advanced topics.

    • @marinstatlectures
      @marinstatlectures  10 років тому

      you're welcome, glad you find them useful! we have more in the works, and will be producing them as we find the time to do so :-)

    • @Bonzari
      @Bonzari 10 років тому

      MarinStatsLectures
      Any chance you can do a video on Random Forest?

    • @marinstatlectures
      @marinstatlectures  10 років тому

      Hi Michael Tilrico , i will add it to our to-do list, and hopefully we can get to it some time sooner than later. we have a bit of a list of other video topics on the current slate, and will work on those first over the coming summer.

  • @nicholasbui5952
    @nicholasbui5952 4 роки тому

    how would you interpret the multiple R squared value if say height was not significant but age was.

  • @oacho3
    @oacho3 9 років тому +1

    Hi, thank you so much! So nice of you to share your understanding of stat with the public. I was running some model to explain metabolic rate as a function of 3 variables and their interactions. When i look at the summary table those variables appear not significant. Instead when i look at the ANOVA table those variables become significant. Is this normal or am I missing something? I would like to hear your perspective on this. Thanks!

  • @LemesCristiano
    @LemesCristiano 9 років тому +1

    Very helpful and well explained !

  • @Bubblegan
    @Bubblegan 5 років тому +1

    Awesome video. Very informative. Thank you.

    • @marinstatlectures
      @marinstatlectures  5 років тому

      @Liberty you are very welcome! glad you found it helpful.

  • @cassandramah9786
    @cassandramah9786 4 роки тому

    How do we report the model's overall significance using the F-test? If we report using APA:
    F(2, x ) = 1938, p

  • @uttonio
    @uttonio 3 роки тому +2

    Hi Marin, I wanna ask at 3:13 you said: "This is the hypothesis that the slope for Height is 0." Is that a null hypothesis? So when the p-value is below 0.05, this hypothesis is rejected right?

    • @uttonio
      @uttonio 3 роки тому

      I am in dire need of your help. Research deadline comes around the corner.

    • @marinstatlectures
      @marinstatlectures  3 роки тому +1

      That’s correct, Ho: the slope is 0, Ha: slope is not 0. Small o slur allows you to reject Ho and have evidence that Ha is likely true

  • @defneselenakdemir4592
    @defneselenakdemir4592 3 роки тому

    I am trying to do class() but it says object not found even though it shows a summary of the dataset. What can I do? Please helpp

  • @somahousein9008
    @somahousein9008 7 років тому

    Thank you for your great videos! I am looking for your video where you mean centre height and age. I am not sure if I missed it but I have watched all the videos up to 5.12 and was not able to hear where you mention it.

  • @lexirene2286
    @lexirene2286 10 років тому +1

    could you do a video on logistic regression? and choosing variables to make-up the model? this video was extremely informative!

    • @marinstatlectures
      @marinstatlectures  10 років тому +1

      Hi Lexi Rene , thanks! We're planning on extending the series to include Logistic regression, Poisson regression, etc, but that will take some time. You can fit a logistic regression in R using the command: *model

  • @syedsaadali9558
    @syedsaadali9558 5 років тому

    Thank you so much. its very helpful . please explain the multivariate regression analysis

    • @marinstatlectures
      @marinstatlectures  5 років тому

      Hi, this question is way too large to be able to answer here in a comment section. if you had a specific question id be happy to try and answer that

  • @faraym5102
    @faraym5102 6 років тому

    Hi Martin, In this video you have more independent variable, What if we have more dependent variables (e.g. three dependent variables), what kind of statistical analysis can be performed on such type of data ?
    Thanks,
    Faray

  • @superherrera
    @superherrera 7 років тому

    You are really good , thanks for your videos.

  • @arinsadeghi3155
    @arinsadeghi3155 8 років тому

    Thanks a lot,
    You're great teacher.

  • @sdgarcia34
    @sdgarcia34 5 років тому +1

    These are all great and very informative videos! Thank you very much for posting them. Still i am stragling a bit, I would like to check the correlation of let say 8000 genes in 10 different patients that I organized from less sick to more sick (lets say percentage of neuronal loss). There is any way I can get the correlation of the expresion of these genes with the sickness of the patient looking at the r? (ex. 0.99 would be a gene that correlate better with a worse patology, right?) if so, Can I check all the r for each gene using R?

  • @ImranKhan-fu1fu
    @ImranKhan-fu1fu 8 років тому

    Hi Marin, Thankyou for the wonderful videos. You have got great lecture delivering skills and your voice sounds nice. I would request you to prepare few videos for logistic regression(Binary and Multinomial) having both numeric and categorical predictors. Interpretation of coefficients, odds ratio, Graphics. Thanks

    • @marinstatlectures
      @marinstatlectures  7 років тому

      thanks for your comment +Imran Khan , we plan on adding videos for logistic regression (and other generalized linear models) when we have time to create them. it is a topic that is high up on our priority list, although finding the time to write, record, and edit the videos is the challenge!

  • @manizhehra.8452
    @manizhehra.8452 9 років тому

    Thank you, very helpful.
    I have a question regards F-statistic. As you know when a linear model has no intercept, it assums as a nls in R. So we won't have F-statistic in the summary of model. How can we calculate this statistic in R for a no-intercept linear regression?

  • @nhlanhlawandile6600
    @nhlanhlawandile6600 4 роки тому

    Hi. How can I manipulate the input's variables data on MLR to get the higher adjusted R-Square

  • @LLOOTTII19951
    @LLOOTTII19951 6 років тому

    hi, you are talking in min 2.13 about centering the x axis (here age and hight) for a better/ different intercept - where did you explain this? would be wonderful to get help with this! thank you

  • @brandonsignorino7273
    @brandonsignorino7273 Рік тому

    The link in the description to the script and dataset are broken - I was able to find the LungCapData2 file on the site, but do you have a link to the script in this video?

  • @MrIt7
    @MrIt7 10 років тому +1

    Great Video! What software and hardware are you using to do it? I imagine the video is separate the nice highlighting? Also what resolution it looks HD but plays back very nicely on my moderate internet connection? Thanks.

  • @caseybarringer8902
    @caseybarringer8902 4 роки тому

    How do you identify the SSE, SSR, and SST values from these outputs?

  • @ashwathraj6893
    @ashwathraj6893 7 років тому

    hi . after creating the model i want to give input for each variable, since height and age are numeric they are given as numeric itself, BUT how to give input for gender,smoke,Caeserean since these are strings?

  • @leearcher7459
    @leearcher7459 9 років тому

    Hello ! These video clips really helped a lot ! I want to ask a question about how to use R to deal with linear regression problems contain absolute values as following ?
    min 2x1 + 3|x2 − 10|
    s.t. |x1 +2|+|x2|≤15.

  • @mukhlesahrimawi5001
    @mukhlesahrimawi5001 4 роки тому

    my Goooood you guys are awesome. thank u thank u thaaaaank you.

  • @itiabuht
    @itiabuht 7 років тому

    Hi
    I've fitted a regression model to a data that has one indicator variable: 1 and -1. My question is how can I exhibit the equation of y when the indicator variable is 1 and when -1.

  • @91ticktock
    @91ticktock 10 років тому

    Amazing video, sir. Can you please help me out on how to find the Explained sum of squares of the model without using the ANOVA table? I know how to calculate the Residual Sum of Squares, but not the explained sum of squares. Thank you very much in advance.

    • @marinstatlectures
      @marinstatlectures  10 років тому

      Hi Syafiq Din , sure. im not sure why you dont want to just read it off the ANOVA table? (ie) *anova(model)*, and look at it directly from there. but, you can work it out from the model summary if you like, using the fact that *R-square = 1 - (ResidualSS/TotalSS)*, and *TotalSS = ResidualSS + ModelSS*

  • @lilyhwangathotmail
    @lilyhwangathotmail 8 років тому

    @3:00,.... Stats question, regarding the t-statistics for Age and Height (ignoring that the intercept is technically at -11.74). Does this mean technically that because the t-value for Age and Height are 3.45e-12 and

    • @marinstatlectures
      @marinstatlectures  8 років тому +1

      Hi +lilyhwangathotmail , sort of....the general idea you are getting at is correct, although there are a lot of little 'errors' in the way you're phrasing things. it's the p-values that are 3.45e-12 and

  • @hafsashahzad7480
    @hafsashahzad7480 2 роки тому

    Hi. Is there any way to perform multiple linear regression on raster time series images?

  • @TheCooPeer
    @TheCooPeer 5 років тому

    Very helpful tutorials! Thanks for helping me understand R better.
    Do you know how to cluster standard errors by a certain variable easily? I struggle understanding most of the answers I can find..

    • @marinstatlectures
      @marinstatlectures  5 років тому

      Hi, I'm n to exactly clear on what you mean by "cluster SEs by a certain variable"...if you can clarify that for me, i may be able to offer some suggestions

  • @akshayrao5894
    @akshayrao5894 10 років тому

    if t value of age is negative and also pvalue is also small when what should we with variable age in our linear model?

    • @marinstatlectures
      @marinstatlectures  10 років тому

      If the p-value is small (the t-stat is large, in absolute value), then this tells you that the variable age is a significant predictor of the outcome (Y), and would probably be kept in your model. of course, it depends on the goal of your model (if it is a predictive model your building, or a causal model, etc), but in general, the small p-value tells you that age is a significant predictor of lung capacity.

  • @panagiotisgoulas8539
    @panagiotisgoulas8539 5 років тому

    Mike, is there any command where I can use that gives me a summary of my independent variable in relation with each possible combination of independent ones, so I can examine over fitting and multi col ? Also does it give mallows cp and vifs? I don't even know how to express this so to google it, but I saw a video of someone in minitab doing that so I am pretty sure R should have that option

  • @tinaw2740
    @tinaw2740 7 років тому

    Could you explain a little bit more about the collinearity? Like in this example, since height and age are highly related, why not just use one to explain the Y?thx

    • @marinstatlectures
      @marinstatlectures  7 років тому +1

      hi +cherry d , it is too difficult to explain a complicated concepts like that in a discussion thread. but a short answer is that IF age and height are VERY highly correlated, we would/should use only one in the regression equation (if they contain almost the same information). if they are just fairly highly correlated, but not collinear (not nearly inseparable) then we can use both of them. id suggest to read more about collinearity (and things like VIF=variance inflation factor, Tolerance,...)

    • @rockstarvarun1995
      @rockstarvarun1995 7 років тому

      Love this..cleared my doubt as well!!

  • @HarpreetKaur-bx1ej
    @HarpreetKaur-bx1ej 2 роки тому

    Getting r squared 0.0000845 is good or bad. Please help

  • @melissajoyabrahams9976
    @melissajoyabrahams9976 4 роки тому

    how do i find the y-intercept, when I need a certain value of age for example?

  • @yatingli2552
    @yatingli2552 8 років тому

    Hello, I have a question. What is the difference between using regression diagnosis model and shapiro.test to figure out the normality of distribution?
    Thanks.

    • @marinstatlectures
      @marinstatlectures  8 років тому

      Hi +Yating Li , they're just two different ways of trying to do the same thing. the diagnostic plot is visual, and more open to interpretation. the Shapiro test is a formal hypothesis test (with Ho: its normally distributed). worth noting is that failure to reject the null is not the same as proving the null...

  • @ruthnante2322
    @ruthnante2322 4 роки тому

    very helpful tutorials

  • @callumfoster10
    @callumfoster10 2 роки тому

    who do you write this up in a results section?

  • @playthedoghouse
    @playthedoghouse 8 років тому

    Fantastic videos! If it weren't for this channel I would not be doing so well in my Biometry class. What about when you have continuous variables on both the X and Y? Do you have any videos where you check the assumptions of MLR with continuous data? I have a homework data set that I'm working through and when I run the autoplot command, I see on my Residuals vs Fitted plot that there is a slight wedge shape that increases along the x axis. My Q-Q plot looks good with only three outliers and my Cook's D plot looks good and shows the three outliers with values under 0.6. I'm just not sure if I need to transform my data or not.

    • @marinstatlectures
      @marinstatlectures  8 років тому

      good to hear you're finding our videos helpful +playthedoghouse ! sure, we have a video on checking the assumptions of linear regression. you can find it here: ua-cam.com/video/eTZ4VUZHzxw/v-deo.html

  • @markgoh4800
    @markgoh4800 10 років тому

    Great video. Just wondering, how do you find the residual sum of squares and estimate of the error variance (if possible) based on this summary output?Thank you!

    • @marinstatlectures
      @marinstatlectures  10 років тому

      Hi Mark Goh , sure you can find those in the R output. Below the model coefficients you can see the "Residual Standard Error". The error-variance (which goes by many different names, like MSE, etc) would be the square of the Residual Standard Error.
      You can also use the output to work out what the Residual SumOfSquares is. You can take note that the Residual Standard Error = sqrt(ResidualSumOfSquares/DFresidualSumOfSquares). You can find the DF for the residual sum of squares right next to the Residual Standard Error in the R output.
      You can also just ask for the ANOVA table for the model using *anova(model)*, substituting in the name of the object you saved the model in. This will return the ANOVA table with all of the sum of squares, mean squares, etc.

  • @kathrynmatencio9149
    @kathrynmatencio9149 5 років тому

    What if you use Gender instead of height? How do you fit separate linear regression model for each gender?

    • @marinstatlectures
      @marinstatlectures  5 років тому

      if you include Gender, you will get a model coefficient for gender, which essentially creates 2 separate lines, one for males and one for females.

  • @mindatadesse7122
    @mindatadesse7122 5 років тому

    THUNK U FOR UR VALUABLE INF.

  • @anonymouse559
    @anonymouse559 7 років тому

    Very helpful! Thank you!

  • @sda2115
    @sda2115 3 роки тому

    How do I perform a MLR with a dataset that contains continuous AND categorical variables? Or could anyone point me to a video explaining this

    • @marinstatlectures
      @marinstatlectures  3 роки тому

      This video shows exactly that. Age and height are numeric, smoke, gender, and caesarean are all categorical variables

  • @Dopserados
    @Dopserados 8 років тому

    Hi Mike,
    i wonder what smoker means in this case?
    Bcs we got the age given from 0 years to 25 years. And i dont think children below 16 years should smoke/ arent allowed to.
    So does Smoker=yes means that the women was smoking during her pregnancy?
    Or where is that data from?

    • @marinstatlectures
      @marinstatlectures  8 років тому

      Hi +Dopserados , the Smoke variable is an indicator if they identify them selves as smoking 0=no, 1=yes. the ages are 3-19. the data is simulated data from a real dataset from the 1970s, collected in. the youngest smoker in the data is 10. while children under a certain age are not allowed to legally buy cigarettes, some still do get cigarettes and do smoke. there are, of course, all sorts of issues with self-reported smoking status, and a simple yes/no answer to this question. these are some of the things i discuss with my class about this dataset. hope that clarifies some things for you.

    • @Dopserados
      @Dopserados 8 років тому

      Thanks +MarinStatsLectures for the explanation, i will use your data for a little seminar paper and needed some information about the data. Everything is fine now. :)

  • @nvlptl
    @nvlptl 8 років тому +1

    Great video!

  • @manizhehra.8452
    @manizhehra.8452 9 років тому

    Thank you for the video. So, the "summary" argument in R gives an evaluation for a linear regression, but what about a non-linear regression model? How can we evaluate our non-linear regressions?

    • @marinstatlectures
      @marinstatlectures  9 років тому

      Hi Manizheh Rajabpour , i'd say that it gives more of a summary than an evaluation, as the term evaluation can mean many things, and this gives more of just the model coefficients, and a few other basic summaries. there are other ways to evaluate the fit of the model, etc.
      for a non-linear model, it depends exactly on which command you are using, but in general, the "summary" command will return a general summary for most objects in R, and will determine what type of a summary is appropriate for the object. for example, typing summary(LungCapData) will return a summary of the dataset.

    • @manizhehra.8452
      @manizhehra.8452 9 років тому

      Hi MarinStatsLectures, thank you for the response. I meen summary(model) gives us some statistics of the linear model like "Residual standard error", "multiple R-squared", "F-statistic". p-value could explain the significance of our model. right? Now what about a non linear model?
      In fact I have some multiple regression models that are linear but since they do not have an intercept, we have to write them as below in R:
      model1 = nls(Y ~ a*X1+b*X2, data = trainDat, start = list(a = 1, b=1))
      As you see it is a linear model but we write it as a non linear model, and the summary(model1) do not gives us the statistics.
      What is solution for this? Or how can I evaluate the fit of both multiple linear and nonlinear models in R, if there is other ways than a summary of model?
      thank you in advance

  • @ricardojunior3362
    @ricardojunior3362 7 років тому

    Do you have a tutorial for quantile regression and nonparametric quantile regression? If not, can you make plis?

  • @STEPHCRLR
    @STEPHCRLR 9 років тому

    You said collinearity would be discussed in later videos, could you tell me which one exactly?

    • @marinstatlectures
      @marinstatlectures  9 років тому

      Hi STEPHCRLR , we're still working on the linear regression series, and haven't made that video yet.

  • @chavianddavid
    @chavianddavid 7 років тому

    What if I have a column with a unique ID for each person (let's say the column is called UID). How would I get R to run a linear regression for each UID and save the coefficients for each person. So if I have 200 people, I would have 200 rows of data represent the coefficients for each person?

    • @marinstatlectures
      @marinstatlectures  7 років тому

      Hi +Dave , speaking in general, if you wanted to apply some function to the rows of a dataframe, you can use the *apply* function, e.g. *apply(x, MARGIN=1, FUNCTION)*....this would apply a specified function to the rows (margin = 1) of the dataframe x.
      it is worth mentioning that in order to fir a regression to each individual, you must have multiople datapoint observed for each of the variables.

  • @somilmehta2141
    @somilmehta2141 3 роки тому

    cool explanation man :D

  • @mahendrabodas565
    @mahendrabodas565 5 років тому

    Please let me know if I have in data 2 dependable variables and 16 in dependable variable , how to generate model? shall I delete one dependable variable and build the model? then vise a versa ?

    • @marinstatlectures
      @marinstatlectures  5 років тому

      it really depends on what you are trying to model, and why, etc. one option is to model "y1" and then separately model "y2". whether or not this is a good solution depends largely on the context of whay you are trying to accomplish, and why. but sure, it is not unreasonable (as a general statement) to model there two different outcomes/dependent-variables separately

  • @sch4582
    @sch4582 9 років тому

    Hi Martin, I would like to know how to read multiple CSV files into multiple data frames by using a loop function. I'm having various CSV files i.e., user1.csv to user100.csv and I want to read all of them into different data frames so that I can access them individually. In every user data I've to analyze some columns like Skin temperatures, Heat flux like that. It would be appreciated if you explained in the code. Thanks!!

    • @marinstatlectures
      @marinstatlectures  9 років тому +1

      Hi suraj kumar . you can use the *paste* command to paste a 1,2,...100 at the end of the word user, when reading in the data. here's an example, you can modify to suit your purpose:
      for (i in 1:10){
      print( paste("user",i,".csv", sep="") )
      }
      you can see that this creates a "user1.csv", "user2.csv" and so on...
      you can remove the print from my code...that's just there to have it print on the screen so that you can see that it's doing what you want. you can incorporate this into the loop that has a read.table command in it to read in the data...
      ps. my name is Mike Marin, not Martin...

  • @emilylarson795
    @emilylarson795 4 роки тому

    Sorry for asking a dumb question, but is the Residual Standard error output the SSR?

    • @marinstatlectures
      @marinstatlectures  4 роки тому +1

      it's not a dumb question,... and no, they're related, but not the same thing.
      the Residual SE from output is = square-root( SSR / n-k ) , where n-k is the number of observations minus the number of parameters in the model including the intercept....in case of simple linear regression it would be n-2.

    • @emilylarson795
      @emilylarson795 4 роки тому

      MarinStatsLectures- R Programming & Statistics thank you! Your videos are awesome!

  • @partuzittta
    @partuzittta 10 років тому

    hello Marin
    could u share something about Comparison of two Population Proportions, i need to find a test to determine the difference between two proportions, in 2 differents samples. thanks...

    • @marinstatlectures
      @marinstatlectures  10 років тому

      Hi Lizbeth Hernandez , you can use the *"prop.test"* command in R, to compute a difference in two proportions test. This test tests the Null hypothesis Ho: p1 = p1. You can get this done in R using *prop.test(table(X,Y))*, where X and Y are both categorical variables with 2 levels, and you would like to calculate the proportions of Y, for each of the categories of X.
      To learn about changing default values for the test, you can look at my video on the 2-sample t-test here: Two-Sample t Test in R: Independent Groups (R Tutorial 4.2) this will show you how you can change to a 1-sided test, change the confidence level, etc

  • @adityak204
    @adityak204 8 років тому

    Sir I wanna know before we perform multivariate regression using lm function do we have to scale the data for various variables in order to bring them in same range or lm has inbuilt scaling function ?

    • @marinstatlectures
      @marinstatlectures  8 років тому

      Hi +Aditya Kumar Singh , you can use the *scale* function in R to scale or standardize a variable, and use this in a regression model IF this is what you want to do. it is worth mentioning that in a general case, there is no need or reason to need to scale a variable in order to fit a regression model. scaling a variable may allow for more meaningful interpretations, in certain circumstances, but for the most part there isn't really a reason or need to do this.

  • @iris.west305
    @iris.west305 8 років тому

    Hello, I have a question. What do the "Residuals" for min, 1Q, median, 3Q and max mean? Are those values pertaining to the predicted values of Lung Capacity?

    • @marinstatlectures
      @marinstatlectures  8 років тому +1

      Hi +Sony M , the residuals are the difference between the observed y-value and the predicted y-value from the model (the vertical distance between each of the observations and the regression line). R is returning a summary of them...what is the smallest residual, the Q1 for them, and so on...

    • @iris.west305
      @iris.west305 8 років тому +1

      +MarinStatsLectures Thank you, really helped!

  • @bramsetyadji2881
    @bramsetyadji2881 9 років тому

    MarinStatsLectures thank you for the tutorial but could you tell me why the summary didn't show all of the level variable? like smokeyes and smokeno?

    • @marinstatlectures
      @marinstatlectures  9 років тому

      Hi Bram Setyadji , that's because the "no" is the reference, and is captured by the intercept term. the intercept is the estimated mean Y value for all X=0. in the case of the categorical variable for smoking, it is coded as no=0, yes=1, and so the non-smokers are part of the reference/intercept. in the example at the end of the video, the intercept refers to age=0, height=0, smoke=0=no, gender=0=female, and caesarean=0=no. i have a separate video that explains more about categorical variables in a linear regression, and how dummy/indicator variables are used for these. you can check that out if you need a more in depth explanation of how categorical variables work in a linear regression model: ua-cam.com/video/2s8AwoKZ-UE/v-deo.html

    • @bramsetyadji2881
      @bramsetyadji2881 9 років тому

      Thank you for the prompt reply. I've watched almost all of your videos (especially what you've mention above) and now I understand why in multi linear regression or GLM summary the reference variable is not shown.

    • @marinstatlectures
      @marinstatlectures  9 років тому

      great to hear Bram Setyadji

  • @manizhehpourrahmati2286
    @manizhehpourrahmati2286 9 років тому

    Hi, I have a question that may not be related to this video, but I would be appreciated if you answer me if you know. Is it logical to do ANOVA between the result of a model (regardless the kind of model including linear, nonlinear, parametric, nonparametric, ...) and true measurements to see the significance of the model?

    • @marinstatlectures
      @marinstatlectures  9 років тому

      Hi Manizheh Pourrahmati , i'm a bit unclear on what you mean by comparing results of a model and the true measurements. linear models do return an ANOVA (and F-test) for overall significance of a model, which is sort of what lie you say...but I'm a bit unclear on the wording you are using. in general, you would use ANOVA for a linear regression model, but not for all models...for generalized linear models, instead of ANOVA, you would use a likelihood ratio in place of the ANOVA.
      regardless of the confusion in your question, the short answer is that no, you wouldn't use ANOVA for any sort of model...it wouldn't be appropriate for a non-parametric model, generally not appropriate for a non-linear model (depending of course on what type of non-linear model, as that term captures a ton of different sorts of models)

    • @manizhehpourrahmati2286
      @manizhehpourrahmati2286 9 років тому

      MarinStatsLectures, Thank you for your response. To be clear, I am building models based on a dataset and validating them using 5-fold cross validation in R. the meaning of result of model is predicted values based on test data which as a consequence of using k-fold cross validation all data are used as both training and test data and finally we would have predicted data. my models are multiple linear and non-linear regressions and also neural network and random forest. As you said f-test for linear regression gives us the significance of the model. what about other kinds of model? I thought maybe by comparing predicted values against real values (in situ measurements) and based on anova, we can explain the significance of our model. If it is not logical, which method is used for expressing the significance of models except for linear regressions (random forest model, neural network and non linear Reg.)?

    • @marinstatlectures
      @marinstatlectures  9 років тому

      Hi Manizheh Pourrahmati , thanks for clarifying. i wouldn't consider myself an expert in neural networks or random forests, so i don't think i could offer any better advice about those then you will be able to find in other places. but i can say that in general, it sounds like you will want something other than the significance of your model (a significance test of you overall model would just tell you if your model is better than nothing). it sounds like you want to compare competing models, to decide which is better? for that, you may want to look into things like AIC/BIC to compare models that are non-nested (one does not have to be a subset of the other). the only thing for these though is that it is important that your Y variable is on the same scale for the competing models, in order to use something like AIC to compare models (e.g.) if one model uses Y and the other uses ln(Y), then not good to compare models using AIC or BIC

    • @manizhehra.8452
      @manizhehra.8452 9 років тому

      HiMarinStatsLectures Thanks a lot for answering my questions and trying to solve them even they are not pertinent. In fact I developed different kinds of models and used R -squared adjusted and RMSE to find the accuracy of my models and also AIC to compare the models. I wrote a paper based on my result. I received following comment from a reviewer:
      "Regression analysis generates an equation to describe the statistical relationship between one or more predictor variables and the response variable. To determine whether a result is statistically significantly different, authors have to perform further analysis and then compare the model (optimum model). Statistically, present analysis does not take into consideration such teats."
      I do not know what exactly he wants, but I thought maybe he means the significance of models. Do not you think so?

    • @marinstatlectures
      @marinstatlectures  9 років тому

      Hi Manizheh Rajabpour , it a bit tough to tell because i don't know exactly what you had written in your paper, that the reviewer's comment is referencing. but it sounds like they might be suggesting that if you are using the adjusted R^2 to compare different models, that you might want to do a significance test, to test if the adjusted R^2 are significantly different, and not just that one is greater than the other. but like i said, it's tough to tell without knowing exactly what you had written in your paper that they are referring to.

  • @DataBites78605
    @DataBites78605 7 років тому

    amazing!!

  • @tankaixun8010
    @tankaixun8010 4 роки тому

    what is the meaning of e in the p-value

    • @thegdt37
      @thegdt37 4 роки тому

      it means 2.2 *x 10^* -16

  • @souenleung2925
    @souenleung2925 5 років тому

    How could I do multiple linear regression if I have thousands of variables
    Also, could you teach me how to do the quartile regression as well
    thx

    • @Fletacarling
      @Fletacarling 5 років тому

      hi i know this is old, but its about your categories of data not the number of data points, ie height vs weight, or for multiple regression, height vs weight vs strength.

  • @sunidhijain4476
    @sunidhijain4476 2 роки тому

    where can i get this dataset

  • @fatmaahmad4343
    @fatmaahmad4343 8 років тому

    what about multiple linear regression simulation steps?

    • @marinstatlectures
      @marinstatlectures  8 років тому

      Hi +Fatma Ahmad , I'm not sure exactly what you are trying to ask. can you clarify your question, and i can try to help?

    • @fatmaahmad4343
      @fatmaahmad4343 8 років тому

      +MarinStatsLectures ,
      hi ,thanks for your reply,
      I need to generate data follow the terms of the least squares method and multiple linear regression model-simulation- then i will study multiple linear regression analysis and multiple fuzzy linear regression analysis.

    • @marinstatlectures
      @marinstatlectures  8 років тому +1

      Hi Fatma Ahmad , i'm still not completely clear on what you mean. i will provide an answer here, and hopefully this is in line with what you are looking for. below, is some code (that you can modify) that will randomly generate an "x1" an "x2", and then generate a "y" using these, and adding some error to it.
      # randomly generate 100 observations from a normal distribution with mean 0, sd=1
      *x1

    • @fatmaahmad4343
      @fatmaahmad4343 8 років тому

      +MarinStatsLectures
      thanks alot for your help , that's exactly what i want :) .

    • @fatmaahmad4343
      @fatmaahmad4343 8 років тому

      +MarinStatsLectures
      excuse me , I have another question? can I transfer the outputs of x1, x2 and y into another statistical program sush spss , minitab or matlab?

  • @akshayrao5894
    @akshayrao5894 10 років тому

    can't we use "pairs()" command

    • @marinstatlectures
      @marinstatlectures  10 років тому

      Yes, you can use the "pairs" command to produce all possible pair-wise scatterplots of the variables in your data set.

  • @pidarosayoblanka8184
    @pidarosayoblanka8184 8 років тому

    how many bananas is the lung capacity measured in ?
    Really now is it US or metric system?

    • @marinstatlectures
      @marinstatlectures  8 років тому

      i can't recall the units of measurement...but it is a measure of volume displacement per unit time