Deriving the mean and variance of the least squares slope estimator in simple linear regression

Поділитися
Вставка
  • Опубліковано 31 бер 2019
  • I derive the mean and variance of the sampling distribution of the slope estimator (beta_1 hat) in simple linear regression (in the fixed X case). I discuss the typical model assumptions, and discuss where we use them as I carry out the derivations. The derivations are carried out using summation notation (no matrices).
    At the end, I briefly discuss the normality assumption, and how that leads to beta_1 hat being normally distributed. While I do discuss the real deal there, I go over it fairly quickly, as the main point of the video is deriving E(beta_1 hat) and Var(beta_1 hat).
    Note that any time I use "errors" or "error terms" in this video, I am referring to the theoretical error terms (the epsilons) and not observed residuals from sample data.
    Time stamps:
    0:00 Brief discussion the simple linear regression model, assumptions, and some tools we will use.
    2:58 Deriving E(beta_1 hat)
    5:06 Deriving Var(beta_1 hat)
    8:49 Discussion of normality of beta_1 hat.

КОМЕНТАРІ • 48

  • @jadaliha
    @jadaliha 4 роки тому +5

    Best channel ever. I love the content, the way you teach them, and how well you breakdowns assumptions and connect topics. Excellent works. Thanks

  • @captainw6307
    @captainw6307 4 роки тому

    Thank you so much for the super helpful video!!! I am so glad to find your videos whenever I get to struggle with statistics courses.

  • @seanmahoney8345
    @seanmahoney8345 4 роки тому

    Your explanations and presentation are perfect. Thanks brother!

  • @jolenej.baxter8764
    @jolenej.baxter8764 4 роки тому +2

    Thanks so much, this was is always easy to follow and comprehend especially when trading with the *ASH STRATEGY*

  • @aminasadi1040
    @aminasadi1040 3 роки тому +1

    Thank you so much, I was really excited by your perfect explanation.

  • @zhaofengzheng2923
    @zhaofengzheng2923 4 роки тому +2

    Best explainatory video I have seen. Thank you for bringing dry book knowledge to life!

  • @user-xt9js1jt6m
    @user-xt9js1jt6m 4 роки тому

    Very very clean and clear explanation
    God bless you sir!!🙏🙏🙏

  • @elizabethokello8764
    @elizabethokello8764 2 роки тому

    very clear explanations bringing out the assumptions well

  • @HaineGratuite
    @HaineGratuite 5 років тому +4

    I would love to see the same video for multiple regression (unbiaseness and variance of estimators)! Your explanations are very clear

    • @jbstatistics
      @jbstatistics  5 років тому +3

      Thanks for the kind words and suggestion. That video is on the list :)

  • @jgrtrx
    @jgrtrx 5 років тому +3

    I love your derivation videos. It's cool to see where the theory comes from :D

    • @jbstatistics
      @jbstatistics  5 років тому +1

      Thanks! And thanks for the feedback. I like those too, and will be doing more of them.

    • @gagangayari5981
      @gagangayari5981 4 роки тому

      @@jbstatistics Bo(intercept)and B1(slope) are constant values for a certain sample.What does taking mean of those B0 and B1 mean?

  • @valeriereid2337
    @valeriereid2337 Рік тому

    Excellent lecture. Thanks so much for making rigorous concepts easy to understand.

  • @tnorton314
    @tnorton314 3 роки тому

    Another great one!

  • @fataishuaib4728
    @fataishuaib4728 Рік тому

    Nice breakdown. Thanks

  • @agustinlawtaro
    @agustinlawtaro 3 роки тому

    Great videos.

  • @sicongzhao1497
    @sicongzhao1497 3 роки тому

    Great work!
    One suggestion: at 10:11, in the denominator of c_i, maybe it is better to replace X_i with X_j to avoid confusion.

  • @andrei642
    @andrei642 2 роки тому +4

    I don't understand how sum((X_i - X_bar)^2) = sum((X_i - X_bar)*X_i)

  • @felipebascastro5515
    @felipebascastro5515 4 роки тому +1

    Muchas gracias

  • @user-eh5zz7xg3e
    @user-eh5zz7xg3e 3 місяці тому

    Thank you so much

  • @NazirulHzm
    @NazirulHzm 5 років тому +1

    This is excellent. What program do you use to produce this?

    • @jbstatistics
      @jbstatistics  5 років тому +3

      Thanks for the compliment. The background is a Beamer/Latex presentation. I record, edit, and annotate in Screenflow. In any of my videos with freehand annotation, I'm writing on a Bamboo pad using Skim.

  • @michaelsaia4381
    @michaelsaia4381 4 роки тому

    Has anybody found a video like this for the regression model where the y-intercept is assumed to be 0?

  • @user-xt9js1jt6m
    @user-xt9js1jt6m 4 роки тому

    I want same explain ation for matrix notation
    I am also interested in logistics regression and estimating se of its coeff.
    Can you help ?

  • @lesleymatipedza2726
    @lesleymatipedza2726 Рік тому

    This is fantastic, just saved me because hayi

  • @user-zc8er5bu6b
    @user-zc8er5bu6b 6 місяців тому

    Best teach

  • @opheliaschwuchow6977
    @opheliaschwuchow6977 2 роки тому +1

    Why can we assume that the xs are fixed, in real life they vary?

  • @yassinewaterlaw6597
    @yassinewaterlaw6597 2 роки тому

    3:20 (xi-xbar) it should be variable why you consider it constant ?

  • @gagangayari5981
    @gagangayari5981 4 роки тому

    Bo(intercept)and B1(slope) are constant values for a certain sample.What does taking mean of those B0 and B1 mean?

    • @akshayprabhakant819
      @akshayprabhakant819 3 роки тому

      since the sample can change, beta-nought-hat and beta-1-hat which are derived from the sample data, will also change. hence expectation values of these beta-nought-hat and beta-1-hat is what is being calculated.

  • @hiramunir2078
    @hiramunir2078 4 роки тому

    plzzzzzzzz sir make vedios on derivation of multiple regression

  • @noregretinthislove
    @noregretinthislove 8 місяців тому

    와형님 감사합니다

  • @cassidy1762
    @cassidy1762 3 роки тому

    can you derive b0

  • @andrewhughes5304
    @andrewhughes5304 3 роки тому +1

    Yo I don't understand how you can equate sum[(X_i - X_bar)(Y_i - Y_bar)] = sum [(X_i - X_bar)Y_i] is this not only true is Y_bar is exactly equal to zero. The denominator has the same problem. except with Xbar.

    • @jbstatistics
      @jbstatistics  3 роки тому +5

      sum[(X_i - X_bar)(Y_i - Y_bar)] = sum[(X_i - X_bar)Y_i] -sum[(X_i - X_bar)Y_bar]. Now, take a look at the 2nd term and recognize that Y bar is a constant with respect to the summation, so it cam come outside of the summation: sum[(X_i - X_bar)Y_bar] = Y_bar sum[(X_i - X_bar), and since sum[(X_i - X_bar) = 0, we have the result.

    • @andrewhughes5304
      @andrewhughes5304 3 роки тому

      ​@@jbstatisticsOh word eyy? And where does sum[(X_i - X_bar) = 0 come from?

    • @jbstatistics
      @jbstatistics  3 роки тому +4

      @@andrewhughes5304 sum[(X_i - X_bar) = sum[X_i ] - sum[X_bar] = sum[X_i]- n X_bar = sum[X_i] - n (sum{X_i]/n) = sum{X_i] - sum[X_i] =0. (X_bar is a constant with respect to the summation.)

    • @mad.finance
      @mad.finance 2 роки тому

      @@jbstatistics I was looking for that. Thank you very much!

    • @Celdorsc2
      @Celdorsc2 Рік тому

      Thanks,. Coming from google search here. I also couldn't understand that. Thanks for clarification!

  • @KARAB1NAS
    @KARAB1NAS 4 роки тому

    3:04. What you say is wrong: the denominator is not constant - it is a random variable. \bar{X} is random (cause the sample is a random sample) and X_i are all random.

    • @jbstatistics
      @jbstatistics  4 роки тому +2

      1:44 "We are also going to assume that the X values are fixed, and not random. X is not a random variable in these derivations. In reality, in regression settings..."

    • @akshayprabhakant819
      @akshayprabhakant819 3 роки тому

      (THIS IS WRONG, PLEASE FOLLOW THE NEXT COMMENT BY @jbstatistics) @@jbstatistics the denominator basically means variance of X, since the definition of variance of any random variable is that formula. what we could instead say is that the denominator is being taken out as a constant under the assumption that variance(X) is constant, or to put it in better terms, X is being sampled from a normal distribution. And I think if X defies this, then there is no way that all this is valid. that's why its always advised to normalize feature vectors, so that all of this remains true.
      Thanks for the video , cleared up a lot of mess I had regarding this derivation. :D .

    • @jbstatistics
      @jbstatistics  3 роки тому +1

      ​@@akshayprabhakant819 First, that's not the definition of the variance of a random variable. Second, I'm viewing X as fixed here, so that quantity is simply a constant; sum(X_i-X bar)^2 is a number. Third, the variance of a random variable X being constant is a completely different notion from X having a normal distribution. In this video I view X as fixed, as in, for example, situations in which a researcher has control over the levels of X (e.g. amount of fertilizer applied to plots of land). What I state in the video is valid and correct. If you want to derive the mean and variance of the least squares estimator in scenarios where both X and Y are random, then you are welcome to do so. In this video I look at the situation where X is fixed, which I state clearly up front.

    • @akshayprabhakant819
      @akshayprabhakant819 3 роки тому

      @@jbstatistics oh sorry, my bad, i forgot that a P(X=x) term is also included in the formula for variance of a random variable, thus making it sum-over-i, i=1 to n, (X_i - X_bar)^2.P(X=x), where P(X=x) is the probability distribution of the R.V. X. yup, your logic fits, thanks for replying !!

  • @statisticsappliedmathemati810
    @statisticsappliedmathemati810 3 роки тому

    I am a data analysis instructor i want to connect with data analysis intructors