Standard deviation of residuals or Root-mean-square error (RMSD)

Поділитися
Вставка
  • Опубліковано 10 січ 2025

КОМЕНТАРІ • 37

  • @abakella
    @abakella 2 роки тому +1

    Once again Sal Khan saves my life on an important assignment. A tale as old as time.

  • @abimaeldominguez4126
    @abimaeldominguez4126 3 роки тому +1

    What does people use RMSE for?: "To figure out how much the model disagrees with the actual data". Thank you for that!

  • @galactic-nucleus
    @galactic-nucleus 3 роки тому +11

    Two problems with this video. 1) The degrees of freedom for simple regression is n-2. One for the estimate of the y intercept and one for the estimate of the slope. So the denominator for this should be 2 not three. That results in a value of .866 and not .707. 2) Residual standard error, or standard deviation of residuals as you are calling it does use degrees of freedom as the denominator. However for RMSE, most statisticians use n in the denominator. So technically, RMSE and residual standard error are not the same thing. I'll cut you some slack though since many software packages and even SPSS conflate these two different things. SPSS reports residual standard error as RMSE.

    • @nancyj795
      @nancyj795 2 роки тому

      Yes: it's called the Standard Error of the Estimate in most spreadsheets - "STEYX" and it does come out to your number of .866.

  • @rholin0997
    @rholin0997 4 роки тому

    @4:10 I wish I could free hand lines this straight on my computer, Sal has some serious skill

    • @XinhLe
      @XinhLe 4 роки тому

      he has special device for this work. I guess

  • @jingyiwang5113
    @jingyiwang5113 Рік тому

    And I have learned a lot from your channel!

  • @mickmertens3554
    @mickmertens3554 5 років тому +4

    Perfectly described, thank´s a lot!

  • @baptistezhong4321
    @baptistezhong4321 3 роки тому +2

    Is there any way to normalize RMSD? Such as R2 in linear regression. If you tell me a RMSD value, I still don't know how well it is fitted since I don't know the data set.

  • @Vihntage
    @Vihntage 4 роки тому +1

    What would be the equations for the (two) regression lines that are one standard deviation away from the original regression line? So the two gold/yellow lines parallel with the regression line

  • @cnlowry
    @cnlowry 6 років тому +9

    why is it n-1=3 in the denom and not n=4 in the denom?

    • @freelusion7330
      @freelusion7330 6 років тому

      For standard deviation it's is standard to use n-1 to dispatch the noise a little bit.

    • @syedib
      @syedib 6 років тому +2

      i have used same dataset to find root mean squared error in scikit learn python library i got value like 0.6123724356957945, it is considering n as 4 and not 3, it is bit confusing

    • @mausunk
      @mausunk 6 років тому +5

      They are calculating the mean of a subset (or sample set) of a whole population.
      When you work with a population you would use σ (sigma) for the standard deviation and μ (mu) for the mean. For a population you would use (big n) N=4, this means there are 4 degrees of freedom (df) because the mean of the population (μ) is the true mean and all points used to calculate the mean use a degree of freedom.
      When working with a subset (sample) of the population you use x̄ (x-bar) as the mean and s as the standard deviation. Because we are working with a subset there are (small n for subset) n-1 (here 3) degrees of freedom because we are not working with the true mean of the population.
      The mean x̄ takes a degree of freedom, and the n-1 data points take the other degrees of freedom. Think of it as knowing the x̄ (mean) beforehand and then knowing the values of 3 of your 4 data points. The last data point always has to be a specific value to calculate the right mean.
      Using this n-1 degrees of freedom in your mean of subset calculation is done to provide a better APPROXIMATION of the true population mean. Hope this clarifies and to get about the same explanation but in video format check out watch?v=9ONRMymR2Eg.

    • @MrVivekc
      @MrVivekc 5 років тому

      @@syedib I tried the same on copy and took n=4 and value is same as yours, 0.612..

  • @124deaper1
    @124deaper1 7 років тому +1

    Can you please post a video which talks about modulus? That chapter has been always getting me.

  • @jingyiwang5113
    @jingyiwang5113 Рік тому

    Thank you so much for this amazing video!

  • @surajnakhate8986
    @surajnakhate8986 3 роки тому

    Realy informative thank you

  • @techtak5948
    @techtak5948 5 років тому

    Perfectly described. Thank you

  • @punyipeter8174
    @punyipeter8174 6 років тому

    Thanks for the good explanation

  • @tuur319
    @tuur319 4 роки тому

    all these other formulas i see use n-2? did you make a mistake or does it depend on the situation?

    • @seanvespucci9788
      @seanvespucci9788 3 роки тому

      It depends on the assumptions he's using. I've been taught that for each parameter we estimate, one is subtracted from n. So With the Standard deviation of residuals generally the slope and intercept are estimated so you subtract 2 from n.

  • @nichaelladenola3808
    @nichaelladenola3808 5 років тому +8

    this is the longest 6 minutes of my entire life

  • @alexwyler4570
    @alexwyler4570 5 років тому +1

    why are we squaring the residuals? why are we dividing by number of data points -1?

    • @Penguinian
      @Penguinian 5 років тому

      Alex Wyler um you square the residuals because if you don’t, when you add them up they just add up to zero I think. I don’t know about the -1 thing

  • @VyTran-sm1qp
    @VyTran-sm1qp 4 роки тому

    Could u please tell me how can we find the y^?

    • @rasoolkilani3170
      @rasoolkilani3170 4 роки тому

      if you have some data, you can input it into ms excel and plot it, then show the trend line and you can have a formula that calculates y^ (just google trend line+show formula in excel) choose linear or whatever relation that produces highest R2

  • @HITD47
    @HITD47 5 років тому

    thank you sir

  • @spinLOL533
    @spinLOL533 4 роки тому

    thank you!

  • @shinjirobinlopez9252
    @shinjirobinlopez9252 6 років тому

    If i get 0.52807827702592223431488993480676 as result, does that mean its 52.8% or .528%. and at what value is the result considered bad.

    • @Emotekofficial
      @Emotekofficial 5 років тому +3

      well after the fraction, you need to multiply it by 100 if you want in percentage... so it is 52.8% of course...

  • @gabrielkpaka806
    @gabrielkpaka806 9 місяців тому

    This helpful

  • @sidharthgautam8989
    @sidharthgautam8989 2 роки тому

    Divided by 4, not 3

  • @sweetberries4611
    @sweetberries4611 5 років тому

    amazing

  • @atharvashirsath1604
    @atharvashirsath1604 3 роки тому

    My rmse is 3

  • @jasmeetkaur5775
    @jasmeetkaur5775 7 років тому +9

    How many loves can this comment get? guess none :(

    • @wajidrafique9284
      @wajidrafique9284 7 років тому +3

      how do you expect likes on your comment at a very complex topic being discussed :D