Regression diagnostics and analysis workflow

  • Опубліковано 18 гру 2024


    @THEPSYCHOTIC 8 місяців тому

    I have been trying to find workflow videos on regression analysis for a while now, this is the first (and only one) that I found. It helped me immensely, thank you.

    • @mronkko
      @mronkko  8 місяців тому +1

      You are welcome. It is surprising that very few people teach how to actually use the analyses in empirical research practice.

      @THEPSYCHOTIC 8 місяців тому

      @@mronkko that's true. Most videos cover only interpretation of results or are focused on let's say one part of the analysis but no one covers the whole process in a single video, with a single dataset.
      Just an idea - You could maybe consider doing a workflow series focusing on how to do analysis with different combinations of explanatory/response variables? Let's say one categorical explanatory variable, 1 exp and 1 quantitative, 2 categorical exp var, and so on. And the same logic with explanatory - quantitative vs qualitative. I'm not sure if you've done it already, but it'd be so so helpful!
      Thanks again, keep up the good work. I wish you good luck!

  • @BrinderSadler
    @BrinderSadler 6 місяців тому

    A very informative video that is clear and uses examples so that viewers can better follow. Thank you.

    • @mronkko
      @mronkko  6 місяців тому

      You are welcome!

  • @newtonocharimenyenya2458
    @newtonocharimenyenya2458 3 роки тому +2

    A Great Piece. Simple to understand.

    • @mronkko
      @mronkko  3 роки тому

      Glad you think so!

  • @magnusjensen5867
    @magnusjensen5867 3 роки тому +3

    Best explabation I’ve come across on UA-cam! Keep up the good work

    • @mronkko
      @mronkko  3 роки тому

      Glad it helped!

  • @ltang
    @ltang 3 місяці тому

    Around 7:49 are farmers less prestigious than the model predicted or more? What does sitting below the y=x line mean?

    • @mronkko
      @mronkko  3 місяці тому +1

      They are more prestigious. Check the residual on the y-axis. Anything above zero is respected more than what the model predicts.

    • @ltang
      @ltang 3 місяці тому

      @@mronkkoSo it is below the y=x line just means that theoretically on that percentile we would expect the residual to be even higher? What does the theoretical percentile mean?
      Is it just based on rank

  • @newtonocharimenyenya2458
    @newtonocharimenyenya2458 3 роки тому

    A very Great piece.

  • @bezaeshetu5454
    @bezaeshetu5454 3 роки тому

    Thank you for the nice and clear explanation.

    • @mronkko
      @mronkko  3 роки тому

      You are welcome!

  • @whx2044
    @whx2044 3 роки тому

    Thank you for teaching !

    • @mronkko
      @mronkko  3 роки тому

      You are welcome.

  • @zwan1886
    @zwan1886 3 роки тому

    In your AV plots around 15:00 isn't it showing that the women regressor doesn't add anything to the model?

    • @mronkko
      @mronkko  3 роки тому +1

      Yes. that is what the model shows. Also he regression coefficient in the table at 2:58 shows that the effect of women is nonsignificant.

  • @Youtube304s
    @Youtube304s 7 місяців тому

    Subscribed. Very good

    • @mronkko
      @mronkko  6 місяців тому +1

      You are welcome.

  • @statistikochspss-hjalpen8335

    Great video.
    My question is what to do when ln transformation doesn't help?
    Imagine a regression with only Likert scale variables (1-5). Customer satisfaction as the dependent variable and product quality, customer service as independent variables. Most customers score 4 or 5 on the all variables. Almost all of the MLR assumptions are not met. How to approach the problem?
    I read about PLS being an alternative instead of OLS, but my coefficients are almost identical with both OLS and PLS (don't know if it's because of a fairly big dataset, n=8000).

    • @mronkko
      @mronkko  Рік тому

      If your scales are poorly calibrated so that you get just 4s and 5s in a 1-5 scale, then I do not think that there is anything that you can do except to collect better data.
      How to approach the "allmost all assumptions are not met": I would start by looking at a specific assumption first and what you can do about it. For example, if the relationships are not linear, then I would start thinking about using nonlinear functional forms.

    • @statistikochspss-hjalpen8335
      @statistikochspss-hjalpen8335 Рік тому

      @@mronkko Thank you for taking the time to respond. The data is real and based on real customers. The satisfaction metric (dependent variable) is already well established in the industry. If I'm interpreting my normal probability (y axis shows percent and x axis shows residual) plot it looks like 7% of the observations are off the line. The residuals go from minus 10 to positive 5.
      The residual vs fits, the residuals slope downwards as the fitted value increases.

    • @mronkko
      @mronkko  Рік тому

      @@statistikochspss-hjalpen8335 If the residual slopes downward, then you might have nonlinearity and you need to consider other functional forms.
      The fact that a measure is well-established does not necessarily mean that the data are good. For example if you want to assess the effect on persons height on persons weight, but only measure people between 180 and 181 cm, then normal measurement tape would not suffice because it is not precise enough. The same can happen in your data, if you have little variation in satisfaction you might need a measure that is calibrated differently. I think I talk about measurement calibration in one of the measurement presentations, but I am not 100% sure about that.

  • @harijha6279
    @harijha6279 2 роки тому

    best explanation

    • @mronkko
      @mronkko  2 роки тому

      Good that you liked it!

  • @rutwikkadane2409
    @rutwikkadane2409 3 роки тому

    Thanks for the explanation!

    • @mronkko
      @mronkko  3 роки тому

      Glad it was helpful!

  • @kar2194
    @kar2194 3 роки тому

    Hi Thanks for the content! 3:09, you said you have a video of the regression coefficient, I can't find it, I would like to check it out :)

    • @mronkko
      @mronkko  3 роки тому +1

      Good question. The videos are from a course that I run and I have organized them as UA-cam playlists. This video is from the third study unit and the video that I refer to is from the second unit:

    • @kar2194
      @kar2194 3 роки тому

      @@mronkko Thanks!

  • @faemillongo6839
    @faemillongo6839 2 роки тому

    Thanks. So clear

    • @mronkko
      @mronkko  2 роки тому

      Happy that you find it helpful. The lack of reporting that regression diagnostics were done is a big problem in published research. And this would be so easy to fix. Pay attention to your model assumptions and justify them.

  • @auddssey
    @auddssey Рік тому

    i want to see the r code for residual vs leverage plot, how the occupation outliers appear :-)

    • @mronkko
      @mronkko  Рік тому

      The slides are linked in the video description and contain some R code in the slide notes