Logistic Regression in R, Clearly Explained!!!!

StatQuest with Josh Starmer

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 15 січ 2025

КОМЕНТАРІ • 667

@statquest 3 роки тому ⁺²⁸
Here's the link to the code: github.com/StatQuest/logistic_regression_demo/blob/master/logistic_regression_demo.R
Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
@falaksingla6242 3 роки тому
Hi Josh,
Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so.
Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.
@MuctaruKabba 4 роки тому ⁺⁴²
Your videos never disappoint, Sir. I have gone through many of them and think you've earned the right to brand the phrase: "clearly explained" because your explanations are indeed very clear. I am building a better explanation of statistics thanks to you. I appreciate you and hope you continue to pass on the knowledge.
@statquest 4 роки тому ⁺⁴
Wow, thanks!
@zhansayabauyrzhanova2492 3 місяці тому
I dont understand why you used both categorical for logistic regression??
7:00
@holeman1 3 роки тому ⁺²⁸
This 89-year-old guy says BAM!! So clearly explained, indeed. DOUBLE-BAM!!!!
@statquest 3 роки тому ⁺³
BAM!!! And thank you for your support!!!!
@wei2674 4 роки тому ⁺²⁶
Thank you so much Josh for all these videos! I got Aplus for most of my stat courses quite a few years ago when I was doing my MSc of BIostat, but it took me quite some time to come up with a better understanding of a few concepts. You just summarized and presented these ideas and more in a few minutes! You are a genius and on top of that, you are so Kind to share all these work to everyone for free! With my limited vocabulary, all I can say is THANK YOU! It makes me feel the world is a beautiful place with beautiful mind and soul. I love your song “hello”, it reminds me of the day I met my daughter and brought happy tears to my eyes :)
@statquest 4 роки тому ⁺¹
Thank you so much!!! I'm really glad you like my videos and my music. :)
@emilyblythe7708 6 років тому ⁺⁹¹
where have you been my whole thesis! thank you!!
@statquest 6 років тому ⁺⁹
Hooray! I'm glad to help! :)
@amandacampos3037 4 роки тому ⁺¹
I feel the same!! hah
@SurrenderPink 4 роки тому ⁺⁵
Josh, it’s Saturday morning here and I’m enjoying a cup of Bam! learning R from the best teacher on the planet. I’m so grateful and appreciative of your efforts to share your considerable talents with us!
@statquest 4 роки тому
Thank you very much! :)
@chasti5754 3 роки тому ⁺¹³
I just wish one day all this information actually stays and sticks to my mind... thank you thought! Your videos are amazing!
@statquest 3 роки тому ⁺¹
Thanks for watching!
@solalstenou6474 6 років тому ⁺¹
What is great with your video is that even if I forgot my headphone I am able to follow the video in the computer room full of other students! Thank you so so so much !!!! From University of Bordeaux
@statquest 6 років тому
Solal Sténou Merci!! :)
@meniz4659 4 роки тому ⁺¹⁶
You will surely be in my Thesis acknowledgments. Thank you for making our lives relatively easier but truly more ineligible. BAAAAAM!!
@statquest 4 роки тому
Thanks so much! :)
@alexandergeorgiev2631 4 роки тому ⁺²
You are an absolute life saver. My data science paper is due in two days and now I have my pretty log graph and I understand this better. DOUBLE BAM!!!!!
@statquest 4 роки тому
Hooray!
@nkristianschmidt 4 роки тому
so, how did it go today?
@nathanielchristian7027 5 років тому ⁺³
Your simple English explanation of the meaning of "Intercept" in the output from 8:30 to 8:38 of this video was something I could not find after searching for 2 hours. Thank you!
@statquest 5 років тому ⁺²
Awesome!!! Now that you have that concept down, a lot of other stuff in statistics should make more sense. (At least I hope!) :)
@marielledelcarmencaballero5017 2 роки тому ⁺¹
Your videos are great! It's also so nice of you that you take the time reply to so many of the comments here !
@statquest 2 роки тому
Thank you!
@japhethernandezvaquero204 4 роки тому ⁺³
Nice channel to land on! Happiest discovery of my 2020! Great job!
@statquest 4 роки тому
Thank you! :)
@i8thelastmoa360 5 років тому ⁺⁵
Your videos cover everything in my course and I wish I found you sooner! So much detail and clear explaining in such little time
@dodgecarlincila879 3 роки тому ⁺³
I was just here for the logistic regression but bam!! I would be watching all of your videos. As a ds learner using r, double bam!!!, your videos will surely help big time! Bambambam! 👌😅
Thank you. 🙂
@statquest 3 роки тому
Awesome! Thank you!
@565-FENRIR 3 роки тому ⁺²
I really enjoyed the clearly way to explain us this topic. So many thanks for the teaching!!!
@statquest 3 роки тому
Thank you very much!!!
@zahraab1027 4 роки тому ⁺⁵
"one last shameless self promotion" got me 😂😂😂.....that's why I love your videos, u make learning stats fun
@statquest 4 роки тому ⁺¹
Hooray! Thank you! :)
@wei2674 4 роки тому ⁺⁸
Both my husband and I learned so much from ur video. ( inspired by the top comment), whenever you come to Toronto let us know for a few free accommodation in our Asian restaurant/bubble tea surrounded neighborhoods (north York center)!
Thx again!
Xin
@statquest 4 роки тому ⁺²
Hooray!!! That would be awesome. I will dream of the day I can visit you in Toronto. :)
@daviddevega4433 4 роки тому ⁺²
Thanks you very much for all stuff. You have saved me to fail my exams. Amazing quality channel Unbelievable the low number of likes. Very appreciated channel, at least for me. Thanks again.
@statquest 4 роки тому
Wow, thanks!
@alhaque7556 2 роки тому ⁺¹
Thank you so much! I've a stat project to do in R with logistic Regression and this simplified the coding portion so much!
@statquest 2 роки тому
Hooray!
@chrischukwu2956 4 роки тому ⁺³
You are an amazing teacher. God bless you!
@statquest 4 роки тому
Thank you! 😃
@burrohq 3 роки тому ⁺¹
You sir deserve a promotion 👏 thanks for this incredibly helpful video
@statquest 3 роки тому
Thank you! :)
@farhadwaseel9981 5 років тому ⁺⁶
I recommend all the videos by stat quest with Josh Starmer. Thank you for your good explanations.
@statquest 5 років тому
Thank you very much! :)
@yashilagovender5134 3 роки тому ⁺¹
Thank you so much for this video! I've been suffering with the coding for my project but this really helped. You're a star!
@statquest 3 роки тому ⁺¹
Thanks!
@maheshkumar-vv5fp 4 роки тому ⁺²
good looking white background...
graphs are beautiful...
whatever you say, you write it on screen....
your sound and sound system, very good..
the way you explain things, CLEARLY EXPLAINS everything..
and loved that music part and BAM!!!
and here, i have something to say about your work..
and that is VERY BIG BAM !!!... good luck.. keep growing..
@statquest 4 роки тому
Thank you very much! :)
@BruceWayne-oc7dn 3 роки тому ⁺¹
Its's 1:11 AM and what I am doing is DOUBLE BAM. Thank you for this awesome video. U are hero.
@statquest 3 роки тому
Thanks! :)
@internalmedicine9982 4 місяці тому ⁺¹
Thanks for an excellent video. As usual.
@statquest 4 місяці тому
Thanks again!
@wa5561 2 роки тому ⁺¹
Thank you for saving my study. Not gonna lie, this video made me cry. I was about to drop out because of statistics, but this saved my project.
@statquest 2 роки тому
Hooray!
@Mel22Brasil 4 роки тому ⁺²
It must be so much fun working with you! Thank you for this tutorial. =)
@statquest 4 роки тому
Thank you! :)
@LoizidesGeorge 5 років тому ⁺²²
So helpful, thanks!
Whenever you come to Cyprus let me know for few free accomodations in our mountainous region, Marathasa!
Thx again!
Γ
@statquest 5 років тому ⁺⁷
Wow! That sounds awesome!!!
@LoizidesGeorge 5 років тому ⁺³
@@statquest
oh yes!
I owe you a lot - you saved me so many hours!
Γ
@nl7247 Рік тому ⁺¹
Thanks for also showing how to wrangle data and explore missing data in a simple helpful way ❤
@statquest Рік тому ⁺¹
My pleasure 😊
@zhansayabauyrzhanova2492 3 місяці тому
I dont understand why you used both categorical for logistic regression??
7:00
@nl7247 3 місяці тому
The outcome is dichotomous.
@Fsp01 3 роки тому ⁺²
Doing a masters program on analytics and this video made more sense than all the lectures combined on logistic regression. thank you
@statquest 3 роки тому ⁺¹
Thanks!
@riteshpatel1984 5 років тому ⁺³
Hi Josh, thanks for your videos they are very easy to understand. Really appreciate your efforts. I believe I speak for many,
Because of you many people are able to understand with utmost clearity and you cover all the small details with super ease. Keep up the Nobel work. Cheers 👍
Would it be possible for you to put up a video on model evaluation i.e. determining cutoff and model performance.
Thanks
@statquest 5 років тому ⁺¹
Thank you! :)
@Eldad_2.0 4 роки тому ⁺⁴
Great job bro.
Gratitude for your help. You also have where to stay if you come to Uganda (Africa).
@statquest 4 роки тому
Thank you very much!!! :)
@ricardot4722 5 років тому ⁺²
I am impressed, you are talented, thanks for your sharing your knowledge.
@statquest 5 років тому
Thank you! :)
@critiquessanscomplaisance8353 5 років тому ⁺³
I won't forget you in the acknowledgments sir haha!!! Great job!
@statquest 5 років тому
Thank you very much! :)
@temjim 4 роки тому ⁺⁵
Hi, Josh. I cannot thank you enough for these videos... Would also be good to have a similar video in Python..
@statquest 4 роки тому ⁺¹
Great suggestion!
@aishwaryadas3681 2 роки тому
@@statquest where's the video sir in python sir?
@AOLFlyersNewsletters 4 роки тому ⁺¹
Thanks Josh - you are our saviour!
@statquest 4 роки тому
BAM! :)
@AOLFlyersNewsletters 4 роки тому ⁺¹
@@statquest Triple Booyah BAM from my side!
@yutassmilehealsme6572 4 роки тому ⁺²
THANK YOU! somehow I couldn't find any websites explaining this
@statquest 4 роки тому
Glad you found it.
@goodsuggestionbutno6783 3 роки тому
Hoooray! We made it to the end of an exciting journey through logistic regression! Hope you have a nice day, and thank you for understanding the output for logistic regression in R, which really cant be understood thoroughly without watching all the logistic + odds videos!
@statquest 3 роки тому
Yep, that is correct. That's why I made all those other videos first - the output is jam packed with stuff.
@mariyapak428 3 роки тому ⁺¹
Josh, joining all the folks here in thanking you! I have a question: around minute 9:05 you talk about odds of having being unhealthy for a female. How do we know that these are the odds of being unhealthy vs being healthy? I feel I am floating when it comes to intercept, reference categories, and baseline categories. Thanks a lot!
@statquest 3 роки тому ⁺¹
R orders factors ("healthy" vs "unhealthy") in alphabetical order. So that means "healthy" is first, and the default, and "unhealthy" is the difference from that. Likewise, "sexF" and "sexM" are ordered alphabetically, so "sexF" is the default value and "sexM" is the difference from that.
@raghavendral882 5 років тому ⁺²
BAM_ spot on thanks for such video.. my journey with logis tic regression and r has started.
@statquest 5 років тому
Awesome!!! :)
@SritamaDutta_Asansol 4 роки тому
Mine too
@kingfisher65 Рік тому ⁺²
amazing. thank you man!
@statquest Рік тому
Thanks!
@familians Рік тому
You may like this video too:
Another great video about logistic regression in JMP
ua-cam.com/video/9yN_yjGAJZE/v-deo.htmlsi=jUwEZUDobBudE8AE
@tansutazegul8297 2 роки тому ⁺¹
incredibly brilliant tutorial!
@statquest 2 роки тому
Thanks! :)
@bellahuang8522 3 роки тому ⁺¹
me binge watching Josh's videos before midterm... anyone else? lmao
@statquest 3 роки тому
Good luck! :)
@RajeshSahu-ey8kw 4 роки тому ⁺¹
U are geneus...and ur teaching style too...hurray!!!! and Bamm!!!!
@statquest 4 роки тому ⁺¹
Wow, thank you!
@at4652 6 років тому ⁺⁵
Great tutorials, I started with your PCA video and since then hooked onto other videos . Could I request you to do a video on various types of probability distributions when to use them.
@statquest 6 років тому ⁺²
Those are all in the works. I wish I could work 2 or 4 times faster than I can. I've wanted to cover the major probability distributions for over a year, but got sucked down a machine learning path and now feel spread pretty thin. However, these will happen eventually! :)
@TimothyChenAllen 6 років тому ⁺¹
StatQuest with Josh Starmer could you make a video on how to work 2 to 4 times faster? :-)
@statquest 6 років тому ⁺¹
As soon as I figure that out, I'll make a video on it! ;)
@weilianglim1764 6 років тому
BAM!!!
@ThinkwithLex Рік тому ⁺²
A small request, you have done a lot already, a big thank you for that. Is it possible to make a video on Logistic regression in Python ?
@statquest Рік тому ⁺¹
I'll keep that in mind.
@ThinkwithLex Рік тому ⁺¹
@@statquest thank you so much
@mutuamutunga 4 роки тому ⁺²
This has been extremely helpful. Thank you!
@statquest 4 роки тому
Thank you! :)
@joseluismanzanares3662 5 років тому
Clear as water. Super BAM!!! Gracias por compartir
@KayYesYouTuber 4 роки тому ⁺¹
Your videos are awesome. Thank you very much.
@statquest 4 роки тому
Thank you! :)
@ca177 4 роки тому ⁺²
YOU RAWK !! Awesome explains on ML concepts..
@statquest 4 роки тому
Thank you! :)
@hajer3335 6 років тому ⁺²
Thank you so much for this effort really appreciate
We need a stat quest on three topics:
1-Chi-square test,
2- The Hosmer-Lemeshow goodness of fit test for logistic regression.
And 3- Iteratively reweighted least squares (IRLS) by using Newton's method.
If you don't mind :) of course.
Can you tell us about the title of next video?!
@statquest 6 років тому ⁺¹
The Chi-Square test is on the list. I've looked into the Hosmer-Lemeshow fit... Can you tell me what you think about the limitations? Specifically those mentioned in the wikipiedia article about it? en.wikipedia.org/wiki/Hosmer%E2%80%93Lemeshow_test#Limitations_and_alternatives
And iteratively reweighted least squares is also on the list. However, up next are some basic statistics videos and then videos on lasso, ridge, and elastic-net regression.
@hajer3335 6 років тому ⁺¹
the Hosmer-Lemeshow statistic was used to avoid problem in Pearson chi-squared statistic which was when observations being grouped by the values of the x variables, the Pearson chi-squared goodness of fit test cannot be readily applied if there are only one or a few observations for each possible value of an x variable, or for each possible combination of values of x variables.
(A sample with a sufficiently large size is assumed. If a chi-squared test is conducted on a sample with a smaller size, then the chi-squared test will yield an inaccurate inference).
So in the Hosmer-Lemeshow statistic, the observations are grouped by expected probability. But there is very little guidance on selecting the number of subgroups. The number of subgroups,g, is usually calculated using the formula g> P + 1. For example, if you had 12 covariates in your model, then g > 12. How much bigger than 12 g should be is essentially left up to you. Small values for g give the test less opportunity to find mis-specifications. Larger values mean that the number of items in each subgroup may be too small to find differences between observed and expected values. Sometimes changing g by very small amounts (e.g. by 1 or 2) can result in wild changes in p-values. As such, the selection for g is often confusing and arbitrary. Also, it doesn’t take overfitting into account and tends to have low power. For these reasons, the Hosmer-Lemeshow test is no longer recommended.
Am I on right? Is it enough cues to no longer used of HL test?
I have another question, ( Overfitting is happening when your sample size is too small. If you put enough predictor variables in your regression model, you will nearly always get a model that looks significant.
While an overfitted model may fit the idiosyncrasies of your data extremely well, it won’t fit additional test samples or the overall population. The model’s p-values, R-Squared and regression coefficients can all be misleading. Basically, you’re asking too much from a small set of data.)
If I have a small sample, is there any problem to use Maximum likelihood to fit model and McFadden's pseudo-R squared? Is there any rule to chose the number of sample for any regression?
Sorry for the many of questions, it is my first year in biostatistics. :)
@statquest 6 років тому ⁺¹
These are all great questions. You are correct about the HL test and you are correct about overfitting. There are, however, lots of tricks you can use to compensate for overfitting (lasso regression, ridge regression, elastic net regression etc.)
One way to test to see if you have a model that is "overfit" is to use cross validation.
As for a minimum number of samples for logistic regression - people often say "10 samples per level of each discrete variable". It's a general rule of thumb and it doesn't always apply. However, again you can use cross validation to verify if you have enough samples or not. Cross validation is a very practical tool!
@hajer3335 6 років тому ⁺¹
Thank you, Mr Josh, for answering me, I need to study more about Cross-validation.
@hajer3335 6 років тому ⁺¹
Sorry l have more than one account 🙈🙊
@danielromero-alvarez5392 4 роки тому ⁺¹
you are just the best! Thanks for doing this!
@statquest 4 роки тому ⁺¹
Thank you! :)
@wilfredoa.tovarhidalgo9385 2 роки тому ⁺¹
Excelent!!!! Thank you very much.
@statquest 2 роки тому
bam!
@sofiaalfonso9883 3 роки тому ⁺¹
Sir, you are a savior
@statquest 3 роки тому
Thanks! :)
@danee593 5 років тому ⁺²
Josh you are amazing, thank you!
@mohamedhijazi8460 4 роки тому ⁺²
You're the man! thanks for everything!
@statquest 4 роки тому
Thank you very much! :)
@davila1906 4 роки тому ⁺¹
so incredibly helpful and well done. Thank you so much!!
@statquest 4 роки тому
Thank you! :)
@paulshannon9708 5 років тому
You really are wonderful for explaining this in a way morons like me can understand, this is so incredibly helpful. Thank you so much!
@N0o0x0e0r 6 років тому ⁺¹
This channel has helped me a lot understanding statistics! Could you please make a video explaining the linear mixed model too?
@statquest 6 років тому
Yes! However, it might be a while before I get to it.
@danieltrodler4340 4 роки тому ⁺¹
Great content and incredible value. Thank you so much
@statquest 4 роки тому
Thanks! :)
@geetikapanda7152 4 роки тому ⁺¹
The more I watch your videos the more the wish I had a teacher like you in my school days..
Do we have a video on chi square test?
@statquest 4 роки тому
Not yet. :( But one day we will.
@andreatulli356 4 роки тому ⁺¹
Great video!!! Thank you so much!
@statquest 4 роки тому
Thanks!
@woopwoopsoupsoup678 2 роки тому ⁺¹
This man is a legend
@statquest 2 роки тому
:)
@vidyaammu1687 3 роки тому
Thanks for the video. Your video made it look like so simple. I request you to upload a video of how to get risk ratios in multiple logistic regression model.
@statquest 3 роки тому
I'll keep that in mind.
@kevinanderson170 3 роки тому ⁺¹
This is great stuff as I am just learning R; so pardon a very basic question: Why does "sex" need to be a factor vs number here?
@statquest 3 роки тому ⁺¹
Since the values or 0 and 1, it probably doesn't matter. However, to be safe, it's probably a good idea to make all categorical values, regardless of their values, factors.
@mueezwaq Рік тому ⁺¹
Hi Josh
Firstly many thanks for your videos on this topic. I have noticed very odd and conflicting results between R and SPSS with regards to entering factors (with more than 1 level) into a logistic regression model. SPSS produces a simplified output containing an odds ratio with 95% CI and p-value, for each individual variable entered into a logistic regression model (rather than the factor levels, as displayed in R).
In R - I have not found a good way to do this. I have used the logistic.display command as well as exp() to get odds ratios, but they do not provide an overall value like in SPSS (instead, listing these for each individual level within the factor).
Do you have any idea why SPSS and R handle logistic regression differently like this? All I would like is a similar output to SPSS - where I get a single odds ratio, 95% CI and p-value for each individual factor variable entered.
@statquest Рік тому ⁺¹
Unfortunately I've never used SPSS so I'm not really familiar with the problem you are having. That said, perhaps this will help: stats.stackexchange.com/questions/543540/different-output-for-logistic-regression-between-r-and-spss-how-to-get-correct
@familians Рік тому
Hi!! You may like this video too:
Another great video about logistic regression in JMP
ua-cam.com/video/9yN_yjGAJZE/v-deo.htmlsi=jUwEZUDobBudE8AE
@fahmiidris4499 4 роки тому ⁺²
super dangg! Good explanation, bro!
@statquest 4 роки тому
Thank you! :)
@da2015 5 років тому ⁺⁶
These videos are so amazing!
Do you have a suggestion for a book that explains Logistic Regression to newbies? The videos are super awesome, but extra references may help too. Hopefully you will write your own book soon!
Thanks!
@shnibbydwhale 4 роки тому ⁺⁵
I know this is probably 10 months too late, but the book “Introduction to Categorical Data Analysis” by Alan Agresti is a great book. Does a really good job explaining logistic regression and is pretty light on the math.
@christelleleitzingerphd7491 4 роки тому ⁺¹
Awesome! Thank you so much! Please could you do a video about conditional logistic regression like clogit in R with result interpretation and how it works when using adjusted parameters.
@statquest 4 роки тому ⁺¹
I'll keep that in mind.
@marcelomurilloquesada8400 5 років тому ⁺¹
Hi, I really like your videos, every topic is as clear as water after watching it. I've watched this one and also the three videos about logistic regression's details. If you want to go further in this topic, you could do a video explaining emmeans package for R. Many people, including me, would understand post hoc tests for glm using emmeans, if someone like you explained it. Thank you!
@statquest 5 років тому
Thanks! :)
@afiapriscilla8276 Рік тому ⁺¹
Is it needed to turn all the variables into a factor before the regression analysis?
@statquest Рік тому ⁺¹
All of the categorical variables need to be converted to factors.
@afiapriscilla8276 Рік тому
@@statquest Thank you very much. What do you classify as categorical?
@statquest Рік тому
@@afiapriscilla8276 Variables that represent discrete categories. Like "favorite color=Blue" or "Red"
@tuanlong9238 5 років тому ⁺¹
And...BAM, thanks for sharing, your video is really useful :D
@statquest 5 років тому
Thanks! :)
@Actanonverba01 5 років тому ⁺¹
At 11:30, the video states, "Since we are not estimating the variance from the data (and instead deriving it from the mean) it is possible that the variance is UNDERESTIMATED." Q. How can we say that we are UNDER-estimating the value of the variance? BTW, awesome vids, music man! ;)
@statquest 5 років тому ⁺¹
That's a great question. Here's a (hopefully) useful discussion on the topic: newonlinecourses.science.psu.edu/stat504/node/162/
@Actanonverba01 5 років тому
@@statquest
It's 1am but let me see if I got this straight...
Due to the nature of discrete functions (like logistic functions) they do not always vary smoothly. With discrete functions it is possible to see variances (and their corresponding probabilities) differ from (in our case) the proposed logistic model. In other words, it is conceivable to have a leptokutic or platykurtic distribution.
It is possible to see probabilities which differ from the expected probabilities due to the fact that the "real" model may be different and/or the samples may not be i.i.d. As it happens, the Bernoulli distribution tends toward the platykurtic.
...It's just the wrong dang model sometimes...
@statquest 5 років тому ⁺¹
It's actually a little simpler than that. With binomial data (like logistic regression) we estimate the mean value = number of positive responses / total number of responses. Once we have the mean value estimated, we use that, and that alone, to calculate the variance. In other words, once we have calculated the mean, we do not need the data anymore to calculate the variance. This is in contrast to linear regression (or a lot of other things) where we estimate the mean with the data and then use the data again to calculate how it varies around the estimated mean. Thus, there is a possibility that with Logistic Regression (and other "generalized linear models") we did not correctly estimate the variance since the data were not involved in that calculation. If we over estimate the variance, that just makes the calculations more conservative and, generally speaking, that's not a problem. However, if we underestimate the variance, then that means we're more likely to say things are significantly different even if they are not, and that's no good. So the dispersion parameter takes care of that.
@Actanonverba01 5 років тому ⁺¹
@@statquest Cheers,
@katere89 5 років тому ⁺⁶
Hi Josh, thanks for this amazing tutorial. Would you be able to add something interactions between predictors and random effects? I am trying to run a mixed-model logistic regression and have three-way interactions but not entirely sure on how to deal with them. Thanks so much :)
@amandacampos3037 4 роки тому
same!
@kedwards127 5 років тому ⁺²
This is so helpful thank you!!
@statquest 5 років тому
Hooray! :)
@ericaleverson9430 4 роки тому ⁺¹
You are so good!! Thank you!
@statquest 4 роки тому
Thanks! :)
@williamstan1780 2 роки тому
I like the way you presented the information in such a manner that is easily understood
I have 2 questions
1. While doing the xtab, what do we need to do if we found that say it is either or both healthy and unhealthy under cp3 is 0 or very minimal (video clip at 6:04)
2. At 15:55 of the video clip , you mentioned about using cross validation to get a better idea of how well it might perform with new data. Do you have a separated video which is specifically for that topic ?
Many thanks
Williams
@statquest 2 роки тому ⁺¹
1) Unfortunately I don't understand what you're asking in this question. However, I think you are asking what do we do when one level from a categorical variable does not have strong preference for healthy or unhealthy or doesn't have much data to begin with. It really depends. You can just try it and see what happens, but you might also try removing the variable and see if that improves predictions.
2) I have a video on cross validation here: ua-cam.com/video/fSytzGwwBVw/v-deo.html
@williamstan1780 2 роки тому
@@statquest
Thanks for your prompt reply
let me clarify my 1st question at the video clip at 6:24, you mentioned that there are 4 patients represent level 1 under restecg category.
My first question is, why only 4 can cause problem? is it because it is too mininal compares with others?(Level 0, and Level 2). How do I know exactly that it is causing the problem when I do the analysis? and if it does cause the problem. how to go about fixing it? just remove the Level 1 can solve the problem?
Thanks for your help
@statquest 2 роки тому
@@williamstan1780 When you don't have much data supporting a specific category, then chance are it will have a lot of variance - in other words, further samples may be very different from the ones in the original dataset. You can test this with cross validation (use some of the data to fit the model, use the rest to see how well it performs). If things are no good, you can remove the variable, or try to lump categories together.
@williamstan1780 2 роки тому ⁺¹
@@statquest thanks Josh ..: appreciated
@kevinshao9148 9 місяців тому ⁺¹
gazillion bam THANKS to you!
@statquest 9 місяців тому ⁺¹
Thanks!
@JRO_Lyrics 2 роки тому ⁺¹
great
work done here
@statquest 2 роки тому
Thank you!
@mdhasibreza5161 3 роки тому
All of your videos are great and fun to learn from! Could you please upload a tutorial on mediation analysis using STATA and R (using the mediation package)?
@statquest 3 роки тому
I'll keep that in mind.
@TiNa-uo3ks 3 роки тому ⁺¹
Thank You. SOOOOOOOOOooOOOoo Helpful
@statquest 3 роки тому
bam!
@mathieufen2239 4 роки тому ⁺¹
SO clear!! Thanks!!
@statquest 4 роки тому ⁺¹
Awesome!
@MB-nc9rq 3 роки тому ⁺¹
Great video, thanks so much Josh! After the 4th minute you mention how to address the NA samples. Can you teach us the RANDOM FOREST method, if we don't want to get rid of our NA samples (e.g. in multivariate cases, where the rows include other useful info)? Thanks!
@statquest 3 роки тому
I cover the random forest method in this video: ua-cam.com/video/6EXPYzbfLCE/v-deo.html (the theory is here: ua-cam.com/video/sQ870aTKqiM/v-deo.html )
@arpitsrivastava7559 6 років тому ⁺³
This Video is very helpful. Do you also have a video about Multinomial Logistic Regression in R. Could be very helpful if you can post it.
@statquest 6 років тому
I'm glad you like the video. I don't have one on multinomial logistic regression, so I'll put it on the to-do list.
@joshuabudi4787 4 роки тому
@@statquest hello! did you ever make a video for this one? would love to check it out if you did, thanks so much for what you do!
@statquest 4 роки тому
@@joshuabudi4787 Not yet. :(
@mihaelawassilko7414 6 років тому ⁺³
Hi Josh, Thank you for the very informative tuturial. Do you have any videos for the multilevel modelling?
@statquest 6 років тому ⁺³
No yet.
@skandagurunathanr4795 5 років тому
Great salute! If you can, please post a video on all machine learning models with a large dataset example implementation in r with clear intuition and mathematics statistics behind it. Thanks.
@BulLiT2401 3 роки тому ⁺¹
Love your videos. Could you do one on mixed logistic regression?
@statquest 3 роки тому
I'll keep that in mind.
@saulesparza7911 6 років тому ⁺¹
This video is amazing! Thanks!!!
@statquest 6 років тому
Thank you! :)
@sheilaserrano1039 6 років тому ⁺³
Thaaaanks! very useful and clear!
@statquest 6 років тому
Hooray! I'm glad you like it! :)
@jives. 3 роки тому ⁺¹
lets goooo StatQuest
@statquest 3 роки тому
bam!
@sitendurocks 4 роки тому
at the end where you make the graph , you could have used the broom package and augment function to create the data frame to compute the fitted and actual values.
@statquest 4 роки тому
Cool.
@thomasbaker26 4 роки тому
Excellent video, very clear and easy to follow! Do you have any videos that show how to do best subsets and cross validation with logistic regression on R? I know you have a video that explains the concept of cross validation but I am looking for a video like this that goes through it step-by-step for logistic regression on R. Same thing for how to run all possible models (best subsets) using logistic regression on R. I have found one by another youtuber for linear regression but not for logistic.
@statquest 4 роки тому
Not yet. :(
@thomasbaker26 4 роки тому ⁺¹
@@statquest Wow thank you for the quick reply! That's alright, if you do make any videos like that, I'll be among the first to watch them! :)
@prathameshmahankal4180 5 років тому ⁺¹
Great videos, Josh! You make things so easy!
I just had a question though - Is it mandatory to convert all variables (which can be converted into factors) into factors? For example, what would have happened if we have kept the sex variable as numeric? Does it make my logistic regression model incorrect?
@statquest 5 років тому ⁺¹
Unless you were expecting a continuous range of values between the two sexes, your model would be incorrect.
@prathameshmahankal4180 5 років тому
@@statquest Well yes, doing this for the sex variable makes sense. However, for my data, I have a religiousness column with discrete values 1-5 and a rating column again with a discrete rating of 1-5. So should I make these two variables factors as well? Or is it fair to keep them as numeric?
Also, thanks for such a prompt reply. Really appreciate it!
@iselacr5747 3 роки тому
Hi, I love the way you explain all this things! I have a couple of questions. I observe that it's necessary to establish a code type for the predictors, if these are dichotomous, for example, they are assigned 1 and 0 (in the example male / female), so:
- How should we proceed with polytomous predictors?
- What results of the model should be reported in a scientific article?
Thank you in advice and keep doing great content!
@statquest 3 роки тому
1) For all categorical data (with 2 or more classes), just make sure you are storing it in a factor.
2) That depends on the journal. I would look at other articles in that journal to figure it out.
@JinXing-j1l 2 місяці тому ⁺¹
The last graph deserves a quadruple BAM!!!!🤣🤣🤣🤣
@statquest 2 місяці тому
Yes!
@philippaknecht9247 6 років тому ⁺²
Hi Josh
I find your videos very informative and they help me a lot with my bachelors thesis. Because you put some variables into "factors" and others stay "numeric" I think I can ask my question, that I nowhere find an answer on the internet, or I don't know how! I do a logistic regression with NBA regular season games to find out if the fact that the teams are eliminated from the playoffs has an effect on their winning probability (to find out if they "tank" = intentionally loosing). For the variable of the current strength of the team I use the current winning percentage of the team (how many games won over how many games playd) and this variable is refreshed after every game. I was wondering if I can put this variable as a "numeric"? Or as what kind of type would you define this winning percentage? The opponents winning percentage, whether the game is on the home court or not, if the team is statistically eliminated or in the playoffs and if the opponent is statistically eliminated or in the playoffs is also in the regression. It is the same regression some reserachers did back in 2002 to test the same thing but no one did recently. I hope you understand my question and hope very much, that you can and are willing to help me. Thank you very much and have a great day!
@statquest 6 років тому
For logistic regression, it will be easier to understand what the estimated coefficients mean if you multiply the percentage of games won by 100. When you do this, you can use these values as "numeric" and the coefficient will tell you how much the probability of the outcome changes for every 1 percentage change in that variable. For more details on interpreting the coefficients, check out ua-cam.com/video/vN5cNN2-HWE/v-deo.html
@philippaknecht9247 6 років тому ⁺¹
Thank you very much for your help!!! I appreciate it a lot! I'm glad it's not a complicated solution... :D
@Gypsy_Danger_TMC 2 роки тому
OK.. I haven't even watched the video yet but it looks like exactly what I need
@statquest 2 роки тому ⁺¹
I hope so! :)
@Gypsy_Danger_TMC 2 роки тому
@@statquest I'm trying to uze a logistic regression model on a set of binary events. Each with a different probability of happening.. and I have no idea what I'm doing haha.. so I'm loading up on coffee and I'm going to start your videos soon
@statquest 2 роки тому
@@Gypsy_Danger_TMC Good luck! :)
@reneeliu6676 6 років тому ⁺²
Thank you!!! Do you have awesome videos on Tobit or Logit model too?
@statquest 6 років тому
The Logit model is the same as logistic regression - the only difference is how the output is presented.
@reneeliu6676 6 років тому
Thank you a lot!! Also, Good wishes to North Carolina.
@statquest 6 років тому ⁺¹
Thank you! :)

Наступне

Автоматичне відтворення

Quantile-Quantile Plots (QQ plots), Clearly Explained!!!