Quantitative Social Science Data Analysis

103
89 213

Chapter 9 Video 3 - Wilcoxon Rank Sum Tests in R

3:58

Chapter 5 Video 2 - Renaming Variables in R

7:34

Chapter 5 Video 6 - Re-Arranging Variable Values in R

8:49

Chapter 5 Video 10 - Dealing with Missing Values in R

7:18

Chapter 5 Video 3 - Changing Variable Classifications in R

7:36

Chapter 4 Video 1 - Reading in Data from a Working Directory in R

12:52

Chapter 4 Video 8 - Pivoting Datasets in R

In this video, we briefly examine pivoting datasets/variables using the pivot_longer() and pivot_wider() functions from the tidyr package.
This is the 8th video of Chapter 4 for the book Quantitative Social Science Data with R, 2nd Edition (Sage).
April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e

Відео

Chapter 9 Video 3 - Wilcoxon Rank Sum Tests in R

3:58

Chapter 9 Video 3 - Wilcoxon Rank Sum Tests in R

Переглядів 137Рік тому

In this video, we examine running Wilcoxon rank sum tests with independent samples in R. This is done with the wilcox.test() function. This is the 3rd video of Chapter 9 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e

Chapter 5 Video 2 - Renaming Variables in R

7:34

Chapter 5 Video 2 - Renaming Variables in R

Переглядів 84Рік тому

In this video, we go through renaming variables in R. This is done using the rename() and rename_with() functions from the dplyr package. This is the 2nd video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e

Chapter 5 Video 6 - Re-Arranging Variable Values in R

8:49

Chapter 5 Video 6 - Re-Arranging Variable Values in R

Переглядів 109Рік тому

In this video, we take a look at re-arranging variable values in R. This is done using the mutate(), factor() function with the levels option and the fct_rev() function from the dplyr and forcats packages. This is the 6th video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study...

Chapter 5 Video 10 - Dealing with Missing Values in R

7:18

Chapter 5 Video 10 - Dealing with Missing Values in R

Переглядів 77Рік тому

In this video, we look at how to deal with missing values in R. This is done with the mutate(), replace_na(), and na_if() functions from the dplyr and tidyr packages. This is the 10th video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e

Chapter 5 Video 3 - Changing Variable Classifications in R

7:36

Chapter 5 Video 3 - Changing Variable Classifications in R

Переглядів 127Рік тому

In this video, we briefly examining changing variable classifications in R. This is done using the as.factor() and as.numeric() functions. This is the 3rd video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e

Chapter 4 Video 1 - Reading in Data from a Working Directory in R

12:52

Chapter 4 Video 1 - Reading in Data from a Working Directory in R

Переглядів 210Рік тому

In this video, we examine reading in different data formats from a working directory in R. This includes reading in .csv, .xlsx, and .dta (Stata) data formats using the tidyverse, haven, and readxl packages. This is the 1st video of Chapter 4 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: stu...

Chapter 5 Video 1 - Determining Levels of Measurement in R

8:27

Chapter 5 Video 1 - Determining Levels of Measurement in R

Переглядів 516Рік тому

In this video, we go through determining variables' level of measurement in R. This is done using the glimpse() and count() functions. This is the 1st video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e

Chapter 8 Video 12 - Multiple Plots - Combining Plots in R (with ggplot2)

3:40

Chapter 8 Video 12 - Multiple Plots - Combining Plots in R (with ggplot2)

Переглядів 62Рік тому

In this video, we examine combining multiple plots in R using the patchwork package. This is the 12th video of Chapter 8 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e

Chapter 5 Video 8 - Collapsing Numeric Variables in R

4:14

Chapter 5 Video 8 - Collapsing Numeric Variables in R

Переглядів 105Рік тому

In this video, we quickly go through collapsing numeric variables in R. This is done using the mutate() and cut_interval() functions from the dplyr and ggplot2 packages. This is the 8th video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e

8:51

Chapter 2 Video 2 - RStudio Tour

Переглядів 102Рік тому

In this video, we briefly examine the features of RStudio. This is the 2nd video of Chapter 2 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e

Chapter 8 Video 9 - Scatterplots with Four Variables in R (with ggplot2)

9:12

Chapter 8 Video 9 - Scatterplots with Four Variables in R (with ggplot2)

Переглядів 169Рік тому

In this video, we create a scatterplot with four variables in R using the geom_point() function from the ggplot2 package. This is the 9th video of Chapter 8 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e

Chapter 5 Video 4 - Removing Characters in Variables in R

19:20

Chapter 5 Video 4 - Removing Characters in Variables in R

Переглядів 145Рік тому

In this video, we examine removing characters in variables' names and values in R. This is done using the mutate(), parse_number(), parse_date(), rename_with(), str_to_lower(), and str_replace_all() functions from the readr, dplyr, and stringr packages. This is the 4th video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebook...

Chapter 5 Video 12 - Creating New Variables from Existing Variables in R

10:41

Chapter 5 Video 12 - Creating New Variables from Existing Variables in R

Переглядів 525Рік тому

Chapter 5 Video 12 - Creating New Variables from Existing Variables in R

Chapter 8 Video 1 - Bar Plots with One Variable in R (with ggplot2)

8:37

Chapter 8 Video 1 - Bar Plots with One Variable in R (with ggplot2)

Переглядів 167Рік тому

Chapter 8 Video 1 - Bar Plots with One Variable in R (with ggplot2)

Chapter 5 Video 7 - Collapsing Categorical Variables in R

5:45

Chapter 5 Video 7 - Collapsing Categorical Variables in R

Переглядів 496Рік тому

Chapter 5 Video 7 - Collapsing Categorical Variables in R

Chapter 8 Video 4 - Histograms with Two Variables in R (with ggplot2)

5:01

Chapter 8 Video 4 - Histograms with Two Variables in R (with ggplot2)

Переглядів 739Рік тому

Chapter 8 Video 4 - Histograms with Two Variables in R (with ggplot2)

Chapter 8 Video 7 - Scatterplots with Two Variables in R (with ggplot2)

9:52

Chapter 8 Video 7 - Scatterplots with Two Variables in R (with ggplot2)

Переглядів 29Рік тому

Chapter 8 Video 7 - Scatterplots with Two Variables in R (with ggplot2)

Chapter 7 Video 1 - Frequency Distributions in R

8:32

Chapter 7 Video 1 - Frequency Distributions in R

Переглядів 99Рік тому

Chapter 7 Video 1 - Frequency Distributions in R

Chapter 4 Video 2 - Reading in Data using RStudio

3:21

Chapter 4 Video 2 - Reading in Data using RStudio

Переглядів 56Рік тому

Chapter 4 Video 2 - Reading in Data using RStudio

Chapter 4 Video 4 - Examining Variables in R

9:00

Chapter 4 Video 4 - Examining Variables in R

Переглядів 65Рік тому

Chapter 4 Video 4 - Examining Variables in R

Chapter 4 Video 5 - Managing Missing Values in R

6:15

Chapter 4 Video 5 - Managing Missing Values in R

Переглядів 61Рік тому

Chapter 4 Video 5 - Managing Missing Values in R

Chapter 2 Video 3 - Working Directories in R

5:27

Chapter 2 Video 3 - Working Directories in R

Переглядів 87Рік тому

Chapter 2 Video 3 - Working Directories in R

Chapter 8 Video 8 - Scatterplots with Three Variables in R (with ggplot2)

4:50

Chapter 8 Video 8 - Scatterplots with Three Variables in R (with ggplot2)

Переглядів 81Рік тому

Chapter 8 Video 8 - Scatterplots with Three Variables in R (with ggplot2)

Chapter 8 Video 10 - Colour Considerations in Scatterplots in R (with ggplot2)

6:12

Chapter 8 Video 10 - Colour Considerations in Scatterplots in R (with ggplot2)

Переглядів 32Рік тому

Chapter 8 Video 10 - Colour Considerations in Scatterplots in R (with ggplot2)

Chapter 4 Video 7 - Merging Different Datasets in R

12:52

Chapter 4 Video 7 - Merging Different Datasets in R

Переглядів 57Рік тому

Chapter 4 Video 7 - Merging Different Datasets in R

Chapter 8 Video 5 - Smoothed Density Plots in R (with ggplot2)

4:14

Chapter 8 Video 5 - Smoothed Density Plots in R (with ggplot2)

Переглядів 97Рік тому

Chapter 8 Video 5 - Smoothed Density Plots in R (with ggplot2)

Chapter 8 Video 2 - Bar Plots with Two Variables in R (with ggplot2)

13:20

Chapter 8 Video 2 - Bar Plots with Two Variables in R (with ggplot2)

Переглядів 158Рік тому

Chapter 8 Video 2 - Bar Plots with Two Variables in R (with ggplot2)

Chapter 7 Video 4 - Z Scores and SATs in R

11:31

Chapter 7 Video 4 - Z Scores and SATs in R

Переглядів 42Рік тому

Chapter 7 Video 4 - Z Scores and SATs in R

Chapter 8 Video 13 - Saving Plots in R (with ggplot2)

6:55

Chapter 8 Video 13 - Saving Plots in R (with ggplot2)

Переглядів 40Рік тому

Chapter 8 Video 13 - Saving Plots in R (with ggplot2)

КОМЕНТАРІ

@jeandenys7 13 днів тому
You make me laught when you laught saying must people ignore the brant test😅😅
@hoangminhtran7064 Місяць тому
Can you consult my R conduction personally? I will pay for you. Please contact me.
@anthonymenor1152 2 місяці тому
Hello, thank you for the helpful video, and subscribed! I had a brief conceptual question. If my model had an interaction term between 2 predictors (let's call them A (numeric) and B (binary)), and I used this in ggpredict() with the "terms" being x = A and group = B, then would the resulting plot trends represent that interaction right. As in, the slope for level 1 of B would differ from the slope of level 2 of B based on the interaction. Conversely, if I used an additive model of just A + B (no interaction) with the same "terms," the resulting slopes for levels 1 and 2 of B would differ but this would not reflect any interaction -- only the additive model. I guess my question is whether the 2nd plot (additive model) is just stratifying the data by levels of B and then producing separate lines for each level? And if so, which is better at capturing the moderating effect of B on the relationship between A and the outcome- comparing the lines of the 1st plot or comparing the lines of the 2nd plot? Please correct me if I am misunderstanding the concepts. I may be confusing stratified log regressions with the grouping you are doing here.
@qssd 2 місяці тому
Hi! This is a great question! Sorry for the delayed response; I needed to look into how `ggeffects()` implements interactions. You are correct that the plot here is additive; it's not an interaction. So, instead of the predicted probabilities for `scot` being based on `trust` and `age` set at their means, the predicted probabilities are when trust=1 and age=mean(age), trust=2 and age=mean(age), etc. If you do an interaction between `scot` and `trust` in the model, the predicted probability plot will look similar but not identical. If your interest is on capturing the moderating effect of a second variable, then an interaction in the model and then plotting the interaction for interpretation purposes would be better. You might want to check out the ggeffects interactions info: strengejacke.github.io/ggeffects/articles/ggeffects.html
@anthonymenor1152 Місяць тому
@@qssd Thanks a lot for explaining! What I ended up doing to visualize an interaction was using practically the same code you shared, but instead of setting an additive model in ggpredict(), I set it to a model with the interaction term in it.
@qssd Місяць тому
Makes sense!
@lizongzhang 4 місяці тому
I am greatly inspired by your video. Thank you! I found another way to make predicted prob plot: ggpredict(model.fit, terms = "scot" ) %>% plot()
@user-po7qd1bz1g 4 місяці тому
how to add fixed time effects in this model?
@qssd 4 місяці тому
The simplest way is to add a factor version of the time variable as a predictor in the model (e.g., `as.factor(year)`). But, if you have proper time series data, you probably want to explore a time series count regression model (e.g., Poisson autoregression).
@bbsyduam2452 5 місяців тому
Hi Thank you for the great video! Do you think this R code can be used when we are trying to judge whether our data has too much zeros for regression analysis other than negative binomial or poisson regression?
@qssd 5 місяців тому
Hi - thanks! Do you mean like for a binary outcome variable, where you would normally use logit or probit? It doesn't appear it will work for that. I took a look at the package (`performance`) and function's code on Github, and it should throw an error if it's not a checking a count distribution. I wouldn't be surprised if there are R packages/functions that allow you to test if there are too many zeros in a binary outcome variable, but I don't know them off-hand. For binary and ordered outcomes, if I'm worried there might be too many 0s or 1s (or similar for ordered case), I'll run a model with that assumes an asymmetric error distribution (e.g., complementary log-log, skewed logit), compare AIC/BIC values with the logit/probit versions, and choose the model with the lowest AIC/BIC.
@jessicawojcik8731 5 місяців тому
Thank you for the very understandable video! What model should I use, if my zero inflation test tells me my model is overfitting zeros?
@qssd 5 місяців тому
Thanks! If you are overfitting zeros using poisson regression, I would first use a negative binomial regression. If you are still overfitting zeros with NB regression, I would next use a hurdle model (probably the NB version) b/c of how it truncates the count data for only positive counts. That way you should get a better understanding of the relationship for 0 counts and positive counts.
@hen3vz 6 місяців тому
Show us how to run and interpret a multivariate probit in R please kind sir!
@qssd 5 місяців тому
Hi - sorry, I don't typically work with data used in a multivariate probit (i.e., correlated binary outcome variables) and I don't have an example queued up.
@josephbaya6654 6 місяців тому
How can I access the book
@qssd 6 місяців тому
Hi - at present, there isn't a free version of the book; I don't know if there will be. Thanks
@user-op3lh4ni6p 6 місяців тому
sir, can you share the word file please
@qssd 6 місяців тому
Hi - the Word file is just automatically generated by knitting the .Rmd in RStudio. So, there isn't anything special about the Word file; and also I'm not sure whether I still have it. Thanks
@guerschommugisho5569 6 місяців тому
Thank you Sir for this tutorial. I have a problem plotting predicted with ggprecti package. Could you help me please? I have already estimated regression output using nnet package as follow : Ado1 <- multinom(DV ~ Age + `Place of residence`+ `Completed primary school` + Region + Period + `Sexual violence` , data=ADO10km, weights = v005/1000000, family=binomial); summary(Ado1) When I want to plot with ggpredict like this : ggpredict(Ado1,termes="Period") I get the message error I usually get this error when I try to plot predicted probabilities from multinomial logistic regression with ggpredict : ! Can't extract column with `terms[2]`. x Subscript `terms[2]` must be a location, not a character `NA`. Backtrace: 1. ggeffects::ggpredict(Ado1, termes = "Period") 2. base::lapply(...) 3. ggeffects (local) FUN(X[[i]], ...) 4. ggeffects:::ggpredict_helper(...) 5. ggeffects:::.post_processing_predictions(...) ... 9. tibble:::`[[.tbl_df`(original_model_frame, terms[2]) 10. tibble:::tbl_subset2(x, j = i, j_arg = substitute(i)) 11. tibble:::vectbl_as_col_location2(j, length(x), names(x), j_arg = j_arg) 14. vctrs::vec_as_location2(j, n, names) 15. vctrs:::result_get(...)
@qssd 6 місяців тому
Hi - I'm not entirely sure, but I have a couple of ideas. #1, I noticed in your ggpredict() specification that you have 'termes' instead of 'terms'. I assume you are using the French for terms or this is error. Does ggpredict work with French? Have you tried "terms"? #2, have you tried adding `as_tibble()` on its own line between `ggpredict()` and `ggplot()` (with using the piping operator %>% to connect the lines)? #3, it looks like the error is saying it can't use an NA value. Does the variable "Period" have any values that only have NAs as observations? For example, all the observations for Period=2 are NA. It is also possible that NAs were created for in the ggpredict() function. For example, if the ggpredict() function didn't work correctly it may have produced NAs instead of numeric values. Have you checked the output when you run the ggpredict() function? #4, another thing to try is adding `filter(!is.na())` before `ggplot()`. So, and adding #2, ggpredict(Ado1, terms="Period") %>% as_tibble() %>% filter(!is.na()) %>% ggplot(...) (the ... is all the ggplot code). You can then see what gets plotted and see if it looks wrong. This might help you figure out the problem.
@guerschommugisho5569 6 місяців тому
@@qssd Thank you so much. It finally worked
@alexyankson4759 7 місяців тому
What package are you using for check_zeroinflation
@qssd 7 місяців тому
It's from the `performance` package
@user-jp5vq9vq5u 8 місяців тому
Hi Brian, good talk and help me a lot, thanks. My data are expressed with numerators and denominators of different counts, repectively. E.g., the fish is of disease occurrence in a pond and the initial population number set up in each pond is different. I want to detect whether the pond or other treatments have significant impacts of disease infection to the fish. How to conduct two outcome variables (counts in numerators and denominators ) in Poisson, NB or zeroinflated or hurdle models? Thank you.
@qssd 8 місяців тому
Hi - thanks for the question. Unfortunately, I'm not sure I have a great answer. If I understand correctly, if for each pond the numerator is number of fish with disease and the denominator is total number of fish in the pond (i.e., the population in the pond), can you just create a ratio or % and use something like OLS? Otherwise, if you have two related counts as outcomes, you can use the 'bivariate' versions of count models (e.g., Famoye 2010). I'm not very familiar with these models b/c I haven't needed to use them but you can find info online.
@ilhamtohari306 8 місяців тому
how we can find the confidence interval of odds ratio?
@qssd 8 місяців тому
Hi - good question. Honestly, I usually don't worry about confidence intervals of odds ratio as odds ratio is just a way to interpret the coefficients; where confidence intervals matter for the coefficients. I don't know an easy way per se. Since we use `exp(\beta)` to get the odds ratio value, my best guess is that we can use the `confint()` function to get the 95% confidence intervals of the coefficients and then use the `exp()` function. For example, `exp(confint(model.mlogit, level=.95))` will give you the odds ratio values of the 95% CIs of the coefficients.
@user-zz4tj3cn4w 10 місяців тому
Hi--thanks so much for this video and for 1.6. I am trying to add confidence intervals for group predicted probabilities and add them to a ggplot. I have tried to combine the info from the two videos but ended up getting different predicted probabilities than the group probabilities generated when I used the code from this video. Any suggestions on how to get confidence intervals for group probabilities and how it might be different than the 1.6 video? Thanks!
@qssd 10 місяців тому
Hi - I'm not sure why the difference, but I imagine it's something small w/ how they are calculated (e.g., a control variable is held at mode in one and mean in another). But, check out these new videos that I have on this (much easier code): ua-cam.com/video/qKchFtTuaBE/v-deo.htmlsi=7V2QQJC6OT15luG4 ua-cam.com/video/0-kSeGPHMFk/v-deo.htmlsi=Qob26xkMM_7Bh25m
@prashantchoudhary138 10 місяців тому
Hi, While combining the graphs, I get the error "non-numeric argument to binary operator" . please help.
@qssd 10 місяців тому
Hmmm. Generically, the error means you are trying to do something that requires numeric vectors (etc) but you are using non-numeric vectors (etc.). I don't think that matters here. My guess is that the patchwork package is not loaded (`library(patchwork)`). I got the same error as you when I didn't first run `library(patchwork)`. So, double check that.
@yadetafufa-qd2xi 11 місяців тому
Thank you so much for your contributions
@qssd 11 місяців тому
You are very welcome
@williamdesousadias1463 Рік тому
And how about ordered probit for painel data in R? you how to do it?
@qssd Рік тому
Yeah, that's a different kettle of fish. I don't have direct experience running that in R; besides from a two-wave panel survey where the time aspect is essentially irrelevant. But, the ordered probit part is likely the easiest part. The harder part is correctly specifying the errors of the panel overtime, i.e., fixed vs. random effects, etc. I'm sure you can find an R package that does this -- just google it. Sorry can't be more helpful.
@isaacbaah6743 Рік тому
Where can someone interested get the datasets
@qssd Рік тому
There is a link in the video description --- the line starting 'April 2023 Update:'
@dantitoprrito Рік тому
This series is amazing, thank you!
@qssd Рік тому
You're very welcome!
@elsavarelaredondo6868 Рік тому
The glimpse function is not used in this chapter in the book. It would be good to let watchers know that we have to uload the dplyr library to be able to use glipmse
@qssd Рік тому
Thanks for the comment. The `glimpse()` function is used in this chapter in the 2nd edition. The function is from the `dplyr` library, but simpler to use `library(tidyverse)` to load the core tidyverse packages at once.
@julianschmidt260 Рік тому
I don't usually post comments, but I'm doing the analysis for my bachelor thesis and I'm also using polr() models. Your videos have helped me enormously, especially in interpreting the results. Thank you very much.
@qssd Рік тому
Awesome! Glad they were useful.
@jkarimb Рік тому
I am very thankful for your tutorials! I could not have written my thesis without them!
@qssd Рік тому
Terrific, thanks so much!
@CanDoSo_org Рік тому
Hi, since the estimates (coefficients) are the log of odds, why the exponential of the coefficients are not the odds, but the odds ratio? Thanks.
@qssd Рік тому
There is a bit of tricky nuance. I use this example - Think of the odds as an event occurring vs. not occurring, and the odds ratio as the odds an event occurs for one group vs. another group(s). In the regression context, we are comparing different groups (our predictors) for an event occurring. Extending beyond 'groups' (i.e., nominal predictors), Long (1997) shows that for ordered predictors exp(\beta_i) is the one-unit change in odds, which is the odds ratio and not odds. I prefer to reference logit coefficients as providing 'logits' and not 'log of odds' to reduce confusion.
@CanDoSo_org Рік тому
@@qssd Thanks.
@CanDoSo_org Рік тому
Hi, could you please compare the pros & cons between logit and probit, thanks?
@qssd Рік тому
Mainly, if you want to use odds ratio for interpretation you have to use logit. Otherwise, it is personal preference --- only difference between logit and probit is assumption about error distribution, and almost always get same results.
@CanDoSo_org Рік тому
@@qssd Thanks.
@MahadHassan-ql9ti Рік тому
Are any of these datasets available so we can work along with you?
@qssd Рік тому
Hi - The datasets, as part of the digital resources, should be available on the book's website sometime this month. When they are available, I'll include the links on the channel. Sorry they're not available yet.
@MahadHassan-ql9ti Рік тому
@@qssd Thanks! I've just received my Kindle version so looking forward to tackling these exercises! :)
@qssd Рік тому
The data and codebooks are now up: study.sagepub.com/Fogarty2e
@MahadHassan-ql9ti Рік тому
@@qssd Thank you!!!
@qssd Рік тому
The RStudio IDE is now downloadable from posit.co (replacing rstudio.com)
@silvadidierrr Рік тому
You just saved me 4h! For likert scales I will use as.factor and then as.numeric instead of using search and replace in Excel!
@qssd Рік тому
awesome!
@CanDoSo_org Рік тому
Hi, thanks for the great tutorial, but I did not find the exact data you used here. There are a lot of dataset on the site you refer to and no clue which one is the one you used.
@qssd Рік тому
Thanks. You're correct about the data location. It appears that the site has been overhauled. The data used here is the 'microdata teaching file' from the 2011 Scottish Census: www.scotlandscensus.gov.uk/documents/microdata-teaching-file-and-user-guide/ Some of the variables in the dataset used in the video were recoded from the original variables in the downloadable data.
@CanDoSo_org Рік тому
@@qssd Thanks for your quick reply.
@roypeijen 2 роки тому
Thanks for sharing this video. I've a question about the "with()" command. In your tutorial, you only have one factor variable (gender) with two levels and that is why you put the '2' after the length.out=100 option, right?. In case I would have a 3-level factor variable, this would have been a '3', right? I'm running a model now with two factor variables. One factor is a 2-level factor variable (gender) and the other is a three-level factor variable (education). My question is, what number do I need to fill at the place you filled in the 2 when knowing my model with two different factor variables? Hope you can help.
@qssd 2 роки тому
That's right - the '2' refers to the 2 values of gender (which are the lines to plot). The answer of what to do when you have two factor variables comes down to what you want to plot? For this type of plot, I believe the x-axis variable must be numeric and you can only have one variable as a factor that will generate the different lines in the plot. So the number to put in will be how many levels the one factor variable has. If your second factor is just in the plot as a 'control', like marital_dummy, you might need to convert it to a numeric or you could try setting to one of the levels (like the modal value); but I haven't tried it. An alternative way to plot these is using the `ggpredict()` function from the `ggeffects` package. The `ggpredict()` function essentially does all of the code prior to the ggplot() code and then you can specify all the ggplot() code and link the two using the pipe operator %>%
@roypeijen 2 роки тому
@@qssd Thanks! I converted the gender variable to a numeric one because that is not the one I wanted to plot but the three-level factor variable is. Works perfectly now!
@KirtiTewari 2 роки тому
The best video ever!
@qssd 2 роки тому
Thanks so much!
@deniseou2670 2 роки тому
First time to leave the comment in UA-cam! I watched all the logit regression videos you produced! You are way too good compared to my teacher! Thanks so much! I just wanna say thank you!
@qssd 2 роки тому
Wow, thanks!
@xtxpxhx 2 роки тому
oh this was great thank you much!
@qssd 2 роки тому
You're so welcome!
@supradg 2 роки тому
Why do not make the Education and Scottish Identity as (ordered) categorical variables? I think that we cannot change these by 1 percentage point, for example? Does making them numeric make sense?
@qssd 2 роки тому
You could make them explicitly ordered categorical or numeric. Here, you will get the same regression coefficients with either.
@kasberge7164 Рік тому
@@qssd strictly speaking, you would have to recode them into dummies to include them in a regression model as independent variables, or not? You are treating them as numeric, which I don't get- is this accepted as a quantitative social science practice`?
@qssd Рік тому
@@kasberge7164 You would only dummy them if they are nominal-level variables. Edu and scot id are ordinal-level variables, so the simplest option is treating them as numeric. You would only recode them as dummies if wanted to create a new version(s) of the variable -- for example, scotID = 7 (so, '1' in a dummy) vs. scotID=1-6 (so, '0' in a dummy).
@haintuvn 2 роки тому
How can we interpret "Women's odds" so the other person who is not familiar with the concept "odds" can understand? Does that mean "possibility"?
@qssd 2 роки тому
Good question. Although odds are related to probability, we often don't think about them the same as probability (or predicted probability). Generally, we can think of odds as the "chance" or "likelihood" of an event occurring. Here, the greater the odds, the more likely an event will occur.
@michalispapadopoulos5090 2 роки тому
thanks a lot sir! Very helpful!
@qssd 2 роки тому
Glad it helped!
@lorenzocapitani9556 2 роки тому
This is probably one of the greatest videos I've found on the topic. Very nicely explained. Thank you kind sir!
@qssd 2 роки тому
Wow, thanks!
@joshuasmith1526 2 роки тому
At line 42 I get the error "Factor has new levels 0, 1" any idea how to resolve this?
@qssd 2 роки тому
The error is likely referring to a mismatch in what the factor levels are currently and what you are trying are referencing in the code. So, it might be they are 1 and 2, but you asking for levels 0 and 1. Try changing the code to '1:2' or recode the factor levels prior to the plotting code. (Sorry for the late response, you might have figured this out already).
@ezechielamoussou7409 2 роки тому
Hello and thank you for you video 🙂 Would it be possible to do the graph if my variables were factors?
@qssd 2 роки тому
Sorry for the late reply. For this type of plot, the x-axis needs to be numeric in some way (so, a variable that is ordered with at minimum 3 values). You would have to use a different plotting technique to plot just factor variables (that is, nominal-level variables).
@ezechielamoussou7409 2 роки тому
@@qssd thank you !
@joshuawelch33 2 роки тому
Is it possible to move the dashed line? For example, if I want it at 1 vice 0.
@qssd 2 роки тому
I don't think you can with the coefplot() function. It also would defeat the purpose of this plot as the dashed line at 0 provides info on statistical significance; most of the other coefficient plotting functions I've seen have some version of the 0 line. The coefplot() manual says you could get rid of the line with zeroType=0 as an option. Otherwise, you could try to do this plotting using ggplot2.
@KN-tx7sd 2 роки тому
if your outcome is continuous instead of binary as you have shown how will the interpretation be done? Kindly explain,
@qssd 2 роки тому
Hi - if your outcome variable is continuous then you should use linear regression and not logit regression. If you mean a predictor variable that is continuous, then the odds ratio interpretation is done in the same way as for 'general_health' in the video. The one possible difference is that if the continuous predictor's values are meaningful then you can use them instead of the generic 'unit'. For example, if the predictor's values are raw dollars, you can still say 'for a one-unit increase', but you can also say 'for a one-dollar increase'.... You just need to make sure you know the unit of measurement (e.g., raw dollars, millions of dollars, etc.), otherwise you might mess up the interpretation.
@KN-tx7sd 2 роки тому
excellent, excellent
@qssd 2 роки тому
Thank you! Cheers!
@gabriellamartinez7985 2 роки тому
Hello thank you for this video, its been super helpful! I have a question regarding the dependent variables. How would you interpret the polr function output for dependent variables that are factors? For example, if RefvotDum was a factor with the same 0, 1 levels or No, Yes levels, how would you interpret the coefficients in that case?
@patric001122 2 роки тому
I also would like to know, how to interpret the coefficients. Can I just use the "(exp(model.1$coefficients[-1])" command to get the odds-ratio? is there an opportunity how I could get the marginal effects? And is it also valid to use the "scale()" command to standardize the coefficients? Thank you very much for your videos, they helped a lot!
@qssd 2 роки тому
Yeah, sorry, I haven't had time to create new videos on the interpretations. To get polr to run with a factor outcome variable, you need to classify it as ordered (e.g., as.ordered(as.factor()) ). The interpretations, though, should be the same for a numeric or ordered factor outcome variables.
@qssd 2 роки тому
Yes, you can use the same code for the odds ratio. The difference for interpretation from the binary logit is that the odds need to be discussed as 'more', 'increased, 'greater', etc. instead of a specific outcome. This is b/c ordered logit uses cumulative odds ratio and so the odds value is the cumulative odds of a lower to higher outcome. For example, 'for a one-unit increase in education, the odds of *greater/increased/more* trust increases by a factor of..... I don't usually work with marginal effects, but there is a lot you can do with predicted probabilities. You should be able to use the scale() function, but I haven't thought through the impact on the interpretations... Thanks.
@patric001122 2 роки тому
@@qssd Thank you so much for the fast answer, it helped a lot!
@qssd 2 роки тому
Welcome!
@Habalabaloooo 2 роки тому
How do you rename the predictors from a glm output? Especially for factor vectors with levels the naming becomes burdensome.
@qssd 2 роки тому
Agreed, it is burdensome. You can use tidyverse functions mutate() and recode() to label the factor levels then pipe it (%>%) to the ggplot2 code, but it still requires you to write-out the labels at least once.
@loopyloup 2 роки тому
Great vid thank you. Very helpful for my term project.
@qssd 2 роки тому
Glad it was helpful!
@mgmmac 2 роки тому
good vid
@qssd 2 роки тому
Thanks!
@Mattee77 2 роки тому
Hi, several econometrics books states that you cannot interpret Pseudo R2 similar to OLS R2 since they both have different forms of error terms. Pseudo R2 presented should be taken with a grain of salt. Aside from that, good tutorial on R! thanks!
@qssd 2 роки тому
Yes, you are right. Pseudo R2 is based on log-likelihood values.
@saadalhumaid3959 3 роки тому
Good tute thanks bro
@qssd 3 роки тому
You're welcome
@yanganontu4059 3 роки тому
Hi Brian really appreciate your work in my research i would like to measure the contribution of household food insecurity access scale which have discrete depended variables to beekeeping participation. can i use ordered probit model or ordered logit which one is better of?
@qssd 3 роки тому
Thanks! Sorry for the delay. If the dependent variable is ordinal, then either ordered logit or probit work equally well. There is no real difference. The choice is usually what you prefer like the choice between binary logit or probit. The one caveat is that, obviously, if you use ordered probit then you can't use odds ratios for interpretations.
@crahul1987 3 роки тому
Very nice....can you please upload the second part?
@qssd 3 роки тому
Thanks! Sorry for a delayed response. I haven't had the time to create more of the videos, unfortunately.

Quantitative Social Science Data Analysis

КОМЕНТАРІ