Simple techniques for dealing with missing data

Mikko Rönkkö

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 14 жов 2024

КОМЕНТАРІ • 30

@duckcluck123 Рік тому ⁺¹
I loved that you added those simulation results. That was very interesting and helped my understanding
@mronkko Рік тому
You are welcome!
@MaverickRam Рік тому
Very helpful and simplified explanation. Thanks for the video!
@mronkko Рік тому
You are welcome!
@mohamadmatinhavaei9859 7 місяців тому
Great job, but what about missingness that exist in a single column and also it's more than 50%? Is deep models like GAN would be useful for imputation?( In time-series prediction). Many thanks🙏
@mronkko 7 місяців тому
I assume GAN refers to some kind of neural network. Imputation works regardless of the amount of missing data, under these three conditions:
1) You are doing multiple imputation and not single imputation so that you can quantify the uncertainty introduced by the imputation process.
2) The imputation model contains all features of your data that are relevant for the analysis.
3) The missingness does not depend on the missing value itself. (i.e. data are MAR or MCAR)
I do not really see what neural nets would add over throughfully developed imputation model but they are likely to increase sample size requirements.
@mohamadmatinhavaei9859 7 місяців тому
@@mronkko "Hi again Mikko, I'm tackling a unique challenge with my dataset and believe your insights could greatly help. Could you share any contact info for more brief discussion? Thanks!"
@mronkko 7 місяців тому
@@mohamadmatinhavaei9859 I take consulting orders through instats.org/expert/mikko--rönkkö-829.
@surya-td4dg 3 роки тому ⁺¹
Strong Finnish accent :).. Thank you for the awsome content
@mronkko 3 роки тому ⁺³
I take the comment about my accent as a compliment ;) Funny thing: I used to live in the US and part of the accent was lost during that time. Even if that is about 20 years ago now, I still see my accent diminish when I spend a couple of days there. But now that we cannot travel the accent is as strong as ever!
@ashayagarwal Рік тому
I found your channel recently, and started liking your teaching approach. I want to ask if pairwise deletion is possible in regression y = X*beta + e, beta = inv(X'X)X'y. It is possible to calculate a pairwise version of X'X. Would love to hear your thoughts. Thx
@mronkko Рік тому
In pairwise X'X you would need to adjust for sample size for each cell. But in principle you can estimate pairwise covariances of all the variables and then estimate regression from that covariance matrix. The resulting estimator should be consistent under MCAR but getting the standard errors right would require adjustments to the complete data standard error formulas. I have not seen any paper discussing how to do this and therefore I would not be comfortable using this approach. That being said, that I have not read something does not mean that it does not exist. I have just come to the conclusion that because FIML and multiple imputation exist already and I know how to do both, there is little reason for me to learn about other approaches to adjusting for missing data in estimation.
@bandungmee Рік тому
Hi
It was mentioned that "the imputed data can only be used within the pooling testing and cannot be used for the model testing".
Does it mean the data is only imputed/simulated for the purpose of analysing its reliability?. If it cannot be used for model testing, does it mean we still need to use the actual data and perform the deletion of missing data?
Correct me if I'm wrong
Thank you
@mronkko Рік тому
I need more context. Can you give me a timestamp from the video?
@danielvoss2483 7 місяців тому
Good Job 👍👍👍
@mronkko 7 місяців тому
Thanks!
@zhaowu3193 2 роки тому
Hi, thank you for the content. I would like to know how to choose the reference variable, for example, in your case IQ is taken as a reference when imputing job performance. Actually I have a lot of variables in my data set where some of them have a lot of missing values. How can I identify which variable to refer when I want to impute another one?
@mronkko 2 роки тому
Your imputation model needs to use all variables and model all relationships that you have in your main model. In addition, you can use auxiliary variables (I have a video about that). The rule with auxiliary variables is that you should be liberal in including them. However, if your sample size is small you can start to get bias and computational difficulties if you include too many.
@PavanKumar-ef1yy 8 місяців тому
Thanks a lot sir
@mronkko 7 місяців тому
Most welcome
@George70220 Місяць тому
Our teacher is focusing on us using KNN to impute data. This seems like a biased method like the traditional methods but I'm not 100% sure.
@mronkko Місяць тому
What does KNN stand for.
@Stelnice 3 роки тому
Hi! In what types of research can I use pairwise/listwise deletion?
@mronkko 3 роки тому ⁺²
Deleting observations is never ideal if you only consider it from statistical perspective. However, simplicity is also a virtue in applied research (for example, you would be less likely to make mistakes if you keep things simple) and simple techniques should be used over complex ones if the difference in outcomes is small. Deleting observations is OK if a) your sample size is sufficient after deletion and b) your missing data are MCAR. I would not use pairwise deletion because using a different sample size for different analyses complicates things, but this depends on how the data are missing.
@Stelnice 3 роки тому
got this, thank you!
@saimakanwal1863 Рік тому
Sir... Which book you are using
@mronkko Рік тому
Enders 2010. It is cited in the video.
@couragee1 2 роки тому
thank you
@mronkko 2 роки тому
You are welcome

Наступне

Автоматичне відтворення