More on Causal Effects 👇 👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosVVTz9HEzpI4d6xpWsc8rOa.html 📰 Read More: medium.com/towards-data-science/causal-effects-via-regression-28cb58a2fffc Some resources I found helpful: -Book chapter by Gelman & Hill: www.stat.columbia.edu/~gelman/... -DML paper: arxiv.org/abs/1608.00060 -DML talk: ua-cam.com/video/eHOjmyoPCFU/v-deo.html -Metalearner paper: arxiv.org/abs/1706.03461
Good question. I don't know of any standard approaches for doing that. One could do uncertainty quantification based on the specific type of model used, but it's hard to define what "best" would mean in this context.
Hi Shawhin, thank you for this amazing playlist! I have a question: 1. Before we get to estimation, we evaluate whether the causal effect is identifiable or not (through identify_effect()). This step would give us a sufficient set of variables we need to observe in order to compute our causal effect. I believe this is captured in the estimand 2. So, are these regression models built on only these variables from the sufficient set? Or do these models use all the variables provided? We pass in the estimand to model.estimate_effect(), so I'm wondering how do we connect the estimand to the estimation step It would be super helpful if you can throw some light on what happens behind the scenes here, and any mistakes in my understanding from above. Thanks!
Hi Sateesh, great question. Here, identify_effect() generates an estimand based on the causal graph which implicitly requires a sufficient set, if identifiable. My understanding is only the variables in the estimand will be use to estimate the effect. Hope that helps!
Thanks for superb explanations. I have two questions on my understanding by assuming a different DAG, can you comment? 1: In your notebook with linear regression, double machine learning and x-learner, as all with method_name="backdoor.****", can I assume that it won't capture any other variable in the DAG, like W, whose interaction with X has impact on the outcome, i.e. the outcome Y is collider from treatment X and this independent variable W, meanwhile, there is no arrow between X and W in the DAG as there is no causal relation between the two. i.e. this causal analysis will never capture interaction impacts with X on Y. If that is the case, how we trust or interpret the resulting ACT, as it only reflect partial picture. 2: For linear regression and double machine learning, as both specified either with method_name="backdoor.linear_regression") or 'model_y':LinearRegression(), can I assume that it won't capture quadratic term for X, so the fundamental models used are not the best fit for X and Y. If that is the case, how we trust or interpret the resulting ACT, as the backend models are the best fit.
When we use regression for inference, how can we evaluate the performance? Even a model with poor R2 provides valid conclusions around the causal inference?
That's a good question. While I can imagine cases where a poor-performing model can still give valid conclusions, I'd say generally it's preferrable to have a model that performs well. The reason is a model with good performance explains the data well (by definition) and thus gives me greater confidence that the (causal) assumptions we are making have some connection to reality.
Another question: In the S-learner you comment that we can use a multilevel variable. But how would the ITE and ATE be calculated in that case? Do we need to calculate ITE and ATE for each pair of values in the treatment variable?
Yes in the multilevel case one would need to compute many ITE/ATE values. I would do it for adjacent pairs ie (0,1), (1,2), etc. This would estimate an ITE function based on treatment level. Or put another way, you wouldn’t just have a single ITE value for a particular unit rather a set of values.
So basically for the T-learners we use one model to fit the untreated data with the target variable and another one to fit the treated data with the target variable (with both models excluding the treatment variable itself) and then use the treated model to predict the treatment outcome value and the untreated model to predict the control outcome value for all rows in the dataset? But that means we used both models to predict values on (at least some of) the data that it's trained on right? Is this really a good practice?
Yes exactly right! That's a great point. I think what you are getting at is a concern that our models may overfit, if we do not have training and testing datasets. I haven't come across that point from my read of the literature, but as a practitioner I would definitely try out different train-test splits and evaluate their performances.
One question: Before estimating an effect, you have to construct the DAG. Isn't it? Which variables/nodes from the DAG are the covariates (Z)? Or you just include every other variable available?
I guess that you don't have to include, for example, a collider for the target and the treatment. I.e., variable that is affected by both target and treatment.
That’s a good question. I would recommend constructing a DAG as a first step to any causal analysis but it is not strictly required for these regression-based techniques.
I would be cautious about (blindly) including all other available variables in the analysis. My intuition on this is it comes down to Berkson’s paradox. There is a nice write up by Judea pearl on this issue: ftp.cs.ucla.edu/pub/stat_ser/r348.pdf This is one reason why a DAG is helpful
More on Causal Effects 👇
👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosVVTz9HEzpI4d6xpWsc8rOa.html
📰 Read More: medium.com/towards-data-science/causal-effects-via-regression-28cb58a2fffc
Some resources I found helpful:
-Book chapter by Gelman & Hill: www.stat.columbia.edu/~gelman/...
-DML paper: arxiv.org/abs/1608.00060
-DML talk: ua-cam.com/video/eHOjmyoPCFU/v-deo.html
-Metalearner paper: arxiv.org/abs/1706.03461
How do I choose which model is the best? Is there a metric such as RMSE?
Good question. I don't know of any standard approaches for doing that. One could do uncertainty quantification based on the specific type of model used, but it's hard to define what "best" would mean in this context.
Hi, many thanks for this video, I hope when you can do a video about 2ML ..
Happy to help! I talk about double ML here: ua-cam.com/video/O72uByJlnMw/v-deo.htmlsi=lVH23-O_uFAEZ0Du&t=326
Love all this info. Great resources and superb explanations. :)
Thanks Jaime!
Hi Shawhin, thank you for this amazing playlist! I have a question:
1. Before we get to estimation, we evaluate whether the causal effect is identifiable or not (through identify_effect()). This step would give us a sufficient set of variables we need to observe in order to compute our causal effect. I believe this is captured in the estimand
2. So, are these regression models built on only these variables from the sufficient set? Or do these models use all the variables provided? We pass in the estimand to model.estimate_effect(), so I'm wondering how do we connect the estimand to the estimation step
It would be super helpful if you can throw some light on what happens behind the scenes here, and any mistakes in my understanding from above. Thanks!
Hi Sateesh, great question.
Here, identify_effect() generates an estimand based on the causal graph which implicitly requires a sufficient set, if identifiable.
My understanding is only the variables in the estimand will be use to estimate the effect.
Hope that helps!
@@ShawhinTalebi Awesome! Thank you so much for the response!
Thanks for superb explanations. I have two questions on my understanding by assuming a different DAG, can you comment?
1: In your notebook with linear regression, double machine learning and x-learner, as all with method_name="backdoor.****", can I assume that it won't capture any other variable in the DAG, like W, whose interaction with X has impact on the outcome,
i.e. the outcome Y is collider from treatment X and this independent variable W, meanwhile, there is no arrow between X and W
in the DAG as there is no causal relation between the two. i.e. this causal analysis will never capture interaction impacts with X on Y. If that is the case, how we trust or interpret the resulting ACT, as it only reflect partial picture.
2: For linear regression and double machine learning, as both specified either with method_name="backdoor.linear_regression") or 'model_y':LinearRegression(), can I assume that it won't capture quadratic term for X, so the fundamental models used are not the best fit for X and Y. If that is the case, how we trust or interpret the resulting ACT, as the backend models are the best fit.
Thanks for the questions.
1) I'm not sure I understand this. But here's my understanding, if you have X->Y
@@ShawhinTalebi Thanks Shaw, talk you soon.
When we use regression for inference, how can we evaluate the performance? Even a model with poor R2 provides valid conclusions around the causal inference?
That's a good question. While I can imagine cases where a poor-performing model can still give valid conclusions, I'd say generally it's preferrable to have a model that performs well.
The reason is a model with good performance explains the data well (by definition) and thus gives me greater confidence that the (causal) assumptions we are making have some connection to reality.
Another question:
In the S-learner you comment that we can use a multilevel variable.
But how would the ITE and ATE be calculated in that case?
Do we need to calculate ITE and ATE for each pair of values in the treatment variable?
Yes in the multilevel case one would need to compute many ITE/ATE values. I would do it for adjacent pairs ie (0,1), (1,2), etc.
This would estimate an ITE function based on treatment level. Or put another way, you wouldn’t just have a single ITE value for a particular unit rather a set of values.
@@ShawhinTalebi sounds reasonable.
Thank you again :) you are great man
So basically for the T-learners we use one model to fit the untreated data with the target variable and another one to fit the treated data with the target variable (with both models excluding the treatment variable itself) and then use the treated model to predict the treatment outcome value and the untreated model to predict the control outcome value for all rows in the dataset? But that means we used both models to predict values on (at least some of) the data that it's trained on right? Is this really a good practice?
Yes exactly right!
That's a great point. I think what you are getting at is a concern that our models may overfit, if we do not have training and testing datasets. I haven't come across that point from my read of the literature, but as a practitioner I would definitely try out different train-test splits and evaluate their performances.
One question:
Before estimating an effect, you have to construct the DAG. Isn't it?
Which variables/nodes from the DAG are the covariates (Z)? Or you just include every other variable available?
I guess that you don't have to include, for example, a collider for the target and the treatment. I.e., variable that is affected by both target and treatment.
That’s a good question. I would recommend constructing a DAG as a first step to any causal analysis but it is not strictly required for these regression-based techniques.
I would be cautious about (blindly) including all other available variables in the analysis. My intuition on this is it comes down to Berkson’s paradox.
There is a nice write up by Judea pearl on this issue: ftp.cs.ucla.edu/pub/stat_ser/r348.pdf
This is one reason why a DAG is helpful
@@ShawhinTalebi thank you so much, I'll take a look at that paper. You are helping me a lot to understand this causality world! :) Lots of love
Happy to help! 😁
Double Machine Learning sounds like something you say as a joke to someone when theyre using machine learning
😂😂 we need more machine learning!!