2 - Potential Outcomes (Week 2)

Поділитися
Вставка
  • Опубліковано 10 лис 2024

КОМЕНТАРІ • 80

  • @sethjchandler
    @sethjchandler 3 роки тому +20

    A masterpiece of clarity.

  • @woods.9549
    @woods.9549 4 роки тому +8

    Nice death star easter egg!

  • @hafsabenzzi3609
    @hafsabenzzi3609 Рік тому +1

    wonderful lecture bravo!

  • @Theviswanath57
    @Theviswanath57 4 роки тому +1

    In Slide #40 with regards to Estimation: I feel it should be sigma_i rather than sigma_x;
    Currently it's 1/n * ( Sigma_x ( { E[Y | T=1, x ] - E[Y | T=0, x] } ))
    I feel it should be 1/n * Sigma_i ( { E[Y | T_i=1, X_i ] - E[Y | T_i=0, X_i] } )) which we can re-written as Sigma_x ( P(X=x) * (E[Y|T=1, X=x] - E[Y|T=0, X=x]) )

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому +2

      You are absolutely right. Unfortunatley, some typos might stay in the videos, even if they have been fixed in the book.

    • @Theviswanath57
      @Theviswanath57 4 роки тому

      Reason:
      Let's there are four sub-groups with following conditional average treatment effect: 1, 0.5, 1.5, 2.5
      Let's say P(X=x) = [ 0.5, 0.2, 0.2, 0.1]
      Let's say there are total 100 subjects
      with the first equation: ATE will be (1/100) * ( 1 + 0.5 + 1.5 + 2.5 ) = (1/100) * 5.5 = 0.055
      with the second equation ATE will be ( 0.5*1 + 0.2*0.5 + 0.2*1.5 + 0.1*2.5 ) = 1.15

  • @charismaticaazim
    @charismaticaazim 3 роки тому

    Brady, do the casual theory literature say anything about "knowing the presence of confounding variables, but not being able to know or measure what they are". This would hint the domain expert that there's something else that is influencing the decision.
    Also, in terms of the shoe example, since we know being drunk is contributing to the outcome it wouldn't really be a confounder if we know
    it, right.

  • @Theviswanath57
    @Theviswanath57 4 роки тому

    On Final Estimation example:
    Question 1: By controlling for age, our estimated ATE is matching with actual ATE; but whereas by controlling for both age & 'protein excreted in urine', our estimated ATE is just 0.85;
    Question 2: What's the causal graph with both age & protein excreted in the urine
    age blood_pressure } where age is confounding variable
    Actual ATE: 1.05 & estimated_ate: 1.05 ( Both from the "mean of differences" & from regression coefficient )

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому

      I'm not sure I see a question in there haha. It sounds like you are describing the code. Note: some of that code is for Chapter 4, where we acually write down the causal graph, so it might not all make sense without Chapters 3 and 4.

    • @Theviswanath57
      @Theviswanath57 4 роки тому

      @@BradyNealCausalInference Cool, will wait for chapter 3 & 4 to be covered

  •  2 роки тому

    In Unconfoundedness, does the conditioning to X means if we fill the group "went to sleep with shoes" with ALL PEOPLE DRUNK, and fill the group "went sleep without shoes" ALSO WITH DRUNK PEOPLE, is a workaround for to fill both groups with random people, selected by a coin? The negative aspect of this is that some data will be lost because we only care about a subset of the dataset (e.g. DRUNK=1, ignoring all data with DRUNK = 0)?

  • @Fhoneysuckle
    @Fhoneysuckle Рік тому

    Hi Brady,thanks for your awesome lecture.But I have a question about the Ignorability and Exchangeablility.In the Causal Inferences: What if ,Randomization
    refer to the joint independence of potential oucome as full exchangeability. Randomization makes the potential outcome jointly independent of treatment T which implies,
    but is not implied by exchangeability.So why the Randomization/Ignorability means joint independence rather than marginal distribution?

  • @Theviswanath57
    @Theviswanath57 4 роки тому

    @Brady: In Jason A. Roy's Coursera course both no-interference assumption & only one way of getting treatment is clubbed under SUTVA;

    • @Theviswanath57
      @Theviswanath57 4 роки тому

      Whereas in your example of "Golden retriever or other dog" which I guess violating "only one way of getting treatment assumption" your putting consistency assumption

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому +1

      @@Theviswanath57 Not entirely sure i understand you comment, but are you saying this:
      "SUTVA is satisfied if unit (individual) i's outcome is simply a function of unit i's treatment. Therefore, SUTVA is a combination of consistency and no interference (and also deterministic potential outcomes)."
      If so, that sounds right to me. That's taken from Section 2.3.5 of the course book (not everything makes it into the lecture)

    • @Theviswanath57
      @Theviswanath57 4 роки тому

      @@BradyNealCausalInference make sense, thanks

  • @kangchenghou5027
    @kangchenghou5027 4 роки тому +2

    Thanks for the great lecture again! I learnt a lot and I have a few questions:
    1. The fundamental problem in causal inference refers to that for each individual, we only get to observe one potential outcome. Ways to get around this is to make assumptions, therefore convert a causal estimand to a statistical estimand. So far in the course, it seems that we cope with average treatment effects. To estimate individual treatment effects, is it that we need more assumption there? Will we cover that in the course?
    2. For positivity assumption, if for some covariates, P(T = 1 | X = x) is very close to 0 or 1. The estimation will be fine if we have access to the full distribution. When it goes to the estimation using the finite samples, it will leads to big variance. So to have good estimate to the treatment effects, we would want P(T = 1 | X = x) not go to the extreme, is this correct? This also reminds me of the bias-variance tradeoff: including more covariates reduce confoundedness (bias), but may lead to estimate with high variance (variance). Does this make sense?
    3. This is more of a comment: I think the lectures mentions that including more covariates is better (correct me if I am wrong). I think it may worthwhile to mention, this is not always the case, for example X -> C

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому

      1. Awesome question. Makes me think you already know the answer haha ;). To move from ATEs to ITEs, we do need to make stronger assumptions. The stronger assumptions we need to make have to do with the specific functional form and noise distribution (in addition to the causal graph). This corresponds to moving from Level 2 to Level 3 of Pearl's ladder. We will see this later in the course when we get to counterfactuals.

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому

      2. You are exactly right on both counts. When we get to estimation in week 5, we will actually see that people sometimes just drop specific examples where P(T = 1 | X = x) is too close to 0 or 1. Your bit about the bias-variance tradeoff is also right (usually).

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому

      3. Right again. I mention this in sidenote 8 of Chapter 2 in the book (www.bradyneal.com/Introduction_to_Causal_Inference-Sep1_2020-Neal.pdf). I think I meant to use weak language in the lecture (e.g. "there is a general perception that this is the case"). If I used strong language (e.g. "this is the case"), would you mind linking me to it, as I should probably correct that with an annotation.

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому

      4. I do everything with PowerPoint and TikZ (since I use TikZ for the book, might as well just reuse those figures in the slides). I sometimes use Inkscape when I need more flexibility than both of those can easily provide.

    • @kangchenghou5027
      @kangchenghou5027 4 роки тому +1

      ​@@BradyNealCausalInference Thanks for the detailed explanation! For 3, it could be just my perceptual bias :) You did mention this is not the general case. But just for the reference, 34:32 "for unconfoundedness, the general idea (which is not always true) is that the more covariates you condition on, the more likely you are to have satisfied unconfoundedness." For 4, may i know how do you integrate the latex with powerpoint?

  • @YashSharma-yw9er
    @YashSharma-yw9er 3 роки тому

    How is the two groups (shoe sleepers and non-shoe sleepers) not being comparable considered a separate reason for association not being causation? Isn't it indirectly a confounder as well?

  • @sourajmishra1450
    @sourajmishra1450 3 роки тому +2

    Hey Brady, Thanks for the great course!! In slide 17: Why does E[Y(1)|T=1] becomes E[Y|T=1]? and same for E[Y(0)|T=0] = E[Y|T=0]?

    • @shipan5940
      @shipan5940 2 роки тому +1

      my understanding, because the condition is T=1, so Y(T) = Y(1) = Y(all) = Y. My own way of explaining this. If T could be 1 or 0, it can't be simplified like this.

    • @rajeevbhatt7415
      @rajeevbhatt7415 2 місяці тому

      It's after applying the consistency assumption because we are guaranteed that for T=t, we will get Y(t), so Y | T = t is sufficient.

  • @sahilverma1635
    @sahilverma1635 4 роки тому +3

    Hello Brady. I have a silly doubt, what is the difference between Y(0) and Y | T= 0 ?

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому +8

      Y(0) corresponds to "take a random person in the whole population and force them to take treatment 0." Y | T = 0 corresponds "take a random person from the subpopulation that happened to take treatment 0." Some of the comments in the threads on this video might also be helpful: ua-cam.com/video/eg-bFhNKbnY/v-deo.html

    • @michelspeiser5789
      @michelspeiser5789 Рік тому +1

      @@BradyNealCausalInference This is a very helpful formulation, that I recommend to be included in the course (unless it's already there and I missed it)

  • @Ptilu2
    @Ptilu2 2 роки тому

    Hi Brady! Thank you so much for those lovely pedagogical videos! There is something I am struggling to wrap my head around though and I was wondering if somebody (you or some other kind soul) could help me with here. You presented ignorability as resulting from an assumption of independence between the causal variables Y(1) and Y(0), leading to E[Y(1)|T=0] = E[Y(1)|T=1]. Isn't this independence meaning that basically the treatment has no causal effect on Y? Instead of removing the arrow from X to T, aren't we removing all arrows leading to T?
    If I try to explain in other words my confusion: if the expectation of the outcome Y(1) does not change whether we give T or not, doesn't it mean that T is not causal for Y? I am obviously having a logic flaw here somewhere so I would be glad if someone could help me seeing it :)

    • @Ptilu2
      @Ptilu2 2 роки тому

      I think I am confusing Y(1) with Y=1 here, while in fact it is Y|do(T=1). Some getting used to...

  • @gwillis3323
    @gwillis3323 3 роки тому

    Hey, you say that the approach at the end, where you train a regression of the form y=at + bx only works because the treatment effect is the same for all individuals (ATE=CATE). I don't think this is correct. In fact, the paper which introduced the Double Machine Learning approach starts off by showing that for the case of y = at + g(x), standard approaches which predict y well will give biased estimators for a (although granted, the Double Machine Learning approach really starts to shine when y=f(x)t + g(x)). Do you have any intuition on why the linear regression approach works so well here? Is it because the outcome variable depends linearly on both the treatment and the feature? Will it always work well in such cases? My intuition says no, that confoundedness can still mess you up. Maybe it's just a quirk of this exact dataset?

  • @tyflehd
    @tyflehd 2 роки тому

    Hello Brady, thank you for the awesome video :) I come over here to get an intuitive understanding on causality. I have a question on lecture slide 14. If the group from T=1 and T=0 are comparable, shouldn't it be drunk on the right if it is sober on the left. Based on my understanding, let's say I am the topmost guy in both groups (T=1, T=0). How can I be included in a group 'go to sleep with shoes on' and the other group 'without shoes on' under the same condition 'drunk'? Please correct me if I am wrong. Thanks!

    • @rajeevbhatt7415
      @rajeevbhatt7415 2 місяці тому

      The same person cannot be included in both groups. just the number of people in both groups is almost same, due to randomization.

  • @TheProblembaer2
    @TheProblembaer2 3 місяці тому

    I SAW THE DEATH STAR!

  • @edisonge9311
    @edisonge9311 4 роки тому +1

    Hi Brady, in page 18, I understand your point here, but I have a question about the definition of E[Y(1)|T=0]. If we observe T=0, then what the meaning of Y(1) here?

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому +1

      Y(1) given that you observe T = 0 is the outcome you would have observed if you had taken T = 1. It isn't something that we can observe (usually)! I think I give the intuition for this on the potential outcomes intuition slide.

    • @edisonge9311
      @edisonge9311 4 роки тому

      @@BradyNealCausalInference So observation T=0 is independent of the do-operation Y(1), then we also can get E[Y(1)] - E[Y(0)] = E[Y(1)|T=0] - E[Y(0)|T=1] , right? But we cannot use consistency law here, therefore, in ICI, Eq.(2.3), it's E[Y(1)] - E[Y(0)] = E[Y(1)|T=1] - E[Y(0)|T=0]. Is my understanding correct?

  • @galaxystat
    @galaxystat 4 роки тому

    Hi Brady, thanks for great lectures! I read the book of why by Judea Pearl. Any difference between potential outcomes framework and counterfactual calculation in the Peral's book ? I saw some comments in the book that Judea thought missing value interpretation was wrong. What methodology do you recommend in practical applications ? or they are just the same ?

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому +1

      I think the two languages share a lot more than a lot of people seem to think. To me, they are simply different notations and different ways to formulate the assumptions. You should be able to understand both, so I include them both in the first month of the course. I use both, depending on the setting or who I'm talking to.

  • @jitingjiang7401
    @jitingjiang7401 4 роки тому

    Hi Brady, Thanks for this lecture. It is super great. I have one question about the fourth assumption for identification, i.e. consistency. To illustrate the concept, you mentioned an example with two different types of dogs as multiple versions of the treatment. I am wondering, is it really a problem? I guess one can always define a specific version of treatments as the T, right? Thank you!

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому

      Yes, that just means being sufficiently specific about how you define the treatment.

  • @charismaticaazim
    @charismaticaazim 3 роки тому

    Independently & Identically distributed = Ignorability / Exchangability.
    Agree ?

  • @RobertKwapich
    @RobertKwapich 3 роки тому

    Great course!
    Any particular books or review papers that you could recommend to read in more detail?

  • @adrianoyoshino
    @adrianoyoshino 3 роки тому

    In the consistency example I got the point that we can't have multiple treatments (like different type of drugs as a treatment). But does it has to have the same outcome always? I mean, is it possible having a case where I take a pill one day and I get better but I take a pill another day and the headache does not get better?

    • @rajeevbhatt7415
      @rajeevbhatt7415 2 місяці тому

      Not following consistency is like adding more nodes to the causal graph. For example, the dog type in the given example, along with whether the person got a dog. Similarly, if the pill's effect is different each day, a day node needs to be added to the causal graph.

  • @tOo_matcha
    @tOo_matcha 2 роки тому +2

    31:13 that split of a second when you see the Death Star 😂

  • @Theviswanath57
    @Theviswanath57 4 роки тому

    Slide #40: Naive estimate might have been estimated through following regression equation: Y_i = alpha + Beta * T_i;
    alpha_hat is 5.33 ?

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому +1

      Not quite. That simple regression and taking the coefficient from the regression is actually what I describe for slide *41*. And in your comment, *beta* hat is actually the ATE estimate (5.33), not alpha hat. In the notation I use in slide 41 (different from yours), it is alpha hat that is 5.33.

    • @Theviswanath57
      @Theviswanath57 4 роки тому

      @@BradyNealCausalInference yeah that's right, little confused; thanks

    • @Theviswanath57
      @Theviswanath57 4 роки тому

      Where can I get the data

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому +1

      @@Theviswanath57 See the GitHub link in Section 2.5 of the book for the data generation and estimation code.

  • @chadpark9248
    @chadpark9248 4 роки тому

    Thanks for the grate lecture again. I have a few questions about the text book
    In page 8. "A natural quantity that comes to mind is the associational difference: ~~~~Then, maybe E[Y(1)]-E[Y(0)] equals E[Y|T=1]-E[Y|T=0]."
    From these sentences, I got little confused What "maybe ~ equals" means.....

    • @chadpark9248
      @chadpark9248 4 роки тому

      In addition, I have a question about the description section of "Consistency" on page 14. I understand Y(t) intuitively, but I don't understand "whereas Y(T) is the potential outcome for the actual value of treatment that we observe" intuitively. Do you have an example?

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому +1

      Basically, it's just like a train of thought that is common to go down. "maybe E[Y(1)]-E[Y(0)] equals E[Y|T=1]-E[Y|T=0]" is the more formal way of writing "maybe causation equals association (correlation equals causation)." Of course, this thinking is often incorrect :)

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому +1

      ​@@chadpark9248 For a given individual, they will observe a specific value, say t', for the random variable T. That means that they will observe the potential outcome Y(t'). So the realized value, t', of T gets connected to the observed outcome Y in that way (assuming consistency). Similarly, Y(T) corresponds to the potential outcome that we observe when we know the realized value of the treatment random variable T. It is distinct from Y(1), Y(0), or Y(t) which is meant to denote a specific potential outcome, that isn't related to the random variable T at all (even though, we use the same letter, but in lower case, for Y(t)).

    • @chadpark9248
      @chadpark9248 4 роки тому

      @@BradyNealCausalInference Thank you for your detailed explanation.

  • @scotth.hawley1560
    @scotth.hawley1560 4 роки тому

    Great lecture, but starting at 20:02 I become lost: How is E[Y(1) | T=0] not a contradiction? If you do(T=1) then doesn’t that force T=1?

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому +1

      Yes, but T=0 is *conditioning* on T=0, not doing T=0. So condition on T=0 means "look at the people how happened to not take the treatment." Then for those people, Y(1) means "what would have happened had they taken the treatment?"

    • @scotth.hawley1560
      @scotth.hawley1560 3 роки тому

      @@BradyNealCausalInference Thanks so much for taking the time to respond! This clarification helped me be able to move forward.

    • @BradyNealCausalInference
      @BradyNealCausalInference  3 роки тому

      @@scotth.hawley1560 Glad to hear it! Thanks for bearing with me on the slow response time haha.

  • @souradipchakraborty7071
    @souradipchakraborty7071 3 роки тому

    Can we have a non-linear cause and effect relationship? In that case, how do we estimate the exact effect ?

    • @BradyNealCausalInference
      @BradyNealCausalInference  3 роки тому

      Yes! You'd use the same estimator that is used in slide 40, but with a nonlinear model instead of linear regression. You can also use any of the other estimators that we discuss in week 6 of the course.

    • @souradipchakraborty7071
      @souradipchakraborty7071 3 роки тому

      @@BradyNealCausalInference Thanks will definitely check the week 6 course. I asked as if there is non-linearity with respect to T, then Y_hat = alpha * T + alpha' * T^2 + alpha'' * T^3.... + beta_X. Then which coefficient would give us the causal effect of T on Y.

  • @Theviswanath57
    @Theviswanath57 4 роки тому

    @Brady: In Slide #41, I am wondering estimation should be sigma_x ( P(X=x) * ( E[ Y/T=1, X=x] - E(Y/T=0, X=x) )

    • @Theviswanath57
      @Theviswanath57 4 роки тому

      In your variant essentially we are saying that P(X=x) is same for all x; please correct me if I am wrong;

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому

      ​@@Theviswanath57 In slide 40, it is that equation that you write, assuming that you meant "E[Y | T=1, X=x] - E[Y | T=0, X=x] " when you wrote "E[ Y/T=1, X=x] - P(Y/T=0, X=x)." However, in slide 41, we use a completely different way to estimate the ATE: linear regression and then using the coefficient of the regression. In general is not equal to the correct equation from slide 40. It is only equal when E[Y | T=1, X=x] - E[Y | T=0, X=x] is the same for all x (i.e. the treatment effect is the same for all individuals). I don't actually include the specific equation for the estimate in slide 41, but you can get it using the closed-form solution to linear regression. You can see the exact code that I used for this in Section 2.5 of the course book.

    • @Theviswanath57
      @Theviswanath57 4 роки тому

      @@BradyNealCausalInference regarding P(Y/T=0, X=x), yes I mean E(Y/T=0, X=x).

    • @Theviswanath57
      @Theviswanath57 4 роки тому

      Understood on "It is only equal when E[Y | T=1, X=x] - E[Y | T=0, X=x] is the same for all x (i.e. the treatment effect is the same for all individuals)."

    • @Theviswanath57
      @Theviswanath57 4 роки тому

      @brady if we have P(X=x) as part of the equation ATE is unbiased estimate even if "E[Y | T=1, X=x] - E[Y | T=0, X=x] is not same for all x "?

  • @charismaticaazim
    @charismaticaazim 3 роки тому

    Reporting a mistake: Around 5:03 Brady said T=0 for taking the pill. It shld be T=1.

  • @viralgupta5630
    @viralgupta5630 4 роки тому

    Is there a textbook or course website

    • @BradyNealCausalInference
      @BradyNealCausalInference  4 роки тому

      Website: causalcourse.com
      Book: www.bradyneal.com/causal-inference-course#course-textbook

  • @amins6695
    @amins6695 2 роки тому

    Amazing video. One question. The example at the end of the lecture seems like a simple linear regression. Does it mean that when we run linear regression, we are doing causal inference? What is the difference between regression and causal inference, here?

  • @mingmingchen7154
    @mingmingchen7154 3 роки тому

    Thanks for the lecture! I have a question around ua-cam.com/video/5x_pPemAVxs/v-deo.html: Is E[Y(1) - Y(0)] (here the individual subscript i is implicit) properly defined since some data are missing?

    • @DailySFY
      @DailySFY 10 місяців тому

      @mingmingchen7154 As you have pointed out it is a biased estimate. And Brady explains this clearly afterwards.