Very instructive and well-made video ! I have 1 question: where can I found your video on this calibration thing ? Very curious about that ! I also have 1 slight remark: 12:08 into the video there is a mistake in the formula for ITE. In both terms of the ITE formula you use W=1, but this trivially has to be W=1 (treated) and W=0 (not treated) respectively I think. Do you agree ? It is just a minor remark, the rest is outstanding 👍
Wow very good eye! I didn't notice this. And you are correct - Wi for the second term in that equation should be 0 and not 1 since it represents "not treated". Thanks for catching that!
And the Video on calibration - you can go to the video tab on my channel and look for a video "Model Calibration - EXPLAINED". Sorry I can't paste the link here. UA-cam isn't good with it.
@@CodeEmporium I find the content of your videos of a VERY good quality ! So when a new one arrives, I am watching them very focused. So spotting a mistake is not that hard for me. I hope you will continue making these videos! Can you point me to that video about CALLIBRATION ?
@@CodeEmporium I think it is this video: MODEL CALIBRATION - EXPLAINED ! - Why Logistic Regression DOESN'T return probabilities?! ua-cam.com/video/5zbV24vyO44/v-deo.html
great video! Thanks! FYI: at 12:20 ITE formula is wrong. Both terms are same (prob of customer purchased given emails) but latter should be prob of customer purchased NOT given emails
Cool topic! One confusion I had on the class transformation approach: If W=0 and Y=0 how can we be sure this is a "persuadable" and not a "lost cause"? If I understand correctly, both groups can take on these values. Similar question when W=1 and Y=1, can we be sure this is a "persuadable" and not a "sure thing"?
Amazing question Ritvik. Yea I think you are correct. Z=1 doesn't just target persuadables, but a superset of that (with some lost causes and some sure things). The main objective of this class transformation approach i believe is to separate the sleeping dogs (we lose money advertising to these people) from the persuadables. I could have been clearer with this, but thanks for pointing this out!
Sleeping dogs in general is a very small percentage of the total population. So using this Z=1 class transformation, we're not really gaining much tbh...You should stick this clarification / correction or update your video description. I had the same exact thought and I rewatched that part of the video multiple times before I saw this correction comment.
The two-model approach assumes your models have no error, which is optimistic at best. Subtracting the two values from the treated model and the control model completely disregards that these point estimates are not accurate (unless you have perfect and therefore overfitting models, which is bad anyway). The two-model approach should only be used as an example to show why modeling and measuring Uplift is not trivial and therefore requires specific tools like Uplift modeling techniques
Intuitively, since z can only have value 1 or 0, the treatment effect on the persuadable group p(z=1) wrt other groups p(z=0) is p-(1-p) = 2p-1. Great work, thanks!
Hi, I am in drug discovery and the things you talk about are directly translatable to my work (potential customer vs. patient who will respond best to my drug). Thanks so much!!
Here is a video on how causal inference can go a long way with machine learning. It's a fun video from the foundation of the concept with some important math. Hope i lay this out right and it's easy to understand. Any thoughts? Let me know in the comments or discord (link in description). Cheers!
Hi, I am starting my master thesis on the same topic, could you please help me find best resources to get going with the topic. It will be of great help. Thank you
12:34 Can anyone explain in this chat setting how the product rule made all those changes tot he ITE equation? I've been staring at it too long and it's not clicking. Great vid, thanks.
12:08 shouldn't it be: P(Zi = 1 | Xi) = P(Yi = 1 | Xi, Wi = 1) + P(Yi = 0 | Xi, Wi = 0) Instead of Wi = 1 and Wi = 0 being on the left, they should be on the right, or what am I missing?
Hi, I am starting my master thesis on the same topic, could you please help me find best resources to get going with the topic. It will be of great help. Thank you
good course, but I don't get how the data collected for Zi, because Zi = 1 when the sample (in treatment group and convert) OR the sample (in the control group and not convert). Persuadable should be the 'AND' between the above relationship, is it?
Very good video. I have a question, what if the probability of treatment is not 0.5 but some other value, such as 1/4? Then you cannot fully merge the probability person Xi got the email at 13:52. In this scenario, the ITE = -1 + 4/3P(Zi=1|Xi) + 8/3P(Yi=1,wi=1|Xi). What should we do with the unmerged 8/3 P(Yi=1,wi=1|Xi)?
While we can understand the effect of a new policy by modeling it through uplift models, how can check if a certain factor is a cause for environments which we can just model, not change?
Hi, I am starting my master thesis on the same topic, could you please help me find best resources to get going with the topic. It will be of great help. Thank you
Hi, I am starting my master thesis on the same topic, could you please help me find best resources to get going with the topic. It will be of great help. Thank you
it seems like ITE measures how much more probability it is due to the treatment than not due to the treatment. so is the way to prove causality is by simply showing a higher probability. doesn't correlation also show high probability?
Thanks, but what if the probability of treatment is not 0.5? let's say we only assign 20% to treatment group. Does the class transformation still hold?
I have made 3 videos in the series for causal inference. I think this is the second one. You can check out my "Causal Inference" playlist for all these videos (though I am not sure if I got to exactly the calibration of uplift modeling - if not, maybe a future video)
good video, im wondering how the casual ML approach could mitigate any biases/clashes in the experiment, for example, if the treatment group individuals were sent an email, but during the experiment run, another experiment was conducted on the content of the email, or the UX of the landing experience on the email.. how can we analyze the results of whether the email is "good" or "bad" while considering all those external effects?
Xi ∈ ℝ^D "Xi" represents a specific customer, where "i" is an index referring to a particular customer. "∈" denotes membership, meaning "Xi" belongs to or is an element of. "ℝ^D" represents the set of real numbers raised to the power of "D," where "D" is the dimensionality of the feature space. This indicates that each customer is represented as a vector of real numbers with "D" dimensions. Each dimension might correspond to a specific feature or attribute of the customer, such as age, income, spending habits, etc. So, the equation "Xi ∈ ℝ^D" means that each customer "Xi" is represented as a vector of real numbers with "D" dimensions.
Very instructive and well-made video !
I have 1 question: where can I found your video on this calibration thing ? Very curious about that !
I also have 1 slight remark: 12:08 into the video there is a mistake in the formula for ITE. In both terms of the ITE formula you use W=1, but this trivially has to be W=1 (treated) and W=0 (not treated) respectively I think. Do you agree ?
It is just a minor remark, the rest is outstanding 👍
Wow very good eye! I didn't notice this. And you are correct - Wi for the second term in that equation should be 0 and not 1 since it represents "not treated". Thanks for catching that!
And the Video on calibration - you can go to the video tab on my channel and look for a video "Model Calibration - EXPLAINED". Sorry I can't paste the link here. UA-cam isn't good with it.
@@CodeEmporium I find the content of your videos of a VERY good quality ! So when a new one arrives, I am watching them very focused. So spotting a mistake is not that hard for me. I hope you will continue making these videos!
Can you point me to that video about CALLIBRATION ?
@@CodeEmporium OK, I will search for that
@@CodeEmporium I think it is this video:
MODEL CALIBRATION - EXPLAINED ! - Why Logistic Regression DOESN'T return probabilities?!
ua-cam.com/video/5zbV24vyO44/v-deo.html
great video! Thanks!
FYI: at 12:20 ITE formula is wrong. Both terms are same (prob of customer purchased given emails) but latter should be prob of customer purchased NOT given emails
Thank you! And yep. Thanks for pointing that lil mistake out. Some others did too and i hearted a comment for more visibility. Will pin too.
Cool topic! One confusion I had on the class transformation approach:
If W=0 and Y=0 how can we be sure this is a "persuadable" and not a "lost cause"? If I understand correctly, both groups can take on these values. Similar question when W=1 and Y=1, can we be sure this is a "persuadable" and not a "sure thing"?
Amazing question Ritvik. Yea I think you are correct. Z=1 doesn't just target persuadables, but a superset of that (with some lost causes and some sure things). The main objective of this class transformation approach i believe is to separate the sleeping dogs (we lose money advertising to these people) from the persuadables. I could have been clearer with this, but thanks for pointing this out!
Ahh thanks for the clarification. Makes total sense as a sleeping dog vs. not sleeping dog classifier!
Sleeping dogs in general is a very small percentage of the total population. So using this Z=1 class transformation, we're not really gaining much tbh...You should stick this clarification / correction or update your video description. I had the same exact thought and I rewatched that part of the video multiple times before I saw this correction comment.
The two-model approach assumes your models have no error, which is optimistic at best.
Subtracting the two values from the treated model and the control model completely disregards that these point estimates are not accurate (unless you have perfect and therefore overfitting models, which is bad anyway).
The two-model approach should only be used as an example to show why modeling and measuring Uplift is not trivial and therefore requires specific tools like Uplift modeling techniques
Intuitively, since z can only have value 1 or 0, the treatment effect on the persuadable group p(z=1) wrt other groups p(z=0) is p-(1-p) = 2p-1. Great work, thanks!
Hi, I am in drug discovery and the things you talk about are directly translatable to my work (potential customer vs. patient who will respond best to my drug). Thanks so much!!
Happy this is useful!
Keep up your great work! You are such a good teacher (+ entertainer sometimes ;)).
To you, i bow. Thank you :)
great explanation! Pls continue making videos on these topics “causal inference” as there are very few informational videos.
Here is a video on how causal inference can go a long way with machine learning. It's a fun video from the foundation of the concept with some important math. Hope i lay this out right and it's easy to understand. Any thoughts? Let me know in the comments or discord (link in description). Cheers!
What are you using to make this video? The text and animations. It's really simple and effective.
Camtasia Studio:)
@@CodeEmporium Thanks. Keep up the videos!
can you give me document of this video?
Looking forward to it. Causal Inference is one of the coolest ML ideas I've been able to ise
Thank you! And super True
Hi, I am starting my master thesis on the same topic, could you please help me find best resources to get going with the topic. It will be of great help.
Thank you
@@idkwiadfr I'll be looking into it soon if you're interested
@@ChocolateMilkCultLeader Sure
12:34 Can anyone explain in this chat setting how the product rule made all those changes tot he ITE equation? I've been staring at it too long and it's not clicking.
Great vid, thanks.
12:08 shouldn't it be:
P(Zi = 1 | Xi) = P(Yi = 1 | Xi, Wi = 1) + P(Yi = 0 | Xi, Wi = 0)
Instead of Wi = 1 and Wi = 0 being on the left, they should be on the right, or what am I missing?
You said ITE is in the range [0,1]. Can it not be negative for the "sleeping dogs" category you defined?
Hi, I am starting my master thesis on the same topic, could you please help me find best resources to get going with the topic. It will be of great help.
Thank you
please, can you make a video to explain the dual aspect collaborative attention ?
good course, but I don't get how the data collected for Zi, because Zi = 1 when the sample (in treatment group and convert) OR the sample (in the control group and not convert). Persuadable should be the 'AND' between the above relationship, is it?
Great video...keep it up...all the best
Great video. Thank you for sharing!
Very good video. I have a question, what if the probability of treatment is not 0.5 but some other value, such as 1/4? Then you cannot fully merge the probability person Xi got the email at 13:52. In this scenario, the ITE = -1 + 4/3P(Zi=1|Xi) + 8/3P(Yi=1,wi=1|Xi). What should we do with the unmerged 8/3 P(Yi=1,wi=1|Xi)?
How does one determine if a lead passed into the z function results in 1? Where does that training data for the model come from?
Awesome!! I have a question. What if I don't have Randomized Data? How can I then estimate ICT and do Uplift Modelling?
Amazing work! congratz!
While we can understand the effect of a new policy by modeling it through uplift models, how can check if a certain factor is a cause for environments which we can just model, not change?
Hi, I am starting my master thesis on the same topic, could you please help me find best resources to get going with the topic. It will be of great help.
Thank you
Nice explanation. Have you create any videos about dynamic causal inference with Machine learning
Hi, I am starting my master thesis on the same topic, could you please help me find best resources to get going with the topic. It will be of great help.
Thank you
10:17 how to get values for z?
Whats is definition of
1. Model 1
2. Model 2
3. Z
?
Great video. Where can I find your video on calibration?
This very cool. Is it possible to use the two model approach to measure incremental orders in a a/b test?
Loved it!! Thank you for the derivation; simple when you explain it. I would've just accepted it as something handed down from the gods otherwise
I was wondering why do we need this uplift modeling in the first place, can’t statistics answer the goal?
it seems like ITE measures how much more probability it is due to the treatment than not due to the treatment. so is the way to prove causality is by simply showing a higher probability. doesn't correlation also show high probability?
Thanks, but what if the probability of treatment is not 0.5? let's say we only assign 20% to treatment group. Does the class transformation still hold?
What is the video where you show how to calibrate the Uplifting Model? I am not able to find it
I have made 3 videos in the series for causal inference. I think this is the second one. You can check out my "Causal Inference" playlist for all these videos (though I am not sure if I got to exactly the calibration of uplift modeling - if not, maybe a future video)
@@CodeEmporium thank you, I just found it, great video by the way!
very good video! Could you please share your slides?
Thanks for watching! I created these animations as a video. It isn't a slide deck. Sorry about that
good video, im wondering how the casual ML approach could mitigate any biases/clashes in the experiment, for example, if the treatment group individuals were sent an email, but during the experiment run, another experiment was conducted on the content of the email, or the UX of the landing experience on the email..
how can we analyze the results of whether the email is "good" or "bad" while considering all those external effects?
Very good. Upload next your video
Intersting
Many thanks
Hi @CodeEmporium, first of all thank you for the video, very well explained.
What tool do you use for the animation?
Glad you like it! I use Camtasia Studio
How cute is this!!! Do not waste your time - Promo-SM .
Xi ∈ ℝ^D
"Xi" represents a specific customer, where "i" is an index referring to a particular customer.
"∈" denotes membership, meaning "Xi" belongs to or is an element of.
"ℝ^D" represents the set of real numbers raised to the power of "D," where "D" is the dimensionality of the feature space. This indicates that each customer is represented as a vector of real numbers with "D" dimensions. Each dimension might correspond to a specific feature or attribute of the customer, such as age, income, spending habits, etc.
So, the equation "Xi ∈ ℝ^D" means that each customer "Xi" is represented as a vector of real numbers with "D" dimensions.