Mastering Hypothesis Testing for Data Science Interviews: Binomial, Z-test, and T-test

Поділитися
Вставка
  • Опубліковано 29 чер 2024
  • This video is part 1 of hypothesis testing problems in data science interviews.
    Part 2 of hypothesis testing problems in data science interviews:
    • A/B Testing Analysis M...
    🟢Get all my free data science interview resources
    www.emmading.com/resources
    🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
    🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
    🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
    🔵 Data Science Resume Checklist www.emmading.com/data-science...
    ✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
    // Comment
    Got any questions? Something to add?
    Write a comment below to chat.
    // Let's connect on LinkedIn:
    / emmading001
    ====================
    Contents of this video:
    ====================
    00:00 Intro
    00:34 Three types of questions
    02:18 When to use binomial test, z-test and t-test
    05:09 t-distribution vs z-distribution
    06:50 Testing proportions
    09:26 What's in the next video

КОМЕНТАРІ • 76

  • @stella123www
    @stella123www 3 роки тому +17

    best hypothesis testing video I've ever seen on youtube, thank you for producing great content!

  • @liuauto
    @liuauto 3 роки тому +1

    This video will save tons of effort before taking a stats course and diving into any details. Have not seen such a helpful diagram ever before.

  • @deepadas4585
    @deepadas4585 3 роки тому +2

    Love your videos, Emma! Very insightful and to-the-point explanation.
    I would love to see some domain-specific analytics interview case studies like- supply chain analytics, e-commerce analytics.

  • @cming6108
    @cming6108 3 роки тому +1

    so much appreciation for every content you upload!!!

  • @CodeEmporium
    @CodeEmporium 3 роки тому +1

    This is good detail. Love it

  • @j33vn
    @j33vn 2 роки тому +7

    Great content as always Emma. An intuitive way to think about not using t-test for estimating population proportion is that for Bernoulli data, there is only one unknown. The population proportion. Once we know it, the variance is simply p(1-p). But In the case of estimating population mean, there are two unknowns. Population mean and population standard deviation. The heavier tail of t dbn is used to capture the extra uncertainty caused by this additional unknown. Khan Academy explains this in more detail for anyone interested. Thanks!

    • @emma_ding
      @emma_ding  2 роки тому

      Great observation Jeevan! Thank you for sharing!

    • @user-bn6tc4vv6l
      @user-bn6tc4vv6l 10 місяців тому +1

      Hi, which video/modules from Khan Academy explain this? thanks

  • @jeoffleonora4612
    @jeoffleonora4612 3 роки тому

    Great video as always! Thanks Emma!

  • @vincenttan6303
    @vincenttan6303 3 роки тому

    good stuffs! clearer than textbook and even lecturers.

  • @user-nz5oi8pd5m
    @user-nz5oi8pd5m 2 роки тому +1

    always spoiled by Emma's concise and clear explaination.

  • @oliviazhang2922
    @oliviazhang2922 2 роки тому

    You are absoluetly the best Emma!! Thank you!!!

  • @brothermalcolm
    @brothermalcolm 3 роки тому

    Perfect, just what I need, subscribed!

  • @anamikadas9445
    @anamikadas9445 3 роки тому +4

    Love your videos Emma! For bernoulli variables, would a Chi-Squared also work? Is one method preferred over another in practice?

  • @hameddadgour
    @hameddadgour Рік тому

    Great explanation and very informative! Thank you!

  • @norilouis
    @norilouis Рік тому

    This is SO helpful and I really appreciate your content Emma!

    • @emma_ding
      @emma_ding  Рік тому

      I'm so glad to hear you found it helpful, Louis! Thanks so much for watching. 😊

  • @starbuststream3219
    @starbuststream3219 Рік тому

    Very informative video for job interview preparers!

  • @lydiamai6861
    @lydiamai6861 3 роки тому

    Hi Emma, although I have not learnt this far, I enjoyed the video thanks to your clear and structured explanation. Thanks.

    • @emma_ding
      @emma_ding  3 роки тому

      Happy to hear that! Thank, Lydia!

  • @sirvachjumani7215
    @sirvachjumani7215 3 роки тому

    Really useful content for interviewers.

  • @cliffrunner
    @cliffrunner Рік тому

    this is a great video! thanks a lot!

  • @dallalstreet1775
    @dallalstreet1775 3 роки тому

    thanks Emma! woderful video

  • @sinhamohit
    @sinhamohit 2 роки тому

    Timestamps
    Top quality content
    No funky intro music
    No repetitive sentences
    No begging for likes and subscribe
    Actually gets started when says "Let's get started"
    Earning subscribers the right way!

    • @emma_ding
      @emma_ding  2 роки тому

      Thanks Mohit for the summary! :)

  • @csousa3608
    @csousa3608 Рік тому

    Great video! I would love to see a video about hypothesis testing but applied to a case of use when you have to apply A/B/n testing.

    • @emma_ding
      @emma_ding  Рік тому

      Great suggestion! In fact, I have a video on the topic you suggested ua-cam.com/video/6uw0A3aKwMc/v-deo.html, hope it helps! :)

  • @chuchuzhu333
    @chuchuzhu333 3 роки тому

    Thank you so much!

  • @datasciencepreparationhub9933
    @datasciencepreparationhub9933 2 роки тому

    Good explanation!

  • @Leon71
    @Leon71 3 роки тому

    Thank you very much!

  • @tekingunasar4189
    @tekingunasar4189 2 роки тому

    Hi! Great video. I am a little bit confused on the flow chart though, because it references the knowing some information about the population distribution, particularly when in the flow chart we check whether or not the population distribution is normal. I am confused by this because if we were to know that the population distribution is normal, wouldn't that make hypothesis testing redundant? I know that this is actually not the case, and that I am misunderstanding something, but I'm not sure what exactly that is.

  • @hiapple6060
    @hiapple6060 Рік тому

    Hi Emma, what test should I use if the metric follows a Bernoulli distribution, and with very different sample sizes in each group, say, 10000 observations in control and 1000 in treatment? In this case, should I use z-test with the pooled standard error or Welch's t-test?

  • @racoonYY109
    @racoonYY109 3 роки тому +1

    Hi Emma, may I understand what's the difference between z-test and binomial test, if to compare CTR of two groups?

  • @kangxinwang3886
    @kangxinwang3886 3 роки тому +2

    this is just good period

  • @zhihaoxu756
    @zhihaoxu756 2 роки тому +3

    Hi Emma, thank you very much for making this videos. It is indeed very helpful! However, I have a question regarding the difference between Z-test and Binomial test. For small sample, i.e when np

    • @xiaofeichen5530
      @xiaofeichen5530 Рік тому

      I think she means calculating directly the probability of k successes in n trials using the binomial pmf Pr(X=k)=(n choose k)p^k(1-p)^(n-k)

  • @nattapatjuthaprachakul9859
    @nattapatjuthaprachakul9859 3 роки тому

    Thank you so much

  • @navishagarwal1736
    @navishagarwal1736 3 роки тому

    Hey Emma! Thanks for another great video.
    I have watched the video a few times now but the part on "testing proportions" seems to be going over my head. Possibly because I do not have some basics necessary here.
    Any suggestions on recommended reads?

    • @emma_ding
      @emma_ding  3 роки тому

      For resources about stats, you can find some resources from my blog post towardsdatascience.com/how-i-got-4-data-science-offers-and-doubled-my-income-2-months-after-being-laid-off-b3b6d2de6938. For A/B testing specific, this book is a good read. www.amazon.com/Trustworthy-Online-Controlled-Experiments-Practical/dp/1108724264

  • @shrutigupta5104
    @shrutigupta5104 2 роки тому +1

    Hi Emma, thanks for making informative videos. My question is how did you choose sample size of 30 as the marker for differentiating between small sample size to large sample size?.

    • @Fawk3s1
      @Fawk3s1 2 роки тому

      it is a convention in statistics. Basically, if n > 30 you can apply the central limit theorem, which says that your distribution is normally distributed if n > 30.

  • @akshat175
    @akshat175 3 роки тому +1

    Hey Emma, your videos are super useful and simple to follow. Is there a place I can access your slides as well for quick review of the key concepts? This comment would hold for all your videos and not just this one..

    • @emma_ding
      @emma_ding  3 роки тому +2

      Sorry, there's no slides, it's all part of the video editing. But I'll definitely consider providing it in the future if it helps!

  • @rioache1081
    @rioache1081 3 роки тому

    4:11 There is a lot of arguing on the stats forums about the assumption of normality for t-test. And many of the comments state that for t-statistic to have a t-distribution the population has to follow the normal distribution (so t-test does actually require normality of population). What's your opinion on that topic?

  • @cql8878
    @cql8878 2 роки тому

    I love your videos Emma! But by far this one is the hardest one to follow among yours :(

    • @emma_ding
      @emma_ding  2 роки тому

      Thanks for the feedback! Could you be specific which part is hard to follow? Thanks!

  • @appledotted
    @appledotted 3 роки тому +1

    I had a tech screen with a fin-tech company today. They asked me to walk through the math behind testing normality with skewness. (Quite odd)
    I got a bit stuck on how to convert the skewness into a p-value. I mentioned that normally we have CLT that we can do normal approximation like for Binomial and Poisson Distribution, but I am not sure about skewness. Then I said maybe we can try bootstrapping to simulate the sampling distribution to get the variance of skewness if the distribution is unknown. (Not sure if this is a correct approach)
    I tried to find online resources about this after the interview, but somehow none of them go in-depth to talk about this part. Do you happen to have some insight?
    P.S. Really like your videos, very concise and instructive. :)

    • @appledotted
      @appledotted 3 роки тому

      Just rethought about this, I think we can simulate a normal distribution over and over again with the same n, and see what is the proportion of those the skewness is more extreme than our observed data, and use that proportion as the p-value.

  • @ramanadeepsingh
    @ramanadeepsingh 11 днів тому

    Great video...what happens when sample-size is less than 30 and population distribution is not normal. What kind of tests are used in practice?

  • @bluestacheandego
    @bluestacheandego 3 роки тому

    Hi! Thanks for the videos! I see you got Oreiley textbooks behind you. Do you recommend them? if so, how do you study from them? thanks

    • @emma_ding
      @emma_ding  3 роки тому

      Haha, interesting question! Depends on what you are interested in, two books I highly recommend - Practical Statistics for Data Scientists (if you are interested in learning statistics in practice) and Designing Data-Intensive Applications (if you are interested in software engineering).

  • @plttji2615
    @plttji2615 2 роки тому

    I m quite confused that when testing the conversation rate should I use z test. Cuz some websites mentioned t test. Could you please explain this?

  • @racoonYY109
    @racoonYY109 3 роки тому

    Also why for t-test, we have pooled and unpooled variances scenarios, while for z-test for two proportions we always used pooled?

  • @bcws
    @bcws 7 місяців тому

    Does the Slutsky theorem apply here? Slutsky theorem only applies when one number converges in distribution to a random element and the other converges in probability to a constant.

  • @ishpandey7886
    @ishpandey7886 3 роки тому

    Thanks a ton.... I never found such videos... You are really helping the community...
    I just have a question if the size

    • @emma_ding
      @emma_ding  3 роки тому +1

      Yes, it just won't be a t-test or Z-test. You can Google "hypothesis test non normal distribution" to find more details.

    • @ishpandey7886
      @ishpandey7886 3 роки тому

      @@emma_ding Thanks... Would love to get one end-to- end hypothesis problem with code... That would be really helpful...

  • @shirleygui6533
    @shirleygui6533 2 роки тому

    Great video! but there is a small point that I was confusing: if the sample size is large enough, according to the CLT theorem, it follows the normal distribution (variance can be calculated from the sample data), then we should use z-test instead of t-test because we "know" the variance? Is my logic correct? THank you

    • @irisyao8691
      @irisyao8691 2 роки тому

      I have the same question, if the sample size >30, we can use z-test by using sample variance though we don't know population variance.

  • @thegreatlazydazz
    @thegreatlazydazz 3 роки тому +1

    Can you give some material which discusses whty theoretically we cannot use t tests for binomial proportions.

    • @emma_ding
      @emma_ding  3 роки тому +1

      Here you go stats.stackexchange.com/questions/90893/why-use-a-z-test-rather-than-a-t-test-with-proportional-data!

  • @Han-ve8uh
    @Han-ve8uh 3 роки тому

    If a company has defined more than 2 stages in conversion, so not just no-click/click, but like 1. Open product page 2. Add to checkout 3. Open Payment confirmation Page ... It won't follow bernoulli anymore since there are more than 2 outcomes. Are there tests for this, or we have to still use bernoulli and treat outcomes as "reached stage x vs not reached stage x"? How does the latter case affect analysis?

    • @emma_ding
      @emma_ding  3 роки тому +1

      In those cases, you can simplify the problem with "conditions": given users passed all previous stages, the behavior of entering or not entering to the next stage follows Bernoulli distribution. This will make testing a lot easier.

  • @YK-mh3mp
    @YK-mh3mp Рік тому

    For general distribution other than normal distribution, I think it is theoretically wrong to use t-test. It is not only for proportions.

  • @sssam844
    @sssam844 Рік тому

    could you please attach the subtitles as well? I find your videos fantastic and helpful but I have difficulty understanding the pronunciation of some words

    • @emma_ding
      @emma_ding  Рік тому

      Sure thing! Thanks for the suggestions. I've added subtitles to my most recent videos, and will add more!

  • @diazjubairy1729
    @diazjubairy1729 3 роки тому

    What's the difference between hypothesis test and a/b test ?

    • @jimbocho660
      @jimbocho660 2 роки тому

      An A/B test is one type of hypothesis test.

  • @maryamomar4106
    @maryamomar4106 2 роки тому

    I love you.

    • @emma_ding
      @emma_ding  2 роки тому

      I'm glad you find the content so loveable! Thank you Maryam.

  • @nagrajkaranth123
    @nagrajkaranth123 2 роки тому

    Sis please cover all the interview questions of data science

  • @vivekambastha2273
    @vivekambastha2273 3 роки тому

    May be good topic, but the presentation on topics is not good, also have some pauses while switching the topics

    • @emma_ding
      @emma_ding  3 роки тому

      Thanks a lot for the feedback! I'll pay more attention to pauses in the future!