Data Scientist answers 30 Data Science Interview questions

Поділитися
Вставка
  • Опубліковано 15 жов 2024
  • Let's look at some data science interview questions!
    RESOURCES
    [1] Simplilearn's 50 interview questions: www.simplilear...
    [2] Approximate Nearest Neighbor (ANNOY) from Spotify: github.com/spo...
    [3] What is a p-value? (‪@kozyrkov‬ ) • What is a p-value?
    [4] Eigen Vectors and Eigen Values (‪@3blue1brown‬ ): • Eigenvectors and eigen...
    [5] Model Calibration - Why logistic regression doesn't return probabilities: • Why Logistic Regressio...
    JOIN US ON DISCORD: / discord
    SPONSOR
    Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite. Love it!
    Learn more: www.kite.com/g...

КОМЕНТАРІ • 45

  • @kachrooabhishek
    @kachrooabhishek 2 роки тому +3

    blessed to like this video , Dude these are some serious scenarios which are not covered by the major channels . Bless you :)

  • @yk4993
    @yk4993 3 роки тому +7

    Feedback mechanism simply refers to the fact that true labels are known and during training the model gets feedback about the error, hence correcting it via gradient descent. Sounds like a tautology as it is just related to the fact that the data is labeled.

    • @CodeEmporium
      @CodeEmporium  3 роки тому +1

      Oh interesting. Never would have really thought to mention that. But that's good to know. Thank you :)

    • @sirgodricenwardsaier9074
      @sirgodricenwardsaier9074 3 роки тому +1

      Aren't there unsupervised models that use gradient descent without the need for labeled data though? t-SNE and node2vec come to mind as examples of cases where SGD doesn't require labels. That said, this is niche enough that it probably doesn't matter for typical interviews.

  • @nay_codes
    @nay_codes 3 роки тому +9

    I'm not one to write comments on UA-cam, but I have to say I really love your content. And an Interview Questions series would be awesome.

    • @CodeEmporium
      @CodeEmporium  3 роки тому

      Thanks a lot! Gonna be making more of these and hope you like the future ones too

  • @sourajitsaha3845
    @sourajitsaha3845 3 роки тому +1

    Feedback mechanism in this context basically means that you get to compare (think of loss functions) your model's output on data with the provided labels in order to update the weights of your model (a.k.a. learning) in the supervised settings, whereas in unsupervised setting you can't do that given you don't have the labels to compare to and you update the weights of your model without explicitly comparing your model's output with labels.

    • @CodeEmporium
      @CodeEmporium  3 роки тому

      Yep! I guess to me, that sounds like a restatement of "has labels" and "doesn't have labels", just in a fancier tone.

  • @harshparikh5871
    @harshparikh5871 2 роки тому +2

    Yooo, imma use this to study for some upcoming interviews. This video really dumbed down some of this stuff for me a lot.

  • @zyladd6176
    @zyladd6176 10 місяців тому

    very well made video that adds details onto standard answers for ds interviews Good analysis.

  • @newbie8051
    @newbie8051 3 місяці тому

    Ah quick refresher
    Thansk

  • @هشامأبوسارة-ن7و
    @هشامأبوسارة-ن7و 7 місяців тому

    P-value is the probability that your null hypothesis is an extreme event. Let’s say that the p-value of observing the regression coefficient of a predictor (e.g. age as an independent variable to predict income) is 0.03. The latter means that you should have 97% confidence in what the data is telling about your age factor in explaining your expected income, hence you should confidently reject that the age’s regression coefficient is 0, no explanatory power.

  • @NaManCoo
    @NaManCoo 6 місяців тому

    very good quality video!

  • @mapa5000
    @mapa5000 8 місяців тому

    Great video !

  • @هشامأبوسارة-ن7و
    @هشامأبوسارة-ن7و 7 місяців тому

    A mistake that the majority of data scientists commit is stating that given that the outcome variable is a probability, [0-1], you should automatically use Logistic regression. That’s completely incorrect. Being a probability, [0-1], is just a necessary condition and not necessarily sufficient to be modelled using Logistic regression. There is an other factor that needs to be observed, being that the predictor variable should exhibit a “threshold effect”, hence the reason for the sigmoid shape in response to the change in the predictor values.

  • @tejareddy199
    @tejareddy199 Рік тому

    Excellent work!

  • @RPiao
    @RPiao 2 роки тому

    You rock! Dude. Thank you youtube RECOMMENDATION system. Are you using ANNOY, youtube?

  • @paragjain2762
    @paragjain2762 3 роки тому +4

    11Oct is too far buddy! I have an interview on Friday! Anyways, better late than never! Thanks for doing this.

    • @CodeEmporium
      @CodeEmporium  3 роки тому +1

      It's here now :)

    • @pearlmarysamuel4809
      @pearlmarysamuel4809 3 роки тому +1

      How was the interview?

    • @paragjain2762
      @paragjain2762 3 роки тому +2

      ​@@pearlmarysamuel4809 it went well, moved to the next round. Thank you Ajay for the commentary in this video, it provided really useful insights.

    • @pearlmarysamuel4809
      @pearlmarysamuel4809 3 роки тому

      Congratulations. God bless.

  • @fahnub
    @fahnub 2 роки тому

    please do more, and also include case based problems if possible

  • @2mitable
    @2mitable 2 роки тому

    make this kind of series

  • @leotrisport
    @leotrisport 3 роки тому

    Maybe the most important thing to keep in mind in order to improve generalisation (avoid overfitting) might be first to check if the validation/and train are coming from the same probability distribution … I mean no amount of regularisation would sort this issue

    • @CodeEmporium
      @CodeEmporium  3 роки тому

      Yep. Very true

    • @clapdrix72
      @clapdrix72 2 роки тому +1

      If you have a sufficiently large sample then random assignment (in non time series problems) will basically ensure they are from the same distribution. I would want to make sure my data sample was generated from one process (or at least sufficiently similar processes so that conditioning on features will reconcile the two).

  • @clapdrix72
    @clapdrix72 2 роки тому +2

    This is the second video in which the creator has emphasized model interpretability as a universal virtue so I have to call this out. While I agree it's nice to have and in cases of causal inference it's all that really matters, in 60-70% of the modeling done in DS we don't care about interpretability AT ALL provided a black box algorithm is statistically significantly better than the interpretable one in predicting or forecasting. Where is this coming from?

    • @CodeEmporium
      @CodeEmporium  2 роки тому

      Sorry. I am late. And you make a good point. My views on this have changed a little over time; so I agree with you more and more. :)

  • @psiddartha7115
    @psiddartha7115 4 місяці тому

    I am non engineer how to prepare

  • @davidcho8877
    @davidcho8877 3 роки тому

    This is the best interview question review video.

    • @CodeEmporium
      @CodeEmporium  3 роки тому

      Thank you for the kind words! More to come :)

  • @hardikvegad3508
    @hardikvegad3508 3 роки тому

    Hey i have a question... When is an outlier consider as important? If we can't drop it than what Techniques we should use to deal with that outlier.... I hope I'll receive an answer bcz I was asked this in an interview

    • @CodeEmporium
      @CodeEmporium  3 роки тому +2

      Here is an application: Outliers can skew averages. One thing you could do is take the lower 99% of the groups you are comparing (but also be sure to report the outlier case). Typically, you aren't just dealing with numbers. Each number may represent a user. If so, you want to understand why the 1% behaves the way it does. In many situations, the reason these outliers exist is explainable.
      Note: this answer is purely from a data science standpoint. Not a hardcore stats standpoint. But hope this kind of helps

  • @rohitchan007
    @rohitchan007 3 роки тому

    This was really helpful. Can you please make videos on reinforcement learning(MDPs, Model Free Learning, Monte Carlo tree) ?

    • @CodeEmporium
      @CodeEmporium  3 роки тому +1

      Reinforcement learning huh. I haven't used it too much as a data scientist, but I'll think about the kind of content ican create that's useful for everyone. Thanks for the suggestion!

    • @rohitchan007
      @rohitchan007 3 роки тому

      @@CodeEmporiumThank you for that. It'll also help with my master's in AI course too😅

  • @dr.mikeybee
    @dr.mikeybee 2 роки тому

    Feedback may refer to loss.

  • @mallikarjunshettar7976
    @mallikarjunshettar7976 Рік тому +2

    bro why are you stressing yourself by simply reading solutions just share the link we will go through the answer. simply a waste of time and bakwas video

  • @hellstenlight9454
    @hellstenlight9454 Рік тому +6

    Zero creativity, 100% copy paste.

  • @owaisfarooqui6485
    @owaisfarooqui6485 2 роки тому

    for the algo ❤

  • @aravindr7422
    @aravindr7422 Рік тому

    man you answered only 15 questions

  • @Srhrsh
    @Srhrsh 11 місяців тому

    Please do less overaction while speaking and trying to sound cool🙏

  • @metalacousticfusion
    @metalacousticfusion 10 місяців тому

    Do not talk like that please