Data Scientist answers 30 Data Science Interview questions
Вставка
- Опубліковано 15 жов 2024
- Let's look at some data science interview questions!
RESOURCES
[1] Simplilearn's 50 interview questions: www.simplilear...
[2] Approximate Nearest Neighbor (ANNOY) from Spotify: github.com/spo...
[3] What is a p-value? (@kozyrkov ) • What is a p-value?
[4] Eigen Vectors and Eigen Values (@3blue1brown ): • Eigenvectors and eigen...
[5] Model Calibration - Why logistic regression doesn't return probabilities: • Why Logistic Regressio...
JOIN US ON DISCORD: / discord
SPONSOR
Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite. Love it!
Learn more: www.kite.com/g...
blessed to like this video , Dude these are some serious scenarios which are not covered by the major channels . Bless you :)
Feedback mechanism simply refers to the fact that true labels are known and during training the model gets feedback about the error, hence correcting it via gradient descent. Sounds like a tautology as it is just related to the fact that the data is labeled.
Oh interesting. Never would have really thought to mention that. But that's good to know. Thank you :)
Aren't there unsupervised models that use gradient descent without the need for labeled data though? t-SNE and node2vec come to mind as examples of cases where SGD doesn't require labels. That said, this is niche enough that it probably doesn't matter for typical interviews.
I'm not one to write comments on UA-cam, but I have to say I really love your content. And an Interview Questions series would be awesome.
Thanks a lot! Gonna be making more of these and hope you like the future ones too
Feedback mechanism in this context basically means that you get to compare (think of loss functions) your model's output on data with the provided labels in order to update the weights of your model (a.k.a. learning) in the supervised settings, whereas in unsupervised setting you can't do that given you don't have the labels to compare to and you update the weights of your model without explicitly comparing your model's output with labels.
Yep! I guess to me, that sounds like a restatement of "has labels" and "doesn't have labels", just in a fancier tone.
Yooo, imma use this to study for some upcoming interviews. This video really dumbed down some of this stuff for me a lot.
very well made video that adds details onto standard answers for ds interviews Good analysis.
Ah quick refresher
Thansk
P-value is the probability that your null hypothesis is an extreme event. Let’s say that the p-value of observing the regression coefficient of a predictor (e.g. age as an independent variable to predict income) is 0.03. The latter means that you should have 97% confidence in what the data is telling about your age factor in explaining your expected income, hence you should confidently reject that the age’s regression coefficient is 0, no explanatory power.
Thanks for this explanation
very good quality video!
Great video !
A mistake that the majority of data scientists commit is stating that given that the outcome variable is a probability, [0-1], you should automatically use Logistic regression. That’s completely incorrect. Being a probability, [0-1], is just a necessary condition and not necessarily sufficient to be modelled using Logistic regression. There is an other factor that needs to be observed, being that the predictor variable should exhibit a “threshold effect”, hence the reason for the sigmoid shape in response to the change in the predictor values.
Excellent work!
You rock! Dude. Thank you youtube RECOMMENDATION system. Are you using ANNOY, youtube?
11Oct is too far buddy! I have an interview on Friday! Anyways, better late than never! Thanks for doing this.
It's here now :)
How was the interview?
@@pearlmarysamuel4809 it went well, moved to the next round. Thank you Ajay for the commentary in this video, it provided really useful insights.
Congratulations. God bless.
please do more, and also include case based problems if possible
make this kind of series
Maybe the most important thing to keep in mind in order to improve generalisation (avoid overfitting) might be first to check if the validation/and train are coming from the same probability distribution … I mean no amount of regularisation would sort this issue
Yep. Very true
If you have a sufficiently large sample then random assignment (in non time series problems) will basically ensure they are from the same distribution. I would want to make sure my data sample was generated from one process (or at least sufficiently similar processes so that conditioning on features will reconcile the two).
This is the second video in which the creator has emphasized model interpretability as a universal virtue so I have to call this out. While I agree it's nice to have and in cases of causal inference it's all that really matters, in 60-70% of the modeling done in DS we don't care about interpretability AT ALL provided a black box algorithm is statistically significantly better than the interpretable one in predicting or forecasting. Where is this coming from?
Sorry. I am late. And you make a good point. My views on this have changed a little over time; so I agree with you more and more. :)
I am non engineer how to prepare
This is the best interview question review video.
Thank you for the kind words! More to come :)
Hey i have a question... When is an outlier consider as important? If we can't drop it than what Techniques we should use to deal with that outlier.... I hope I'll receive an answer bcz I was asked this in an interview
Here is an application: Outliers can skew averages. One thing you could do is take the lower 99% of the groups you are comparing (but also be sure to report the outlier case). Typically, you aren't just dealing with numbers. Each number may represent a user. If so, you want to understand why the 1% behaves the way it does. In many situations, the reason these outliers exist is explainable.
Note: this answer is purely from a data science standpoint. Not a hardcore stats standpoint. But hope this kind of helps
This was really helpful. Can you please make videos on reinforcement learning(MDPs, Model Free Learning, Monte Carlo tree) ?
Reinforcement learning huh. I haven't used it too much as a data scientist, but I'll think about the kind of content ican create that's useful for everyone. Thanks for the suggestion!
@@CodeEmporiumThank you for that. It'll also help with my master's in AI course too😅
Feedback may refer to loss.
bro why are you stressing yourself by simply reading solutions just share the link we will go through the answer. simply a waste of time and bakwas video
Zero creativity, 100% copy paste.
for the algo ❤
man you answered only 15 questions
Please do less overaction while speaking and trying to sound cool🙏
Do not talk like that please