Key takeaways: 1. Talk a lot and explain why you decided to go with the specific solution 2. Don't be afraid to take a pause and rethink the problem 3. Don't get fixated on one aspect of the problem too much, always try to approach it from the bird's view 4. Focus first on the easier parts of the problem and then approach the harder ones 5. Remember it's a conversation with the interviewer, not the solo showoff 6. Ask them questions about the work they've done so far, what they learnt in the company, look for the signs of being valued as a person in a team, how would the first 6 months on the job look like
No you don't want it to be a conversation with the interviewer. Point 5 is wrong, if you are the 'solo showoff' and you talk through a detailed correct solution to a problem for 30 minutes and you hit many of the faucets then the interviewer will be impressed. You want as little tips or hints for the answer as possible, questions to clarify the problem are great if they aren't dumb questions.
I appreciate that this interview is more like a conversation, with a focus on problem-solving. In the past, I've had interviews where the tone was passive-aggressive instead of constructive, so it's refreshing to have a more productive experience. It's great when interviews can be a valuable use of time, rather than a frustrating waste.
As someone who just finished their masters and is looking for a job in data science, this interview really boosted my confidence cuz I was able to respond to every question
Even not knowing much about data science, this interview was very helpful in learning how it might be applied to real, known problems, and the mutual feedback at the end was very helpful to learn about the dynamics of interviews, too! Thank you guys for doing this.
I love the part they pull open a Google doc to clarify their points and have it noted down. Excellent problem-solving approach. Focusing on the problem, its scope, its limitation, whats expected behaviour, what isnt etc ...was always something that I was rushing ahead with. Jumping to the solution right away is something that I had to unlearn. Also great on Kylie for being super composed. Another super power and not easy to do in live situations. Great video!
Thank you for this interview. For the spam issue, you can tag on the traditional features that Kylie Ying mentioned, and then use a multi-lingual embedding model to create vectors out of the post content. Use these features + embeddings to train your model.
⭐ Contents ⭐ ⌨ (0:00:00) Video overview & format ⌨ (0:02:13) Introductory Behavioral questions ⌨ (0:07:46) Social media platform bot issue task overview ⌨ (0:15:26) What are some features we should investigate regarding the bot issue? ⌨ (0:25:02) Classification model implementation details (using feature vectors) ⌨ (0:41:38) What would a dataset to train models to detect bots look like? How would you approach collecting this data? ⌨ (0:51:38) Technical implementation details (python libraries, cloud services, etc) ⌨ (0:56:01) Any questions for me? ⌨ (1:03:42) Post-interview breakdown & analysis
I watched this video like a year ago knowing very little. Now, I feel like I can completely answer each question in detail and follow all of the concepts being mentioned.
Thank you for sharing this video. As someone who is transitioning into the Data Science field( Machine Learning/ AI), I was very surprised that I was able to keep during the interview. I was it lost at all. I don’t have a technical background, but I’ve been studying ,Azure,python,GitHub,R,SQL etc… pretty hard for the past few months, and doing some labs I’m feeling pretty confident I can make the move.
Thanks to Nick and Kylie, it was very informative. Love the flow of the interview. I wanted to add something about the real issue around Spam Bots, as this is internal to the platform or system, you can always add a small CAPTCHA routine around the message sender side. Example, the click of "post" or enter button may pre-calculate the CAPTCHA before message is posted in a response, it may look heavy but can easily be done in a lean way.
I really liked that Keith decided to use a Google doc, because in these what I would First Round 'team fit' and knowledge-gauging interviews the assumption is you'll just talk over zoom and not do whiteboarding or use another tool. This was a good reminder to expect the unexpected - you could be asked to do anything - maybe even code :)
Hello, I dont usually comment on UA-cam but this time I just wanna say thank you for the people involved in this video. This really helped go through my technical interview and I ended up getting the job I wanted. The introduction and close of the interview was almost a copy of what happen in my interview. Obviously, the technical part was different (in my case they just asked about the technical challenge I had to solve prior the interview) but the way I approached the questions was very similar. Just THANKS!!
Nice video. I think in the beginning when asked if she had any first thoughts on the issue of spam bots, one thing that could've been added was 'what are the positives of bots' Too much was said about the negative aspects of bots, and the first impression I had was, if all bots are so negative, just ban bots. But bots do have a role, and many bots are used to automate functionality. So the real important point is, identify bots that are being malicious in some way. Then dive into how to develop metrics to identify the concept of 'malicious'.
Not only interviewees should see this Mock Interviews, but also the interviewers, this way, they can learn how to do them, because out there are many companies trying to do this type of process but they do it bad and confusing. Thanks for this great video!
In my opinion this data scientist interview is pretty weak and useless. The interviewer didn't probe deeply enough on important details to get a real answer of whether the candidate is capable of building a successful spambot model, applying it to the production system and maintaining/improving it continuously. A large part of the interview is spent on listing a bunch of potential features that could be used by the model to classify spam, which does not require deep thinking, understanding or knowledge of the problem. Very little implementation details were actually asked or answered. This candidate only talks about the approach in a very high-level and vague way, like "I'd use Tensorflow and try a few things to get a sense of a good-enough model." This kind of general answer is not helpful at all. We've seen so many candidates who can do the high-level talking points, but fail to have a good understanding of the entire ML production workflow and lack basic implementation skills.
I agree. I feel like she’s more of a computer scientist than a data scientist whose speciality is building models and be able to discuss which models to pursue and how to assess these models. A computer scientist can talk about data science at a very high level like you said but it takes a data scientist to actually go deeper into technical details that make the models and fix any flaws. If she does not have the MIT title, I don’t think I would’ve been very impressed with her although I’m sure she’s very intelligent. She’s just not a data scientist per se but a very smart computer scientist who knows data science concepts like one hot encoding and used tools like tensorflow
my approach is slightly different : Basic data collection : 1. You have basic details capture when you create a YT account, name, email, DOB, Image, etc 2. Assume that everytime you log into UA-cam your activity is recorded as follows : comment made (if any), time, post, ip_address, email, like, dislikes, report, type of report, followers, activity spent (scrolling, browsing etc) . Selection on queries : 1. Filter on accounts where comments are made and activity time in one session is extremely long (say more than 12hrs) or very frequent activities in a time interval ( e.g. log in/out 5 times in 1 minute) On feature engg side : Extract features -> in 1day how many comments are made, number of links posted in comment, number of reports per time interval, number of words, number of words which are spam, difference between account followers vs those followed , activity time in minutes/seconds (scrolling 100 videos/minutes indicates bot action). Target variables: 1. Set threshold based on existing patterns visible eg. more than 50 comments are day, more than 100 reports in 1hr, clicking 100 video's in 1min, if spam words > 10 ( should satisfy any of these conditions ) -> set to 1 else set to 0. Classification model side : Sklearn (fast and quick to test your ideas and features), Model- XGB, Logistic/SVM (baseline) Deploy, see and rework This is not perfect but this what I was coming up while seeing this video.
Question for professional Data scientist: @49:35, is a feasible solution to run the feature vectors through a clustering model, then label the clusters as spam/not spam?
@@nb7070 personal experience from my machine learning interview - 90% interview is a conversation between you and the hiring person about ur projects and your ideas and approach.
Actually, I think so. I joined a fresher DS Interview a year ago, and they asked me to whiteboard coding, IQ problem solving, mathematics, machine learning, etc. Even though I did not do too worse except for the whiteboard part, I did not land that job. Lmao
Great video, I watched it from beginning to end. I liked how she answered the questions. I was surprised at how conversational the interview was. If possible, could you post a Data Analytics interview as well please. Thank you.
This was really a useful video. If interviews are like this. It's love 🎉 One thing we can add in this feature are the links. Certain spam posts consists of similar links to the same post/ profile. Not only bots do it, but even people constantly spam their account in the comments. So Link Frequency, Link Context and so on. That feature used with Bayes Classifier can be useful to make the model more robust. Bag of Words from NLP can also be used in order to make this Link tasks easier. Just an opinion of mine. 😊
Can you pls provide Data Engineer interview Process as well? It will be greatly appreciated. Thank you for changing lives for better around the world, including mine! 😁✌️
This is a great interview, thanks for doing this! If only all data science interviews would be like this in reality (w/ some exceptions), the world would be a better place.
that was pretty intense at the beginning when he started asking about how would a email filter model look like or how would data set look like to feed the model... She went into implementation immediately which is a normal resort because how else can you defin the data set if your not gonna think of the function that might be doing the filtering. Overall was handled nicely from both sides. Thanks for sharing.
Interview seems very easy, as a product owner I speak to our Data scientists and many of them are building business projection models etc. Is the technical implementation the key skill here? Because working in tech now, the first hour of this interview I think most of our stakeholders would even be able to answer during requirements gathering due to subject matter expertise. Like what to train the model on is usually subject matter expertise. I think its so well understood among tech stakeholders that something happens in the system, can you show me a 1 if it, or a combination of things happened, or a 0 if not and push that to some visualization report.
Thanks for such demo. I have learnt how we can tackle open-ended problems with our familiar tools in data science. We can apply it for many different scenario. I really appreciate the hard work you put for creating such video. I am not sure if it is possible for you to create video that help us learn the usual skills required for entry level machine learning engineer position and a similar mock interview for such position. Thanks
I have noticed that there wasn't much conversation in regards to what ML model to use or what hyperparameters or architectures. Is this normal for an ML/Data Science interview?
I don’t think this approach is quite right. You’re diving into models before any data analysis. Can we clearly define a bot? We need to look at sample bot accounts. How many different types of bots can we identify? What are the similarities between bot groups? Can all the different bot group feature values ranges fit inside a single data envelope or should we concentrate on identifying a single bot group at a time? The amount of time the account has been active is prob a strong feature. Where are bots coming from historically? Are account names and profile pic very similar to an existing account? There could also be an anti-dataset, where accounts that were classified as spam complained and got reinstated? This could help mitigate misclassification. Ethically, could some demographics’ accounts be more susceptible to being wrongly classified as spam?
True, but to effectively tackle the problem, you need solid domain knowledge as well as conducting a deep analysis that arises from interacting with the environment, which is inherently absent in an interview. Not to mention, this is an NLP problem based on some features she mentioned, such as the 'content of account's posts'.
Yes, and also, If we are given Semi-supervised environment (with some accounts blocked due to spam, and others not labeled at all), maybe clustering could be a good strategy to group similar types of accounts based on their features. I would bet the model would help identfying accounts created recently, with low followers, lots of posts/comments, lots of tags. I think creating a class only based on the number of spams restricts the information available. Maybe weight that feature when clustering to give more importance, but I would not use it s the label.
OMG!! I literally said the same thing.... I got interested in AI almost when the ironman first movie released and I kinda moved slowly from Electrical Engineering to DS. I wanted to have my own JARVIS and get into robotics later in the future and since then have been in AI. Have said the same thing in all my interviews! cant believe she had the same motivation :) assuming the motivation is real regardless of it being a mock interview :)
Great Video! But are these interview really that easy? So is it like if you have confidence and you're able to have a smooth conversation about what you are thinking about the topic, you get the job? Or is it like for an intern level role, hence easier questions?
This is just one type of interview that you'll encounter in a job search process. The open-ended nature of it should make it less stressful than technical coding interviews. That being said, there is a lot of opportunity here to really demonstrate your abilities. A senior data scientist candidate would be expected to go into a lot more complexity & implementation details than an intern-level candidate. A senior candidate should also be able to clearly communicate trade-offs of any decisions that they make. This type of interview is really designed to see how well someone understands the data science process and to measure how well they can communicate what they know. In my opinion, part of the reason this interview seemed pretty easy is that Kylie is very confident in her approach and could get to key details without needing much/any prodding from me on the interviewer side. To get the job, you'll probably need to succeed in this interview as well as a technical coding interview or two and a behavioral interview.
@@KeithGalli so specially for a Data Science role do the interviews go any deep into implementation details. Like right now I'm looking for similar roles and was easily able to answer these questions myself. The only thing I'm not that confident about is if they ask how decision trees mathematically work or implement a neural network from scratch (I mean I could, but in the heat of the moment I cant). Do they ask such questions in Data Science roles or just in ML roles?
Every interview is different so it's definitely possible that a company could ask you to go into implementation details, but from my perspective knowing when to use decisions trees or neural networks is more important than being able to implement them from scratch. In the real world, we have libraries that make decision trees & neural networks very easy to use. You almost never need to implement something from scratch. As a result, it's more important to understand how they work at a high-level and when to use them and what the relevant Python libraries are. Hope this makes sense!
Usually there are at least 2 rounds: A round like this which is personality/ high level problem solving/culture fit, and there will usually be a technical screening as well. Technical screenings are usually first and weed out people who don't understand the tech stack/ Data Science principles at all or very well. If you know the tech stack or most of it, it is usually no stress.
@@ramg4699 Well for one, she gives vague, non-technical answers to questions she most likely had before hand. I would expect this level of thoughtfulness from a high school student interviewing to get into a low level college program but not from a Computer Science/Chemical Engineering Grad from MIT.. what’s crazier is that they released this as an example of what a good performance looks like..
@@anon.cashpoorloser5285 But he didn't ask technical questions, that was more business case scenario and how she would deal with it. She is not "applying" for Chemical Engineer either, how did you expect her to answer?
being reported as spam shouldnt be the the only way to mark your mail as spam, many bots/trolls that doing just because they can. World of warcraft has this useless report spam /offensive that automatically kick and mute you from the game for a month if enough reports are summited and doesnt need that many reports in the first place. But we need to have another check to be sure isnt just fake reports, an extra check can be made from the last 10 posts, if any have offensive/repetitive languages combined how often was written.
43:20 There are people who strongly dislike their ideological opponents and take any means necessary to deplatform them. As such, they are bad actors who shall routinely misreport all kinds of breaches of community guidelines or whatever else there could be. For example, some tribal people from hostile neighboring countries could be misreporting one another immediately once the other posts anything. This happens to honest gamers who report cheaters--the hackers have a community and websites and mailing lists, and so can target a good guy so that the good guy gets a fabricated "bad" reputation score. Many corporations stupidly believe that there are no bad actor groups, but this is a huge and foolish miscalculation.
Calm down, what she said wasn't earth shattering, she's just using the lingo. Once you learn the lingo you too can speak the way she does plus learn data.
one way that will help in labeling for a given account , is how many of the account's posts are reported as spam . If an account shares a post that is for some reason , is reported spam 1000 of times but the other posts have 0 spam report , then how confident should we be to label it as a bot Feature idea : alongwith the number of followers . Each follower should get a weight , i.e, if an account is followed by genuine people (celebrities) then the weight of that incoming link should be high . Note that the problem with the idea that whether an account is followed by other bots or not , was circular in nature .
If time is considered as attribute... A model that predicts Human entered predicted time based on Tweet length can be made.. now after getting the predicted time we can have Bayesian Network with a particular account, time took to predict whether its a SPAM/Bot attack.
I think a good approach for labeling things as spam is writing a program called checkspam that references the posting frequency or if the post are the same within a certain time frame. It could label it as spam if it falls within such parameters. You could make it so the program checkspam would only run if it was reported by another account in order to combat the potential of false flagging from people with negative intentions.
This interview is interesting but is so unlike any of the 4 in-person interviews I've had in the last 2 months as to be comical. Observations: 1. I don't believe the person being interviewed is answering these case problems cold... some guidance has been given to direct their thinking in preparation. Ex: How many people on the fly can give 5 reasons for x decision-making in classification? In this case the person has serious experience solving the problems and is clearly reading her other screen to develop her answers!!! 2. In my interviews I am not convinced the person actually read my resume. 3. In my interviews I am not convinced I was being seriously considered... in one case I was told the interview was 30 minutes only, and the interviewer kept cutting me off.
There are flaws with some of those solutions. Like flagging an account as spam if it is follower by spam accounts. That could lead to valid accounts being flagged as spam or even attacked by bots intentionally adding users to flag them as spam. The thing about using email from random domains is also problematic in many ways, and also using emails that use random characters (some of us use random emails for different accounts precisely to keep spam away and improve privacy). You could also not even guess bot characteristics and feed data for models to try to find common characteristics and trends.
That's what I thought of too. For example, checking if a name is common is not possible. But of course, we're watching from the sideline so it's easy to form criticism.
So, I think I want to go into Data Science beside Web Development, and this was pretty handful. Even tho, I miss some points, I answered some questions in a pretty good way. Thank you!!
Thanks for the video. Both of you did a great job in making it feel realistic. My only question is. Is this really the sort of difficulty of the entry data science jobs? It feels suspiciously easy or shallow to me. Can someone back this up? Thank you again!
I have been interviewing for entry level data science jobs and this is fairly accurate! Although I have often been asked about what advanced concepts I know and how I've used it.
That would have been a fairly easy job interview. No on the fly algorithmic problems to be solved, no mathematical questions, no deep understanding of ML (distributions, statistics, metrics, solvers, backprop, stochastic gradient, ....)...
Impressive! Well detailed! @KeithGalli, that siren was pretty loud. Perhaps you have trained a model that picks and feeds only @KylieYYing's voice to you and removes any other unwanted voices or sounds 😁😜
so i've never had any technical interviews before thats why i'd like to ask do they usually ask you to code some stuff on the go or its just some broad questions about the technicalities of the problem you're trying to solve ?
I didn't care that much for it. The problem was trivial. Reminded me of something from a university lab lecture with breakout groups. I find that she really knows what she is talking about and he used a lot of bluffing, bluster, and talking too much without saying anything to compensate for his lack of knowledge. It's a common tactic I see among white males. I do like that she schooled him at the end.
Key takeaways:
1. Talk a lot and explain why you decided to go with the specific solution
2. Don't be afraid to take a pause and rethink the problem
3. Don't get fixated on one aspect of the problem too much, always try to approach it from the bird's view
4. Focus first on the easier parts of the problem and then approach the harder ones
5. Remember it's a conversation with the interviewer, not the solo showoff
6. Ask them questions about the work they've done so far, what they learnt in the company, look for the signs of being valued as a person in a team, how would the first 6 months on the job look like
Talk with a purpose. I hate people that talk to fill space lol
No you don't want it to be a conversation with the interviewer. Point 5 is wrong, if you are the 'solo showoff' and you talk through a detailed correct solution to a problem for 30 minutes and you hit many of the faucets then the interviewer will be impressed. You want as little tips or hints for the answer as possible, questions to clarify the problem are great if they aren't dumb questions.
I learned that you should probably do some research on the company you want to work at so that you spend more time practising relevant topics
😊
Thanks for featuring!!
You did an amazing job, keep up the good work
Very Good Job . superb way to learn from you
I admire you.
You are really confident while communicating with interviewer. It's almost like 2 colleagues discussing about a problem. Great work
You deserve that
I appreciate that this interview is more like a conversation, with a focus on problem-solving. In the past, I've had interviews where the tone was passive-aggressive instead of constructive, so it's refreshing to have a more productive experience. It's great when interviews can be a valuable use of time, rather than a frustrating waste.
As someone who just finished their masters and is looking for a job in data science, this interview really boosted my confidence cuz I was able to respond to every question
Exactly my thoughts…. Glad to hear someone else with similar thought
enjoy your underpaid intern in the capitalistic world. Everyone is replaceble, you will make zero difference.
@@yugiohfanatic1964 that's a very sad way to view the world. Hope you feel better soon
@@duckcluck123 hello simp
@@yugiohfanatic1964 are you 12
Even not knowing much about data science, this interview was very helpful in learning how it might be applied to real, known problems, and the mutual feedback at the end was very helpful to learn about the dynamics of interviews, too! Thank you guys for doing this.
I love the part they pull open a Google doc to clarify their points and have it noted down.
Excellent problem-solving approach.
Focusing on the problem, its scope, its limitation, whats expected behaviour, what isnt etc ...was always something that I was rushing ahead with.
Jumping to the solution right away is something that I had to unlearn.
Also great on Kylie for being super composed. Another super power and not easy to do in live situations.
Great video!
As someone who is currently in college and is actively preparing for interviews, this video helped me because i answered every question almost easily.
Grrr i couldn’t answer them all , but I’m gonna catch up to you soon GIDEON
Thank you for this interview. For the spam issue, you can tag on the traditional features that Kylie Ying mentioned, and then use a multi-lingual embedding model to create vectors out of the post content. Use these features + embeddings to train your model.
⭐ Contents ⭐
⌨ (0:00:00) Video overview & format
⌨ (0:02:13) Introductory Behavioral questions
⌨ (0:07:46) Social media platform bot issue task overview
⌨ (0:15:26) What are some features we should investigate regarding the bot issue?
⌨ (0:25:02) Classification model implementation details (using feature vectors)
⌨ (0:41:38) What would a dataset to train models to detect bots look like? How would you approach collecting this data?
⌨ (0:51:38) Technical implementation details (python libraries, cloud services, etc)
⌨ (0:56:01) Any questions for me?
⌨ (1:03:42) Post-interview breakdown & analysis
Can we have Data Analyst mock interviews too?
Yes please
Yeah, that'd be great!
No, unfortunately not 😔
Please 🙏
Yes please
I watched this video like a year ago knowing very little. Now, I feel like I can completely answer each question in detail and follow all of the concepts being mentioned.
Thank you for sharing this video. As someone who is transitioning into the Data Science field( Machine Learning/ AI), I was very surprised that I was able to keep during the interview. I was it lost at all. I don’t have a technical background, but I’ve been studying ,Azure,python,GitHub,R,SQL etc… pretty hard for the past few months, and doing some labs I’m feeling pretty confident I can make the move.
My Knowledge increase a lot by watching this. Please Upload more mock interviews like this. I also some technical details in model implementation.
Thanks to Nick and Kylie, it was very informative. Love the flow of the interview. I wanted to add something about the real issue around Spam Bots, as this is internal to the platform or system, you can always add a small CAPTCHA routine around the message sender side. Example, the click of "post" or enter button may pre-calculate the CAPTCHA before message is posted in a response, it may look heavy but can easily be done in a lean way.
Keith is one fantastic teacher. He took my analytics skills from 0 to 5 very quickly. Great content.
Ghanta
@@yuti65 why bro?
I really liked that Keith decided to use a Google doc, because in these what I would First Round 'team fit' and knowledge-gauging interviews the assumption is you'll just talk over zoom and not do whiteboarding or use another tool. This was a good reminder to expect the unexpected - you could be asked to do anything - maybe even code :)
Hello, I dont usually comment on UA-cam but this time I just wanna say thank you for the people involved in this video. This really helped go through my technical interview and I ended up getting the job I wanted. The introduction and close of the interview was almost a copy of what happen in my interview. Obviously, the technical part was different (in my case they just asked about the technical challenge I had to solve prior the interview) but the way I approached the questions was very similar. Just THANKS!!
more mock interviews please
Nice video. I think in the beginning when asked if she had any first thoughts on the issue of spam bots, one thing that could've been added was 'what are the positives of bots' Too much was said about the negative aspects of bots, and the first impression I had was, if all bots are so negative, just ban bots. But bots do have a role, and many bots are used to automate functionality. So the real important point is, identify bots that are being malicious in some way. Then dive into how to develop metrics to identify the concept of 'malicious'.
Not only interviewees should see this Mock Interviews, but also the interviewers, this way, they can learn how to do them, because out there are many companies trying to do this type of process but they do it bad and confusing. Thanks for this great video!
In my opinion this data scientist interview is pretty weak and useless. The interviewer didn't probe deeply enough on important details to get a real answer of whether the candidate is capable of building a successful spambot model, applying it to the production system and maintaining/improving it continuously. A large part of the interview is spent on listing a bunch of potential features that could be used by the model to classify spam, which does not require deep thinking, understanding or knowledge of the problem. Very little implementation details were actually asked or answered. This candidate only talks about the approach in a very high-level and vague way, like "I'd use Tensorflow and try a few things to get a sense of a good-enough model." This kind of general answer is not helpful at all. We've seen so many candidates who can do the high-level talking points, but fail to have a good understanding of the entire ML production workflow and lack basic implementation skills.
I agree. I feel like she’s more of a computer scientist than a data scientist whose speciality is building models and be able to discuss which models to pursue and how to assess these models. A computer scientist can talk about data science at a very high level like you said but it takes a data scientist to actually go deeper into technical details that make the models and fix any flaws. If she does not have the MIT title, I don’t think I would’ve been very impressed with her although I’m sure she’s very intelligent. She’s just not a data scientist per se but a very smart computer scientist who knows data science concepts like one hot encoding and used tools like tensorflow
The guy was very friendly and knowledgeable!
I kinda surprisingly enjoyed this. I didn't even know when the interview started. Feels like two people having a convo about data science
my approach is slightly different :
Basic data collection :
1. You have basic details capture when you create a YT account, name, email, DOB, Image, etc
2. Assume that everytime you log into UA-cam your activity is recorded as follows : comment made (if any), time, post, ip_address, email, like, dislikes, report, type of report, followers, activity spent (scrolling, browsing etc) .
Selection on queries :
1. Filter on accounts where comments are made and activity time in one session is extremely long (say more than 12hrs) or very frequent activities in a time interval ( e.g. log in/out 5 times in 1 minute)
On feature engg side :
Extract features -> in 1day how many comments are made, number of links posted in comment, number of reports per time interval, number of words, number of words which are spam, difference between account followers vs those followed , activity time in minutes/seconds (scrolling 100 videos/minutes indicates bot action).
Target variables:
1. Set threshold based on existing patterns visible eg. more than 50 comments are day, more than 100 reports in 1hr, clicking 100 video's in 1min, if spam words > 10 ( should satisfy any of these conditions ) -> set to 1 else set to 0.
Classification model side : Sklearn (fast and quick to test your ideas and features), Model- XGB, Logistic/SVM (baseline)
Deploy, see and rework
This is not perfect but this what I was coming up while seeing this video.
this is very helpful. I wish people do more of this type of content
Question for professional Data scientist:
@49:35, is a feasible solution to run the feature vectors through a clustering model, then label the clusters as spam/not spam?
This actually got me hyped for a job interview!
are data science interviews even remotely similar to this mock interview? If so i would be hyped too lol
@@nb7070 personal experience from my machine learning interview - 90% interview is a conversation between you and the hiring person about ur projects and your ideas and approach.
Actually, I think so. I joined a fresher DS Interview a year ago, and they asked me to whiteboard coding, IQ problem solving, mathematics, machine learning, etc. Even though I did not do too worse except for the whiteboard part, I did not land that job. Lmao
This was so helpful, thank you! It was a delight to watch Kylie's problem solving approach :)
Clarification at 48:37 can’t we just consider the accounts that have already been reported as spam and create a dataset just based on these accounts?
Great video, I watched it from beginning to end. I liked how she answered the questions. I was surprised at how conversational the interview was. If possible, could you post a Data Analytics interview as well please. Thank you.
Wow, I didn't expect such a video. Very interesting. Thank you for sharing!
Please upload all mock interviews for web development, app development etc.
This was really a useful video. If interviews are like this. It's love 🎉
One thing we can add in this feature are the links. Certain spam posts consists of similar links to the same post/ profile.
Not only bots do it, but even people constantly spam their account in the comments. So Link Frequency, Link Context and so on.
That feature used with Bayes Classifier can be useful to make the model more robust. Bag of Words from NLP can also be used in order to make this Link tasks easier.
Just an opinion of mine. 😊
Can you pls provide Data Engineer interview Process as well? It will be greatly appreciated.
Thank you for changing lives for better around the world, including mine! 😁✌️
Very useful seeing Kylie thought process in coming up with the answers
Great Interview, thank you you guys! CEO of quitter Elon Tusk, got me😂
This is a great interview, thanks for doing this! If only all data science interviews would be like this in reality (w/ some exceptions), the world would be a better place.
tell us more about the interviews you had plz
@@alirezouali3119 can you tag me , if he replies
Real life interviews are simillar. Usualy there would be a separate interview to asses your coding skills.
Proven. Reliable. Keith Galli!! I love you man!! 😂😂
Cool interview,
Wanted to add, the account's age would make a very good feature I guess
The initial tip I got from this was to not shave before my interview.
that was pretty intense at the beginning when he started asking about how would a email filter model look like or how would data set look like to feed the model... She went into implementation immediately which is a normal resort because how else can you defin the data set if your not gonna think of the function that might be doing the filtering. Overall was handled nicely from both sides. Thanks for sharing.
Why was her first idea to go for a neural network for the solution? ~ 28:00 can anyone explain?
because she's familiar with tensorflow
Thanks for boosting my confidence, that I might get a job if i stay focused on learning. Good luck
Interview seems very easy, as a product owner I speak to our Data scientists and many of them are building business projection models etc. Is the technical implementation the key skill here? Because working in tech now, the first hour of this interview I think most of our stakeholders would even be able to answer during requirements gathering due to subject matter expertise. Like what to train the model on is usually subject matter expertise. I think its so well understood among tech stakeholders that something happens in the system, can you show me a 1 if it, or a combination of things happened, or a 0 if not and push that to some visualization report.
Thanks for such demo. I have learnt how we can tackle open-ended problems with our familiar tools in data science. We can apply it for many different scenario. I really appreciate the hard work you put for creating such video. I am not sure if it is possible for you to create video that help us learn the usual skills required for entry level machine learning engineer position and a similar mock interview for such position. Thanks
excellent interview it inspired me to go ahead and update myself into Data Scientist again.
wow is that how it works? I'm gonna update myself to billionaire.
Interesting to see a data science interview, it's the same format as other tech interviews
I have noticed that there wasn't much conversation in regards to what ML model to use or what hyperparameters or architectures. Is this normal for an ML/Data Science interview?
I don’t think this approach is quite right. You’re diving into models before any data analysis. Can we clearly define a bot? We need to look at sample bot accounts. How many different types of bots can we identify? What are the similarities between bot groups? Can all the different bot group feature values ranges fit inside a single data envelope or should we concentrate on identifying a single bot group at a time?
The amount of time the account has been active is prob a strong feature.
Where are bots coming from historically?
Are account names and profile pic very similar to an existing account?
There could also be an anti-dataset, where accounts that were classified as spam complained and got reinstated? This could help mitigate misclassification.
Ethically, could some demographics’ accounts be more susceptible to being wrongly classified as spam?
True, but to effectively tackle the problem, you need solid domain knowledge as well as conducting a deep analysis that arises from interacting with the environment, which is inherently absent in an interview.
Not to mention, this is an NLP problem based on some features she mentioned, such as the 'content of account's posts'.
Yes, and also, If we are given Semi-supervised environment (with some accounts blocked due to spam, and others not labeled at all), maybe clustering could be a good strategy to group similar types of accounts based on their features. I would bet the model would help identfying accounts created recently, with low followers, lots of posts/comments, lots of tags.
I think creating a class only based on the number of spams restricts the information available. Maybe weight that feature when clustering to give more importance, but I would not use it s the label.
28:35 I'm a bit surprised about the NN. Seems like a perfect example of tabular data that can be fed to a Boosted Tree such as CatBoost
Great video on interview processes! ❤
great video! more data science mock interviews, pls
Meaningful insights, thanks for sharing it!
She came in saying she wanted to build the Iron Man suit 😂
"You had me at 'Bachelors and Masters from MIT' 😁"
She literally does have that tho 😅
This mock interview really impressed me.
At 28:00 I think mentioning dataset format to train would be tabular or non tabular include which features as discrete variable could better
Very Nice , Post more content on Data Science!
OMG!! I literally said the same thing.... I got interested in AI almost when the ironman first movie released and I kinda moved slowly from Electrical Engineering to DS. I wanted to have my own JARVIS and get into robotics later in the future and since then have been in AI. Have said the same thing in all my interviews! cant believe she had the same motivation :) assuming the motivation is real regardless of it being a mock interview :)
I fell asleep and woke up to this playing 💀
Incredible video! Nice job.
Greate video. Create a Data Analyst Mock, please
Great video, but next time can you please post a how would successful job interview look like?
Great Video! But are these interview really that easy? So is it like if you have confidence and you're able to have a smooth conversation about what you are thinking about the topic, you get the job? Or is it like for an intern level role, hence easier questions?
This is just one type of interview that you'll encounter in a job search process. The open-ended nature of it should make it less stressful than technical coding interviews. That being said, there is a lot of opportunity here to really demonstrate your abilities. A senior data scientist candidate would be expected to go into a lot more complexity & implementation details than an intern-level candidate. A senior candidate should also be able to clearly communicate trade-offs of any decisions that they make. This type of interview is really designed to see how well someone understands the data science process and to measure how well they can communicate what they know.
In my opinion, part of the reason this interview seemed pretty easy is that Kylie is very confident in her approach and could get to key details without needing much/any prodding from me on the interviewer side.
To get the job, you'll probably need to succeed in this interview as well as a technical coding interview or two and a behavioral interview.
@@KeithGalli so specially for a Data Science role do the interviews go any deep into implementation details. Like right now I'm looking for similar roles and was easily able to answer these questions myself. The only thing I'm not that confident about is if they ask how decision trees mathematically work or implement a neural network from scratch (I mean I could, but in the heat of the moment I cant). Do they ask such questions in Data Science roles or just in ML roles?
Every interview is different so it's definitely possible that a company could ask you to go into implementation details, but from my perspective knowing when to use decisions trees or neural networks is more important than being able to implement them from scratch. In the real world, we have libraries that make decision trees & neural networks very easy to use. You almost never need to implement something from scratch. As a result, it's more important to understand how they work at a high-level and when to use them and what the relevant Python libraries are. Hope this makes sense!
@@KeithGalli yup, that makes sense. Thanks for replying!
Usually there are at least 2 rounds: A round like this which is personality/ high level problem solving/culture fit, and there will usually be a technical screening as well.
Technical screenings are usually first and weed out people who don't understand the tech stack/ Data Science principles at all or very well. If you know the tech stack or most of it, it is usually no stress.
Sir i have completed bsc physics but now I m very instrested in programing can i m eligible in msc IT Or data science
For all those who think this level of interview performance will land you a job in this market.. it won't. Good luck out there!
Explain why
@@ramg4699 Well for one, she gives vague, non-technical answers to questions she most likely had before hand. I would expect this level of thoughtfulness from a high school student interviewing to get into a low level college program but not from a Computer Science/Chemical Engineering Grad from MIT.. what’s crazier is that they released this as an example of what a good performance looks like..
@@anon.cashpoorloser5285 But he didn't ask technical questions, that was more business case scenario and how she would deal with it. She is not "applying" for Chemical Engineer either, how did you expect her to answer?
Thank you for this. Wow!
It would be great and crazy if after this video she does the implementation of the model like step by step
being reported as spam shouldnt be the the only way to mark your mail as spam, many bots/trolls that doing just because they can. World of warcraft has this useless report spam
/offensive that automatically kick and mute you from the game for a month if enough reports are summited and doesnt need that many reports in the first place. But we need to have another check to be sure isnt just fake reports, an extra check can be made from the last 10 posts, if any have offensive/repetitive languages combined how often was written.
Thanks for this great insight!
Love the interview...
This channel is amazing
43:20 There are people who strongly dislike their ideological opponents and take any means necessary to deplatform them. As such, they are bad actors who shall routinely misreport all kinds of breaches of community guidelines or whatever else there could be.
For example, some tribal people from hostile neighboring countries could be misreporting one another immediately once the other posts anything.
This happens to honest gamers who report cheaters--the hackers have a community and websites and mailing lists, and so can target a good guy so that the good guy gets a fabricated "bad" reputation score. Many corporations stupidly believe that there are no bad actor groups, but this is a huge and foolish miscalculation.
One for Full stack development please.
Thank you so much for sharing.
cried meself to sleep when she said she has a masters
edit: the way she speaks about things omg, wow, I wanna be like her one day
Calm down, what she said wasn't earth shattering, she's just using the lingo. Once you learn the lingo you too can speak the way she does plus learn data.
one way that will help in labeling for a given account , is how many of the account's posts are reported as spam .
If an account shares a post that is for some reason , is reported spam 1000 of times but the other posts have 0 spam report , then how confident should we be to label it as a bot
Feature idea : alongwith the number of followers . Each follower should get a weight , i.e, if an account is followed by genuine people (celebrities) then the weight of that incoming link should be high . Note that the problem with the idea that whether an account is followed by other bots or not , was circular in nature .
If time is considered as attribute... A model that predicts Human entered predicted time based on Tweet length can be made.. now after getting the predicted time we can have Bayesian Network with a particular account, time took to predict whether its a SPAM/Bot attack.
this is the most reasonable interview I've ever seen, which ironically makes it somewhat unrealistic and useless.
Would love to see the technical programming interview as well! Thank you for sharing this.
she can't code LOL!
I think a good approach for labeling things as spam is writing a program called checkspam that references the posting frequency or if the post are the same within a certain time frame. It could label it as spam if it falls within such parameters. You could make it so the program checkspam would only run if it was reported by another account in order to combat the potential of false flagging from people with negative intentions.
I simply think any implementation that checks the number of reports is easily abusable
Right? There could have been more out-of-the-box thinking@@MrTheanimekiller
30:00 - omg, and im literally hearing the sirens with my studio pairs headphones.
This interview is interesting but is so unlike any of the 4 in-person interviews I've had in the last 2 months as to be comical. Observations:
1. I don't believe the person being interviewed is answering these case problems cold... some guidance has been given to direct their thinking in preparation. Ex: How many people on the fly can give 5 reasons for x decision-making in classification? In this case the person has serious experience solving the problems and is clearly reading her other screen to develop her answers!!!
2. In my interviews I am not convinced the person actually read my resume.
3. In my interviews I am not convinced I was being seriously considered... in one case I was told the interview was 30 minutes only, and the interviewer kept cutting me off.
There are flaws with some of those solutions. Like flagging an account as spam if it is follower by spam accounts. That could lead to valid accounts being flagged as spam or even attacked by bots intentionally adding users to flag them as spam. The thing about using email from random domains is also problematic in many ways, and also using emails that use random characters (some of us use random emails for different accounts precisely to keep spam away and improve privacy). You could also not even guess bot characteristics and feed data for models to try to find common characteristics and trends.
That's what I thought of too. For example, checking if a name is common is not possible. But of course, we're watching from the sideline so it's easy to form criticism.
So, I think I want to go into Data Science beside Web Development, and this was pretty handful. Even tho, I miss some points, I answered some questions in a pretty good way. Thank you!!
Thanks for the video. Both of you did a great job in making it feel realistic. My only question is. Is this really the sort of difficulty of the entry data science jobs? It feels suspiciously easy or shallow to me. Can someone back this up? Thank you again!
That's what I'm saying
I have been interviewing for entry level data science jobs and this is fairly accurate! Although I have often been asked about what advanced concepts I know and how I've used it.
Great❤ work
Would love one for Data engineering too
That would have been a fairly easy job interview. No on the fly algorithmic problems to be solved, no mathematical questions, no deep understanding of ML (distributions, statistics, metrics, solvers, backprop, stochastic gradient, ....)...
Impressive! Well detailed! @KeithGalli, that siren was pretty loud. Perhaps you have trained a model that picks and feeds only @KylieYYing's voice to you and removes any other unwanted voices or sounds 😁😜
The interview was awesome
Thanks for sharing this.
So mr tusk is hiring again, so soon? Seriously though, A software developer interview would be great.
so i've never had any technical interviews before thats why i'd like to ask do they usually ask you to code some stuff on the go or its just some broad questions about the technicalities of the problem you're trying to solve ?
Amazing interview
Can I get a certificate form this course
I didn't care that much for it.
The problem was trivial. Reminded me of something from a university lab lecture with breakout groups. I find that she really knows what she is talking about and he used a lot of bluffing, bluster, and talking too much without saying anything to compensate for his lack of knowledge. It's a common tactic I see among white males. I do like that she schooled him at the end.
When I give responses like this interviewee I get feedback that I need to be more direct in my responses
Another feature to be included could be the IP address from which the request is coming
"our CEO has been complaining...". I would fire this guy