Instagram ML Question - Design a Ranking Model (Full Mock Interview with Senior Meta ML Engineer)

Поділитися
Вставка
  • Опубліковано 2 січ 2025

КОМЕНТАРІ • 35

  • @MeghavVerma22
    @MeghavVerma22 10 місяців тому +51

    The Meta engg guy is on-point. Every stage of the pipeline/process has so many nuances which could have made this into a 2hr+ video - maybe consider doing a podcast version of this and make a guestlist of viewers who can submit questions? @Exponent

  • @soustitrejawad
    @soustitrejawad 10 місяців тому +14

    As a current master's student in data science actively job hunting, I must say this mock interview is incredible. Thank you so much, Vicram! Where can I find more of your content?

    • @tryexponent
      @tryexponent  6 місяців тому +1

      Thanks for your feedback! You can check out the full interview course at bit.ly/4bUEPbF to see more of Vikram's mock interviews.

  • @mulangonando2942
    @mulangonando2942 5 місяців тому +7

    This is perfect content for even guys who are just looking to activate mental faculty in fullstack ML design. The whole scale of thought process from concept to concrete algorithms is super transparent for many

  • @pranav7471
    @pranav7471 7 місяців тому +8

    The best ML design interview I have seen so far!

  • @shrutis.tiwari8335
    @shrutis.tiwari8335 2 місяці тому +3

    The best machine learning mock interview on youtube!!!

  • @abhipatel9048
    @abhipatel9048 10 місяців тому +4

    Very educational! Loved it, Keep on brining more ML interviews.. :)

  • @nickgreenquist2582
    @nickgreenquist2582 18 днів тому

    Excellent video, wow! And thats coming from someone who works on these systems in industry. So cool that this level of knowledge and expertise is just floating around on YT.

  • @basedanarki
    @basedanarki 8 місяців тому +4

    he's cracked. great job to you both

  • @chrisogonas
    @chrisogonas 4 місяці тому +1

    That was both remarkable and educational! Excellent session, folks 👍🏾👍🏾 Thanks for sharing.

  • @maryamaghili1148
    @maryamaghili1148 8 місяців тому +12

    for the candidate generation, I would propose a funel model in which first I use some simple algorithms like logistic regression or Dtree or ANN which he used to quickly narrow down the search space to 1/1000 and then do more advanced techniques for refining it. I will use 2 tower network for ranking my candidates.

    • @mullachv
      @mullachv 8 місяців тому +25

      Two tower is for learning the embeddings. During serving we use the learnt embeddings from the two tower to located approximate nearest neighbor to the viewer's embedding. In reality we will have several parallel paths to generate candidates - here we show just one in the interest of time. Some of the candidate generation sources include: collaborative filtering (either two tower or matrix factorization), content filtering (keyword/interest matching), popular feeds, viral feeds, connected content feeds (content from socially connected creators) etc.
      The deeper model for ranking typically would use thousands of features (windowed aggregates, embedding aggregates, embeddings from pre-trained text/image/video processing models etc.). These models are compute and memory intensive to run and we want to only run them on a select thousand (or so) items for a specific viewer. This keeps the serving latency low. This deep ranking model typically has multiple heads (multi-task) with several predictors (like, comment, share etc.). These individual predictors are weighted to generate a score. Reverse sorted scores can be used for creating a candidate post-item list to show the viewer.
      Does that help clarify?

  • @xgu185
    @xgu185 9 місяців тому +10

    One comment could be that two tower network should also be categorized as collaborative filtering

  • @SPRajagopal
    @SPRajagopal 28 днів тому +1

    at 26:25, it seems like very specific two choices. During an interview, do they expect you to know these or arrive at these from scratch?

  • @dantesbytes
    @dantesbytes Місяць тому +2

    hire this guy because Meta's recommendation engine is completely broken

  • @RezaE
    @RezaE 6 місяців тому +1

    This was a great mock interview. Thanks for sharing it.

  • @ChuanChihChou
    @ChuanChihChou 8 днів тому

    41:17 The interviewee seems to mean that the same two-tower model is used for both ANN retrieval of candidates and (pointwise) ranking of retrieved candidates?
    I don't know if this has become the dominant approach but it's not always the case. Once we get down to 100s-1000 of candidates we can potentially run more sophisticated models for better accuracy. There are also alternatives to score-based pointwise ranking like pairwise/listwise ranking.

  • @haoyuwang3243
    @haoyuwang3243 7 місяців тому +8

    just out of curiosity, do you think the performance is good enough to pass a senior level MLSD interview?

  • @teapot-d8x
    @teapot-d8x 26 днів тому

    Thanks, this is a nice example of an interview!
    Few comnents :
    1) I felt trade-off claim against MF was weak and possibly incorrect. MF can scale to larger inputs, and there are more efficient, stochastic gradient decent based alternatives, to ALS to allow that. Specifically there are even online algorithms fir streaming data. Maybe a better claim against using it could be that representing items by their features is not trivial, specifically if you seek to capture some non-linear dependencies between the features? Would love to hear some feedback on this aspect.
    2) In the proposed solution the ranking step was essentially redundant. It wasn't clear why the two towers were needed for ranking. Assuming we already created an embedding for the user to query the vector db of posts , the score is just the sigmoid over similarity, which is monotonic with the sigmoid. So the ranking as proposed is redundented, its basically sorting by the same similarity score used in retrieval. Or have I missed something?
    3. As for measuring ranking offline, I would also expect something that measures ranking directly like ngcd or top@k, given we have logging data that can be used to measure performance over actual displayd lists. Also if you use full lists negative and positives might not be balanced so instead of auc roc auc pr would be a better choice.

    • @tryexponent
      @tryexponent  18 днів тому

      Hey teapot-d8x, thanks for your feedback!

  • @dafliwalefromiim3454
    @dafliwalefromiim3454 Місяць тому +1

    Hi, thanks for the video. At 32.44 You are saying, For converting the interaction matrix A(mXn) into U(MXd) and V(NXD) we use regularization ? Regularization is used for reducing the overfitting by techniques like adding a relugularization penalty in the cost function. I have never heard regularization being used in matrix decomposition. Please guide me to any literature article on this ? How can regularization be used in matrix approximation. ? Its also not clear how the two tower network would give us the embeddings.

  • @EranM
    @EranM 5 місяців тому +2

    Where is the cycle of learning? How about monitoring? When do we train? Do we automate it? how?

  • @Sam-nn3en
    @Sam-nn3en 3 місяці тому +1

    person has a lot of knowledge but is racing to complete his explanations (understandably so due to time limit of the mock interview) . A lot of knowledge is being shared but at this fast pace it might not be possible for the interviewer to keep up unless they too are at the same frequency. Would request the person to slow down a bit or check with the mock interviewer to see if she allows him a little more to explain given that this is being recorded for YT audience.
    Great explanation overall though and it helped tremendously. Thank you

    • @Sam-nn3en
      @Sam-nn3en 3 місяці тому

      also, things like extra post processing considerations and as well as production level deployment discussions are mark of a senior level engineer (E6) And it can be seen very clearly that this person had it. this discussion alone can make a difference between mid level position and senior position.

  • @PrudhviRaj12
    @PrudhviRaj12 9 місяців тому +1

    Thank you so much for this video. I have a question. So once the two-tower model is trained, for candidate generation, the embeddings for items are computed offline, and the user embedding is computed on the go with the user features, and that user embedding is used against the item embedding vectors for kNN. Is that correct? If so, since the output of the two-tower model is binary, where would I be getting the embeddings from? From a layer before the sigmoid?

    • @chun5919
      @chun5919 9 місяців тому +2

      yes, just apply user tower or item tower.

  • @art4eigen93
    @art4eigen93 Місяць тому +1

    Could you guys please make these interviews easier to understand? It seems to confuse more than make understand.

  • @jianglonghe5188
    @jianglonghe5188 28 днів тому

    you probably want to use mAP instead of auc. It is better to treat the problem asking ranking problem instead of binary classification problem.

  • @sharulathan8028
    @sharulathan8028 9 місяців тому

    can someone explain the label part in the two tower model ?

  • @alexilaiho6441
    @alexilaiho6441 7 місяців тому

    Great interview. I have always been confused, in an ML System design interview, should we focus on the ML model data/training/eval pipeline more or the inference pipeline( which is ore of a traditional system design) more ??

    • @tryexponent
      @tryexponent  7 місяців тому +1

      In an ML system design interview, you typically need to cover the entire process, including the problem statement, data engineering, modeling, and deployment. It's important to address both the data/training/evaluation pipeline and the inference pipeline. To determine where to focus more on, you may take cues from your interviewer or directly ask them for guidance.

  • @92abhinavkashyap
    @92abhinavkashyap 6 місяців тому +1

    What tool is he using to write??

    • @tryexponent
      @tryexponent  6 місяців тому +2

      The tool is "Whimsical"!

  • @MrAnujchopra
    @MrAnujchopra 6 місяців тому +2

    As I understand, it is just 1 model for candidate selection and for ranking. Then why go to that same model twice? We generate post embedding asynchronously. Does Approximate Nearest Neighbour search is faster than taking dot product with all items.
    Also 0.5 ROC-AUC is not random prediction but a constant prediction for all values.

  • @siberiasummer
    @siberiasummer 3 місяці тому +1

    What level would the candidate pass with this answer? Senior? Staff?