How Uber uses Graph Neural Networks to recommend you food (live stream)

Поділитися
Вставка
  • Опубліковано 15 чер 2024
  • Ankit Jain presents his team's work at Uber with operationalizing GraphSAGE to power Uber Eats recommendations. Blog post: eng.uber.com/uber-eats-graph-.... Join my FREE course Basics of Graph Neural Networks (www.graphneuralnets.com/p/bas...!
    Ankit's TensorFlow Machine Learning Projects book: read.amazon.com/kp/embed?asin...
    Mailing List: blog.zakjost.com/subscribe
    Discord Server: / discord
    Blog: blog.zakjost.com
    Patreon: / welcomeaioverlords

КОМЕНТАРІ • 26

  • @YangJackWu
    @YangJackWu 3 роки тому +8

    Thank for setting up this. Looking forward for more such industry talks

  • @yadneshsalvi4364
    @yadneshsalvi4364 3 роки тому +1

    Thanks, looking forward for more such videos.

  • @stanleymeent8562
    @stanleymeent8562 Рік тому +1

    Great content, would love to see more of this!

  • @sasankv9919
    @sasankv9919 3 роки тому +2

    This is great. Thanks for sharing.

  • @freemind.d2714
    @freemind.d2714 3 роки тому +1

    Thanks for sharing, more please : )

  • @fatemejbr5787
    @fatemejbr5787 3 роки тому +1

    great video. Thanks a lot :)

  • @krishnavarnam
    @krishnavarnam Рік тому +1

    We want more of such talks ☺️

  • @somalkant6452
    @somalkant6452 Рік тому

    Thanks for the video , loved it

  • @sircosm
    @sircosm 9 місяців тому

    Superb content! Thank You!

  • @johng5295
    @johng5295 Рік тому

    Thanks in a million.Great content.

  • @md.kamrulhasantuhin6866
    @md.kamrulhasantuhin6866 3 роки тому +1

    very helpful

  • @yashmahendra6669
    @yashmahendra6669 2 роки тому

    How do you deal with Concept drifts in case of recommendations?

  • @SwamySriharsha
    @SwamySriharsha 3 роки тому

    Great Session. Can we get to know more about the downstream task? I understand that you were using a classification model to get the probability scores which indeed we use to rank the pairs. But Inorder to train a classification model we need both positives and negatives right? user-item interactions from past acts as positive samples but how are we creating the negative samples?
    Also, I am interested to know how are they splitting the train and validation data from the graph? If it is a time based split, then how do they create negative samples for train and validation sets?

    • @welcomeaioverlords
      @welcomeaioverlords  3 роки тому +2

      Hi Swamy. I don't have any additional knowledge about the Uber systems specifically, but I can say that generally negative samples are created randomly. For a given user you could replace actual things they ordered with random things they did not order and then train a model to discriminate the real from the fake data. This is essentially "noise contrastive" learning.

  • @PD-vt9fe
    @PD-vt9fe 3 роки тому +2

    Thanks for the great video.
    How can I know such live streams like this will come up to join in?
    I also have a question, at 34:12, Graph learning cosine similarity is the most important feature, but what is it exactly? As far as I understand, after training with GraphSAGE, we got an embedding vector for each node. Where does the "Graph learning cosine similarity" come from? Is that we calculate the cosine similarity of every single user to the rest of users, dishes? If so, it would be a lot of numbers to feed into the model.
    Thanks again.

    • @welcomeaioverlords
      @welcomeaioverlords  3 роки тому +1

      Thanks! I send out announcements of talks via the mailing list, which you can find at blog.zakjost.com/subscribe. I think the cosine similarity is the dot product between the user embedding and the dish embedding (assuming their lengths are normalized). That’s a way of reducing all these embedding vectors to a single feature for the downstream XGBoost model.

    • @PD-vt9fe
      @PD-vt9fe 3 роки тому +1

      @@welcomeaioverlords Thank you.
      Just to make sure I understand correctly. Please correct me if I'm wrong. Let's say we have 1,000 users and 100 dishes. When it comes to the downstream model, we add 100 features, each is the dot product of the user embedding and a dish embedding, right?

    • @welcomeaioverlords
      @welcomeaioverlords  3 роки тому +1

      We didn’t discuss this so I’m not exactly sure, but my guess is that for a single user and 100 dishes, you should score them as 100 separate records and then rank by the final model score to get your product rankings. So any single prediction would have a single user/dish pair.

    • @PD-vt9fe
      @PD-vt9fe 3 роки тому +1

      @@welcomeaioverlords Thank you!

    • @welcomeaioverlords
      @welcomeaioverlords  3 роки тому

      @@PD-vt9fe Happy to have such a thoughtful viewer!

  • @clrs8995
    @clrs8995 2 роки тому

    just wanted a quick validation at 19:20 :- what I understood is for the node which we are considering we pass it through a neural network to generate a self embedding of itself (denoted by red rectangles {NN}) and for its neighbors we aggregate the values and pass it through some function say average and this average is then passed on to a neural net to generate its aggregated embeddings ( denoted by blue rectangles {NN}) and on the next layer there is an aggregation of the two . Correct me if I am wrong on understanding this

    • @welcomeaioverlords
      @welcomeaioverlords  2 роки тому +1

      Each layer uses the red rectangle to generate a self embedding. The yellow rectangle aggregates the neighbors, and the blue rectangle gets and embedding for the neighborhood. Then, the self and neighborhood embeddings are combined to create a neighborhood-aware embedding (the h's).
      In this diagram, there are two layers, so this happens two times. This allows information to flow to a 2-hop neighborhood. For example, this diagram shows a 2-layer calculation for the red node, which is not directly connected to the purple node. However, the purple node is connected to the green node, so it's used to update the embedding of the green node in layer 1, which is then used to update the red node's embedding in layer 2. I.e., the red node's neighbor's neighbor.

    • @clrs8995
      @clrs8995 2 роки тому

      @@welcomeaioverlords Thank you so much.

  • @kk008
    @kk008 11 місяців тому

    What does it mean when he mentions heterogeneous stuff?

  • @abhishekmajumdar561
    @abhishekmajumdar561 Рік тому

    Just came across this video. Was super helpful in understanding how GNN is being used for Recommendation.
    I am a bit unclear on how they combine Dishes with Users in the same graph, as they can potentially have different feature vectors. In their blog post, they mention using a projection layer in the GNN, however, there isn't much info on it. Could someone point me to the methods/techniques that can be used to achieve the same? Thanks!

    • @welcomeaioverlords
      @welcomeaioverlords  Рік тому

      This sort of thing is typically handled by using a linear layer to project the raw features into a common embedding space. Each unique node type (e.g., Users, Dishes) would have its own projection/linear layer, and the job of those layers is to transform the raw features into a common fixed-sized dimensionality such that the output of the transformed Users and Dishes features can be jointly modeled.