How Uber uses Graph Neural Networks to recommend you food (live stream)
Вставка
- Опубліковано 15 чер 2024
- Ankit Jain presents his team's work at Uber with operationalizing GraphSAGE to power Uber Eats recommendations. Blog post: eng.uber.com/uber-eats-graph-.... Join my FREE course Basics of Graph Neural Networks (www.graphneuralnets.com/p/bas...!
Ankit's TensorFlow Machine Learning Projects book: read.amazon.com/kp/embed?asin...
Mailing List: blog.zakjost.com/subscribe
Discord Server: / discord
Blog: blog.zakjost.com
Patreon: / welcomeaioverlords
Thank for setting up this. Looking forward for more such industry talks
Thanks, looking forward for more such videos.
Great content, would love to see more of this!
This is great. Thanks for sharing.
Thanks for sharing, more please : )
great video. Thanks a lot :)
We want more of such talks ☺️
Thanks for the video , loved it
Superb content! Thank You!
Thanks in a million.Great content.
very helpful
How do you deal with Concept drifts in case of recommendations?
Great Session. Can we get to know more about the downstream task? I understand that you were using a classification model to get the probability scores which indeed we use to rank the pairs. But Inorder to train a classification model we need both positives and negatives right? user-item interactions from past acts as positive samples but how are we creating the negative samples?
Also, I am interested to know how are they splitting the train and validation data from the graph? If it is a time based split, then how do they create negative samples for train and validation sets?
Hi Swamy. I don't have any additional knowledge about the Uber systems specifically, but I can say that generally negative samples are created randomly. For a given user you could replace actual things they ordered with random things they did not order and then train a model to discriminate the real from the fake data. This is essentially "noise contrastive" learning.
Thanks for the great video.
How can I know such live streams like this will come up to join in?
I also have a question, at 34:12, Graph learning cosine similarity is the most important feature, but what is it exactly? As far as I understand, after training with GraphSAGE, we got an embedding vector for each node. Where does the "Graph learning cosine similarity" come from? Is that we calculate the cosine similarity of every single user to the rest of users, dishes? If so, it would be a lot of numbers to feed into the model.
Thanks again.
Thanks! I send out announcements of talks via the mailing list, which you can find at blog.zakjost.com/subscribe. I think the cosine similarity is the dot product between the user embedding and the dish embedding (assuming their lengths are normalized). That’s a way of reducing all these embedding vectors to a single feature for the downstream XGBoost model.
@@welcomeaioverlords Thank you.
Just to make sure I understand correctly. Please correct me if I'm wrong. Let's say we have 1,000 users and 100 dishes. When it comes to the downstream model, we add 100 features, each is the dot product of the user embedding and a dish embedding, right?
We didn’t discuss this so I’m not exactly sure, but my guess is that for a single user and 100 dishes, you should score them as 100 separate records and then rank by the final model score to get your product rankings. So any single prediction would have a single user/dish pair.
@@welcomeaioverlords Thank you!
@@PD-vt9fe Happy to have such a thoughtful viewer!
just wanted a quick validation at 19:20 :- what I understood is for the node which we are considering we pass it through a neural network to generate a self embedding of itself (denoted by red rectangles {NN}) and for its neighbors we aggregate the values and pass it through some function say average and this average is then passed on to a neural net to generate its aggregated embeddings ( denoted by blue rectangles {NN}) and on the next layer there is an aggregation of the two . Correct me if I am wrong on understanding this
Each layer uses the red rectangle to generate a self embedding. The yellow rectangle aggregates the neighbors, and the blue rectangle gets and embedding for the neighborhood. Then, the self and neighborhood embeddings are combined to create a neighborhood-aware embedding (the h's).
In this diagram, there are two layers, so this happens two times. This allows information to flow to a 2-hop neighborhood. For example, this diagram shows a 2-layer calculation for the red node, which is not directly connected to the purple node. However, the purple node is connected to the green node, so it's used to update the embedding of the green node in layer 1, which is then used to update the red node's embedding in layer 2. I.e., the red node's neighbor's neighbor.
@@welcomeaioverlords Thank you so much.
What does it mean when he mentions heterogeneous stuff?
Just came across this video. Was super helpful in understanding how GNN is being used for Recommendation.
I am a bit unclear on how they combine Dishes with Users in the same graph, as they can potentially have different feature vectors. In their blog post, they mention using a projection layer in the GNN, however, there isn't much info on it. Could someone point me to the methods/techniques that can be used to achieve the same? Thanks!
This sort of thing is typically handled by using a linear layer to project the raw features into a common embedding space. Each unique node type (e.g., Users, Dishes) would have its own projection/linear layer, and the job of those layers is to transform the raw features into a common fixed-sized dimensionality such that the output of the transformed Users and Dishes features can be jointly modeled.