Graph Representation Learning (Stanford university)

Machine Learning TV

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 2 жов 2019
Slide link: snap.stanford.edu/class/cs224w...
Наука та технологія

КОМЕНТАРІ • 48

@deterministicalgorithmslab1744 3 роки тому ⁺⁷¹
NOTES/ Important Points :-
1.) Graphlets are undirected motifs.
2.) The purpose of graph representation learning is to allow automatic feature engineering by embedding each node into d-dimensional space such that similarities in the network(*) correspond to closeness(cosine/Euclidean distance) in the embedding space.
3.) Images==Grid graphs ; Text, audio == Chain graphs . Adjacency matrix representation of a graph is not permutation(of rows/columns) invariant. But the graph is.
4.) *We define this similarity in the network on the basis of random walks. Similarity(u,v) = probability of visiting node v on a random walk starting at node u. Similarity depends on the random walk strategy (R) that you choose to use.
5.) You take fixed size random walks from each node, to make it's neighborhood sets.
6.) You try to maximize the probability of neighborhood sets given the node. Where, probability of node v occurring in neighborhood set of node u(depends on the similarity b/w embeddings of node u and v) is defined as a soft-max that operates over cosine similarities between embedding of node u and embeddings of all other nodes. [ 26:20 ]
7.) We use negative sampling to approximate the soft-max with difficult to compute denominator.
8.) In negative sampling, nodes are sampled randomly in proportion to their degree.
9.) But, random walks from a node, only encode similarities that correspond to "nearness" of two nodes in the network.. so how to encode other types of similarities ?
10.) Node2Vec :-
11.) We can have 2 different types of exploration:- DFS(macro view) like walk. Or BFS(micro view) like walk.
12.) More precisely, 2nd order random walks (i.e., walk has one unit of memory, which tell where you came from to the current node). Also you need to estimate the distances(from the starting node) of all nodes in vicinity of starting node. And then, you choose to move towards, away, or remain at the same distance from starting node in the ratio 1/p : 1/q : 1 respectively. [ 45 :00 ]
13.) Depending on the values of p and q we can get embeddings where communities of nodes have nearby embeddings in embedding space.. or where nodes with similar structural roles are closer together in embedding space. [ 48:02 ]
14.) These embeddings are very consistent. That is, removing/adding ~50% edges leads to only small drop in accuracy of node classification problem.
15.) The node embeddings can be aggregated over various nodes to classify a set of nodes.
16.) It has been observed Node2Vec performs better on node classification task than link prediction.
17.) Must choose definition of node similarity(i.e., random walk(possibly with p,q) neighborhood, normal neighborhood etc.) that matches your application.
18.) EMBEDDING ENTIRE GRAPHS :-
19.) For molecules.. or where lots of small networks are available.
20.) Just calculate normal node embeddings, and sum all of them.
21.) Make a super-node that connects to a sub-graph. The embedding of super-node, is embedding of sub-graph.
22.) :- These are transformations of random walks that are independent of what nodes actually occur on the path ; but try to capture the local structure around a node in the repetitions that occur on the random walk. So, A->B->C is transformed as 1->2->3 . A->B->A is transformed as 1->2->1 , and so on. The number of possible anonymous walks grows exponentially with number of steps in random walk and the size of the set of nodes reachable from a particular node.
23.) Approach 1 :- Graph feature vector = vector having frequencies of different possible anonymous walks.
24.) How many walks needed to get good estimate [ 1:08:10 ]
25.) Approach 2 :- Learn embeddings of all possible anonymous walks(of fixed length) and concatenate/avg them to get embedding of graph. To learn these anonymous walk embeddings, try to predict the probability distribution over next random walk from a node, using anonymous walk embeddings corresponding to the anon. walks corresponding to previous [delta] random walks. As in [ 1:14:02 ].
26.) Anonymous walks are based on the assumption that all important properties/structures of the graph can be recovered from the statistics of the random anonymous walks that can be undertaken from various nodes.
@dharshanak4118 3 роки тому ⁺³⁹
i never knew Slavoj Zizek used to teach Machine learning when he was young
@MrSouravmondal 3 роки тому
:D :D :D
@TheMateyl 3 роки тому ⁺¹
Thicc Slovenian accent
@user-rt6rr8pp9x 4 роки тому ⁺³¹
Graph is so general, explainable, convenient and elegant. Also it's pretty difficult. : )
@veselindion2 3 роки тому
Pretty difficult indeed :D
@gunarto90 4 роки тому ⁺⁴
Great lecture and Q&A sessions
@MahmoudOuf 4 роки тому ⁺⁶
Great lecture, thanks a lot, The article is also so much interesting and infromative.
@dogaarmangil 2 роки тому ⁺²
Great lecture. 37:50 It is a misnomer to call this process "the finding of similarity between nodes" when in fact what you are doing is finding node groups. In a group, nodes are rarely similar to each other: you won't find 2 CEOs in a company, atom nuclei are surrounded by electrons rather than other nuclei, and so on.
@vikankshnath8068 4 роки тому ⁺⁷
He is a such a great explainer
@linfengyang8834 4 роки тому ⁺²
Great lecture, very informative and clear.
@feishi4932 4 роки тому ⁺³
Great lecture. It helps me a lot.
@jimmorrisshen 4 роки тому ⁺⁷
This is a great talk.
@josephzhu5129 Рік тому
Great lecture, he knows how to explain complicated ideas, thanks a lot!
@tomlarey 4 роки тому ⁺²
Great lecture, props to Jure Leskovec :)
@lloydacquayethompson6789 3 роки тому ⁺²
This guy is awesome
@yandajiang1744 3 місяці тому
Awesome explanation
@arefy100 2 роки тому
Great lecture, thanks a lot
@sm_xiii 4 роки тому ⁺¹²
@Machine Learning TV Can you please give some numbering to the lecture videos? So that we can figure out at which order we should watch those.
@Alkis05 3 роки тому
That is assignment number 3: create a program that go through the lectures and sort them based on the download of the transcript.
@christophchristoph9909 2 роки тому
awesome! thanks for sharing!
@MrSajjadathar 4 роки тому
which video you uploaded yesterday
@hossein.amirkhani 4 роки тому ⁺¹⁰
Great lecture. But just one point: in deepwalk part, he says frequently that he wants to maximize the cost function, but really he wants to minimize it. In addition, his explanation of negative sampling is not completely correct. In fact, he should explain it by mentioning that the multi-class problem is changed to a two-class problem.
@LeenaBora 4 роки тому
Hi Hossein,
Yes. It was indeed great lecture.
I also didn't understand the part of negative sampling. Can you please elaborate it more?
@ghensao4027 Рік тому
Exactly. log(probability) is negative infinity to 0, taking negative of this would have a range of 0 to infinity. You want to minimize instead
@muhammadibrahimnejadian9265 2 роки тому
Thanks a lot for sharing
@ClosiusBeg 2 роки тому
What if make the number of random walks also random? And simulate n ramdom wolks with constant number of walks over m random constant numbers of walks
@soumyadbanik Рік тому
What kind of data can have graph representations and what types of data can not be represented as a graph?
@ShikhaMallick 4 роки тому ⁺¹
Hi, I don't get the intuition behind softmax. Can someone please help me understand?
@muratcan__22 3 роки тому
you can read the wikipedia I think it is well explained there.
@vik24oct1991 3 роки тому
like sigmoid is smooth approximation to sign function or step function , the softmax is smooth approximation for argmax function , like in argmax the output will be 1 for index of input having max value and zero otherwise , in case of softmax the output for index with max value is closer to one but difference is that the second largest or third index will also have non zero based on their values while argmax has one hot output , the softmax has smooth distribution where all index have some value greater than zero based on their inputs but as exponents are used if the max value is far greater than other value , then it basically becomes like the argmax. This is particularly used so that the loss function is differentiable which the argmax is not.
@zhenyusha7123 4 роки тому ⁺²
where are the left class videos?
@MachineLearningTV 4 роки тому
Watch the new video we uploaded yesterday.
@georgeigwegbe7258 4 роки тому ⁺¹
@@MachineLearningTV Hello @MLTV can we get the whole videos on cs224w?
@0xkellan 4 роки тому ⁺⁹
@@georgeigwegbe7258 snap.stanford.edu/class/cs224w-2018/ find "resource" part on this page, you can find lecture notes and videos.
@ufuoma833 2 роки тому
Surprisingly, I can follow his discussion. Or maybe I'm just tripping.
@fatcat4791 2 роки тому
Is this guy from Switzerland ?
@MCRuCr 2 роки тому ⁺¹
Random walking is somehow like the random firing of biological neurons
@dark_all_day9311 3 роки тому
Ah nu Cheeki Breeki iv Damke
@o_lakisgavalas 2 роки тому ⁺¹
misleading title as no (generative) representation is discussed. E.g. "Supervised learning on graphs", or "learning features of graphs" would have been more suitable imho.
@thomasfranzstockhammer7846 2 роки тому ⁺¹
Lg
@distrologic2925 Рік тому ⁺¹
This is about learning *from* graphs but how do we learn the actual graph structures just from raw data? This is the question AI should be answering.
@pr749 3 роки тому
I think I am only here to listen to his accent.
@jensharbers5620 2 роки тому
I do not think That the accent is disturbing someone in any way
@pr749 2 роки тому
@@jensharbers5620 Interesting that you interpreted my comment negatively.
@constantlistener807 2 роки тому
He sounds like young Slavoj Zizek
@charlottjakob2948 3 роки тому ⁺¹
This is a great talk.
@muhammadibrahimnejadian9265 2 роки тому
Thanks a lot for sharing

Наступне

Автоматичне відтворення

Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)