Petar's talk are great as always! (I remember attending his talk while at Google lol). Timestamps for those looking to rewatch specific sections :) 0:00 - Introduction by Pietro Lio 1:10 - Overview 1:56 - 1. Fantastic GNNs in the Wild 6:52 - 2. Talk Roadmap 9:00 - 3. Towards GNNs from first principles 10:34 - 4. Permutation invariance and equivariance 15:42 - 5. Learning on Graphs 20:22 - 6. Message passing on graphs 24:34 - 7. Perspectives on GNNs 25:42 - 7.1 Node Embedding Techniques 29:39 - 7.2 Natural Language Processing 31:23 - 7.3 Spectral GNNs 41:17 - 7.4 Probabilistic Graphical Models 45:09 - 7.5 Graph Isomorphism Testing 48:53 - 7.6 Geometric Deep Learning 50:23 - 7.7 Historical Concepts 51:15 - 7.8 Computational Chemistry 52:22 - Acknowledgements and Q&A
Most of these concepts were already at least somewhat familiar to me but the way he connects them wholistically makes me feel like I'm learning them for the first time
Beautiful presentation. Dr Velickovic is one of the best lecturers I've heard in my life. Everything he says is so clear and concise. Add his charisma on top of all that and you can understand why he attracts more and more people to study GNNs. We are so proud to have him
Thank you very much! I just completed my undergrad, and I am in the process of discovering new ideas and topics to work upon and learn more. These kinds of videos really help me (esp as a young graduate who doesn't have much idea about multiple topics but want to discover more).
amazing talk! I like the way that you connect concepts with applicational and historical context. It makes me motivated to make this talk making all senses to a 7 year-old or a 107 year-old (:
Hi Petar, it's been a nice reframing of GNNs, thanks! Noting that GAT can treat non-homophilic graphs strikes this analogy to me: If propagation is error smoothing then attention makes edge-aware smoothing (in image processing).
Excellent talk, Thank you so much. I'll be more than happy if you share the best resources to dive into GNN applied to combinatorial optimisation problems. 🙏
Thank you very much for the presentation; it was very insightful. I have a minor question. As you pointed out, the situation is a bit different when it comes to the continuous node feature task. I wonder if some care needs to be taken when fitting a graph representation in the convolutional neural network? Thank you for your consideration.
Thanks for a great talk! Very interesting and inspiring! I wonder, is there any research for graph networks in the limit, amenable for analytical treatment, like the NTK mode? Are there any special properties of the loss landscapes, not present in more common fully-connected NNs or usual CNNs?
Thank you for the kind words! While it may not fully align with what you're after, I think you might find the recent paper from Xu et al. on (G)NN extrapolation very interesting: arxiv.org/abs/2009.11848 Herein, the authors make several (nicely visualised) geometrical arguments on the properties of GNNs in the extrapolation regime. The main tool and setting for their analysis is, indeed, the NTK mode.
Excellent presentation! A question came up though, which flavour of gnn layer could we say that graphsage uses for the embedding algorithm? Could the learned weights matrices W be considered as fixed weight inputs of the convolutional GNN layer?
An excellent question -- thanks for asking! This would depend on which type of GraphSAGE we're looking at :) GraphSAGE-mean, GraphSAGE-GCN and GraphSAGE-pool are all conv-GNNs. They transform every node in isolation, they then use a permutation invariant aggregator, and do not take the receiver node at all into account. The matrix W is just part of either of the two functions (psi or phi), depending on whether it's applied to individual neighbours or aggregated vectors. On the other hand, GraphSAGE-LSTM is not permutation equivariant, and hence does not fit any of the three flavours. It is possible to 'massage' the LSTM aggregator to make it fit, however; see Janossy Pooling (Murphy et al.). Lastly, I'd note that GraphSAGE's main contribution is its scalability to inductive learning on very large graphs (as per its title ;) ) through neighbourhood sampling. Many of the embedding algorithms it proposes are not unlike models already previously proposed in the literature (e.g. the GCN of Kipf & Welling).
It should be possible, in my opinion. Especially since GNNs are fundamentally discrete structures, they align very well with the kind of computation typically studied in a theoretical CS degree.
As far as I know, boosting and decision trees are great for dealing with data that is assumed tabular, i.e. where you don't assume your nodes are linked together. GCNs (and/or more expressive GNNs) should be used whenever you assume that the links between your data points are actually meaningful and should be exploited.
Matrices that commute are jointly diagonalizable. I understand this as if AB = BA, then A and B have the same eigen vectors? However this cannot be true as I commutes with any matrix, and any vector is an eignevector of I.
Good sighting! I didn't have the time in the talk to get into the nuances of this, but essentially, you'll have exactly the same eigenbasis if the matrices commute _and_ have no repeated eigenvalues. If you have repeated eigenvalues (as is the case for I), this theorem becomes more ambiguous to apply. en.m.wikipedia.org/wiki/Commuting_matrices has a bucket list of properties wrt commutativity and diagonalisation.
I understand the assertion, with no repeated eigen values. I thought about it a bit more, for repeated eigen values, it's like there exists a change of basis that diagonalize all of them? Any unitary matrix diagonalizes I. Thanks for the reference. The shift matrix was an excellent hint for the next property.
37:17 Why can't an adjacency matrix be eigen decomposed? AFAIK, any real symmetric matrix is diagonalizable. en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#Real_symmetric_matrices I believe you were trying to substitute adjacency matrix with a positive semi-definite matrix that can also express all adjacency properties. That way, the eigen decomposed diagonal matrix Λ only has non-negative values.
Thanks for your note! You are certainly correct, and this is one aspect I hadn't qualified well enough in the talk. Yes, any undirected adjacency matrix can be eigendecomposed. However, the Laplacian's eigendecomposition has nicer properties. It guarantees all eigenvalues are nonnegative (while the multiplicity of the zero eigenvalues can be used to track connected components), and the resulting eigenvectors can be directly used to approximate many interesting problems on the graph (eg. optimal cuts).
Incredible clarity here despite the challenging material. There's no other way to say it: Petar is goated.
This is one of the cleanest, most sophisticated and organized scientific speeches I have ever heard...
Petar's talk are great as always! (I remember attending his talk while at Google lol).
Timestamps for those looking to rewatch specific sections :)
0:00 - Introduction by Pietro Lio
1:10 - Overview
1:56 - 1. Fantastic GNNs in the Wild
6:52 - 2. Talk Roadmap
9:00 - 3. Towards GNNs from first principles
10:34 - 4. Permutation invariance and equivariance
15:42 - 5. Learning on Graphs
20:22 - 6. Message passing on graphs
24:34 - 7. Perspectives on GNNs
25:42 - 7.1 Node Embedding Techniques
29:39 - 7.2 Natural Language Processing
31:23 - 7.3 Spectral GNNs
41:17 - 7.4 Probabilistic Graphical Models
45:09 - 7.5 Graph Isomorphism Testing
48:53 - 7.6 Geometric Deep Learning
50:23 - 7.7 Historical Concepts
51:15 - 7.8 Computational Chemistry
52:22 - Acknowledgements and Q&A
Most of these concepts were already at least somewhat familiar to me but the way he connects them wholistically makes me feel like I'm learning them for the first time
Beautiful presentation. Dr Velickovic is one of the best lecturers I've heard in my life. Everything he says is so clear and concise. Add his charisma on top of all that and you can understand why he attracts more and more people to study GNNs. We are so proud to have him
Excellent talk Petar, so useful to have these different perspectives brought together in one consistent framing.
Great talk! The first 20 minutes are simply brilliant! Kind of first principles explanation that I dream of when starting any new type of topic :)
great presentation. Thank you for sharing, Dr Velickovic
It was way informative and slides are self explanatory for the people with basic understanding of math equations :) Thank you !
Thank you very much! I just completed my undergrad, and I am in the process of discovering new ideas and topics to work upon and learn more. These kinds of videos really help me (esp as a young graduate who doesn't have much idea about multiple topics but want to discover more).
Your presentation skills only became better since Cambridge times. And they were stellar then already.
Petar! This is solid work. Clear thinking and speaking.
We need more lectures like this! Nice Lecture!
amazing talk! I like the way that you connect concepts with applicational and historical context. It makes me motivated to make this talk making all senses to a 7 year-old or a 107 year-old (:
Very nice lecture talk. Good GNN resources, tools and exposure
Rewatching some of this talk - it is that good!
Great talk! It definitely improved my understanding about GNNs. Thank you!
Hi, Petar, thanks a lot for the talk recording. Could you also release the slides of your talk?
They are provided in the video description now :)
@@petarvelickovic6033 Thanks a lot again :)
Hi Petar, it's been a nice reframing of GNNs, thanks!
Noting that GAT can treat non-homophilic graphs strikes this analogy to me: If propagation is error smoothing then attention makes edge-aware smoothing (in image processing).
Thank you for the great talk, Petar!
Great presentation. very entertaining and informative. Thanks Petar
Thank you Petar for this talk!
This gentleman is very good at this.
Excellent talk, Thank you so much.
I'll be more than happy if you share the best resources to dive into GNN applied to combinatorial optimisation problems. 🙏
Thank you for the kind words!
As a matter of fact, we've very recently put out a survey on GNNs for combinatoral tasks:
arxiv.org/abs/2102.09544
@@petarvelickovic6033 This is amazing. Thanks
This was a great talk. Thank you!
Thank you very much for the presentation; it was very insightful. I have a minor question. As you pointed out, the situation is a bit different when it comes to the continuous node feature task. I wonder if some care needs to be taken when fitting a graph representation in the convolutional neural network? Thank you for your consideration.
This is an Interesting area of study, you would have dropped the link for pdf books on comments
Thanks for a great talk! Very interesting and inspiring! I wonder, is there any research for graph networks in the limit, amenable for analytical treatment, like the NTK mode? Are there any special properties of the loss landscapes, not present in more common fully-connected NNs or usual CNNs?
Thank you for the kind words!
While it may not fully align with what you're after, I think you might find the recent paper from Xu et al. on (G)NN extrapolation very interesting: arxiv.org/abs/2009.11848
Herein, the authors make several (nicely visualised) geometrical arguments on the properties of GNNs in the extrapolation regime. The main tool and setting for their analysis is, indeed, the NTK mode.
Excellent presentation! A question came up though, which flavour of gnn layer could we say that graphsage uses for the embedding algorithm? Could the learned weights matrices W be considered as fixed weight inputs of the convolutional GNN layer?
An excellent question -- thanks for asking!
This would depend on which type of GraphSAGE we're looking at :)
GraphSAGE-mean, GraphSAGE-GCN and GraphSAGE-pool are all conv-GNNs. They transform every node in isolation, they then use a permutation invariant aggregator, and do not take the receiver node at all into account. The matrix W is just part of either of the two functions (psi or phi), depending on whether it's applied to individual neighbours or aggregated vectors.
On the other hand, GraphSAGE-LSTM is not permutation equivariant, and hence does not fit any of the three flavours. It is possible to 'massage' the LSTM aggregator to make it fit, however; see Janossy Pooling (Murphy et al.).
Lastly, I'd note that GraphSAGE's main contribution is its scalability to inductive learning on very large graphs (as per its title ;) ) through neighbourhood sampling. Many of the embedding algorithms it proposes are not unlike models already previously proposed in the literature (e.g. the GCN of Kipf & Welling).
@@petarvelickovic6033 thank you very much for the answer :D
Found this on reddit. Great talk.
Would it be possible for someone without too much ML experiences (but have CS degree) learn theoretical part of GNN inside out within a month?
It should be possible, in my opinion.
Especially since GNNs are fundamentally discrete structures, they align very well with the kind of computation typically studied in a theoretical CS degree.
Hvala, Petre :)
Hvala Petre!! :)
thank you soo much - very good and useful!
Relation between pgm and graph nn is not clear
can you clear the concepts?
Really great summary! :)
why can't you use XGboost or decision trees for node level classification instead of GCN?
Of course you can! Sergey Ivanov et al. recently showed it's a very strong baseline: arxiv.org/abs/2101.08543
@@petarvelickovic6033 so when to use GCN over XGB or decision trees? Not combined.
As far as I know, boosting and decision trees are great for dealing with data that is assumed tabular, i.e. where you don't assume your nodes are linked together. GCNs (and/or more expressive GNNs) should be used whenever you assume that the links between your data points are actually meaningful and should be exploited.
pretty fast . rewatching 3~4 times is helpful.
Matrices that commute are jointly diagonalizable. I understand this as if AB = BA, then A and B have the same eigen vectors?
However this cannot be true as I commutes with any matrix, and any vector is an eignevector of I.
Good sighting! I didn't have the time in the talk to get into the nuances of this, but essentially, you'll have exactly the same eigenbasis if the matrices commute _and_ have no repeated eigenvalues. If you have repeated eigenvalues (as is the case for I), this theorem becomes more ambiguous to apply.
en.m.wikipedia.org/wiki/Commuting_matrices has a bucket list of properties wrt commutativity and diagonalisation.
I understand the assertion, with no repeated eigen values. I thought about it a bit more, for repeated eigen values, it's like there exists a change of basis that diagonalize all of them? Any unitary matrix diagonalizes I. Thanks for the reference. The shift matrix was an excellent hint for the next property.
Thank you so much!
Good talk. Thanks.
If you are using Transformers, you are using GNNs!
Brain.....melted
Theoretically also not clearly explained and practically not explained worst University
37:17 Why can't an adjacency matrix be eigen decomposed? AFAIK, any real symmetric matrix is diagonalizable. en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#Real_symmetric_matrices
I believe you were trying to substitute adjacency matrix with a positive semi-definite matrix that can also express all adjacency properties. That way, the eigen decomposed diagonal matrix Λ only has non-negative values.
Thanks for your note! You are certainly correct, and this is one aspect I hadn't qualified well enough in the talk.
Yes, any undirected adjacency matrix can be eigendecomposed. However, the Laplacian's eigendecomposition has nicer properties. It guarantees all eigenvalues are nonnegative (while the multiplicity of the zero eigenvalues can be used to track connected components), and the resulting eigenvectors can be directly used to approximate many interesting problems on the graph (eg. optimal cuts).