Fantastic~!! 숨을 죽이며 되돌려 보기를 몇 번.. 정말 좋은 강의다.. 문과출신 경력단절 10년 아이 둘 엄마인 내가 인공지능을 알아가는 재미에 빠짐.... 수학은 알수록 참 많은 이야기를 담고 있다.... math is not about numbers but logic and... storytelling as well. Thank you so much from South Korea.
Very recommendable! However, the tiny inserts providing further reading recommendations are hard to read. I suggest these recommended readings should be included in the text section underneath the video link.
You can turn artificial neural networks inside-out by using fixed dot products (weighted sums) and adjustable (parametric) activation functions. The fixed dot products can be computed very quickly using fast transforms like the FFT. Also the number of overall parameters required is vastly reduced. The dot products of the transform act as statistical summary measures. Ensuring good behavour. See Fast Transform (fixed filter bank) neural networks.
If a NN can't compute distances without multiplicative dot products how did Alpha Fold calculate the evolution of protein folding states entirely based on distances?
20:50 May I have a brief explanation about what "logarithm of probability of correct/entirely correct classification" in these two slides means? What is the significance of it and why is it helpful to negate it?
Lot of reasons and statistical considerations, but as intuitive argument(not proof) negating it helps interpreting small values as good and big values as bad, exactly what a loss function in ML is interpreted.
You might also look at shannon`s information entropy and associated measures: en.wikipedia.org/wiki/Quantities_of_information . In short, -log(p) is the a measure of the amount of information an event e with prob. p carries. Intuitively, it tells you how often you`d have to divide your space of possibilities in half in order to locate/find the event e with absolute certainty among all other events. (see also ua-cam.com/video/v68zYyaEmEA/v-deo.html&ab_channel=3Blue1Brown for a great explanation)
@@bingeltubeHi, I was wondering whether e is a constant before, but it does not make sense to me either. If e is a constant approximately 2.7, then 3e - 5 = 3.1. However, I think an effective learning rate is somewhat between 0.001-0.1.
The explanations are a little too handwavy in this lecture. Wojciech seems to assume some intuitions are obvious when they aren't for someone without a lot of experience in the field. For example, when showing how sigmoids can emulate an arbitrary function, he said we can just "average" the sigmoid and the reversed sigmoid to form a bump, without mentioning that this averaging comes from the softmax (or am I wrong and it is something else?).
Still, I appreciate the effort and think the lecture is great overall. Just need to complement it with other sources. Speaking of which, I would recommend Chris Olah's post on understanding neural networks through topology: colah.github.io/posts/2014-03-NN-Manifolds-Topology/ Thank you guys at deepmind for this course!
Also at 1:22:06, it's hard to interpret the graphs - it's not immediately obvious what blue and green represent. Why would one assume the labels are obvious? Graphs should always be labelled 🙂
I just think that the averaging with the reverse sigmoid comes from a second neuron, in fact he says there are 6 neurons, and each pair of neurons is the sigmoid + reverse sigmoid, then we have 3 resulting bells (as in the graph) which are then weighted average in the next layer
@@marcospereira6034 And if you know nothing like me there's soo soooo much more that he's hand wavy about and expects to grasp incredibly quickly... lol
Go for Engineering Mathematics (rather than pure, note same topics you will find in pure mathematics too, but learn it from engineering perspective, thus select courses/books named 'engineering mathematics') Topics -> Linear Algebra, Multivariate Calculus, Optimisation, graph theory, discrete mathematics & Most important in this case Numerical Methods for _________ (fill here whatever you like)
numerical stability refers to the accumulation of errors. an algorithm that is numerically unstable is sensitive to errors and allows them to accumulate, causing the final result to diverge from the correct value.
I need to clarify one more thing -- here 'error' really comes from converting thing from continuous to discrete. Basically you can't implement (actually you can by analytical/symbolic maths but leaving it aside) continuous computations in computers. So what we do --> We discrete (convert to numbers in this case) and thus there are some 'errors' which accumulate and computations don't converge (they usually blow up to infinity or to 0) -> thus even that function is theoretically (in sense continuous space) converging. It will not do so in actual computation on computers --> thus numerically unstable.
4 роки тому
1:04:30 Hold on… Are you telling me that, given a neural network and a set of weights, I could generate a picture of the most doggish dog ever? :D Edit: The answer is "yes"! ua-cam.com/video/shVKhOmT0HE/v-deo.html
*DeepMind x UCL | Deep Learning Lectures | 2/12 | Neural Networks Foundations*
*My takeaways:*
*1. Plan for this lecture **0:40*
*2. What is not covered in this lecture **1:59*
*3. Overview*
3.1 Neural network applications 3:59
3.1 What started the deep learning revolution 5:10
*4. Neural networks **9:17*
*5. Single-layer neural networks **17:14*
5.1 Activation function: sigmoid 17:53
5.2 Loss function: cross-entropy for binary classification 19:50
5.3 Final activation function for multi-class classification: softmax 23:15
5.4 Uses 26:01
5.5 Limitations 27:28
*6. Two-layer neural networks **28:15*
*7. Tensorflow playground **32:34*
*8. Universal Approximation Theorem **33:55*
*9. Deep neural networks **40:29*
9.1 Activation function: ReLU 41:05
9.2 Intuition behind network depth 44:20
9.3 Computational graphs 49:00
*10. Learning/training **52:27*
10.1 Optimizer: Gradient descent 53:24
10.2 Optimizers that are built on gradient descent: Adam, RMSProp 54:39
10.3 Computational graphs for training 55:45
10.4 Backpropagation, chain rule 57:15
10.5 Linear layers as computational graph 1:00:48
10.6 ReLU layers as computational graph 1:02:30
10.7 Softmax as computational graph 1:03:09
10.8 Cross-entropy as computational graph 1:04:04
10.9 "Cross-entropy Jungles" 1:06:00
10.10 Computational graph example: 3-layer MLP with ReLU 1:06:58
*11. Pieces of the puzzle: max, conditional execution **1:08:20*
*12. Practical issues **1:10:44*
12.1 Overfitting and regularization 1:10:51
12.2 Lp regularization 1:12:44
12.3 Dropout 1:13:14
12.4 As models grow, their learning dynamics changes: double descent 1:13:32
12.5 Diagnosing and debugging 1:16:15
*13. Bouns: Multiplicative interactions **1:19:48*
Lei Xun Thanks for sharing
Fool
Thank you.
Fantastic~!! 숨을 죽이며 되돌려 보기를 몇 번.. 정말 좋은 강의다.. 문과출신 경력단절 10년 아이 둘 엄마인 내가
인공지능을 알아가는 재미에 빠짐.... 수학은 알수록 참 많은 이야기를 담고 있다....
math is not about numbers but logic and... storytelling as well.
Thank you so much from South Korea.
Great lecture and big thanks to DeepMind for sharing this great content. - Wojciu, dalej tak. Super!
oh, this are just projections!.. Its such a great intuition, it finally clicked now.. thanks!
We appreciate your compliment..for more guidance WhatsApp.....+::1,,,3,,,,,,1,,,,,,3,,,,,,5,,,,,,3,,,,,,9,,,,,,8,,,,,,.2, ,,,,,7,,,,,,0,,,,,,,,
Wow! Great lecture covering the fundamentals. I liked the focus on computational graphs and on reasoning WHY certain components work or don't work .
same, usually engineering side of things are missing in deep learning lectures
my auto play was on and i just woke up to this
🤣
lol
Very recommendable! However, the tiny inserts providing further reading recommendations are hard to read. I suggest these recommended readings should be included in the text section underneath the video link.
The slides are in the video description; you can copy the titles and authors of the readings from there. :)
@@synaesthesis thank you! Pardon my oversight! :-)
Intuitive, fun, weaved with valuable experience and new research results. Excellent!
You can turn artificial neural networks inside-out by using fixed dot products (weighted sums) and adjustable (parametric) activation functions. The fixed dot products can be computed very quickly using fast transforms like the FFT. Also the number of overall parameters required is vastly reduced. The dot products of the transform act as statistical summary measures. Ensuring good behavour. See Fast Transform (fixed filter bank) neural networks.
53:17 I think there is a mistake in the Jacobian's formula, in the left-down corner.
yes, it should have been df_k/dx_1
It will be even better if you can release course assignments to public for practice
If a NN can't compute distances without multiplicative dot products how did Alpha Fold calculate the evolution of protein folding states entirely based on distances?
Intuitive and explains in detail.
ua-cam.com/video/r_Q12UIfMlE/v-deo.html
20:50
May I have a brief explanation about what "logarithm of probability of correct/entirely correct classification" in these two slides means?
What is the significance of it and why is it helpful to negate it?
Lot of reasons and statistical considerations, but as intuitive argument(not proof) negating it helps interpreting small values as good and big values as bad, exactly what a loss function in ML is interpreted.
You might also look at shannon`s information entropy and associated measures: en.wikipedia.org/wiki/Quantities_of_information . In short, -log(p) is the a measure of the amount of information an event e with prob. p carries. Intuitively, it tells you how often you`d have to divide your space of possibilities in half in order to locate/find the event e with absolute certainty among all other events. (see also ua-cam.com/video/v68zYyaEmEA/v-deo.html&ab_channel=3Blue1Brown for a great explanation)
Wow it's awesome. Please can I get the slide for this lecture. Thanks 😊
its in video description
A very nice lecture.
Me: being happy about a deep learning lecture.
Also me: seeing complicated formulas and closing the tab.
Don't let it discourage you!
just focus on the concept. There are major APIs that can do the math for you.
It’s not that complicated, don’t give up.
33:29
33:29
"Play is the highest form of research" - Albert Einstein
Thank you.
Thank you for the lecture! In 3e-5, what does e stand for?
That is scientific notation: 3e-5 is the same as 3 * 10 ^ -5 which is 0.00003 😊
@@AndreyButenko Thank you so much. so this means that setting the learning rate at 0.00003 is really helpful?
e stands for Euler's constant and it refers in this case to the exponential function see e.g. en.wikipedia.org/wiki/E_(mathematical_constant)
@@bingeltubeHi, I was wondering whether e is a constant before, but it does not make sense to me either. If e is a constant approximately 2.7, then 3e - 5 = 3.1. However, I think an effective learning rate is somewhat between 0.001-0.1.
@@bingeltube I think Andrey's answer makes more sense.
Are the slides for this series available?
this
@@vmikeyboi323 where?
@@vmikeyboi323 where?
@@jingtao1181 When people just comment "this", it usually means something like "I agree". Don't ask my why the word "this", I don't know either.
@ thanks for the reminder
The explanations are a little too handwavy in this lecture. Wojciech seems to assume some intuitions are obvious when they aren't for someone without a lot of experience in the field.
For example, when showing how sigmoids can emulate an arbitrary function, he said we can just "average" the sigmoid and the reversed sigmoid to form a bump, without mentioning that this averaging comes from the softmax (or am I wrong and it is something else?).
Still, I appreciate the effort and think the lecture is great overall. Just need to complement it with other sources.
Speaking of which, I would recommend Chris Olah's post on understanding neural networks through topology: colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Thank you guys at deepmind for this course!
Also at 1:22:06, it's hard to interpret the graphs - it's not immediately obvious what blue and green represent. Why would one assume the labels are obvious? Graphs should always be labelled 🙂
I just think that the averaging with the reverse sigmoid comes from a second neuron, in fact he says there are 6 neurons, and each pair of neurons is the sigmoid + reverse sigmoid, then we have 3 resulting bells (as in the graph) which are then weighted average in the next layer
@@marcospereira6034 And if you know nothing like me there's soo soooo much more that he's hand wavy about and expects to grasp incredibly quickly... lol
Idk. I felt the entire explanation was quite superficial. I’m not sure whether this depends on the audience though
It is super clear
Can we please get the slides for these series?
They are in the video description Taimur
What software was used to make these slides?
Thanks for this Interesting series. Any place I can get related math knowledge ?
same, would love to have some links to related math content
Go for Engineering Mathematics (rather than pure, note same topics you will find in pure mathematics too, but learn it from engineering perspective, thus select courses/books named 'engineering mathematics')
Topics -> Linear Algebra, Multivariate Calculus, Optimisation, graph theory, discrete mathematics & Most important in this case Numerical Methods for _________ (fill here whatever you like)
I forgot to mention - Finite point airthmatic
:o there is a writing mistake in "neural networks as computational graphs" sotfmax instead of softmax
haha
I really like this explanation. But yet this is not the best. There are a lot of blind spots in this work. This course is not for beginners
I have to try hard to hear and understand the lecturer, I wish he spoke more clearly.
Use caption
Can someone explain in a simpler way the term '"Numerically Stable"
@Prasad SeemakurthiThank you Prasad. It was really healpful
numerical stability refers to the accumulation of errors. an algorithm that is numerically unstable is sensitive to errors and allows them to accumulate, causing the final result to diverge from the correct value.
I need to clarify one more thing -- here 'error' really comes from converting thing from continuous to discrete. Basically you can't implement (actually you can by analytical/symbolic maths but leaving it aside) continuous computations in computers. So what we do --> We discrete (convert to numbers in this case) and thus there are some 'errors' which accumulate and computations don't converge (they usually blow up to infinity or to 0) -> thus even that function is theoretically (in sense continuous space) converging. It will not do so in actual computation on computers --> thus numerically unstable.
1:04:30 Hold on… Are you telling me that, given a neural network and a set of weights, I could generate a picture of the most doggish dog ever? :D
Edit: The answer is "yes"! ua-cam.com/video/shVKhOmT0HE/v-deo.html
35:57
9:18
The tricky triangle lamentably look because comic clasically enjoy near a meek purple. spicy, sordid spot
The historical production longitudinally bake because bead practically trade since a wrathful feature. vengeful, dangerous wealth
The wet calendar normally tempt because tomato alternately release inside a obnoxious open. chubby, exultant panty
*Neural Net self destructs*