Why do Convolutional Neural Networks work so well?

Algorithmic Simplicity

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 24 гру 2024

КОМЕНТАРІ • 104

@user-wv1po8hf8k 24 дні тому ⁺²
Best video series ever, finally answering the real questions I had about HOW they do what they do, not the steps they follow
@ozachar Рік тому ⁺¹⁸
As a physicist, I recognize this process as "real space renormalization group" procedure in statistical mechanics. So each layer is equivalent to a renormalization step (a coarse graining). The renormalization flows are then the gradual flow towards a resolution decision of the neural net. It makes the whole "magic" very clear conceptually, and also automatically points the way for less trivial renormalization procedures known in theoretical physics (not just simple real space coarse graining). The clarity of videos like yours is so stimulating! Thanks
@warpdrive9229 Рік тому
Bingo!
@dradic9452 Рік тому ⁺⁵⁶
Please make more videos. I've been watching countless neural networks videos and until I saw your two videos I was still lost. You explained it so clearly and concisely. I hope you make more videos.
@algorithmicsimplicity Рік тому ⁺¹²
Thanks for the comment, it's great to hear you found the videos useful. I was unexpectedly busy with my job the past few months, but rest assured I am still working on the transformer video.
@algorithmicsimplicity 2 роки тому ⁺⁵⁷
Transformer video coming next! I'm still getting the hang of animating, but the transformer video probably won't take as long to make as this one. I haven't decided what I will do after that, so if you have any suggestions/requests for computer science, mathematics or physics topics let me know.
@bassemmansour3163 2 роки тому ⁺¹
what program are you using for animation? thanks!
@algorithmicsimplicity 2 роки тому ⁺⁴
I'm using the python package manim: github.com/ManimCommunity/manim
@davidmurphy563 2 роки тому ⁺⁶
I'd say probably RNNs would flow nicely from this [excellent] video. GANs too I guess. Autoencoders for sure. Oh, LSTMs, the memory problem is a fascinating one. Oh and Deep Q-Networks.
Meh, the field is so broad you can't help but hit. I'd say RNNs first as going from images to text seems a natural progression.
@wissemrouin4814 2 роки тому ⁺²
@@davidmurphy563 yess please, I guess RNNs needs to be presented even before transformers
@davidmurphy563 2 роки тому ⁺²
@@wissemrouin4814 Yeah, I would agree with you there. RNNs serve as a good introduction to a lot of the approaches you'll see for sequence-vector problems and its drawbacks explains the development of transformers.
I'd suggest RNNs then LSTNs then transformers.
That said, this channel has done sterling work explaining everything so far so I'm sure he'll do a great job even if he dives straight into the deep end.
@IllIl Рік тому ⁺⁴³
Dude, your teaching style is absolutely superb! Thank you so much for these. This surpasses any of the explanations I've come across in online courses. Please make more! The way you demystify these concepts is just in a league of its own!
@joshlevine4221 4 місяці тому ⁺⁵
3:02 _Strictly_ speaking, there are only a finite number of images for any given image size and pixel depth, so each on can be uniquely described by a single number (and it is even an integer!). These "image numbers" cover a very, very, very wide and sparsely-filled range, but the "image number" still only has a single dimension. Thank you for the great video!
@dillxn554 Місяць тому
Great point. At 4:47, my thought is that there are 8 bits needed to cover values 0-255, so the actual volume of the space would be `3,072 color vals x 8 bits ea. = 24,576 bits` to represent an image, so `2^24576` potential images? I don't know where the 9 came from.
@rohithpokala Рік тому ⁺²
Bro ,you are real super man.This video gave so many deep insights in just 15 mintues providing so much strong foundation. I can confidently say,this video single handedly throwed 1000's of neural networks videos present on the internet.You raised the bar so high for others to compete.Thanks.
@Number_Cruncher 2 роки тому ⁺¹⁰
This was a very cool twist in the end with the rearranged pixels. Thx, for this nice experiment.
@nananou1687 7 місяців тому ⁺²
This is genuinely one of the best videos I have ever seen! No matter the type of content. You have somehow made one of the most complicated topic, and simply distilled it to this. Brilliant!
@thomassynths Рік тому ⁺¹
This is by far the best explanation of CNNs I have ever come across. The motivational examples and the presentation are superb.
@ZetaReticulli Рік тому ⁺⁵
@4:17 Why is it 9^N points required to densely fill N dimensions? Where is 9 being derived from? Is it for the purpose of the example given - or a more general constraint?
@algorithmicsimplicity Рік тому ⁺⁵
It is a completely arbitrary number just for demonstration purposes. In general, in order to fill a 1d interval of length 1 to a desired density d you need d evenly spaced points. To maintain that density for n-d volume you need d^n points. I just chose d=9 for the example. And the more densely filled the input space is with training examples, the lower the test error of a model will be.
@senurahansaja3287 Рік тому
@@algorithmicsimplicity thank you for ur explanation but in here ua-cam.com/video/8iIdWHjleIs/v-deo.html dimensional points mean the input dimension right ?
@algorithmicsimplicity Рік тому ⁺¹
@@senurahansaja3287 Yes that's correct.
@benjamindilorenzo 9 місяців тому ⁺³
The best video on CNN´s. Please make a video about V-Jepa, the proposed SSL Architecture from Yann LeCun.
Also it would be nice to have a deeper look at Diffusion Transformers or Diffusion in general.
Really really good work man!
@bassemmansour3163 2 роки тому ⁺⁶
best illustrations in the subject. thank you for your work!
@j.j.maverick9252 2 роки тому ⁺⁷
another superb summary and visualisation, thank you!
@connorgoosen2468 Рік тому ⁺²
How has the UA-cam Algorithm not suggested you sooner? This is such a great video, just subscribed and keen to see how the channel explodes!
@jcorey333 10 місяців тому ⁺¹
This is one of the best explanations I've seen! Thanks for making videos
@manthanpatki146 11 місяців тому ⁺¹
Man, keep making more videos, this is a brilliant video
@illeto 7 місяців тому ⁺³
Fantastic videos.
Here before you inevitably hit 100k subscribers.
@yoavtamir7707 3 місяці тому ⁺¹
You are explaining so so so well. Thanks and keep going!!!!!!!!!!!’
@jollyrogererVF84 Рік тому ⁺¹
A brilliant introduction to the subject. Very clear and informative. A good base for further investigation.👍
@PotatoMan1491 6 місяців тому ⁺¹
Best video I found for explaining this topic
@djenning90 Рік тому ⁺²
Both this and the transformers video are outstanding. I find your teaching style very interesting to learn from. And the visuals and animations you include are very descriptive and illustrative! I’m your newest fan. Thank you!
@escesc1 7 місяців тому ⁺¹
This channel is top notch quality. Congratulations!
@khoakirokun217 7 місяців тому ⁺³
I love that you point out that we have "super human capability" because we are pre trained with assumption about the spatial information :D TLDR: "we are sucked" :D
@sergiysergiy8875 Рік тому ⁺¹
This was great. Please, continue your content
@nadaelnokaly4950 8 місяців тому ⁺²
wow!! ur channel is a treasure
@BenjaminDorra 4 місяці тому ⁺¹
Thank you for this fascinating video !
It is a very original angle on the effectiveness of CNNs. I have never seen this approach, most articles and videos focus on the reduction in parameters and computation compared with the base MLP or the image compression.
Interestingly you don't talk about pooling, a staple of CNNs architectures. Arguably it is mostly for computational efficiency but I have seen a bit of debate on the subject (max pooling being especially polarizing).
@algorithmicsimplicity 4 місяці тому ⁺¹
My goal in this video is to explain why CNNs generalize better than other architectures. It is true that CNNs are more computationally efficient than MLPs, but there are other ways to improve the efficiency of MLPs. In particular, in this video the "Deep neural network" that I am comparing to is not a MLP, but a MLP-mixer. This MLP-mixer is just as parameter and compute efficient as the CNN (using an almost identical architecture), the only difference between them is that in the CNN each neuron sees a 3x3 patch, and in the MLP-mixer each neuron sees information from the entire image. This difference, and this difference alone, results in the ~20% point accuracy increase.
Max-pooling has generally been used to improve efficiency. Sometimes max-pooling can improve accuracy, but only by about 1-2%. In other cases, max-pooling can actually reduce accuracy. The main reason to use it is to just reduce computation. Because of this I don't consider max-pooling to be fundamental to the success of CNNs, you can built CNNs without max-pooling, they work fine.
@terjeoseberg990 Рік тому ⁺²
I believe that the main advantage to convolutional neural networks over fully connected neural networks is the computational savings and the increased training data.
A convolutional neural network is basically a tiny fully connected network that’s being trained on every NxN square on every imaginable. This means that a 256x256 image is effectively turned into 254x254 or 64,516 tiny images. If you start with 1 million images in your training data, you now have 64.5 billion 3x3 images that you’re going to train the tiny neural network on.
You can then create 100 of these tiny neural networks for the first layer, another 100 for the second layer, and another 100 for the third layer, and so on for 10 to 20 layers.
@algorithmicsimplicity Рік тому
I think that these 2 reasons are the most commonly cited reasons for the success of CNNs (along with translation invariance, which is absolutely incorrect), but I don't think that these 2 things are sufficient to explain the success of the CNN.
It is true that CNN uses much less computation than fully connected neural networks, but there are other ways to make deep neural networks which are just as computationally efficient as CNNs. For example, using a MLP-Mixer style architecture in which a linear transform is first applied independently across channels to all spatial locations, and then a linear transform is applied independently across spatial locations to all channels. In fact, this is exactly what I used when making this video! The "Deep Neural Network" I used was precisely this, it would have taken too long to train a deep fully connected neural network. This MLP-Mixer variant uses the same computation as CNN, but allows each layer to see the entire input. Which is why it achieves less accuracy than a CNN.
As for the increased training data size, it is possible this helps but even if you multiply your dataset size by 100,000, it is still nowhere near the amount of data you would expect to need to learn in 256*256 dimensional space. Also, if it was merely the increased training data, then I would expect CNNs to perform better than DNNs even on shuffled data (after all, having more data should still help in this case). But in fact we observe the opposite, CNNs perform worse than DNN when the spatial structure is destroyed.
For these reasons I believe that the fact that each layer sees a low effective dimensional input is necessary and sufficient to explain the success of CNNs.
@terjeoseberg990 Рік тому
@@algorithmicsimplicity, It’s a combination of multiplying the dataset size by 64,500 and reducing the network size from 256x256 to 3x3. In fact it’s the reduction of the network size to 3x3 that’s allowing the effective 64,500 times increase in dataset size. It’s not one or the other, but both. Each weight gets a whole lot more training/gradient following.
You should do a video on the MLP-Mixer, and how it compares to CNN.
@panizzutti 2 місяці тому ⁺¹
Bro your videos make me understand so well wtf
@montanacaleb 4 місяці тому ⁺³
You are the 3blue1brown of ml
@thetntsheep4075 4 місяці тому ⁺¹
At 14:00 with the rearranged pixels, do you mean every image in the dataset has the pixels rearranged in the same way? If they were rearranged in a different random way for each image I dont see how you could learn classification well at all
@algorithmicsimplicity 4 місяці тому
Yes I do mean in the same way. The same permutation is applied to every image. This is equivalent to shuffling the columns of a tabular dataset. Has no effect on fully connected neural networks, but severely impacts CNNs.
@joshmouch Рік тому
Yeah. Jaw dropped. This is an amazing explanation. More please.
@anangelsdiaries 10 місяців тому ⁺²
Fam, your videos are absolutely amazing. I finally understand what the heck a CNN is. Thanks a lot!
@justchary Рік тому
I do not know who you are, but please continue! You definitely have a wast knowledge on the subject, because you can explain complex things simply.
@VictorWinter-n2i Рік тому
Really nice! What tool did you use to do those awesome animations?
@algorithmicsimplicity Рік тому
This was done using Manim ( www.manim.community/ )
@pedromartins9889 11 місяців тому
Great video. You explain things really well. My only complain is that you don't cite references. Citing references (which can be made simply as a list in the description) makes your less obvious statements more sound (like the fact that the quantity of significant outputs of a layer is more or less constant and small, I understand it would be very hard to explain it maintaining the flow of the video, but if there was in the description some link to that explanations or at least to a practical demonstration, the viewer could, if wanted, understand it better or at least be more sure that it is really true). Citing references also helps the viewer a lot if he wants to further study the topic (and this is fair, since you already made the rersearch for the video, so it costs you way less to show your sources than to the viewer to rediscover them). In summary: citing references gives you more credibility (in a digital world filled with so much bullshit) and gives a great deal of help to interested viewers to go deeper on the topic. Don't be mistaken, I really like your channel.
@KarlyVelez-u2k Рік тому ⁺¹
Your videos are extremely good, especially for such a small channel. Great video! Can do one in Recurrent Neural Networks please .
@5_inchc594 Рік тому ⁺²
amazing content thanks for sharing!
@GaryBernstein Рік тому
Can you explain how the NN produces the important-word-pair information-scores method described after 12:15 from the sentence problem raised at 10:17? Can you recommend any tg groups for this Q & topic?
@neithanm Рік тому ⁺¹
I feel like I missed a step. The layers on top of the horse looked like an homogeneous color. Where's the information? I was expecting to see features from small parts to recognizing the horse, but ...
@reubenkuhnert6870 Рік тому
Excellent content!
@JorgeSolorio620 2 роки тому ⁺³
Great video! Can do one in Recurrent Neural Networks please 🙏🏽
@Emma2-cg5jh 7 місяців тому
Where does the Performance value for the rearranged images come from? Did you made Them By Yourself or is there a paper for that?
@algorithmicsimplicity 7 місяців тому
All of the accuracy scores in this video are from models I trained myself on CIFAR10.
@HD-Grand-Scheme-Unfolds Рік тому
@AlgorithmicSimplicity greetings, may I ask: in you video presentation could you please specify in what sense do you mean by "Randomly re-order the pixels" (13:55)? let me explain my question. Although I know you mean reshuffling the permutation order of the set of input pixels; when I said in what sense I meant: is it (A-> a unique random re-order seed for each training example (as in for every picture) |OR| (B-> the same random re-order seed for each training example?
if you meant in the sense of "A" I would be amazed the convolution-net can get that 62.9% accuracy you mentioned earlier. That 62.9% would be more believable for me if you meant in the sense of "B".
@algorithmicsimplicity Рік тому ⁺³
I meant in the B sense, same shuffle applied to every image in the dataset (training and test). If it was a different random shuffle for each input then no machine learning model (or human) would ever get above 10% accuracy. If you have some experience with machine learning, this operation is equivalent to shuffling the columns of a tabular dataset which of course all standard machine learning algorithms are invariant to.
@HD-Grand-Scheme-Unfolds Рік тому
@@algorithmicsimplicity lol, speaking from in hindsight, your point in now taken. Dwl lmao😄🤣. Which human or person.... but let me be the devil's advocate for entertainment and curiosity purposes a bit: I it was somehow in sense "A", then I'd imagine that imply a phenomenon we all may call pure memorization at its finest.
But to go back on main track, I love that you went out of the way to make that clear in you presentation, yours is the second video that mentioned, but you were to first to settle the question the big question (that I already asked you, thanks again).
by the way in the name of opportunity sake I would like to ask: Do you know where a non-programmer person may source a intuitive interactive GUI based executable program that simulate and implement recurrent neural networks (especially if its the simple RNN, prefer against but will accept LSTMs or GRUs)? Github for example mostly accommodates those who meet coding knowledge prerequisite. "MemBrain" (meets concept but its RNN is still puzzling for me to figure out, and train test etc) (but its the most promising one to try work with so far) and "Neuroph Studio" (meets the concept but have no RNN support) and "Knime Analytics Platform" is likened onto coding skills, in disguise as GUI with click and parameter controls. rules for arrangments are too complex, and counter intuitive. IBM Watson studio seems similar and matlab is a puzzlebox too.
@algorithmicsimplicity Рік тому ⁺¹
I'm afraid I don't know of any GUI programs that simulate RNNs explicitly, but I do know that RNNs are a subset of feedforward NNs. That is, it should be possible to implement an RNN in any of those programs you suggested. All you would need to do is have a bunch of neurons in each layer that copy the input directly (i.e. the i'th copy neuron should have a weight of 1 connected to the i'th input and 0 for all other connections), and then force all neuron weights to be the same in every layer. That will be equivalent to an RNN.
I would also recommend you just try and program such an app yourself. Even if you have no experience programming, you can just ask ChatGPT to write the code for you 😄.
@mrfurious60 4 місяці тому ⁺¹
How we do go from 3 by 3 feature map to a 5 by 5 image?
@algorithmicsimplicity 4 місяці тому
In the second layer, the input is a 3x3 grid of outputs from the first layer. In the first layer, each output is computed from a different 3 by 3 grid of pixels. Therefore, the input to the second layer contains information from 9 different overlapping 3x3 patches, which means it sees information from a 5x5 patch of pixels.
@mrfurious60 4 місяці тому
@@algorithmicsimplicity Thanks man. I guess I'm a little too slow because while I get how the first layer gives us what it does, the second layer is still a problem 😅.
@bobuilder4444 7 місяців тому
13:09 How would you know which numbers to remove?
@algorithmicsimplicity 7 місяців тому
You can simply order the weights by absolute value and remove the smallest weights (the ones closest to 0). This probably isn't the best way to prune weights, but it already allows you to prune about 90% of them without any loss in accuracy: arxiv.org/abs/1803.03635
@bobuilder4444 7 місяців тому
@@algorithmicsimplicity Thank you
@jamespogg Рік тому
amazing vid good job man
@Isaacmellojr 11 місяців тому
Mais videos por favor! Vc tem o dom!!
@uplink-on-yt Рік тому
12:58 Wait a minute... Did you just describe neural pruning, which has been observed in young human brains?
@ThankYouESM Рік тому
Seems like the bag-of-words algorithm can do a faster job at image recognition since it doesn't need to read a pixel more than once.
@aydink7739 Рік тому ⁺¹
This is wow, finally understand the „magic“ behind CNNs. Bravo, please continue 👍🏽
@Walczyk 11 місяців тому
7:04 this is just like the boost library from microsoft
@pypypy4228 Рік тому
It's brilliant!
@solaokusanya955 Рік тому ⁺¹
So technically, what the computer sees or not is high dependent on "whatever" we has humans dictate it to be...
@nageswarkv Рік тому
definitely good video, not fluff video
@lilep666 2 місяці тому
2:36 you just put images of dogs and cats arbitrarily on the x axis?? what? what does x even mean in this example and how do you decide what "x" value every image gets? x has no meaning here so what does it matter what curve you fit??
@scarletsence Рік тому
This god like visualizations thanks.
@peki_ooooooo Рік тому
Hi, how's the next video?
@yourfutureself4327 Рік тому
💚
@blonkasnootch7850 Рік тому
Thank you for the video. I am not sure if it is right to say that humans have build knowledge into the brain about how the world works from birth.. accepting vision input for data processing, detecting objects or separating regions of interest is something every baby has to clearly learn. I have seen that with my children it is remarkable but not there from beginning.
@algorithmicsimplicity Рік тому ⁺¹
Of course children still need to learn how to do visual processing, but the fact that children can learn to do visual processing implies that the brain already has some structure about the physical world built into it. It is quite literally impossible to learn from visual inputs alone, without any prior knowledge.
@jameswustaken3862 4 місяці тому ⁺¹
@lolikobob Рік тому
Make more good videos!
@thechoosen4240 Рік тому
Good job bro, JESUS IS COMING BACK VERY SOON;WATCH AND PREPARE

Наступне

Автоматичне відтворення

Transformer Neural Networks Derived from Scratch