Convolution Neural Networks - EXPLAINED

CodeEmporium

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 4 гру 2024

КОМЕНТАРІ • 153

@taihatranduc8613 4 роки тому ⁺⁴
you made me realize there are indeed other UA-camrs "don't really know much about" what they're saying (0:17). You explain the best way in youtube especially about the structure of the CNN
@HafeezUllah 3 роки тому ⁺¹
I had no idea about CNN at all, this was great and given me immense confidence in learning about CNN. Great video. scratch to end explained beautifully.
@prodbreeze 5 місяців тому ⁺¹
YOU HAVE MADE ME ACTALLY LIKE ML DL for the first time
@darasingh8937 3 роки тому ⁺¹
Thanks a lot for not having a superficial touch of the topic. Keep it up!
@sneha_more 2 роки тому
The way you explained made me feel like I didn't know so much about CNN. I wonder when did you read so many papers. Thanks for sharing your knowledge. Helps a lot.
@neillunavat 4 роки тому ⁺¹
You explain better than well established organizations boi!! Keep it up.
@insidiousmaximus 3 роки тому ⁺⁴
mate I have been working as junior AI engineer for over a year now and I have successfully deployed custom built CNNs on nvidia hardware but I am still learning from your videos! Just discovered and watched them all back to back. Best videos I have found and I watch a hell of a lot of videos on this topic! I have also read some hardcore books on it. Your videos are par excellence please keep making them! Would love to see some practical examples, there are many tutorials on how things like segmentation and superpixels WORK but nobody wants to show us how to actually implement them into a custon network and display the results. ie. detect flame or smoke. When it comes to practical solutions nobody really goes beneath the provided API examples! Very frustrating.
@ozancanacar8237 6 років тому ⁺³¹
Thank you so much! Everyone just explaining like : ""So this is convolution and that generates this numbers and this is our feature cubes and you apply pooling and get that... lets jump in to the python code i wrote in 5 weeks but imma explain in 15 seconds". You've explained all these concepts clearly and one by one. Can you make a video about training the CNN, it would be awesome.
@SuperMaDBrothers 2 роки тому
5 weeks? Nah bro they're not as dumb as you are lol. But seriously code is a shit way of explaining something. You should check out lectures from universities though, this video was pretty shit too
@abhilasht6471 5 років тому ⁺¹⁴
thank you so much for an amazing video even after going through several videos I did not get the concept clear after this video all of my doubts are clear
please make hands-on tutorials it's a humble request, hope to see you soon
small correction @16:40 calculation of 12.5, not 13.5 == (26-2+1)/2 = 12.5
@Bilangumus Рік тому
Still relevant today, thanks.
@xxdxma6700 2 роки тому
Such an amazing video man. The best educational I have watched in a while
@JohnUsp 4 роки тому ⁺⁹
17:00 - From 13x13x32 to conv3x3,64. How the volume/deep of 32 is handle? I understand the result of 11x11x64(filters) but those 32 layers are summed/packed and send to conv3x3x64?
@thomasmarsden1870 3 роки тому ⁺³
lmao I have the same question. pretty sure there are 64, 3*3*32 filters.
@anemoiacApache 5 років тому ⁺¹¹
Should've found this a month ago before i proceeded to try and learn this on the fly and just embarrassed myself in front of my department
@sharpshootoyaj 4 роки тому
This is genuinely a brilliant explanation. Many thanks
@TheRealJackfrog 4 роки тому
Well done! Your voice and method left me wanting a more detailed explanation from you.
@TheRealJackfrog 4 роки тому
Maybe you could give that explanation over a cup of hot chocolate by the fire as we cuddle up, listening to the latest episode of the Lex Fridman Podcast together. We laugh as Lex goes off on some profound tangent about how the human mind is hard to understand. "That's not the only thing that's hard" I think to myself, as you spoon me ever so gently. It's a perfect night. Just you and me, by the fire, as the sky darkens outside the cabin windows. I know that you could never leave me wanting more...
Sorry, I got a paper due in 9 days that I don't want to write.
@smealzzon 5 років тому ⁺¹
Great video, filled in a lot of gaps of understanding.
@mehdisoleymani6012 2 роки тому ⁺¹
Be careful !!! thank you, at 17:28 time of the clip there is a mistake in the equation (13-3+1=1 is true however you have typed 13-2+1=11
@abdulcustom 3 роки тому ⁺¹²
This is a great video. I have one small doubt. @17:11 How do you apply 64 kernels on 32 response maps and get 64 response maps in the next layer?
@gentix8564 2 роки тому ⁺²
remember the depth of each filter is 32. so actually, you apply 64 3*3*32 filters, which is why the output depth is 64.
@npip99 2 роки тому
Thank you for this question! Wondering the same thing!
@npip99 2 роки тому
Ah thank you, so each takes the 3x3 over all of the previous filters.
@ttb1513 Рік тому
17:27 Out.width = 13 - 2 + 1 = 11. Something is wrong here, as 13-2+1 is 12.
@sciWithSaj 3 роки тому
Thanks you very muchh.
Cleared lots of doubts.
@sokiprialajonah4932 4 роки тому ⁺¹
this video really help me alot
@raghavamorusupalli7557 2 роки тому
Location independence is an important feature
@TawhidShahrior 2 роки тому
man you are a genius.
@shrutiprasad3354 4 роки тому
greatest of all the other videos
@CodeEmporium 4 роки тому
Thanks for the compliments :)
@Geoters 6 років тому ⁺¹⁰
Sorry, one moment is not clear. After first convolution (and maxpool) we end up with 13x13x32. When applied conv3x3,64. How did it work? We had 32 layers (feature maps). If we apply conv3x3,64 to each layer we would end up with 32x64 layers. But we end up with only 64 layers. thanks
@CodeEmporium 6 років тому ⁺¹
When we have a 13×13×32 volume, and apply one filter of 5×5×32, then we get a 11×11 feature map. So if we apply 64 such filters to the 13×13×32 volume, we end up with 64 such 11×11 feature maps. In other words, an output of 11 × 11 × 64
@Geoters 6 років тому
Sorry, allow me rephrase the question. At 4:50 you apply the convo filter 3x3x1 to image 5x5x1. Basically just weighting and adding pixels that fit into 3x3 square. How would you apply 3x3x1 filter to image 5x5x2 (2 layers 5x5x1 ) ? Weighting and adding pixels from both layers.
@CodeEmporium 6 років тому
Depth of the filter and the input should be the SAME. 3 x 3 x 1 filter convolves with a 5 x 5 x 1 image as they have the same depth (1). But in the case of 5 x 5 x 2, we NEED to apply a filter of shape 3 x 3 x 2. A 3 x 3 x 1 filter will only convolve with one of the 5 x 5 x 1 layers. We don't take the average of both layers as they represent different data. Hope that makes sense.
@Geoters 6 років тому
15:35. After first convolution and pooling we end up with 13x13x32. So how do we apply convolution 3x3x64 to it? We got 32 layers of 13x13 grid. So now we apply 3x3 convolution filter 64 times and end up with 64 layers. How do we do it since we have 32 layer in the source?
@CodeEmporium 6 років тому ⁺³
We don't apply convolution with a 3 x 3 x 64 filter. We apply convolution for 64 filters of shape 3 x 3 x 32, each with the input 13 x 13 x 32. The result of each convolution will be a 11 x 11 output. Since we have 64 such convolution operations, we end up with 11 x 11 x 64. Just note the OUTPUT depth is equal to the number of filters chosen for convolution. And the depth of filter is equal to the depth of INPUT.
@swedenontwowheels Рік тому
very well explained! good job! thank you so much for putting the effort in this video!
@CodeEmporium Рік тому
Thanks so much!
@fahnub 2 роки тому
this is just so good. thank you for this.
@ahmedsabbir5862 4 роки тому ⁺²
@17.25 , Output (width) = 13-3+1/1. So the result will be 11
@CodeEmporium 4 роки тому ⁺²
You are right. Will like this so others can see it. Nice catch!
@ahmedsabbir5862 4 роки тому
@@CodeEmporium You're welcome. You should do some tutorials on Kaggle Problem solving, it will be helpful.
@manishsharma2211 4 роки тому
Bang on. Explained very good
@SuryadiputraLiawatimena 6 років тому ⁺⁸
Please explain again why we have 32 and 64 layers (feature maps)? from where these number, are they calculated or just pick numbers? thanks.
@manishsharma2211 4 роки тому
Sir. It depends how many feature vector do you need. These num are majorly used
@ravikumarhaligode2949 3 роки тому
I am also having same query, how to decide how many filters are required
@IndiaNirvana 10 місяців тому
Great videos. One small question at 5:07 how did you select the weights of the 3 by 3 filter
@swarajshinde3950 4 роки тому
Yann Lecun is great
@ishaquenizamani9800 2 роки тому
your videos are great please make a video on U-net plz
@artinbogdanov7229 4 роки тому
Great explanation. Thank you!
@MrStudent1978 6 років тому
Excellent explanation
@Hassan.Wahba.97 3 роки тому ⁺¹
I just noticed that we round up when pooling, we don't floor. cause (26 - 2 + 1)/2 is 12.5 not 13.5
@psychotropicalfunk 2 роки тому
7 months later but I noticed the same. Either that or by mistake calculated using the first output and took 28 instead of 26: (28-2+1)/2 = 13.5
@manoharrengasamy4174 2 роки тому
Thanks,good explanation @ filters. can you refer links :how filters/kernels prepared ?.For a object how many filters minimum required?, development and updation of filter upto latest yolo model
@mohammedbenaissa1278 24 дні тому
I have never understood cnn like I do after this video.
@konstantin7596 Рік тому
I think at 16:32 the +1 should be outside the fraction in the end again?
4 роки тому
you provide references, thank you very much. yours videos is great.
@himanshusrihsk4302 5 років тому
Please make a video on visual question answering
@robertcohn8858 4 роки тому
I think the value of this video is not so much that you will be able to sit down and use CNN from the get-go. Rather, it demonstrates some of the key concepts quite well (convolving layers for example). Looking at the final example is helpful and should probably be viewed several times to get the full meaning. But in all, the video is - when used with other information sources - a good start to learning CNN.
@인나-h2f 4 роки тому
thank u, teacher
@reasoning9273 Рік тому
Actually, CNNs were introduced bit earlier. I recall it was LeCun's 1989 paper.
@sujithtumma6754 2 роки тому
Awesome explanation. Loved it. Just a little correction , at 17:24 I think "hwidth" is 3 not 2 .
@CodeEmporium 2 роки тому
Thanks for the catch! Yeah there are definitely a few typos here that you and some others called out. (Also thanks for the compliments) :)
@GKS225 3 роки тому
Awesome video! Keep it up!
@scientistgeospatial 5 років тому
Well done! Thanks buddy.
@honeyrulesintheworld 2 роки тому
hi can you tell me how to find confusion matrix for image retrival using CNN?
@miladmfarid 3 роки тому ⁺¹
16:47 you explained the pooling width output and in the equation used 26-2+1/2 which will be 12.5 but you said it will be 13.5 ! and I don't know how you get to 13 ? can you please explain?
@anjanichowdaryoleti5425 3 роки тому
{[Filter length - pooling value length]÷stride} +1 formula
@anjanichowdaryoleti5425 3 роки тому
Then {[26 - 2]÷2} + 1 =13
@ocnarfchan4857 4 роки тому
How does back propagation work for Convolutional Neural Network?
@adam_sporka 4 роки тому
Thank you very much!
@natjimoEU 4 роки тому
great video mate.
@SkullcandygirlSuchi 3 роки тому
Hey can I get the whole content with diagram
@rangaeeee 3 роки тому
About CNNs url is broken ... Pls update the latest one
@alexfourie6491 4 роки тому
Nice video, quick question though. How do you determine the weights in each filter? I would assume they are randomly assigned like the weights in a normal neural network on the first feed-forward pass.
Follow up question:
How would one then go about updating the weights in each filter?
Thank you
@Nuns341 2 роки тому
how is h-height change from 3 to 2?
@yashpandit832 4 роки тому ⁺¹
One doubt: In the last image shown will what will the width of each filter be in the second conv. layer? My understanding is that it will be 32 as the input width is 32 i.e. the filter of 3x3x32. Am I right or is there something wrong I have understood? Plz help.
@prateeksasan9759 4 роки тому ⁺¹
i have the same question. have you figured it out?
@anandachetanelikapati6388 4 роки тому
May I know how to calculate the input, output and learnable parameters in the following case?
Assumptions:
- Input size is (32, 32, 3)
- No padding for all convolutions
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Layer Type Kernel Stride Neurons/feature maps input size output size No. of parameters
-------------------------------------------------------------------------------------------------------------------------------------------------------------
1 Conv (3, 3) (1, 1) 16 (32, 32, 3)
2 Pool (2, 2) (2, 2) 16
3 Conv (5, 5) (1, 1) 32
4 Pool (2, 2) (2, 2) 32
5 Conv (3, 3) (1, 1) 64
6 Dense -- -- 128
7 Dense -- -- 2
--------------------------------------------------------------------------------------------------------------------------------------------------------------
thank you
@GamingGleeSquad 5 років тому ⁺¹
Why is the Filter size 3x3 @8: 06? Can we take some different size for the Filter?
@nikolasdrn 5 років тому
Yes, you can
@giahuytrinh7195 2 роки тому
ty
@abhijitmahapatra8024 4 роки тому
Hello AJ, today I discovered your channel( subscribed long back but never explored this much) and guess what you provide much simple intuition of topics that’s hard to grasp within minutes. Can you do the same for some Machine learning part like ARIMA and other predictive models..!! Anyhow great content. Really appreciate your effort and knowledge.
@CodeEmporium 4 роки тому ⁺¹
Ive been playing around with time series models recently too. Not sure if there is enough drive for a video at this time. But will definitely keep this in mind
@abhijitmahapatra8024 4 роки тому
CodeEmporium That would be a great help. thanks for the reply AJ can’t thank enough for your efforts.
@videoinfluencers3415 4 роки тому ⁺¹
Whoaa!!!!
@bankawat1 4 роки тому
good one
@malihafarahmand75 4 роки тому
how to calculate 512 and 512 dense
@louerleseigneur4532 4 роки тому
merci
@mohammedhassan7770 5 років тому
Good job, thanks.
@sambarajuchiluveru8444 6 років тому ⁺¹
hello dear, thank you for video i have question how to deal with pooling in one dimensional input case?
@elrosspangue7443 6 років тому
Question, why is there an increase of kernels for every convolution layer and where are those kernels coming from? What is the basis of those kernels?
@CodeEmporium 6 років тому ⁺²
The network tries to understand features of the input (image). The shallower layers extract high level features (edges, strokes, shadowing, texture, etc). The deeper we go, lower level features are extracted (could by anything. Most likely not human interpretable). Such lower level features are more complex. Hence we need more parameters to learn them. So the deeper we go, the more kernels we use.
@elrosspangue7443 6 років тому
@@CodeEmporium Follow up question, where can I get the parameters? What is the basis of these parameters? Are parameters and features the same?
Just also wanna give appreciation and thanks to your videos and answer! The backstory of this questions is, me and my thesismates are creating a CNN model that revolves on genre classification with some enhancement of new techniques and methodologies. This video was actually our basis from learning how CNN works and it's specifics in terms of layers - from nothing to almost intuitively knowing the basics.
@krishnamishra8598 4 роки тому
why do we use convolution ??? why not just simple ANN in case of image ?? main question is what is need of convolution in CNN?? please Answer....
@amithm3 3 роки тому
ANN takes 1D input and thus loses the spatial details of the image, but in cnn those are extracted and presented to ANN in a more meaningful and trainable manner
@mpcr9799 4 роки тому
I know how a filter in a Convolutional Neural Network "scans" the input image and multiplies the values of the kernel with the corresponding receptive field in the input image and adds it all up to get a new pixel in the output activation map. But Im unsure how the numbers in a filter is decided.
Is the kernel a patch from the image that is chosen? Like a 5x5 patch of the image that the network must decide to be good to be used as a filter? Or are they random numbers that backpropagation will soon change to fit best with the data? And would these numbers in the filter be considered as the weights of the network?
Thanks for any help.
@barnabyroberts7950 3 роки тому
The values in the kernel are randomly initialised and altered via backpropagation. If you know about simple densely connected networks, then you can consider a single weight in this type of network to be analogous to a 2D kernel that convoles a single channel in the input image. If you consider a 3-channel image as the input to a layer, and a single channel as the layer output, then the output (a 2D image) is taken by convolving each input channel with its own K*K kernel and summing (superimposing) the resulting 3 images. This is analagous to a simple densely connected network except each weight in the layer is a K*K kernel rather than a scalar. However it makes more sense to consider a K*K*3 kernel rather than summing 3 K*K kernels for the 3 input channels. If N is the number of input channels, M the number of output channels and K the width of a kernel, then you have K*K*N*M parameters for a single layer.
@changqunzhang1277 2 роки тому
Thank you very much! This is great video containing many helpful information. Really appreciate the time and effort you spent on making this video. Here is a question, when conv 3*3, 64 applied on 13*13Z*32 images, isn't the result 11*11* (64*32)? for each 32 layers, the filters that is 64 times were applied. One more thing, I believe 13-2+1 = 11 is not correct (should be 12) @17:29
@chriswalsh5925 2 роки тому
Yes! I thought the same... it is confusing enough as it is! :D ... maybe a mistake or something not mentioned about how the convolution works?
@nurfaizahmusa496 5 років тому ⁺¹
Great video, this is really helpful and detailed. Loved it!!!
@sathishp6257 6 років тому ⁺¹
17:25 how come h(width) is 2 and after doing arithmetic Out(width) is 11.. and as per my observation while doing conv3x3, 64 kernal size (h (width)) should be 3 right?
@CodeEmporium 6 років тому ⁺¹
When we have a 13×13×32 volume, and apply convolution with one filter of 3×3×32. This will give us an 11×11 feature map (as the stride is 1). Apply 64 such kernels, we get 64 such 11×11 feature maps i.e. a 11×11×64 volume.
@wlxxiii 5 років тому
Mistake in the slide: should be 13 - 3 + 1 = 11
@deepakkumarshukla Рік тому
@@CodeEmporium where does this 3*3*32 filter come from? did I miss something or is something missing in the images shown?
@MustafaHoda 6 років тому
The 32 Filters that are demonstrated at 8:46, are those filters in the other layers behind the first the same or different?
@danishnawaz7869 5 років тому
Thank you!
@LovedbyGod4ever 3 роки тому
Thank u bro
@clearwavepro100 5 років тому
gonna need to subscribe bc multiple videos about audio and cnns ! :) yes!
@MrRameeez 5 років тому
What is dense layer, why it is 512??
@samarpitasnani7996 3 роки тому
can i get the slides of this.
@RKYT0 3 роки тому
hey man,
is it somehow possible to ask you some questions in terms of my master thesis? ;)
@amithm3 3 роки тому
finally the video i wanted, how to convert the deep volume matrix into ANN input. I have one doubt, suppose we have an image of 28x28 pixel and the first cnn layer with 3 kernel, we will get 3 feature maps, now in the next layer if we have "64" kernels how many feature map do we get, is it 64 * 3 or is it just x no of feature maps. if it is only 64 no of maps then how do we convolve the 3 feature maps into 64 feature maps using only 64 kernels, should we sum the 64 * 3 maps we get into 64 maps??
@mdyzma 5 років тому ⁺¹
17:21 your filter in round 2 convolution is (3, 3). So it should be 13-3+1=11. Not 13-2+1, which is 12.
@baskorobaskoro7972 6 років тому
How to set value in filter (kernel)? Is it set by randomized?
@CodeEmporium 6 років тому ⁺¹
Initially, yes. They take on random values, which are later "learned".
@SuryadiputraLiawatimena 6 років тому
how do they 'learned'? do you have this cnn code in Keras?
@amirulsadikin8716 5 років тому
Thank you soo much ...you saved me alot of reading time....
@CodeEmporium 5 років тому
Perfect! Glad it helped
@reggaebin 4 роки тому
@17:25 13-2+1=11 is not correct.
@mnsnliu9317 5 років тому
good
@santhoshkolloju 6 років тому
Hey Can you do intuitive explanation of CNN on text data
@CodeEmporium 6 років тому ⁺¹
Sure. Maybe a future video.
@StevenSmith68828 5 років тому
Where does 32 come from?
@RedShipsofSpainAgain 6 років тому
16:34 shouldn't that be 12.5, not 13.5? (26-2+1)/2 = 12.5
@gh0oo 5 років тому
Yes
@zhenzhen8766 3 роки тому
memo 13:30
@anwarulislam6823 2 роки тому
Someone sending me conversation like AI Chatbot through all of actions in neural networks by inner voice using brain!!! Is it possible or not, if it is than how can I control this thing??
#Thanks in advanced.
@elinaakhmedova9407 6 років тому
Thanks for this video! You are cool, keep going 🤗
@CodeEmporium 6 років тому
Yay! Thanks! Imma keep it up ;)
@sunidhinayak6413 5 років тому
can you please make a video on Keras - container
@XX-vu5jo 3 роки тому
Dude study on your own lol
@XX-vu5jo 3 роки тому
And my fake PhD supervisor don’t even know or understand a single thing about this!!!! Damn those quacks! My country sucks!
@shreyjain6447 3 роки тому ⁺¹
Which country?
@macsenwyn7223 4 роки тому
13-2+1 is not 11 its 12
@Leon-pn6rb 4 роки тому ⁺⁷
poorly explained the layers. The same surface level explanation with no intuition behind it for the core concepts
The easier concepts were explained well but that wasn't why people watch these vids
@bishwasapkota9621 4 роки тому
Poorly explained!! Anyway a good try

Наступне

Автоматичне відтворення