The intro just rocked, as to why CNN. "Humans can do object detection quickly and machines can't" and hence that's where it begins. Amazing... Thanks...
Have been watching several videos to get a high level understanding of CNN, but no luck. However, this is a very good explanation ! Cleared lots of doubt in few minutes. Thank you
Its an easy solution actually. The video is recorded from the other side of the glass board. The video is then flipped horizontally. You can observe the watch appears to be on his right hand but its actually left.
In my eyes , the goal of Convolution is to make the signal invariant to scaling and translation. It acts as a pre-processor of the raw input signal. You could also first pre-process your training set and store it in a file. Then you can use this file and feed it directly to the deep neural network. You don't need the Convolution anymore at training. Another way of making your signal (picture) invariant is to first Fourier Transform it to make it scaling and translation invariant. Next you transform the signal from cartesian to polar coordinates to make it rotational invariant. Finally you Fourier Transform that signal and end up with a fully invariant signal that you can store as a pre-processed Training set.
But CNN makes it possible to sequentially apply more abstract filters that fit the specific objects in the image. I'm not sure if those transformations you named are able to do that, which is taking very complex and abstract patterns into account.
I was looking to understand how to represent a CNN in a way that clearly shows the difference to just dense neural networks. This really helped! thanks!
Amazing explanation! Two quick questions: 1. If each layer of a neural network can recognize more complex / abstract objects, does that mean that deeper neural networks (neural networks with more layers) will always be more powerful, or at least have the potential to be more powerful? 2. Could one say the same about the width of neural networks? Would a neural network with more nodes per layer be able to recognize a larger variety of images?
Both those assumptions are valid, with some caveats. If you have too many nodes in a layer, you're looking for too many features in the data, and you'd virtually memorise the training data after some point, because you're not reducing the dimensionality anymore. If you use too many layers, you're risking vanishing/exploding gradients, and you're making features needlessly complex, which may also lead to overfitting. Besides, there need to be sufficiently complex activation functions between layers to leverage the feature-extracting prowess prowess of each node. If the activation functions are too non-linear, the individual weights become less meaningful, and harder to train. If the activation function is not sufficiently non-linear, you're essentially obtaining the result of single matrix multiplication operation with the computational overhead of multiple operations.
Hi! Have I assumed correctly that in case of using CNNs for image recognition, the deeper the filters go, the more they zoom out on the image? Next logical question is - what type of software is used to analyze test cases (e.g. real houses) and create those filters?
The filter is no more than just a matrix. The discrete convolution is performed in each layer (this is where the name CNN comes from). The filter is refined using training data, just like how you would train a perception, you train the matrix to behave as desired.
So I take the key to building a CNN is on how to build the filters? also, given that the first layer is fragmented, does it mean that the first layer could be of general usage, while the later layers are more application oriented?
Martin, how are the filters for a CNN created? Random? stored in some database? Might there be advantage from specifying filters yourself, particularly if you have expertise with the domain the images are from ?
Hi ,I'm a maths student and I need to do a project. the theme is games and sport. I saw your video and thought why not apply this technique to the world of sports? to discover from the analysis of the players' movements if one is sick. Can you help me to apply CNN and use it well please.
Don't ask him. His explaination is sloppy and incomplete. The convolution operations with the filters produce matrix channels building the tensor. For example after four convolution operations, you should have four matrix channels. The next operation would be a max pooling operation on each matrix channel in the tensor. Please let me know if you have a question.
so by combining the other video of yours. At the end of the the CNN there will be a discriminator which has been trained to know what a house looks like, what an apartment looks like, what a skyscraper looks like and therefore tells you that is a house ?
certo, curiosidade: Se tratando de pessoas gêmeas ou sei lá trigêmeas univitelinos como diferencia-las pela CNN? Outro detalhe com relação aos filtros, suponhamos que temos objetos sobre as retas por exemplo como identifica-las neste processo com tão vastas imagens possíveis de armazena-las?
That's the neat part - you don't manually make those filters. Those filters are learned by the network based on bounding boxes in the annotated training images.
This is too low level and vague for people who need it and too high level and complicated for children, I believe that you should go more in depth to provide more information such as how the convolution works, different activation methods and different types of layers
These videos are for 2 demographics, young adults/teenagers who find AI technology fascinating and want to understand how it works. And for children to spark the flame of the scientist inside them towards AI development when they grow up. The Second reason is the most important.
perfect explanantion. I hate it when people throw difficult terms around. Why can't it be precise and clear such as using a house as an analogy. Well done!
Unbelievably clear and succinct explanations
Thanks for the appreciation, Sunny, that's what we strive for! 🙂
Well said
L.
.
מצורף .
...
❤, . מחלת תינו@@JockGeez
The intro just rocked, as to why CNN. "Humans can do object detection quickly and machines can't" and hence that's where it begins. Amazing... Thanks...
Explained in a very simple way that's easy to understand! Great video!
Bro this dude just wrote mirrored wth. Also thanks for the video! The concept of CNN is a lot more clear to me now. :))
Glad this was useful to you! 👍 As for writing mirrored, here is how we do it 👉 ibm.co/3jnq1st 😉
man i like how you clearly explain your videos
Have been watching several videos to get a high level understanding of CNN, but no luck. However, this is a very good explanation ! Cleared lots of doubt in few minutes. Thank you
Such a likeable person explaining so well, much appreciated! :)
Mans just wrote in perfect handwriting BACKWARDS on the glass and no one is talking about it what the heck
um actually the video
is mirrored
The magic of video editing, he’s a wizard
If you look around, you'll find a video they made to address just this question, everyone who watches IBM videos asks exactly that, I know I did :)
Its an easy solution actually. The video is recorded from the other side of the glass board. The video is then flipped horizontally. You can observe the watch appears to be on his right hand but its actually left.
In my eyes , the goal of Convolution is to make the signal invariant to scaling and translation. It acts as a pre-processor of the raw input signal. You could also first pre-process your training set and store it in a file. Then you can use this file and feed it directly to the deep neural network. You don't need the Convolution anymore at training.
Another way of making your signal (picture) invariant is to first Fourier Transform it to make it scaling and translation invariant. Next you transform the signal from cartesian to polar coordinates to make it rotational invariant. Finally you Fourier Transform that signal and end up with a fully invariant signal that you can store as a pre-processed Training set.
Any citations for elaborating what you said.
But CNN makes it possible to sequentially apply more abstract filters that fit the specific objects in the image. I'm not sure if those transformations you named are able to do that, which is taking very complex and abstract patterns into account.
This channel has some of the best CompSci explanations ! Never been disappointed!
I was smiling to myself the whole time. So simple and succinct! Thank you
I was looking to understand how to represent a CNN in a way that clearly shows the difference to just dense neural networks. This really helped! thanks!
0:42 I cannot get over the fact that this dude just wrote the term CNN backwards so easily and so fast :O
Or maybe he just inverted the video horizontally in post edition
try looking at the video using a mirror ...
He inverted the video. That's why he's writing with his left hand and wearing his clock on the right arm.
@@badbud804yeah, I also mentioned that but it would be very impressive if he could actually do that
the knob/button on the watch (which is typically to the right of the dial/screen) is the most unambiguous clue establishing the video is mirrored.
This is probably the best explained video i've ever watched, you're a great tutor!!!!!😍😍
Fantastic explanation! Very pedagogical and easy to follow. Thank you!
Dear lord this is perfectly chunked information.
Martin, you are a superb teacher. You make learning easy and fun.
This was easy to understand and very concise...Thank you
Amazing explanation!
Two quick questions:
1. If each layer of a neural network can recognize more complex / abstract objects, does that mean that deeper neural networks (neural networks with more layers) will always be more powerful, or at least have the potential to be more powerful?
2. Could one say the same about the width of neural networks? Would a neural network with more nodes per layer be able to recognize a larger variety of images?
Both those assumptions are valid, with some caveats.
If you have too many nodes in a layer, you're looking for too many features in the data, and you'd virtually memorise the training data after some point, because you're not reducing the dimensionality anymore.
If you use too many layers, you're risking vanishing/exploding gradients, and you're making features needlessly complex, which may also lead to overfitting.
Besides, there need to be sufficiently complex activation functions between layers to leverage the feature-extracting prowess prowess of each node. If the activation functions are too non-linear, the individual weights become less meaningful, and harder to train. If the activation function is not sufficiently non-linear, you're essentially obtaining the result of single matrix multiplication operation with the computational overhead of multiple operations.
you are more and more better than my clg faculty thank you for a great a explanation 😍
This explanation was so good. Currently using CNNs for remote sensing applications.
Nice series Marvin 😁
Very excellent explanation ❤
I understood it very well, in case som1 didn't, watch this video after watching 3b1b video on neural networks
best teacher!! 👏
there should be a full course on this neural network taught by Martin
You made it easy to understand. Very helpful. Thanks a lot :)
Very good explaination. Thank you.
I have a question how are the levels of filters are defined ?
amazing as usual.
Thanks. Great learning Video.
Fantastic Video. Is Martin always writing mirrored? I am fastinated by how your video recording works!
Very clear and right-to-the-point explanation! Thank you!
Utterly well done, our IBM ML specialist!
Hi! Have I assumed correctly that in case of using CNNs for image recognition, the deeper the filters go, the more they zoom out on the image?
Next logical question is - what type of software is used to analyze test cases (e.g. real houses) and create those filters?
The filter is no more than just a matrix. The discrete convolution is performed in each layer (this is where the name CNN comes from). The filter is refined using training data, just like how you would train a perception, you train the matrix to behave as desired.
Hello, thank you for the explanation but I still don't understand how the filters are made.
So I take the key to building a CNN is on how to build the filters? also, given that the first layer is fragmented, does it mean that the first layer could be of general usage, while the later layers are more application oriented?
love this explanation ...
Well if the beer videos ever stop Martin you have a career in IT Vlogging 😁
Can we implement this CNN to determine micro-level profiles, i.e., micrometer level?
What would be the difference between the standard convolutional networks and something newer like CLIP?
Martin, how are the filters for a CNN created? Random? stored in some database? Might there be advantage from specifying filters yourself, particularly if you have expertise with the domain the images are from ?
Damn that was crystal clear.
such an easy, clear and to the point explanation! thanks a lot
This guy gives crystal clear explanations. Supremely Clear!
this video hits different if you are currently taking digital image processing course. I feel smart lol
is this what the vision pro uses?
Hi ,I'm a maths student and I need to do a project. the theme is games and sport. I saw your video and thought why not apply this technique to the world of sports? to discover from the analysis of the players' movements if one is sick. Can you help me to apply CNN and use it well please.
Don't ask him. His explaination is sloppy and incomplete. The convolution operations with the filters produce matrix channels building the tensor. For example after four convolution operations, you should have four matrix channels. The next operation would be a max pooling operation on each matrix channel in the tensor. Please let me know if you have a question.
thanks martin for the clear explanations
you are amazing
Identifying, organizing and reaping to thought.
Your tv CAN communicate with you via your neurons producing electromagnetic waves
Explained this video very well - highly recommend! Thank you
so by combining the other video of yours. At the end of the the CNN there will be a discriminator which has been trained to know what a house looks like, what an apartment looks like, what a skyscraper looks like and therefore tells you that is a house ?
certo, curiosidade: Se tratando de pessoas gêmeas ou sei lá trigêmeas univitelinos como diferencia-las pela CNN? Outro detalhe com relação aos filtros, suponhamos que temos objetos sobre as retas por exemplo como identifica-las neste processo com tão vastas imagens possíveis de armazena-las?
clearly understandable 🙏🙏🙏
Highly insightful
Doesn't it require a lot of manual work to make all those filters? Isn't it better to just run everything through a regular neural network?
That's the neat part - you don't manually make those filters. Those filters are learned by the network based on bounding boxes in the annotated training images.
The volume is a bit quiet here.
Great explanation! Great job; thanks!
Machine learning is truly amazing yet it pales into insignificance when compared to the ability of this chap to write backwards.
I cant tell whether you're joking, but I think the video is flipped horizontally
This explanation is good. Thanks. 😊
Great video! Thanks 👍🏼
Thanks really helpful
At last a video that is useful!
What kind of bord do u use to write
See ibm.biz/write-backwards
Superb explaination
great work explaining!
Thank you
Application of successive Convolutional Filters well presented but at a high level only
finally ! bravo. clear and concise
The best explanation ever.
Awesome explanations ! ... thank you for sharing your knowledge ;))
Will the Activation Functions video come?
Great video 🔥
Is he writing backwards...! impressive
No, obviously.
Very good explanation!
can you help me regarding my project "human pose estimation"
Hi Rasel! What sort of help would you need? 🙂
@@IBMTechnology i have to detect human pose estimation through skeletal data extracted from it
Wow such a comprehensive content on CNN!
oh my god, thankyou for the explanation. Easy to understand
Great content
This is too low level and vague for people who need it and too high level and complicated for children, I believe that you should go more in depth to provide more information such as how the convolution works, different activation methods and different types of layers
It is just an introduction. If one wants to learn the details, they can search for textbooks, I believe there are countless available.
Then actually go and study CNNs. This is a brief overview of how they work.
These videos are for 2 demographics, young adults/teenagers who find AI technology fascinating and want to understand how it works. And for children to spark the flame of the scientist inside them towards AI development when they grow up. The Second reason is the most important.
I genuinely needed a 2 minute explanation of this term and a few others. I guess I'm the target audience.
amazing work. thank u!
perfect explanantion. I hate it when people throw difficult terms around. Why can't it be precise and clear such as using a house as an analogy. Well done!
This was so great thank you
More please ☺️☺️
Definitely what we're planning! 😀 In the meantime, feel free to subscribe to get notified of when we post more videos.
Thank you :)
It's just like our brain recognises objects. Can we make conscious using this technique? Probably yes in future
Waiting to learn more from you
thank you :)
thanks sir
clear and concise bigger picture of CNN
Funny guy. Love him
AWESOME! Thanks :)
thanks
Thanks a lot!
All I can think of is... that how good he is in writing everything mirrored....
This man rocks 🤘
Thank you..!!
Wait, that's a house? I thought it was the head of a tin robot.
that was a simple wow,,,,
amazing