Filter Count - Convolutional Neural Networks

Animated AI

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 30 гру 2024

КОМЕНТАРІ • 52

@bycloudAI 2 роки тому ⁺³⁶
This is channel is UA-cam's undiscovered gold mine, please keep up the amazing content!!
@ChrisRamsland94 Рік тому
for real it's wild how much effort effort he puts into these.
@Oxino-h4d 5 днів тому ⁺¹
hi there bruhh
@afrolichesmain777 5 місяців тому ⁺¹
Its funny you mention that the number of kernels is the least exciting part, my thesis was an attempt on finding a systematic way to reduce the number kernels by correlating them and discarding kernels that “extract roughly the same features”. Great video!
@d_b_ 2 роки тому ⁺⁹
3:24 "only 2 categories...so 512 features is enough" - this statement sounds like it comes from familiarity with the problem. Is there something more to it? Did you see that number of features used in past papers? Was it from your own experimentation against 256 or 1024 features? Is there some math that arrives at this? I'd like to understand this better, so any additional color you have on this would be helpful!
@animatedai 2 роки тому ⁺⁹
You typically want more features than categories. So for something like ImageNET with 1000 categories, 512 wouldn't be enough. You'd want 2048 or higher. But this case only has 2 categories so 512 easily meets that requirement.
And the exact value of 512 came from NVIDIA's StyleGAN papers, which is what I based that architecture on. I don't remember them giving a reason for that value, but it gave them good results and a higher value wouldn't fit into memory during training on the Google Colab hardware.
It's more of an art than a science so let me know if that doesn't completely answer your question. I'm happy to answer follow-ups.
@d_b_ 2 роки тому ⁺¹
@@animatedai Thank you, that helps
@arjunpalusa9421 Рік тому ⁺¹
Eagerly waiting for your next videos..
@nikilragav 5 місяців тому
2:13 how does it stay at the same size? Padding the edges of the original image?
@hieuluc8888 5 місяців тому
0:16 If filters are stored in a 4-dimensional tensor and one of them represents the number of filters, then what does the depth represent?
Місяць тому
it represents the depth of the input tensor
@marceloamado6223 Рік тому
Thank you for the video I really stress out about this matter now I am more calm knowing it's a conventional problem and solving it by euristics is the way
@pi5549 Рік тому
2:08 Don't think we can go 512x512x3 to 512x512xN if filterSize>1. If filterSize=3 we'd be going to 510x510xN, right? Thought experiment: 5 items, slidingWindowLen 3. 3 slide-positions (123 234 345).
@pi5549 Рік тому
hmm, I suppose a feature can extend beyond the image by a pixel, might even collect useful information that informs it that it's dealing with an edge. Solving a jigsaw puzzle you usually collect the edge-pieces & try to work with them first.
@animatedai Рік тому
This question is actually the perfect lead-in to my video on padding: ua-cam.com/video/ph4LrdntONo/v-deo.htmlfeature=shared
That's actually the video that directly follows this one in my playlist on convolution. You can see the full playlist here: ua-cam.com/play/PLZDCDMGmelH-pHt-Ij0nImVrOmj8DYKbB.html&feature=shared
@wwxyz7570 2 місяці тому
Is feature and filter count the same as PyTorch channel size?
@buddhadevbhattacharjee1363 Рік тому
Please create a tutorial on conv3d as well, and which would better for video processing (conv2d or conv3d)
@umamusume1024 9 місяців тому
When it comes to one-dimensional signals, such as time domain signals collected from speed sensors, what is the difference between visualization of 1D CNN and 2D one? Does it just change the height of the cuboid into 1?
And, what algorithms do you recommend for deep learning of one-dimensional time domain signals?
I would really appreciate your reply, because as a Chinese student doing an undergraduate graduation project, I can't find any visualization of 1D CNN on the Chinese Internet.
@krzysztofmaliszewski2589 4 місяці тому
Which approach did you find more effective for your problem? Dense layers or Conv1D layers? Or did you go another way, e.g., LSTMs?
@lionbnjmn Рік тому
hey! do you have any sources on your statements at 2:21(about doubling channels when downsampling) and 2:50 (downsampling units should be followed by dimension-steady units)? currently writing a paper and trying to argue the same point, but i cant find any real research on it :)
@animatedai Рік тому ⁺¹
It's just a common pattern that I've seen. There's no shortage of examples if you want to cite them, from the original resnet all the way up to modern diffusion architectures.
@carlluis2045 Рік тому
awesome videos!!!
@MDTANJIDHOSSAIN-c3y 10 місяців тому
Best video for the CNN.
@kinvert 2 роки тому
Thanks for another great video!
@KJletMEhandelit Рік тому
This is wow!
@whizziq Рік тому
I have one question. What does number of features mean? For example the initial image is 512x512x3 (3 in this case are red-green-blue coloros). But what happened in the next layers? What are these 64, 128 and more numbers of features? Why do we need so many instead of just 3? Thanks. Appreciate your videos!
@Anodder1 Рік тому ⁺⁵
You have 3 dimensions, 2 spatial and 1 feature dimension. The 2 spatial dimensions encode where the information is and the feature dimension encodes different aspects under which the information can be interpreted. In the beginning you have the 3 color channels but the next layer has a much larger feature dimension in which each index represents one particular aspect like "how much red-green color contrast is between left and right is at this position". These aspects become more higher-level like "is this a circle", so the feature dimension needs to increase to cover all useful interpretations that could be applicable at that point. This agrees well with the shrinking spatial dimensions because each pixel in a later layer represents a larger area of the original image for which these many higher-level interpretations would be necessary.
@SpicyMelonYT 2 роки тому
How do you halve the resolution? The only way I can Imagine is to have a kernel that has half the size of the input data plus one. Is that correct to do or is something else happening?
@animatedai 2 роки тому ⁺¹
This will be covered in an upcoming video, but to give away the answer: you can pad with "SAME" (in TensorFlow or manually pad the equivalent of it in PyTorch) and use a stride of 2.
@SpicyMelonYT 2 роки тому
@@animatedai Oh I see, and that is still backpropagation compatible? I guess it would be but I have no clue how to do that little step backwards, I assume just act like the data was always smaller for that step
@animatedai 2 роки тому ⁺¹
Yes, that still works fine with backpropagation. Are you working with a library like TensorFlow or PyTorch? If so, they'll handle the backpropagation for you with their automatic differentiation.
If you're using a kernel size of 1x1, it would work to act like the data was always smaller (specifically you would treat it like you dropped every other column/row of the data and then did a 1x1 convolution with a stride of 1). But for a larger kernel size like 3x3, all of the input data will be used so that won't work.
@SpicyMelonYT 2 роки тому
@@animatedai Ah I see that makes sense. I am actually building it from scratch in javascript. It is pretty slow but I am doing it do get a better understanding for it and I also find it fun.
Also thank you for the responses that is really cool. I think what you are doing with these videos is really sleek and useful. I personally would like if you went into more depths about the actual math and numbers but I completely understand that your goal here is not that and to give a more intuitive explanation for people. Keep it up!
@animatedai 2 роки тому ⁺²
Good luck on your javascript CNN project! And thank you; I appreciate your support. The math is something I plan to cover; I even have the rough draft of the script written for a future video that goes over the math in detail. I just wanted to focus on teaching the intuition separately first so that it doesn't get lost in the calculation details.
@omridrori3286 2 роки тому
This are animation you also use in the course?
@animatedai 2 роки тому ⁺¹
I don't use these in my current course, but I'm planning to incorporate them into a new course that I'm working on now.
@bagussajiwo321 2 роки тому
So, if the filter value is 64, that's mean you stack the 512x512's photo 64 times? like you stack that face 64 times? or there are different pixel value for every filter?
@bagussajiwo321 2 роки тому
take an example like 3x3 matrix with 5 filters
1 0 1
0 1 0
1 0 1
so this value is 5 times stacked bcs the filter value is 5?
@animatedai 2 роки тому ⁺¹
Have you seen my video on the fundamental algorithm? ua-cam.com/video/eMXuk97NeSI/v-deo.html
You can think of each filter as a different pattern that the algorithm is searching for in the image (or input feature map). Each output value (each cube) represents how closely that area of the input matched the pattern. So you get a 2D output for each filter. And those outputs are stacked depth-wise to form the 3D output feature map. If you have 64 filters, you'll stack 64 of these 2D outputs together.
@bagussajiwo321 2 роки тому
@@animatedai i see!! thanks for giving me the previous video's link
sorry for my silly question 😅
@superzolosolo Рік тому
How did you go from 512x512x3 to 512x512x64 while still using a kernel size of 3x3? Wouldnt you have to use a kernel size of 1x1 and have 64 kernels? That is the only way I understand it so far, I'm hoping you explain it later on. Other than that this is super helpful thank you so much 🙏
@animatedai Рік тому
You're correct that you need 64 kernels, but the size of the kernels doesn't matter. It's fine to have a kernel size of 3x3 and 64 kernels.
@superzolosolo Рік тому
@@animatedai I see now, thanks for clarifying
@adityamurtiadi911plus 10 місяців тому
@@animatedai have you used padding of 1 to keep the dimension of output same ?
@tazanteflight8670 Рік тому
Why do your filter examples have a depth of 8 ?
@jntb3000 6 місяців тому
Because the input has a depth of eight
@tazanteflight8670 6 місяців тому
@@jntb3000 Eight ... what? And why was 7 insufficient, and why is 9 too much?
@jntb3000 6 місяців тому
I believe The depth of each kernel must match the depth of the input. In his example, the input has the depth of eight. hence the kernel has the depth of eight.
@jntb3000 6 місяців тому
@@tazanteflight8670 I believe The depth of each kernel must match the depth of the input. In his example, the input has the depth of eight. hence the kernel has the depth of eight. If the input only has depth of 3 (like RGB colors) then the kernel should have depth of 3 also. I guess we could use kernel depth of just one for all input sizes also.
@hieuluc8888 5 місяців тому ⁺¹
I'm having similar questions, I thought there was only 1 filter?
@igorg4129 Рік тому
wow!
@Looki2000 2 роки тому
Imagine the ability to build your own TensorFlow neural network using such 3D visualization.
@conficturaincarnatus1034 Рік тому
The 3rd dimension part seems a bit tedious, I believe 2D visualizations is more helpful in practicality, just write down in a text the feature count at the bottom and go to the next.
And for building neural networks in 2D visualization I've recently found KNIME to be amazing, although you are abstracting entire layers to a single box lmao.

Наступне

Автоматичне відтворення