Filter Count - Convolutional Neural Networks

Поділитися
Вставка
  • Опубліковано 2 вер 2022
  • Patreon: / animated_ai
    Learn about filter count and the realistic methods of finding the best values
    My Udemy course on High-resolution GANs: www.udemy.com/course/high-res...

КОМЕНТАРІ • 44

  • @bycloudAI
    @bycloudAI Рік тому +26

    This is channel is UA-cam's undiscovered gold mine, please keep up the amazing content!!

    • @ChrisRamsland94
      @ChrisRamsland94 Рік тому

      for real it's wild how much effort effort he puts into these.

  • @kinvert
    @kinvert Рік тому

    Thanks for another great video!

  • @arjunpalusa9421
    @arjunpalusa9421 Рік тому

    Eagerly waiting for your next videos..

  • @carlluis2045
    @carlluis2045 Рік тому

    awesome videos!!!

  • @marceloamado6223
    @marceloamado6223 9 місяців тому

    Thank you for the video I really stress out about this matter now I am more calm knowing it's a conventional problem and solving it by euristics is the way

  • @KJletMEhandelit
    @KJletMEhandelit 5 місяців тому

    This is wow!

  • @user-tg6dd7qh5l
    @user-tg6dd7qh5l 3 місяці тому

    Best video for the CNN.

  • @d_b_
    @d_b_ Рік тому +8

    3:24 "only 2 categories...so 512 features is enough" - this statement sounds like it comes from familiarity with the problem. Is there something more to it? Did you see that number of features used in past papers? Was it from your own experimentation against 256 or 1024 features? Is there some math that arrives at this? I'd like to understand this better, so any additional color you have on this would be helpful!

    • @animatedai
      @animatedai  Рік тому +7

      You typically want more features than categories. So for something like ImageNET with 1000 categories, 512 wouldn't be enough. You'd want 2048 or higher. But this case only has 2 categories so 512 easily meets that requirement.
      And the exact value of 512 came from NVIDIA's StyleGAN papers, which is what I based that architecture on. I don't remember them giving a reason for that value, but it gave them good results and a higher value wouldn't fit into memory during training on the Google Colab hardware.
      It's more of an art than a science so let me know if that doesn't completely answer your question. I'm happy to answer follow-ups.

    • @d_b_
      @d_b_ Рік тому +1

      @@animatedai Thank you, that helps

  • @buddhadevbhattacharjee1363
    @buddhadevbhattacharjee1363 6 місяців тому

    Please create a tutorial on conv3d as well, and which would better for video processing (conv2d or conv3d)

  • @umamusume1024
    @umamusume1024 2 місяці тому

    When it comes to one-dimensional signals, such as time domain signals collected from speed sensors, what is the difference between visualization of 1D CNN and 2D one? Does it just change the height of the cuboid into 1?
    And, what algorithms do you recommend for deep learning of one-dimensional time domain signals?
    I would really appreciate your reply, because as a Chinese student doing an undergraduate graduation project, I can't find any visualization of 1D CNN on the Chinese Internet.

  • @igorg4129
    @igorg4129 Рік тому

    wow!

  • @whizziq
    @whizziq Рік тому

    I have one question. What does number of features mean? For example the initial image is 512x512x3 (3 in this case are red-green-blue coloros). But what happened in the next layers? What are these 64, 128 and more numbers of features? Why do we need so many instead of just 3? Thanks. Appreciate your videos!

    • @Anodder1
      @Anodder1 Рік тому +5

      You have 3 dimensions, 2 spatial and 1 feature dimension. The 2 spatial dimensions encode where the information is and the feature dimension encodes different aspects under which the information can be interpreted. In the beginning you have the 3 color channels but the next layer has a much larger feature dimension in which each index represents one particular aspect like "how much red-green color contrast is between left and right is at this position". These aspects become more higher-level like "is this a circle", so the feature dimension needs to increase to cover all useful interpretations that could be applicable at that point. This agrees well with the shrinking spatial dimensions because each pixel in a later layer represents a larger area of the original image for which these many higher-level interpretations would be necessary.

  • @lionbnjmn
    @lionbnjmn Рік тому

    hey! do you have any sources on your statements at 2:21(about doubling channels when downsampling) and 2:50 (downsampling units should be followed by dimension-steady units)? currently writing a paper and trying to argue the same point, but i cant find any real research on it :)

    • @animatedai
      @animatedai  Рік тому +1

      It's just a common pattern that I've seen. There's no shortage of examples if you want to cite them, from the original resnet all the way up to modern diffusion architectures.

  • @omridrori3286
    @omridrori3286 Рік тому

    This are animation you also use in the course?

    • @animatedai
      @animatedai  Рік тому +1

      I don't use these in my current course, but I'm planning to incorporate them into a new course that I'm working on now.

  • @SpicyMelonYT
    @SpicyMelonYT Рік тому

    How do you halve the resolution? The only way I can Imagine is to have a kernel that has half the size of the input data plus one. Is that correct to do or is something else happening?

    • @animatedai
      @animatedai  Рік тому +1

      This will be covered in an upcoming video, but to give away the answer: you can pad with "SAME" (in TensorFlow or manually pad the equivalent of it in PyTorch) and use a stride of 2.

    • @SpicyMelonYT
      @SpicyMelonYT Рік тому

      @@animatedai Oh I see, and that is still backpropagation compatible? I guess it would be but I have no clue how to do that little step backwards, I assume just act like the data was always smaller for that step

    • @animatedai
      @animatedai  Рік тому +1

      Yes, that still works fine with backpropagation. Are you working with a library like TensorFlow or PyTorch? If so, they'll handle the backpropagation for you with their automatic differentiation.
      If you're using a kernel size of 1x1, it would work to act like the data was always smaller (specifically you would treat it like you dropped every other column/row of the data and then did a 1x1 convolution with a stride of 1). But for a larger kernel size like 3x3, all of the input data will be used so that won't work.

    • @SpicyMelonYT
      @SpicyMelonYT Рік тому

      @@animatedai Ah I see that makes sense. I am actually building it from scratch in javascript. It is pretty slow but I am doing it do get a better understanding for it and I also find it fun.
      Also thank you for the responses that is really cool. I think what you are doing with these videos is really sleek and useful. I personally would like if you went into more depths about the actual math and numbers but I completely understand that your goal here is not that and to give a more intuitive explanation for people. Keep it up!

    • @animatedai
      @animatedai  Рік тому +2

      Good luck on your javascript CNN project! And thank you; I appreciate your support. The math is something I plan to cover; I even have the rough draft of the script written for a future video that goes over the math in detail. I just wanted to focus on teaching the intuition separately first so that it doesn't get lost in the calculation details.

  • @pi5549
    @pi5549 8 місяців тому

    2:08 Don't think we can go 512x512x3 to 512x512xN if filterSize>1. If filterSize=3 we'd be going to 510x510xN, right? Thought experiment: 5 items, slidingWindowLen 3. 3 slide-positions (123 234 345).

    • @pi5549
      @pi5549 8 місяців тому

      hmm, I suppose a feature can extend beyond the image by a pixel, might even collect useful information that informs it that it's dealing with an edge. Solving a jigsaw puzzle you usually collect the edge-pieces & try to work with them first.

    • @animatedai
      @animatedai  8 місяців тому

      This question is actually the perfect lead-in to my video on padding: ua-cam.com/video/ph4LrdntONo/v-deo.htmlfeature=shared
      That's actually the video that directly follows this one in my playlist on convolution. You can see the full playlist here: ua-cam.com/play/PLZDCDMGmelH-pHt-Ij0nImVrOmj8DYKbB.html&feature=shared

  • @bagussajiwo321
    @bagussajiwo321 Рік тому

    So, if the filter value is 64, that's mean you stack the 512x512's photo 64 times? like you stack that face 64 times? or there are different pixel value for every filter?

    • @bagussajiwo321
      @bagussajiwo321 Рік тому

      take an example like 3x3 matrix with 5 filters
      1 0 1
      0 1 0
      1 0 1
      so this value is 5 times stacked bcs the filter value is 5?

    • @animatedai
      @animatedai  Рік тому +1

      Have you seen my video on the fundamental algorithm? ua-cam.com/video/eMXuk97NeSI/v-deo.html
      You can think of each filter as a different pattern that the algorithm is searching for in the image (or input feature map). Each output value (each cube) represents how closely that area of the input matched the pattern. So you get a 2D output for each filter. And those outputs are stacked depth-wise to form the 3D output feature map. If you have 64 filters, you'll stack 64 of these 2D outputs together.

    • @bagussajiwo321
      @bagussajiwo321 Рік тому

      @@animatedai i see!! thanks for giving me the previous video's link
      sorry for my silly question 😅

  • @superzolosolo
    @superzolosolo Рік тому

    How did you go from 512x512x3 to 512x512x64 while still using a kernel size of 3x3? Wouldnt you have to use a kernel size of 1x1 and have 64 kernels? That is the only way I understand it so far, I'm hoping you explain it later on. Other than that this is super helpful thank you so much 🙏

    • @animatedai
      @animatedai  Рік тому

      You're correct that you need 64 kernels, but the size of the kernels doesn't matter. It's fine to have a kernel size of 3x3 and 64 kernels.

    • @superzolosolo
      @superzolosolo Рік тому

      @@animatedai I see now, thanks for clarifying

    • @adityamurtiadi911plus
      @adityamurtiadi911plus 3 місяці тому

      @@animatedai have you used padding of 1 to keep the dimension of output same ?

  • @tazanteflight8670
    @tazanteflight8670 Рік тому

    Why do your filter examples have a depth of 8 ?

    • @jntb3000
      @jntb3000 5 днів тому

      Because the input has a depth of eight

    • @tazanteflight8670
      @tazanteflight8670 5 днів тому

      @@jntb3000 Eight ... what? And why was 7 insufficient, and why is 9 too much?

    • @jntb3000
      @jntb3000 5 днів тому

      I believe The depth of each kernel must match the depth of the input. In his example, the input has the depth of eight. hence the kernel has the depth of eight.

    • @jntb3000
      @jntb3000 5 днів тому

      @@tazanteflight8670 I believe The depth of each kernel must match the depth of the input. In his example, the input has the depth of eight. hence the kernel has the depth of eight. If the input only has depth of 3 (like RGB colors) then the kernel should have depth of 3 also. I guess we could use kernel depth of just one for all input sizes also.

  • @Looki2000
    @Looki2000 Рік тому

    Imagine the ability to build your own TensorFlow neural network using such 3D visualization.

    • @conficturaincarnatus1034
      @conficturaincarnatus1034 Рік тому

      The 3rd dimension part seems a bit tedious, I believe 2D visualizations is more helpful in practicality, just write down in a text the feature count at the bottom and go to the next.
      And for building neural networks in 2D visualization I've recently found KNIME to be amazing, although you are abstracting entire layers to a single box lmao.