C 8.7 | Faster RCNN Demo - Pixel Norm, Conv Feature Maps and RPN BBoxes | CNN | Machine Learning

Поділитися
Вставка
  • Опубліковано 23 гру 2024

КОМЕНТАРІ • 39

  • @muneshraghurajvarma9937
    @muneshraghurajvarma9937 3 роки тому

    Thanks a lot for this fantastic explanation

  • @anuj.zest-AnujKhandelwal
    @anuj.zest-AnujKhandelwal 3 роки тому

    At 00:34, for pixel norm, could you please share the link? It is not there in the description.

  • @priyaljain5587
    @priyaljain5587 4 роки тому +1

    this is very good visualization of Faster-RCNN !!!
    Thanks for such a good explaination

  • @furkankaragoz8196
    @furkankaragoz8196 3 роки тому

    I have a question related with the faster r-cnn network, which i am struggled
    -My question is : During the prediction what if my image size is 3000x3000 instead of 600x1000 and feed into network as an input. what will happen? does The faster r-cnn network resize it itself to 600x1000?
    This makes me so complicated.
    im working on tensorflow 1.15
    Sorry if i made a mistake
    Thanks in advance.

  • @abhishekjatram
    @abhishekjatram 4 роки тому

    The video series was great! And thanks for explaining it very clearly and neatly.
    However, I had a doubt, at 4:34 the black box (sliding window at center) is generated by 3x3 conv on 38x50 feature map, as the receptive field at feature map from vgg-16 is 16, when a 3x3 conv is applied on it, the receptive field for the conv becomes 16x3 right? Please correct me if I am wrong.

    • @abhishekjatram
      @abhishekjatram 4 роки тому

      From a comment below, I understood that we are using output of 196 x 196 from vgg, so that could give us 228, but here we are using 38x50 right?

    • @Cogneethi
      @Cogneethi  4 роки тому

      @@abhishekjatram zike.io/posts/calculate-receptive-field-for-vgg-16/
      if we use a 224x224 image as input, then the feature map dim at the last conv layer would be 14x14.
      if we use a 600x800 image, then FM dim will be 38x50.
      Irrespective of the FM dim, the receptive field at any given layer wont change. Here it is 196x196 at the last conv layer.
      That is, if you consider any pixel in the FM of the last conv layer of this network, then, we are effectively looking at a 196x196 patch of the image.
      Since the effective stride is 16 for a 3x3 patch of FM, we will be covering 228x228 patch of the image, as you have pointed out.
      The receptive field only changes if we change the network configuration, (stride, kernel size, number of layers etc).
      Please let me know if I need to elaborate further.

    • @abhishekjatram
      @abhishekjatram 4 роки тому

      @@Cogneethi Ok, got it. The receptive field of VGG-16 at FM is 196x196 and when we move by 1 pixel in FM => we are moving by 16 pixels in image (width(image) / width(FM) = 16 ). So the receptive field of 3x3 conv on FM would be 196 (0,0) + 16 (0,1) + 16 (0,2) = 228 x 228.
      Thank you once again :)

    • @Cogneethi
      @Cogneethi  4 роки тому

      @@abhishekjatram Yes your calculation is right.
      You are Welcome.

    • @anandshrivastava3684
      @anandshrivastava3684 3 роки тому

      @@Cogneethi I might be missing something very fundamental. How an image of 600 X 800 can be an feed into VGG16, when only the image size of 224 X 224 is allowed.

  • @eduardin5214
    @eduardin5214 4 роки тому

    Finally a source that explains it so well... Thanks a lot

  • @zyrn1564
    @zyrn1564 5 років тому

    How did you get the value of 228x228 at 4:46?

    • @Cogneethi
      @Cogneethi  5 років тому

      For this, we need to understand receptive field. But the explanation is long.
      To get the intuition about receptive field, see this: ua-cam.com/video/QyM8c8XK01g/v-deo.html
      The detailed calculation of Receptive Fields for VGG16 can be seen here: zike.io/posts/calculate-receptive-field-for-vgg-16/
      Here, we take the 196x196 output. (we are not using the FC layers of this network)
      The effective stride of the VGG16 network is 16.
      Since, we are using 3x3 conv, the receptive field of this filter is, 196 (output receptive field) +16 (stride of 2nd pixel)+16 (stride of 3rd pixel) = 228.
      So, in total, we have 228x228 receptive field for a 3x3 conv filter.
      For more details see:
      Paper and Github: arxiv.org/pdf/1603.07285.pdf & github.com/vdumoulin/conv_arithmetic
      Please let me know if I need to elaborate further.

  • @luvverma6867
    @luvverma6867 2 роки тому

    Amazing Lecture!!
    Can you please let me know.. that since VGG16 will reduce the feature map to 38x50.. how come sliding window box would be 228x228? Is there a video where you have explained that?

  • @aviseklahiri3864
    @aviseklahiri3864 4 роки тому

    Thanks for the video. Can you just give a slight intution as to why we slide the anchor boxes at a stride of 16 pixels ?

    • @Cogneethi
      @Cogneethi  4 роки тому

      ua-cam.com/video/50-PhoCJEOk/v-deo.html

  • @praveenpl9263
    @praveenpl9263 4 роки тому

    This is really fantastic ,thanks for saving much time!

  • @adityavaishampayan1975
    @adityavaishampayan1975 4 роки тому

    this is the real stuff man

  • @AISHORTS9797
    @AISHORTS9797 4 роки тому

    Hii, where you are expalining the loss function with multiple objects in image??

    • @Cogneethi
      @Cogneethi  4 роки тому

      Hey, the loss function is the same, you just add up the Classification and BBox losses for each ROI proposal. Doesnt matter if there are 2 objects or 3 in your image.
      See: github.com/endernewton/tf-faster-rcnn/blob/0e61bacf863f9f466b54770f35a130514e85cac6/lib/nets/network.py
      def _smooth_l1_loss() & def _add_losses()
      Let me know if it is not clear.

  • @alisherberdimuratov1247
    @alisherberdimuratov1247 4 роки тому +1

    Thank you!

    • @rogerronan2222
      @rogerronan2222 3 роки тому

      pro trick : watch series at Flixzone. Been using it for watching loads of movies recently.

    • @aldenzachariah6892
      @aldenzachariah6892 3 роки тому

      @Roger Ronan yup, I've been using Flixzone} for years myself :D

    • @chaimpayton9166
      @chaimpayton9166 3 роки тому

      @Roger Ronan Definitely, I've been watching on flixzone} for since november myself :D

  • @vikramreddy5631
    @vikramreddy5631 5 років тому

    How do u find their is a object or not out of 17000 how 6000 are selected because their is no selective search only convolution is going to do ?

    • @Cogneethi
      @Cogneethi  5 років тому

      Since we will be using Softmax for Foreground/Background classification in the RPN, we will be getting a score. Using this score, we can sort the Region Proposals. From this sorted list, we pick the top 6000.
      Is it clear now?

  • @Deep_learn
    @Deep_learn 4 роки тому

    Hi thanks for beautiful explanation. I have a small doubt.
    Are you visualizing filter output in last conv layer? Are you using Class Activation Map? How to visualize different filter's heatmap overlayed on original image? When I am trying to visualize, it is showing heatmap of a particular layer (i.e., Block5_Conv1), not individual filter output. If you kindly enlighten me. Thank you in advanced.

    • @Cogneethi
      @Cogneethi  4 роки тому

      I have taken code from here github.com/endernewton/tf-faster-rcnn/
      way back in 2018. Now this repo is updated. So I am not sure if my code mods will work.
      But this is what I did.
      In file test.py
      ---------------------------------------------------------------------------------
      In function: def im_detect(sess, net, im, im_name):
      ...
      ...
      # I added these 2 lines - start
      feat_map = net.extract_head(sess, blobs['data'])
      plot_feat_map(im, feat_map, False, im_name)
      # I added these 2 lines. - end
      # this is the new fn to dump the feature maps overlapped with the image.
      def plot_feat_map(im, feat_map, savefig, im_name):
      print("im shape: {}".format(im.shape))
      print("fm shape: {}".format(feat_map[0].shape))
      fig, ax = plt.subplots(figsize=(12, 12))
      for i in range(0, 512, 50):
      print(i)
      plt.title('Filter ' + str(i))
      ax.imshow(feat_map[0, :, :, i], aspect='auto', cmap="gray", extent=(0, im.shape[1], 0, im.shape[0]))
      ax.imshow(im, alpha=0.3, aspect='equal')
      plt.tight_layout()
      if savefig:
      pth = Path("out/featmap")
      pth.mkdir(parents=True, exist_ok=True)
      plt.draw()
      plt.savefig(pth / (im_name + '_' + str(i) + ".jpg")
      ---------------------------------------------------------------------------------
      As I might have said in the video, I am not sure if my approach to visualize is correct. This is just my hack.
      Nowadays, there are other visualizing libs/tools which might be better and accurate.
      Request you to please check them too. Also, let me know what you find.
      Hope this helps.

  • @arfanwicaksono8590
    @arfanwicaksono8590 5 років тому

    where is the link that explains further about pixel normalization?

    • @Cogneethi
      @Cogneethi  5 років тому

      I seem to have missed the original link, but here are similar references:
      In the repo, you can see the pixel norm related code here:
      github.com/endernewton/tf-faster-rcnn/blob/0e61bacf863f9f466b54770f35a130514e85cac6/lib/model/config.py
      __C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
      &
      github.com/endernewton/tf-faster-rcnn/blob/0e61bacf863f9f466b54770f35a130514e85cac6/lib/utils/blob.py
      def prep_im_for_blob(im, pixel_means, target_size, max_size):
      Some links explaining it:
      forums.fast.ai/t/images-normalization/4058/2
      Here, in Faster RCNN they are basically doing just: (x - x.mean())
      stats.stackexchange.com/a/220970
      arthurdouillard.com/post/normalization/

  • @abhinavnagpal9959
    @abhinavnagpal9959 5 років тому

    Nice demo

  • @RnFChannelJr
    @RnFChannelJr 4 роки тому

    excuse me sir, may i have the code for this explanations ?? thanks a lot before

    • @Cogneethi
      @Cogneethi  4 роки тому

      github.com/endernewton/tf-faster-rcnn
      The code is updated after I made these videos, so there might be some differences. Please check accordingly.