Deformable DETR

Поділитися
Вставка
  • Опубліковано 26 жов 2024

КОМЕНТАРІ • 28

  • @huangshijie3038
    @huangshijie3038 7 місяців тому +4

    Great video!!!
    thank you for making this video
    I was praying to see this 2 years ago lol

    • @makgaiduk
      @makgaiduk  7 місяців тому +1

      I am glad you enjoyed it!

  • @Jacob011
    @Jacob011 Місяць тому +1

    This is an actual explanation. Unlike most of the other channels that purport to "explain" these architectures.

  • @jeffreyheo7120
    @jeffreyheo7120 6 місяців тому +2

    thank you so much for this amazing video! looking forward to more of your content :D

    • @makgaiduk
      @makgaiduk  6 місяців тому

      Glad you enjoyed it! More to come!

  • @leanderheine4135
    @leanderheine4135 2 місяці тому

    Literally the only in depth source other than the "Deformable Convolution Networks" paper. Helped me a lot for my bachelors thesis!

  • @makgaiduk
    @makgaiduk  7 місяців тому

    Check out my next video: reading Deformable DETR source code ua-cam.com/video/3M9mS_3eiaw/v-deo.html

  • @ajeyamandikal2010
    @ajeyamandikal2010 8 місяців тому +2

    Great explanation!!
    Could I request videos covering the object tracking problem, and more specifically models like MOTR?

    • @makgaiduk
      @makgaiduk  8 місяців тому +1

      Certainly! I was hoping to climb up to current state of the art in object detection, and then expand towards more advanced problems like object tracking

    • @ajeyamandikal2010
      @ajeyamandikal2010 8 місяців тому

      @@makgaiduk Great!! Looking forward to it

  • @Taehyoung_Kim
    @Taehyoung_Kim 6 місяців тому +2

    Was really helpful :) keep it up

    • @makgaiduk
      @makgaiduk  6 місяців тому +1

      Glad it helped!

    • @Taehyoung_Kim
      @Taehyoung_Kim 6 місяців тому +1

      @@makgaidukby any chance you plan to go over sota segmentation model as well?

    • @makgaiduk
      @makgaiduk  6 місяців тому +1

      @@Taehyoung_Kim I have that in my plan. I plan to make videos about Bert, CoDetr, Grounding DINO, and probably Mamba for Vision, and then start digging into segmentation models. It will also probably take some time to get up to speed on all concept before doing SOTA. I am making 1 video per week, so we are looking at something like 2-3 months at least

    • @Taehyoung_Kim
      @Taehyoung_Kim 6 місяців тому

      @@makgaiduk nice! I’ll stay tuned

  • @davidro00
    @davidro00 Місяць тому +1

    In the deformable convolution, I still dont get how the "offset branch" is calculating the offset map via a convolution kernel of the same size as the original one. How is its output re arranged to match the specific pixel offsets
    EDIT:
    I think it is the following:
    N refers to the number of kernel elements (eg 9 for a 3x3 kernel) and 2 for x and y offset. So channel 1 and 2 refer to the x and y offsets for the top left position of the kernel.
    Then, the spatial dimensions of the offset map correspond to the current position of the sliding operation of the kernel. Thus, the first 2 channels of the top left value in the offset map determine the x y offsets of the top left kernel item when the kernel is currently in its first position during sliding

    • @makgaiduk
      @makgaiduk  Місяць тому

      Good question. I guess I should do a "deformable convolution" code read

    • @davidro00
      @davidro00 Місяць тому

      @@makgaiduk would be cool, but maybe not that relevant anymore... I also think its written in cuda, as well as they did for deformable attention because of the bilinear interpolation thingy

  • @guillaumehai
    @guillaumehai 2 місяці тому

    This was amazing, nice work - I really appreciate it. Please continue with the vids :)

  • @shaodongwang3029
    @shaodongwang3029 6 місяців тому

    Thank you for this insightful video! The explanations are clear and easy to follow. Love it!
    Regarding the object detection task, especially for detecting stacked or cluttered items, would a DETR-based model be more suitable than YOLO?

    • @makgaiduk
      @makgaiduk  6 місяців тому +1

      By design and reported metrics, more advanced DETR based model like DINO or CoDETR should be better.
      Depending on what sort of data you have, you might also take a look at multi-modal models like OpenAI's "CLIP" or Grounding DINO, they might get better accuracy without finetuning

    • @shaodongwang3029
      @shaodongwang3029 6 місяців тому +1

      @@makgaiduk Got it. Thank you for sharing❤

  • @AhmedEssamFakharany
    @AhmedEssamFakharany 4 місяці тому

    this is awesome !!

  • @rajampetasharath21
    @rajampetasharath21 9 місяців тому

    Great video!!

    • @makgaiduk
      @makgaiduk  9 місяців тому

      Glad it was useful! And thanks for commenting!