Swin Transformer - Paper Explained

Поділитися
Вставка
  • Опубліковано 17 гру 2024

КОМЕНТАРІ • 30

  • @VedantJoshi-mr2us
    @VedantJoshi-mr2us 6 місяців тому +4

    By far one of the best + complete, SWIN transformer explanations on the entire Internet.

    • @soroushmehraban
      @soroushmehraban  6 місяців тому

      Thanks!

    • @FinalProject-rw1yf
      @FinalProject-rw1yf 6 місяців тому

      @@soroushmehraban Hi sir, could you also explain the FasterViT and GCViT paper...

  • @kerenc91
    @kerenc91 8 днів тому

    Great explanation, thanks!

  • @omarabubakr6408
    @omarabubakr6408 Рік тому

    That's The Most Illustrative Video Of Swin-Transformers on The Internet!

    • @soroushmehraban
      @soroushmehraban  Рік тому

      Glad you enjoyed it 😃

    • @omarabubakr6408
      @omarabubakr6408 Рік тому

      @@soroushmehraban yes abs thx so much, although I Have a Quick Question More Related to PyTorch actually which is in min 12:49 in line 239 in the code 1st what does -1 here means and what does it do exactly with the tensor 2nd from where we get [4,16] the 4 here from where we got it cuz its not mentioned in the reshaping. Thanks in advance.

  • @SizzleSan
    @SizzleSan Рік тому +1

    Thorough! Very comprehensible, thank you.

  • @yehanwasura
    @yehanwasura Рік тому +2

    Really informative, helped me lot to understand many concepts here. Keep up the good work

  • @rohollahhosseyni8564
    @rohollahhosseyni8564 Рік тому

    Very well explained, thank you Soroush.

  • @kundankumarmandal6804
    @kundankumarmandal6804 11 місяців тому

    You deserve more likes and subscribers

  • @antonioperezvelasco3297
    @antonioperezvelasco3297 Рік тому

    Thanks for the good explanation!

  • @symao-ir9vw
    @symao-ir9vw 18 днів тому

    17:15, may I ask why the number at the right bottom of the 3rd swin block is 6?

    • @soroushmehraban
      @soroushmehraban  17 днів тому

      That's a hyperparameter I believe. It's hard to use lots of layers at first and second stage because of the memory constraints we have with 4x4 and 8x8 patches and 32x32 patch at the last stage has the highest patch size (least attention to details). So they used the most at 16x16 patch size instead.

  • @proteus333
    @proteus333 Рік тому

    Amazing video !

  • @symao-ir9vw
    @symao-ir9vw 18 днів тому

    The discussion about patch size at around 16:40 is confusing

    • @soroushmehraban
      @soroushmehraban  17 днів тому

      I was comparing 4x4 swin transformer vs 4x4 ViT. In 4x4 ViT the whole layers have patches of 4x4 pixels so in all layers they have good attention to details. But in swin transformer as we go forward we merge these tokens so we have less attention to details in deep layers (that's why the end layer output is not enough for segmentation).

  • @SaniaEskandari
    @SaniaEskandari Рік тому

    perfect description.

  • @siarez
    @siarez Рік тому

    Great video! Thanks

  • @pradyumagarwal3978
    @pradyumagarwal3978 3 місяці тому

    where is the code that u were referring to?

    • @soroushmehraban
      @soroushmehraban  3 місяці тому

      github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py#L222

  • @akbarmehraban5007
    @akbarmehraban5007 Рік тому

    I enjoy very much

  • @Karthik-kt24
    @Karthik-kt24 5 місяців тому

    very nicely explained thank you! likes are at 314 so didnt hit like it😁subbed instead

  • @dslkgjsdlkfjd
    @dslkgjsdlkfjd 5 місяців тому

    2:43 C would be equal to the number of filters not the number of kernels. In the torch.nn.conv2d operation being performed we have 3 kernels for each input channel and then C number of filters. Each filter having 3 kernels not C number of kernels.