Vision Transformer and its Applications

Поділитися
Вставка
  • Опубліковано 27 тра 2024
  • Vision transformer is a recent breakthrough in the area of computer vision. While transformer-based models have dominated the field of natural language processing since 2017, CNN-based models are still demonstrating state-of-the-art performances in vision problems. Last year, a group of researchers from Google figured out how to make a transformer work on recognition. They called it "vision transformer". The follow-up works by the community demonstrated superior performance of vision transformers not only in recognition but also in other downstream tasks such as detection, segmentation, multi-modal learning and scene text recognition to mention a few.
    In this talk, Rowel Atienza will go into a deeper understanding of the model architecture of vision transformers. Most importantly, Rowel will focus on the concept of self-attention and its role in vision. Then, he will present different model implementations utilizing the vision transformer as the main backbone.
    Since self-attention can be applied beyond transformers, Rowel Atienza will also discuss a promising direction in building general-purpose model architectures. In particular, networks that can process a variety of data formats such as text, audio, image and video.
    → To watch more videos like this, visit aiplus.training ​←
    Do You Like This Video? Share Your Thoughts in Comments Below
    Also, You can visit our website and choose the nearest ODSC Event to attend and experience all our Trainings and Workshops:
    odsc.com/california/
    odsc.com/apac/
    Sign up for the newsletter to stay up to date with the latest trends in data science: opendatascience.com/newsletter/
    Follow Us Online!
    • Facebook: / opendatasci
    • Instagram: / odsc
    • Blog: opendatascience.com/
    • LinkedIn: / open-data-science
    • Twitter: / odsc
  • Наука та технологія

КОМЕНТАРІ • 24

  • @DrAIScience
    @DrAIScience 26 днів тому

    Very very very nice explanation!!! I like learning the foundation/origin of the concepts where models are derived..

  • @jhjbm1959
    @jhjbm1959 6 місяців тому +3

    This video provides a clear step by step explanation how to get from images to input features for Transformer encoders, which has proven hard to find anywhere else.
    Thank you.

  • @crapadopalese
    @crapadopalese Рік тому +7

    10:46 - this is a mistake; the convolution is not equivariant to scaling - if the bird is scaled, the output of the convolution will not be simply a scaling of the original output. That would only be true if you also rescale the filters.

  • @PrestonRahim
    @PrestonRahim Рік тому +5

    Super helpful. Was very lost on the process from image patch to embedded vector until I watched this.

  • @xXMaDGaMeR
    @xXMaDGaMeR Рік тому +3

    amazing lecture, thank you sir!

  • @rikki146
    @rikki146 Рік тому +1

    20:17 I think the encoder blocks are stacked in parallel fashion rather than sequential?

  • @DrAIScience
    @DrAIScience 26 днів тому

    Do you have a video about beit or dino?

  • @sahil-vz8or
    @sahil-vz8or 11 місяців тому +1

    you said 196 patches in imagenet data. No of matches will depend on the input image size and the patch size. For eg: if the input image is of 400X400 and patch size of 8X8, then no of patches will be (400X400/8X8) = 50X50 =2500.

  • @ailinhasanpour
    @ailinhasanpour Рік тому +4

    thanks for sharing , it was extremely helpful 💯

  • @scottkorman4953
    @scottkorman4953 Рік тому +4

    What exactly is happening in the self-attention and MLP blocks of the encoder module? Could you describe it in a simplistic way?

  • @muhammadshahzaibiqbal7658
    @muhammadshahzaibiqbal7658 Рік тому

    Thanks for sharing.

  • @PRASHANTKUMAR-ze6mj
    @PRASHANTKUMAR-ze6mj Рік тому +1

    thanks for sharing

  • @anirudhgangadhar6158
    @anirudhgangadhar6158 Рік тому

    Great resource!

  • @user-co6pu8zv3v
    @user-co6pu8zv3v 10 місяців тому

    Thank you, sir

  • @DrAIScience
    @DrAIScience 26 днів тому

    Are you the channel owner??

  • @hoangtrung.aiengineer
    @hoangtrung.aiengineer Рік тому

    Thank you for making such a great video

  • @capocianni1043
    @capocianni1043 Рік тому

    Thank you for this genuine knowledge.

  • @mohammedrakib3736
    @mohammedrakib3736 2 місяці тому

    Fantastic Video! Really loved the detailed explanation step-by-step.

  • @liangcheng9856
    @liangcheng9856 9 місяців тому

    awesome

  • @saimasideeq7254
    @saimasideeq7254 6 місяців тому

    thankyou much clearer

  • @improvement_developer8995
    @improvement_developer8995 9 місяців тому +1

    Tax evader 🤮

  • @improvement_developer8995
    @improvement_developer8995 9 місяців тому +1

    🤮