Multimodal Machine Learning | Introduction | Part 1 | CVPR 2022 Tutorial

Поділитися
Вставка
  • Опубліковано 10 вер 2024
  • Email at khawar512@gmail.com
    0:00 Introduction
    0:32 Multimodal Al Technologies
    1:49 Multimodal Behaviors and Signals
    3:08 What is a Modality?
    4:24 What is Multimodal?
    4:53 Heterogenous Modalities
    6:09 Dimensions of Heterogeneity - Examples
    9:03 Interconnected Modalities
    10:07 Cross-modal Interactions - A Behavioral Science View
    13:13 Dimensions of Cross-modal Interactions
    17:09 Behavioral Study of Multimodal
    19:38 Multimodal Research Tasks
    23:15 What is Multimodal Machine Learning?
    25:59 Multimodal Technical Challenges - Surveys, Tutorials and Courses
    29:50 Alignment
    32:14 Reasoning
    35:00 Generation
    36:03 Challenge 5: Transference
    37:12 Quantification
    37:57 Core Multimodal Challenges Representation
    Multimodal Learning at CVPR 2022
    ================================
    Balanced Multimodal Learning via On the Fly Gradient Modulation | CVPR 2022
    • Balanced Multimodal Le...
    STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes | CVPR 2022
    • STCrowd: A Multimodal ...
    Dual Key Multimodal Backdoors for Visual Question Answering | CVPR 2022
    • Dual Key Multimodal Ba...
    Egocentric Scene Understanding via Multimodal Spatial Rectifier | CVPR 2022
    • Egocentric Scene Under...
    Expanding Large Pre Trained Unimodal Models With Multimodal Information Injection | CVPR 2022
    • Expanding Large Pre Tr...
    End to End Referring Video Object Segmentation With Multimodal Transformers | CVPR 2022
    • End to End Referring V...
    Multimodal Material Segmentation | CVPR 2022
    • Multimodal Material Se...
    Are Multimodal Transformers Robust to Missing Modality? | CVPR 2022
    • Are Multimodal Transfo...
    Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification | CVPR 2022
    • Multimodal Dynamics: D...
    Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality | CVPR 2022
    • Learnable Irrelevant M...
    MNSRNet: Multimodal Transformer Network for 3D Surface Super Resolution | CVPR 2022
    • MNSRNet: Multimodal Tr...
    Multimodal Token Fusion for Vision Transformers | CVPR 2022
    • Multimodal Token Fusio...
    XYLayoutLM: Layout Aware Multimodal Networks for Visually Rich Document Understanding | CVPR'22
    • XYLayoutLM: Layout Awa...
    MNSRNet: Multimodal Transformer Network for 3D Surface Super Resolution | CVPR'22
    • MNSRNet: Multimodal Tr...
    End to End Referring Video Object Segmentation With Multimodal Transformers | CVPR'22
    • End to End Referring V...
    Egocentric Scene Understanding via Multimodal Spatial Rectifier | CVPR'22
    • Egocentric Scene Under...
    Multimodal Machine Learning | Introduction | Part 1 | CVPR 2022 Tutorial
    • Multimodal Machine Lea...
    Transformer for Vision | Multimodal Transformers for Video | Session 7 | CVPR 2022
    • Transformer for Vision...
    Egocentric Scene Understanding via Multimodal Spatial Rectifier | CVPR 2022
    • Egocentric Scene Under...
    Balanced Multimodal Learning via On the Fly Gradient Modulation | CVPR 2022
    • Balanced Multimodal Le...
    Transformers for Multimodal Self Supervised Learning from Raw Video, Audio and Text | NeurIPS 2021
    • Transformers for Multi...
    Multimodal Few-Shot Learning with Frozen Language Models 🌐 NeurIPS 2021
    • Multimodal Few-Shot Le...
    #machinelearning #computervision #airesearch

КОМЕНТАРІ • 10

  • @Epistemophilos
    @Epistemophilos 10 місяців тому +3

    Beautiful overview of a complex subject. Well done sir!

  • @gunasuu
    @gunasuu Рік тому +2

    Superb Presentation Sir with a detailed explanation of Multimodalities.

  • @matthewjohnsinocruz9468
    @matthewjohnsinocruz9468 Рік тому +2

    Can we get a copy of your presentation?

  • @sidharthbatchu6128
    @sidharthbatchu6128 Рік тому +1

    still I don't get it, what is a modality?

    • @kevindegidon4268
      @kevindegidon4268 Рік тому +3

      I think the best description of modality is a form or channel of information. When you think of the five main senses, different informational formats can speak to a given sense, i.e. picture or written word to sight, spoken word or other sounds to hearing. What I am looking to do is design software that can program and echo aspects of synesthesia (the blending of senses) as a teaching and learning tool.

    • @jbm5195
      @jbm5195 9 місяців тому

      Modality, think of the word mode to make it easier. It is the type of information representation. It is how the information is conveyed. The mode of conveying the information could be textual, pictures, videos, audio etc. The modality of this response is textual. If I add a meme, may be picture or gif.

  • @user-he2xz8sz4s
    @user-he2xz8sz4s Рік тому +1

    Is it possible we can download the slides of this course? Thanks!

    • @sy422326
      @sy422326 Рік тому +4

      Haven't found the slides of this lecture, but there is a similar one on their page: drive.google.com/file/d/1qIYBuYrSW2-e95DL7LndfLFqGkIWFG21/view.

    • @dragonsaige
      @dragonsaige 11 місяців тому

      @@sy422326that’s very helpful, thanks

    • @achronicstudent
      @achronicstudent 10 місяців тому

      @@sy422326 you are the best! Thank you