Twelve Labs
Twelve Labs
  • 82
  • 50 080
A Multilingual Multimodal Model and Diffusion-Inspired Text-Video Retrieval | Multimodal Weekly 70
​​​​​In the 70th session of Multimodal Weekly, we had two exciting presentations on multilingual multimodal model and text-video retrieval.
​​​​​​​​✅ Nahid Alam, Karthik Reddy Kanjula, and Surya Guthikonda will present their work Maya, a new multimodal multilingual vision-language model. It is completely open source, open weight and open dataset, designed to handle 8 languages, cultural diversity, and nuanced real-world contexts in vision-language models.
- Connect with Nahid: www.linkedin.com/in/nahidalam/
- Connect with Karthik: www.linkedin.com/in/karthik-reddy-kanjula/
- Connect with Surya: www.linkedin.com/in/surya-guthikonda/
- Maya: github.com/nahidalam/maya
​​​​​​​​​​​​​​​​​​​​​​​​✅ Jiamian Wang will present his work called "Diffusion-Inspired Truncated Sampler for Text-Video Retrieval" - in which he devised a diffusion-inspired iterative alignment process to solve for the multimodal modality gap and achieves encouraging performance.
- Connect with Jiamian: jiamian-wang.github.io/
- DITS: openreview.net/forum?id=SrQua0ATRZ&referrer=%5Bthe%20profile%20of%20Pichao%20WANG%5D(%2Fprofile%3Fid%3D~Pichao_WANG3)
​​​​​​​​
Timestamps:
00:15 Introduction
03:20 Nahid starts
04:35 The challenge
06:05 Maya's solution
07:00 Karthik talks about dataset preparation
09:40 Multilingual pre-train preamble preparation
11:00 Preamble evaluation
11:35 Multilingual pre-train dataset translation
13:25 Surya talks about dataset toxicity analysis
14:05 Text toxicity analysis
15:07 Image toxicity analysis framework
16:05 Toxicity in LLaVA pretrain dataset
17:00 Quality assurance - token repetitions
17:42 Quality assurance - back translate
17:55 Translation verification analysis
18:35 Model training
18:50 Model architecture
20:35 Pre-training LLaVA-1.5 on 8 languages
21:23 Instruction tuning
23:00 Maya demo
24:40 Team Maya
26:00 Q&A with Maya team
35:15 Jiamian starts
35:30 Background - text-video data
36:25 Background - loss function
36:55 Background - evaluation metrics
37:33 Motivation - bridging the modality gap
39:43 Motivation - diffusion model for retrieval task
41:33 Method - DITS (Diffusion-inspired truncated Sampler)
43:10 Experiments - Performance
44:00 Experiments - Model Discussion
47:47 Q&A with Jiamian
Join the Multimodal Minds community to receive an invite for future webinars: discord.gg/CzeUNYr5Bt
Переглядів: 44

Відео

Premiere Plugin for Video Editors and A Framework for AI Video Development | Multimodal Weekly 69
Переглядів 11319 годин тому
​​​​​In the 69th session of Multimodal Weekly, we had two exciting presentations on video editing and video agentic workflow - both built on top of Twelve Labs APIs. ​​​​✅ Daniel Jacobs and Tobias Finkelstein will present SeekSequence, a Premiere plugin for video editors. It has "search" and "discover" features so you can go through your uploaded video, browse through it easily, get inspired, a...
Video Understanding, Video-to-Sound Generation, and Dual-Model Distillation | Multimodal Weekly 68
Переглядів 12914 днів тому
​​​​​In the 68th session of Multimodal Weekly, we had three exciting presentations on video understanding, sound generation, and action classification. ​​​​✅ Ashmal Vayani from the University of Central Florida will present VURF, a general-purpose reasoning and self-refinement framework for video understanding. - Follow Ashmal: ashmalvayani.github.io/ - VURF: arxiv.org/pdf/2403.14743 ​​​​​​​​​​...
Multimodal Benchmarks, Video Prediction, and Multimodal Video Models | Multimodal Weekly 67
Переглядів 17321 день тому
​​​​​In the 67th session of Multimodal Weekly, we had three exciting presentations on multimodal benchmarks, video prediction, and multimodal video models. ​​​​✅ Jieyu Zhang from the University of Washington discussed Task-Me-Anything, a benchmark generation engine which produces a benchmark tailored to a user’s needs. - Follow Jieyu: jieyuz2.github.io/ - Task-Me-Anything: www.task-me-anything....
A Deep Dive into Twelve Labs Embed API for Multimodal Embeddings | Multimodal Weekly 66
Переглядів 211Місяць тому
​​​​In the 66th session of Multimodal Weekly, we had Manish Maheshwari and Hrishikesh Yadav from Twelve Labs to give a masterclass on the newly released Embed API product. Connect with our team: - Manish: www.linkedin.com/in/manishma/ - Hrishikesh: www.linkedin.com/in/hrishikesh-yadav-aa748121a/ Check out the following resources about Embed API: - ​Blog Post: www.twelvelabs.io/blog/introducing-...
Analysis and Insights from Holistic Evaluation on Video Foundation Models | Multimodal Weekly 65
Переглядів 135Місяць тому
​​​​In the 65th session of Multimodal Weekly, we had Lucas Lee from the Twelve Labs Science team to present our recent work on evaluating video foundation models. Connect with Lucas: hyeongminlee.github.io/ Check out the following resources about TWLV-I: ​- Blog Post: www.twelvelabs.io/blog/twlv-i ​- arXiV: arxiv.org/abs/2408.11318 ​- HuggingFace: huggingface.co/papers/2408.11318 - ​GitHub: git...
Intro to Playground and Prompting | Multimodal Weekly 64
Переглядів 150Місяць тому
​​​​In the 64th session of Multimodal Weekly, we had members from our Growth team, Maninder Saini and Sue Kim, to present an introductory session on the Twelve Labs Playground. Connect with our team: - Maninder: www.linkedin.com/in/manindersaini/ - Sue: www.linkedin.com/in/suekim7/ Check out the following resources about Twelve Labs product: - Playground: playground.twelvelabs.io/ ​- Search: ww...
Video summarization, Compositional video understanding, & Tracking everything | Multimodal Weekly 63
Переглядів 172Місяць тому
​​​​In the 63rd session of Multimodal Weekly, we had three exciting presentations on video summarization, compositional video understanding, and tracking everything. ✅ Siyuan Li from ETH Zurich will discuss his work MASA, a novel method for robust instance association learning, capable of matching any objects within videos across diverse domains without tracking labels. Leveraging the rich obje...
Marengo 2.7: Video Search at Your Fingertips
Переглядів 650Місяць тому
Traditional video search is like finding a needle in a haystack. Marengo 2.7 makes multimodal video search easy and intuitive. Learn more about Twelve Labs at twelvelabs.io
Temporal Action Localization, Hallucination Benchmark, and Attention for ViTs | Multimodal Weekly 62
Переглядів 3452 місяці тому
​​​​In the 62nd session of Multimodal Weekly, we had three exciting presentations on temporal action localization, hallucination benchmark for Vision-Language models, and attention mechanism for Vision Transformers. ​​​​​​​​​​​​​​​​​​​​✅ Benedetta Liberatori from the University of Trento will discuss T3AL, which stands for Test-Time adaptation for Temporal Action Localization. In a nutshell, T3...
LMMs for Segmentation and Stochastic Text Modeling for Text-Video Retrieval | Multimodal Weekly 61
Переглядів 872 місяці тому
​​​​In the 61st session of Multimodal Weekly, we had two exciting presentations on large multimodal models for segmentation and text-to-video retrieval. ​​​​​​​​​​​​​​​​​​​​✅ Tsung-Han (Patrick) Wu from UC Berkeley will discuss his work SESAME, a Novel Problem Setting that requires large multimodal models that can See, Say and Segment. - Follow Patrick: tsunghan-wu.github.io/ - SESAME: see-say-...
Audio-Visual Speech Recognition, Video QA, and Video Motion Customization | Multimodal Weekly 60
Переглядів 1323 місяці тому
​​​​In the 60th session of Multimodal Weekly, we had three exciting presentations on audio-visual speech recognition, video question answering, and video motion customization. ​​​​​​​​​​​​​​​​​​​​✅ Andrew Rouditchenko from MIT CSAIL (the Spoken Language Systems Group) discussed his work Whisper-Flamingo, a multimodal model which integrates visual features into the Whisper speech recognition and...
Unified Video Segmentation and Video Object Segmentation | Multimodal Weekly 59
Переглядів 1343 місяці тому
​​​​In the 59th session of Multimodal Weekly, we had two exciting presentations on video segmentation. ​​​​​​​​​​​​​​​​✅ Dr. Li Minghan from Harvard Medical School discussed her work UniVS - a novel unified video segmentation architecture that uses prompts as queries. - Follow Minghan: sites.google.com/view/minghanli-homepage/academic - UniVS: sites.google.com/view/unified-video-seg-univs ​​​​​...
Enhancing Video Super-Resolution and Benchmarking Multimodal LLMs | Multimodal Weekly 58
Переглядів 963 місяці тому
​​​​In the 58th session of Multimodal Weekly, we had two presentations on video understanding and multimodal LLMs. ​​​​​​​​​​​​​​​​✅ Kai Xu from the National University of Singapore (specifically the Computer Vision and Machine Learning Group) will discuss his work IA-RT - which enhances video super-resolution via implicit resampling-based alignment. - Follow Kai: kai422.github.io/ - IA-RT: git...
Composed Video Retrieval, Consent In Crisis, and Video Annotations at Scale | Multimodal Weekly 57
Переглядів 1704 місяці тому
Composed Video Retrieval, Consent In Crisis, and Video Annotations at Scale | Multimodal Weekly 57
Time-Interval Machine, ID-Aware Movie Descriptions, and Story Summarization | Multimodal Weekly 56
Переглядів 1504 місяці тому
Time-Interval Machine, ID-Aware Movie Descriptions, and Story Summarization | Multimodal Weekly 56
Long-Form Video Reasoning and Question-Answering | Multimodal Weekly 55
Переглядів 2204 місяці тому
Long-Form Video Reasoning and Question-Answering | Multimodal Weekly 55
Visual Insights from Social Data with Phyllo and Twelve Labs | Multimodal Weekly 54
Переглядів 1165 місяців тому
Visual Insights from Social Data with Phyllo and Twelve Labs | Multimodal Weekly 54
Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53
Переглядів 2685 місяців тому
Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53
How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52
Переглядів 1715 місяців тому
How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52
Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51
Переглядів 1655 місяців тому
Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51
Generalized Contrastive Learning and Transforming Video Production | Multimodal Weekly 50
Переглядів 2625 місяців тому
Generalized Contrastive Learning and Transforming Video Production | Multimodal Weekly 50
Single-Step Language Model Alignment & Smaller-Scale Large Multimodal Models | Multimodal Weekly 49
Переглядів 2246 місяців тому
Single-Step Language Model Alignment & Smaller-Scale Large Multimodal Models | Multimodal Weekly 49
Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM | Multimodal Weekly 48
Переглядів 1726 місяців тому
Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM | Multimodal Weekly 48
SpiRit-LM, an Interleaved Spoken and Written Language Model | Multimodal Weekly 47
Переглядів 4116 місяців тому
SpiRit-LM, an Interleaved Spoken and Written Language Model | Multimodal Weekly 47
Enhancing Video Production & Media Search with eMAM and Twelve Labs | Multimodal Weekly 46
Переглядів 1507 місяців тому
Enhancing Video Production & Media Search with eMAM and Twelve Labs | Multimodal Weekly 46
Open-Source LLM Evaluation & Multimodal Models for Audio Processing/Creation | Multimodal Weekly 45
Переглядів 3217 місяців тому
Open-Source LLM Evaluation & Multimodal Models for Audio Processing/Creation | Multimodal Weekly 45
Next-Generation Surgical Insights with SDSC and Twelve Labs | Multimodal Weekly 44
Переглядів 1698 місяців тому
Next-Generation Surgical Insights with SDSC and Twelve Labs | Multimodal Weekly 44
Bring Enterprise Data to Video Foundation Models with MindsDB and Twelve Labs | Multimodal Weekly 43
Переглядів 1538 місяців тому
Bring Enterprise Data to Video Foundation Models with MindsDB and Twelve Labs | Multimodal Weekly 43
Generative Representational Instruction Tuning and Agents for Video Creation | Multimodal Weekly 42
Переглядів 2788 місяців тому
Generative Representational Instruction Tuning and Agents for Video Creation | Multimodal Weekly 42

КОМЕНТАРІ

  • @real_chaorders
    @real_chaorders 6 днів тому

    good model for batch video process

  • @maxlund7704
    @maxlund7704 8 днів тому

    Reminds me of our app called Jumper! It's also available as a Premiere panel

  • @road2nohand
    @road2nohand 25 днів тому

    Great Video! Is it possible to run the retrieval model locally or say in the own cloud to save some additional costs after having the embeddings created? Or is this is a dumb question as one could use any alg. to search for similar vectors with cosine similarity locally?

  • @Meta-Chef
    @Meta-Chef 29 днів тому

    Still learning, but I have promised 64TB Pegasus, 32

  • @musikteca
    @musikteca 2 місяці тому

    is there a demo i can use?

  • @abhishekak9619
    @abhishekak9619 3 місяці тому

    does it yet understand sound in different languages? can it tell sound related stuff? loud banging sound on the door and stuff?

  • @qonitasyahira7042
    @qonitasyahira7042 3 місяці тому

    Is it not available yet?

    • @Twelve_Labs
      @Twelve_Labs 3 місяці тому

      You can try out our product for free at playground.twelvelabs.io!

  • @ChatGPT-ef6sr
    @ChatGPT-ef6sr 4 місяці тому

    Well, what’s the point of showing UA-cam videos being indexed when the link fetch is no longer working in 12 labs

  • @cheneywong660
    @cheneywong660 6 місяців тому

    An excellent expert but i needed to guess while he was saying

  • @YuetianWang
    @YuetianWang 8 місяців тому

    Wonderful sharing! Is it possible to share out of slide please?😍

  • @gladibagsby8645
    @gladibagsby8645 9 місяців тому

    *promosm*

  • @WildlifeProtectionSolutions
    @WildlifeProtectionSolutions 10 місяців тому

    Fantastic. This will help us on many levels.

  • @etozhezhenek9987
    @etozhezhenek9987 11 місяців тому

    зачем вы забанили его в России?

  • @tonykipkemboi
    @tonykipkemboi 11 місяців тому

    Cool demo. Have to check it out.

  • @dusandragovic09srb
    @dusandragovic09srb Рік тому

    Great topic! "Hey Siri, make me 1 Friends movie special for each season, 4k, 8k w/e"... I don't think people realize what's here. #Ai P.S. Holy Wood ua-cam.com/video/eFljUYPeigM/v-deo.html

    • @dusandragovic09srb
      @dusandragovic09srb Рік тому

      "No, scratch that and make it with at least 3 special guest stars per movie, movie at least 2.5h long". Dealer : "Drugs?", Anyone : "No thanks, I have AI, no time".

  • @writeonsaga
    @writeonsaga Рік тому

    Awesome talk on AI and Hollywood!

  • @meherbaba-re5xp
    @meherbaba-re5xp Рік тому

    🥰🥰😘😘😍😍

  • @CharlesMacKay88
    @CharlesMacKay88 Рік тому

    "making a feather fly with magic" seems very difficult to tag. this is an impressive demo

  • @dianechoi2074
    @dianechoi2074 Рік тому

    😍

  • @ВэГэ-я5н
    @ВэГэ-я5н Рік тому

    I can't find this "play with parameters" thing! Where is it?!

  • @CREATIVE3892
    @CREATIVE3892 Рік тому

    Heard imagebind is only available for non-commercial use? Is there any update on it being allowed for commercial usage yet?

    • @Twelve_Labs
      @Twelve_Labs Рік тому

      As far as we are aware of, it is not available for commercial usage yet.

  • @hussainshaik4390
    @hussainshaik4390 Рік тому

    where do we get the slides from the video?

  • @vallab19
    @vallab19 Рік тому

    Humanity needs more and more AI resources invested into the healthcare medical sector, like this.

  • @Ki-sf5kt
    @Ki-sf5kt Рік тому

    "Promo sm" 🎊

  • @saas-space
    @saas-space Рік тому

    Such an informative video, thanks

  • @teuida
    @teuida Рік тому

    😯😲

  • @larryabecid2819
    @larryabecid2819 Рік тому

    Amazing tech!

  • @ch6t6
    @ch6t6 Рік тому

    Kissing a frog

  • @beaver-man
    @beaver-man Рік тому

    The somehow washed out speak from your speaker is very hard to understand, especially if english is not your mother tongue. Please improve the talk and its understanding! Thanx.

  • @ChatGPT-ef6sr
    @ChatGPT-ef6sr Рік тому

    We need a direct link fetch. That's how you're going to become the next revolution which you already are

  • @gf7594
    @gf7594 Рік тому

    Тайна чёрного экрана. Посмотрел.

  • @ErdemYldrmer
    @ErdemYldrmer 2 роки тому

    I'm sorry but the ultimate question is : "What's the answer to life and the universe and everything?" 😆

  • @michaeljoo4699
    @michaeljoo4699 2 роки тому

    Wow so cool