82
50 080

Premiere Plugin for Video Editors and A Framework for AI Video Development | Multimodal Weekly 69

1:01:53

Video Understanding, Video-to-Sound Generation, and Dual-Model Distillation | Multimodal Weekly 68

1:01:24

Multimodal Benchmarks, Video Prediction, and Multimodal Video Models | Multimodal Weekly 67

1:04:55

A Deep Dive into Twelve Labs Embed API for Multimodal Embeddings | Multimodal Weekly 66

59:44

Analysis and Insights from Holistic Evaluation on Video Foundation Models | Multimodal Weekly 65

36:17

Intro to Playground and Prompting | Multimodal Weekly 64

35:23

A Multilingual Multimodal Model and Diffusion-Inspired Text-Video Retrieval | Multimodal Weekly 70

In the 70th session of Multimodal Weekly, we had two exciting presentations on multilingual multimodal model and text-video retrieval.
✅ Nahid Alam, Karthik Reddy Kanjula, and Surya Guthikonda will present their work Maya, a new multimodal multilingual vision-language model. It is completely open source, open weight and open dataset, designed to handle 8 languages, cultural diversity, and nuanced real-world contexts in vision-language models.
- Connect with Nahid: www.linkedin.com/in/nahidalam/
- Connect with Karthik: www.linkedin.com/in/karthik-reddy-kanjula/
- Connect with Surya: www.linkedin.com/in/surya-guthikonda/
- Maya: github.com/nahidalam/maya
✅ Jiamian Wang will present his work called "Diffusion-Inspired Truncated Sampler for Text-Video Retrieval" - in which he devised a diffusion-inspired iterative alignment process to solve for the multimodal modality gap and achieves encouraging performance.
- Connect with Jiamian: jiamian-wang.github.io/
- DITS: openreview.net/forum?id=SrQua0ATRZ&referrer=%5Bthe%20profile%20of%20Pichao%20WANG%5D(%2Fprofile%3Fid%3D~Pichao_WANG3)

Timestamps:
00:15 Introduction
03:20 Nahid starts
04:35 The challenge
06:05 Maya's solution
07:00 Karthik talks about dataset preparation
09:40 Multilingual pre-train preamble preparation
11:00 Preamble evaluation
11:35 Multilingual pre-train dataset translation
13:25 Surya talks about dataset toxicity analysis
14:05 Text toxicity analysis
15:07 Image toxicity analysis framework
16:05 Toxicity in LLaVA pretrain dataset
17:00 Quality assurance - token repetitions
17:42 Quality assurance - back translate
17:55 Translation verification analysis
18:35 Model training
18:50 Model architecture
20:35 Pre-training LLaVA-1.5 on 8 languages
21:23 Instruction tuning
23:00 Maya demo
24:40 Team Maya
26:00 Q&A with Maya team
35:15 Jiamian starts
35:30 Background - text-video data
36:25 Background - loss function
36:55 Background - evaluation metrics
37:33 Motivation - bridging the modality gap
39:43 Motivation - diffusion model for retrieval task
41:33 Method - DITS (Diffusion-inspired truncated Sampler)
43:10 Experiments - Performance
44:00 Experiments - Model Discussion
47:47 Q&A with Jiamian
Join the Multimodal Minds community to receive an invite for future webinars: discord.gg/CzeUNYr5Bt

Відео

Premiere Plugin for Video Editors and A Framework for AI Video Development | Multimodal Weekly 69

1:01:53

Premiere Plugin for Video Editors and A Framework for AI Video Development | Multimodal Weekly 69

Переглядів 11319 годин тому

In the 69th session of Multimodal Weekly, we had two exciting presentations on video editing and video agentic workflow - both built on top of Twelve Labs APIs. ✅ Daniel Jacobs and Tobias Finkelstein will present SeekSequence, a Premiere plugin for video editors. It has "search" and "discover" features so you can go through your uploaded video, browse through it easily, get inspired, a...

Video Understanding, Video-to-Sound Generation, and Dual-Model Distillation | Multimodal Weekly 68

1:01:24

Video Understanding, Video-to-Sound Generation, and Dual-Model Distillation | Multimodal Weekly 68

Переглядів 12914 днів тому

In the 68th session of Multimodal Weekly, we had three exciting presentations on video understanding, sound generation, and action classification. ✅ Ashmal Vayani from the University of Central Florida will present VURF, a general-purpose reasoning and self-refinement framework for video understanding. - Follow Ashmal: ashmalvayani.github.io/ - VURF: arxiv.org/pdf/2403.14743 ...

Multimodal Benchmarks, Video Prediction, and Multimodal Video Models | Multimodal Weekly 67

1:04:55

Multimodal Benchmarks, Video Prediction, and Multimodal Video Models | Multimodal Weekly 67

Переглядів 17321 день тому

In the 67th session of Multimodal Weekly, we had three exciting presentations on multimodal benchmarks, video prediction, and multimodal video models. ✅ Jieyu Zhang from the University of Washington discussed Task-Me-Anything, a benchmark generation engine which produces a benchmark tailored to a user’s needs. - Follow Jieyu: jieyuz2.github.io/ - Task-Me-Anything: www.task-me-anything....

A Deep Dive into Twelve Labs Embed API for Multimodal Embeddings | Multimodal Weekly 66

59:44

A Deep Dive into Twelve Labs Embed API for Multimodal Embeddings | Multimodal Weekly 66

Переглядів 211Місяць тому

In the 66th session of Multimodal Weekly, we had Manish Maheshwari and Hrishikesh Yadav from Twelve Labs to give a masterclass on the newly released Embed API product. Connect with our team: - Manish: www.linkedin.com/in/manishma/ - Hrishikesh: www.linkedin.com/in/hrishikesh-yadav-aa748121a/ Check out the following resources about Embed API: - Blog Post: www.twelvelabs.io/blog/introducing-...

Analysis and Insights from Holistic Evaluation on Video Foundation Models | Multimodal Weekly 65

36:17

Analysis and Insights from Holistic Evaluation on Video Foundation Models | Multimodal Weekly 65

Переглядів 135Місяць тому

In the 65th session of Multimodal Weekly, we had Lucas Lee from the Twelve Labs Science team to present our recent work on evaluating video foundation models. Connect with Lucas: hyeongminlee.github.io/ Check out the following resources about TWLV-I: - Blog Post: www.twelvelabs.io/blog/twlv-i - arXiV: arxiv.org/abs/2408.11318 - HuggingFace: huggingface.co/papers/2408.11318 - GitHub: git...

Intro to Playground and Prompting | Multimodal Weekly 64

35:23

Intro to Playground and Prompting | Multimodal Weekly 64

Переглядів 150Місяць тому

In the 64th session of Multimodal Weekly, we had members from our Growth team, Maninder Saini and Sue Kim, to present an introductory session on the Twelve Labs Playground. Connect with our team: - Maninder: www.linkedin.com/in/manindersaini/ - Sue: www.linkedin.com/in/suekim7/ Check out the following resources about Twelve Labs product: - Playground: playground.twelvelabs.io/ - Search: ww...

Video summarization, Compositional video understanding, & Tracking everything | Multimodal Weekly 63

57:37

Video summarization, Compositional video understanding, & Tracking everything | Multimodal Weekly 63

Переглядів 172Місяць тому

In the 63rd session of Multimodal Weekly, we had three exciting presentations on video summarization, compositional video understanding, and tracking everything. ✅ Siyuan Li from ETH Zurich will discuss his work MASA, a novel method for robust instance association learning, capable of matching any objects within videos across diverse domains without tracking labels. Leveraging the rich obje...

Marengo 2.7: Video Search at Your Fingertips

1:08

Marengo 2.7: Video Search at Your Fingertips

Переглядів 650Місяць тому

Traditional video search is like finding a needle in a haystack. Marengo 2.7 makes multimodal video search easy and intuitive. Learn more about Twelve Labs at twelvelabs.io

Temporal Action Localization, Hallucination Benchmark, and Attention for ViTs | Multimodal Weekly 62

1:15:17

Temporal Action Localization, Hallucination Benchmark, and Attention for ViTs | Multimodal Weekly 62

Переглядів 3452 місяці тому

In the 62nd session of Multimodal Weekly, we had three exciting presentations on temporal action localization, hallucination benchmark for Vision-Language models, and attention mechanism for Vision Transformers. ✅ Benedetta Liberatori from the University of Trento will discuss T3AL, which stands for Test-Time adaptation for Temporal Action Localization. In a nutshell, T3...

LMMs for Segmentation and Stochastic Text Modeling for Text-Video Retrieval | Multimodal Weekly 61

57:02

LMMs for Segmentation and Stochastic Text Modeling for Text-Video Retrieval | Multimodal Weekly 61

Переглядів 872 місяці тому

In the 61st session of Multimodal Weekly, we had two exciting presentations on large multimodal models for segmentation and text-to-video retrieval. ✅ Tsung-Han (Patrick) Wu from UC Berkeley will discuss his work SESAME, a Novel Problem Setting that requires large multimodal models that can See, Say and Segment. - Follow Patrick: tsunghan-wu.github.io/ - SESAME: see-say-...

Audio-Visual Speech Recognition, Video QA, and Video Motion Customization | Multimodal Weekly 60

1:09:46

Audio-Visual Speech Recognition, Video QA, and Video Motion Customization | Multimodal Weekly 60

Переглядів 1323 місяці тому

In the 60th session of Multimodal Weekly, we had three exciting presentations on audio-visual speech recognition, video question answering, and video motion customization. ✅ Andrew Rouditchenko from MIT CSAIL (the Spoken Language Systems Group) discussed his work Whisper-Flamingo, a multimodal model which integrates visual features into the Whisper speech recognition and...

Unified Video Segmentation and Video Object Segmentation | Multimodal Weekly 59

59:55

Unified Video Segmentation and Video Object Segmentation | Multimodal Weekly 59

Переглядів 1343 місяці тому

In the 59th session of Multimodal Weekly, we had two exciting presentations on video segmentation. ✅ Dr. Li Minghan from Harvard Medical School discussed her work UniVS - a novel unified video segmentation architecture that uses prompts as queries. - Follow Minghan: sites.google.com/view/minghanli-homepage/academic - UniVS: sites.google.com/view/unified-video-seg-univs ...

Enhancing Video Super-Resolution and Benchmarking Multimodal LLMs | Multimodal Weekly 58

55:31

Enhancing Video Super-Resolution and Benchmarking Multimodal LLMs | Multimodal Weekly 58

Переглядів 963 місяці тому

In the 58th session of Multimodal Weekly, we had two presentations on video understanding and multimodal LLMs. ✅ Kai Xu from the National University of Singapore (specifically the Computer Vision and Machine Learning Group) will discuss his work IA-RT - which enhances video super-resolution via implicit resampling-based alignment. - Follow Kai: kai422.github.io/ - IA-RT: git...

Composed Video Retrieval, Consent In Crisis, and Video Annotations at Scale | Multimodal Weekly 57

1:03:52

Composed Video Retrieval, Consent In Crisis, and Video Annotations at Scale | Multimodal Weekly 57

Переглядів 1704 місяці тому

Composed Video Retrieval, Consent In Crisis, and Video Annotations at Scale | Multimodal Weekly 57

Time-Interval Machine, ID-Aware Movie Descriptions, and Story Summarization | Multimodal Weekly 56

1:03:27

Time-Interval Machine, ID-Aware Movie Descriptions, and Story Summarization | Multimodal Weekly 56

Переглядів 1504 місяці тому

Time-Interval Machine, ID-Aware Movie Descriptions, and Story Summarization | Multimodal Weekly 56

Long-Form Video Reasoning and Question-Answering | Multimodal Weekly 55

1:03:44

Long-Form Video Reasoning and Question-Answering | Multimodal Weekly 55

Переглядів 2204 місяці тому

Long-Form Video Reasoning and Question-Answering | Multimodal Weekly 55

Visual Insights from Social Data with Phyllo and Twelve Labs | Multimodal Weekly 54

41:22

Visual Insights from Social Data with Phyllo and Twelve Labs | Multimodal Weekly 54

Переглядів 1165 місяців тому

Visual Insights from Social Data with Phyllo and Twelve Labs | Multimodal Weekly 54

Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53

57:00

Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53

Переглядів 2685 місяців тому

Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53

How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52

1:06:52

How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52

Переглядів 1715 місяців тому

How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52

Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51

58:37

Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51

Переглядів 1655 місяців тому

Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51

Generalized Contrastive Learning and Transforming Video Production | Multimodal Weekly 50

54:46

Generalized Contrastive Learning and Transforming Video Production | Multimodal Weekly 50

Переглядів 2625 місяців тому

Generalized Contrastive Learning and Transforming Video Production | Multimodal Weekly 50

Single-Step Language Model Alignment & Smaller-Scale Large Multimodal Models | Multimodal Weekly 49

58:02

Single-Step Language Model Alignment & Smaller-Scale Large Multimodal Models | Multimodal Weekly 49

Переглядів 2246 місяців тому

Single-Step Language Model Alignment & Smaller-Scale Large Multimodal Models | Multimodal Weekly 49

Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM | Multimodal Weekly 48

1:00:53

Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM | Multimodal Weekly 48

Переглядів 1726 місяців тому

Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM | Multimodal Weekly 48

SpiRit-LM, an Interleaved Spoken and Written Language Model | Multimodal Weekly 47

58:22

SpiRit-LM, an Interleaved Spoken and Written Language Model | Multimodal Weekly 47

Переглядів 4116 місяців тому

SpiRit-LM, an Interleaved Spoken and Written Language Model | Multimodal Weekly 47

Enhancing Video Production & Media Search with eMAM and Twelve Labs | Multimodal Weekly 46

49:00

Enhancing Video Production & Media Search with eMAM and Twelve Labs | Multimodal Weekly 46

Переглядів 1507 місяців тому

Enhancing Video Production & Media Search with eMAM and Twelve Labs | Multimodal Weekly 46

Open-Source LLM Evaluation & Multimodal Models for Audio Processing/Creation | Multimodal Weekly 45

57:58

Open-Source LLM Evaluation & Multimodal Models for Audio Processing/Creation | Multimodal Weekly 45

Переглядів 3217 місяців тому

Open-Source LLM Evaluation & Multimodal Models for Audio Processing/Creation | Multimodal Weekly 45

Next-Generation Surgical Insights with SDSC and Twelve Labs | Multimodal Weekly 44

50:00

Next-Generation Surgical Insights with SDSC and Twelve Labs | Multimodal Weekly 44

Переглядів 1698 місяців тому

Next-Generation Surgical Insights with SDSC and Twelve Labs | Multimodal Weekly 44

Bring Enterprise Data to Video Foundation Models with MindsDB and Twelve Labs | Multimodal Weekly 43

43:28

Bring Enterprise Data to Video Foundation Models with MindsDB and Twelve Labs | Multimodal Weekly 43

Переглядів 1538 місяців тому

Bring Enterprise Data to Video Foundation Models with MindsDB and Twelve Labs | Multimodal Weekly 43

Generative Representational Instruction Tuning and Agents for Video Creation | Multimodal Weekly 42

54:47

Generative Representational Instruction Tuning and Agents for Video Creation | Multimodal Weekly 42

Переглядів 2788 місяців тому

Generative Representational Instruction Tuning and Agents for Video Creation | Multimodal Weekly 42

КОМЕНТАРІ

@real_chaorders 6 днів тому
good model for batch video process
@maxlund7704 8 днів тому
Reminds me of our app called Jumper! It's also available as a Premiere panel
@road2nohand 25 днів тому
Great Video! Is it possible to run the retrieval model locally or say in the own cloud to save some additional costs after having the embeddings created? Or is this is a dumb question as one could use any alg. to search for similar vectors with cosine similarity locally?
@Meta-Chef 29 днів тому
Still learning, but I have promised 64TB Pegasus, 32
@musikteca 2 місяці тому
is there a demo i can use?
@abhishekak9619 3 місяці тому
does it yet understand sound in different languages? can it tell sound related stuff? loud banging sound on the door and stuff?
@qonitasyahira7042 3 місяці тому
Is it not available yet?
@Twelve_Labs 3 місяці тому
You can try out our product for free at playground.twelvelabs.io!
@ChatGPT-ef6sr 4 місяці тому
Well, what’s the point of showing UA-cam videos being indexed when the link fetch is no longer working in 12 labs
@cheneywong660 6 місяців тому
An excellent expert but i needed to guess while he was saying
@YuetianWang 8 місяців тому
Wonderful sharing! Is it possible to share out of slide please?😍
@gladibagsby8645 9 місяців тому
*promosm*
@WildlifeProtectionSolutions 10 місяців тому
Fantastic. This will help us on many levels.
@etozhezhenek9987 11 місяців тому
зачем вы забанили его в России?
@tonykipkemboi 11 місяців тому
Cool demo. Have to check it out.
@dusandragovic09srb Рік тому
Great topic! "Hey Siri, make me 1 Friends movie special for each season, 4k, 8k w/e"... I don't think people realize what's here. #Ai P.S. Holy Wood ua-cam.com/video/eFljUYPeigM/v-deo.html
@dusandragovic09srb Рік тому
"No, scratch that and make it with at least 3 special guest stars per movie, movie at least 2.5h long". Dealer : "Drugs?", Anyone : "No thanks, I have AI, no time".
@writeonsaga Рік тому
Awesome talk on AI and Hollywood!
@meherbaba-re5xp Рік тому
🥰🥰😘😘😍😍
@CharlesMacKay88 Рік тому
"making a feather fly with magic" seems very difficult to tag. this is an impressive demo
@dianechoi2074 Рік тому
😍
@ВэГэ-я5н Рік тому
I can't find this "play with parameters" thing! Where is it?!
@CREATIVE3892 Рік тому
Heard imagebind is only available for non-commercial use? Is there any update on it being allowed for commercial usage yet?
@Twelve_Labs Рік тому
As far as we are aware of, it is not available for commercial usage yet.
@hussainshaik4390 Рік тому
where do we get the slides from the video?
@vallab19 Рік тому
Humanity needs more and more AI resources invested into the healthcare medical sector, like this.
@Ki-sf5kt Рік тому
"Promo sm" 🎊
@saas-space Рік тому
Such an informative video, thanks
@teuida Рік тому
😯😲
@larryabecid2819 Рік тому
Amazing tech!
@ch6t6 Рік тому
Kissing a frog
@beaver-man Рік тому
The somehow washed out speak from your speaker is very hard to understand, especially if english is not your mother tongue. Please improve the talk and its understanding! Thanx.
@ChatGPT-ef6sr Рік тому
We need a direct link fetch. That's how you're going to become the next revolution which you already are
@gf7594 Рік тому
Тайна чёрного экрана. Посмотрел.
@ErdemYldrmer 2 роки тому
I'm sorry but the ultimate question is : "What's the answer to life and the universe and everything?" 😆
@anthonygiuliani1668 2 роки тому
ua-cam.com/video/oxtx5sVuCoU/v-deo.html
@anthonygiuliani1668 2 роки тому
I had this video in my archive and thought you might enjoy it! : )
@benoliveto5003 Рік тому
42
@gocygo01 Рік тому
Jesus
@michaeljoo4699 2 роки тому
Wow so cool