Decentralized, Safe, Multi-agent Motion Planning for Drones Under Uncertainty via Filtered Reinfo...

[03] Perception (Waabi CVPR 24 Tutorial on Self-Driving Cars)

Why Does Diffusion Work Better than Auto-Regression?

I Built 100 Houses And Gave Them Away!

Час на цвинтар ❗️ Кім Чен Ин подарував Путіну надгробок

Who has won ?? 😀 #shortvideo #lizzyisaeva

[CVPR 2024] TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Mitsubishi Electric Research Labs (MERL)

Переглядів 89

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 3 чер 2024
MERL former intern Haomiao Ni presents our paper, "TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models," at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), held in Seattle, Washington June 17-21 2024. The paper was co-authored with MERL researchers Suhas Lohit, Anoop Cherian, Ye Wang, Toshiaki Koike-Akino, and Tim K. Marks, as well as MERL consultant Prof. Bernhard Egger from FAU and Prof. Sharon X. Huang from Penn State University.
Paper: www.merl.com/publications/TR2...
Abstract: Text-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from a given image (e.g., a woman's photo) and a text description (e.g., "a woman is drinking water."). Existing TI2V frameworks often require costly training on video-text datasets and specific model designs for text and image conditioning. In this paper, we propose TI2V-Zero, a zero-shot, tuning-free method that empowers a pretrained text-to-video (T2V) diffusion model to be conditioned on a provided image, enabling TI2V generation without any optimization, fine-tuning, or introducing external modules. Our approach leverages a pretrained T2V diffusion foundation model as the generative prior. To guide video generation with the additional image input, we propose a "Repeat-and-Slide" strategy that modulates the reverse denoising process, allowing the frozen diffusion model to synthesize a video frame-by-frame starting from the provided image. To ensure temporal continuity, we employ a DDPM inversion strategy to initialize Gaussian noise for each newly synthesized frame and a resampling technique to help preserve visual details. We conduct comprehensive experiments on both domain-specific and open-domain datasets, where TI2V-Zero consistently outperforms a very recent open-domain TI2V model. Furthermore, we show that TI2V-Zero can seamlessly extend to other tasks such as video infilling and prediction when provided with more images. Its autoregressive design also supports long video generation. Code will be made available upon publication.
Наука та технологія

КОМЕНТАРІ •

Наступне

Автоматичне відтворення

Decentralized, Safe, Multi-agent Motion Planning for Drones Under Uncertainty via Filtered Reinfo...

Decentralized, Safe, Multi-agent Motion Planning for Drones Under Uncertainty via Filtered Reinfo...

[03] Perception (Waabi CVPR 24 Tutorial on Self-Driving Cars)

[03] Perception (Waabi CVPR 24 Tutorial on Self-Driving Cars)

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

I Built 100 Houses And Gave Them Away!

I Built 100 Houses And Gave Them Away!

Час на цвинтар ❗️ Кім Чен Ин подарував Путіну надгробок

Час на цвинтар ❗️ Кім Чен Ин подарував Путіну надгробок

Who has won ?? 😀 #shortvideo #lizzyisaeva

Who has won ?? 😀 #shortvideo #lizzyisaeva

Incredible magic 🤯✨

Incredible magic 🤯✨

[MERL Seminar Series Spring 2024] The Debate Over 'Understanding' in AI's Large Language Models

[MERL Seminar Series Spring 2024] The Debate Over 'Understanding' in AI's Large Language Models

2022 - Non-Euclidean Doom: what happens to a game when pi is not 3.14159…

2022 - Non-Euclidean Doom: what happens to a game when pi is not 3.14159…

How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile

How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile

[CVPR 2024] Long-Tailed Anomaly Detection with Learnable Class Names

[CVPR 2024] Long-Tailed Anomaly Detection with Learnable Class Names

Master the NEW Perfect ChatGPT Prompt Formula [2024]

Master the NEW Perfect ChatGPT Prompt Formula [2024]

The Man Who Solved the World’s Hardest Math Problem

The Man Who Solved the World’s Hardest Math Problem

[ICASSP XAI-SA 2024] Why does music source separation benefit from cacophony?

[ICASSP XAI-SA 2024] Why does music source separation benefit from cacophony?

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

Урна с айфонами!

Урна с айфонами!

Разбираем ноутбук на ARM! Троттлинг, ограничения и глюки Snapdragon X Elite

Разбираем ноутбук на ARM! Троттлинг, ограничения и глюки Snapdragon X Elite

Забудьте о RX 580 | Тест Nvidia P102, P106 и GTX 1650 Super

Забудьте о RX 580 | Тест Nvidia P102, P106 и GTX 1650 Super

СТРАШНЫЙ ВИРУС НА МАКБУК

СТРАШНЫЙ ВИРУС НА МАКБУК

Игровой Комп с Авито за 4500р

Игровой Комп с Авито за 4500р

My Computer Thinks This OLED is TWO Monitors…

My Computer Thinks This OLED is TWO Monitors…

he followed the finger movements #shortvideo #iphonefold #smartphone

he followed the finger movements #shortvideo #iphonefold #smartphone

Разбираем ноутбук на ARM! Троттлинг, ограничения и глюки Snapdragon X Elite

Разбираем ноутбук на ARM! Троттлинг, ограничения и глюки Snapdragon X Elite