[CVPR 2024] MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Вставка
- Опубліковано 17 гру 2024
- Abstract
In this paper, we explored Task-Agnostic Pruning of Vision-Language Models, where the goal is to prune ONCE and find a "universal" lottery ticket which successfully transfers to unknown downstream tasks.
Method
We introduce Multimodal Flow Pruning, a method specifically designed for the above setup. MultiFlow relies on two components: the Information Flow Score, which extends weight saliency with neuron saliency and Multimodality-Aware Compression, which ensures a proper balances among modalities when pruning.
Experiments
We experiment with two VLMs (BLIP and XVLM, both in their base version), 3 Vision-Language Tasks (i.e., Image-Text Retrieval, Image Captioning and Visual Question Answering) and 3 sparsities (63%, 75% and 90%). Results show that MultiFlow often surpasses or matches other pruning strategies while using no task objective and being much lighter.
Analysis
We carry out an in-depth analysis, and show that beyond the classical layer-collapse problem of pruning, Vision-Language Models can also suffer from full modality collapse: pruning globally can lead to wiping out an entire modality!
Thanks for checking out this video! If you wanna know more, please check the following resources.
Code: github.com/Far...
Paper: arxiv.org/abs/...