[CVPR 2024] MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

Поділитися
Вставка
  • Опубліковано 17 гру 2024
  • Abstract
    In this paper, we explored Task-Agnostic Pruning of Vision-Language Models, where the goal is to prune ONCE and find a "universal" lottery ticket which successfully transfers to unknown downstream tasks.
    Method
    We introduce Multimodal Flow Pruning, a method specifically designed for the above setup. MultiFlow relies on two components: the Information Flow Score, which extends weight saliency with neuron saliency and Multimodality-Aware Compression, which ensures a proper balances among modalities when pruning.
    Experiments
    We experiment with two VLMs (BLIP and XVLM, both in their base version), 3 Vision-Language Tasks (i.e., Image-Text Retrieval, Image Captioning and Visual Question Answering) and 3 sparsities (63%, 75% and 90%). Results show that MultiFlow often surpasses or matches other pruning strategies while using no task objective and being much lighter.
    Analysis
    We carry out an in-depth analysis, and show that beyond the classical layer-collapse problem of pruning, Vision-Language Models can also suffer from full modality collapse: pruning globally can lead to wiping out an entire modality!
    Thanks for checking out this video! If you wanna know more, please check the following resources.
    Code: github.com/Far...
    Paper: arxiv.org/abs/...

КОМЕНТАРІ •