Efficient Text-to-Image Training (16x cheaper than Stable Diffusion) | Paper Explained

Поділитися
Вставка
  • Опубліковано 16 гру 2024
  • Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce computational costs for both training and inference by magnitudes. Training on 1024x1024 images is way more expensive than training on 32x32. Usually, other works make use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through its novel design, we achieve a 42x spatial compression.
    If you want to dive in even more into Würstchen here is the link to the paper & code:
    Arxiv: arxiv.org/abs/...
    Huggingface: huggingface.co...
    Github: huggingface.co...
    We also created a community Discord for people interested in Generate AI:
    / discord

КОМЕНТАРІ • 82