Nvidia Sana A New AI Model-The GPU Maker Created Their Own Diffusion Model!

Поділитися
Вставка
  • Опубліковано 27 жов 2024

КОМЕНТАРІ • 19

  • @TheFutureThinker
    @TheFutureThinker  4 години тому +3

    Resources:
    nvlabs.github.io/Sana/
    sana-gen.mit.edu/
    arxiv.org/abs/2410.10629
    hanlab.mit.edu/projects/sana

  • @crazyleafdesignweb
    @crazyleafdesignweb 4 години тому +3

    Here are a few key takeaways about NVIDIA’s Sana:
    High-Resolution Image Generation: Sana can generate images up to 4096 × 4096 resolution, making it capable of producing ultra-high-quality visuals.
    Efficiency: It uses a deep compression autoencoder that compresses images 32 times, significantly reducing the number of latent tokens and improving efficiency.
    Linear Diffusion Transformer (DiT): Sana replaces traditional attention mechanisms with linear attention, which is more efficient at high resolutions without sacrificing quality.
    Text-Image Alignment: The model employs a decoder-only small language model (LLM) as the text encoder, enhancing the understanding and alignment of text prompts with generated images.
    Fast and Accessible: Sana can generate a 1024 × 1024 resolution image in less than a second on a 16GB laptop GPU, making high-quality image generation accessible even on consumer-grade hardware.

    • @testales
      @testales 3 години тому

      I'll believe it, when I see it. That's just too good, an open model that generates higher quality images than Flux at a higher resolution and with better prompt understanding while using a lot less resources and hence a lot faster too. That sounds absurd to me. Also it's one thing to generate big image but another to create an image that actually uses it's resolution to the fullest and doesn't just look upscaled.

  • @MrDebranjandutta
    @MrDebranjandutta 4 години тому +1

    About bloody time. No prizes for guessing this wont run on Apple and amd hardware

  • @electronicmusicartcollective
    @electronicmusicartcollective Хвилина тому

    WOW, I cant finish the projects. You always come up with such great news, thank you 👌

  • @Mranshumansinghr
    @Mranshumansinghr Годину тому

    It looks more like an upscaler and enhancer than a Diffusion model.

  • @hmmrm
    @hmmrm Годину тому +2

    STILL flux is better

  • @TheRoninteam
    @TheRoninteam Годину тому

    That is pretty amazing!

  • @insurancecasino5790
    @insurancecasino5790 3 години тому

    Wow. That model is mindblowing and with low Vram? Thanks for the update.

    • @TheFutureThinker
      @TheFutureThinker  3 години тому +1

      Low parameters, low VRAM, able to do something similar to Flux, I take it.

    • @insurancecasino5790
      @insurancecasino5790 2 години тому

      @@TheFutureThinker Word

    • @TheFutureThinker
      @TheFutureThinker  2 години тому

      @insurancecasino5790 yes and word on image , this is awesome , it allow media agency to create banners.

  • @rolarocka
    @rolarocka 4 години тому

    Nice 😍

  • @CharlesLijt
    @CharlesLijt 2 години тому

    10:28 "skin looks natural without shiny platic style"....... Man, please check that image, you can be more wrong

    • @TheFutureThinker
      @TheFutureThinker  Годину тому

      I said compare with Flux , listen the whole thing not some part.
      By the way, run that in Flux, "please check that image, you can be more wrong"

  • @beginnenmetinux
    @beginnenmetinux 2 години тому

    Text in Dutch is much worse then Flux.

  • @aegisgfx
    @aegisgfx 2 години тому

    No one cares about 4k (or 8k or jibbitebillion k) and no one ever Will.

  • @Deadlious
    @Deadlious 2 години тому

    not impressed at all.