JEPA Architectures - How neural networks learn abstract concepts about images (IJEPA)

Поділитися
Вставка
  • Опубліковано 26 жов 2024

КОМЕНТАРІ • 6

  • @ethansmith7608
    @ethansmith7608 Рік тому +4

    There's something quite interesting about predicting embeddings from embeddings, feels like you give the model an extra degree of freedom in designing its representation space rather than training it on reconstruction and hoping that indirectly you can achieve a nice representation space. Both CLIP and the strategy used for the DALLE prior both also sort of learn to base their predictions based off the position of other embeddings and their continued success makes me think this is a promising area of research

  • @gyahoo
    @gyahoo 2 місяці тому +1

    Great explanation ❤

  • @josephsueke
    @josephsueke 7 місяців тому +2

    nicely explained!

  • @ControllerQuickSwaps
    @ControllerQuickSwaps Рік тому +2

    If "like human's do" just means 'using latent representation' that's definitely just attention grabbing imo. Neverthless, taking prediction to latent space is definitely the right direction.

    • @avb_fj
      @avb_fj  Рік тому +2

      Yeah, I agree! I do think there is an aspect of marketing slogan involved here... but as a concept it does make a ton of sense as a research initiative.

    • @ControllerQuickSwaps
      @ControllerQuickSwaps Рік тому

      @@avb_fj I'm still trying to figure out what part of the idea is actually novel. I brought it up in a lab meeting today and people said that self-supervised loss is already often done in latent space?