#55 Dr. ISHAN MISRA - Self-Supervised Vision Models

Поділитися
Вставка
  • Опубліковано 30 січ 2025

КОМЕНТАРІ • 55

  • @ChaiTimeDataScience
    @ChaiTimeDataScience 3 роки тому +45

    I can never get enough of the Epic Tim intros! :D

  • @rogerfreitasramirezjordan7188
    @rogerfreitasramirezjordan7188 3 роки тому +18

    This is what youtube for. Clear explanations and a beautiful intro! Tim intro is fundamental for understanding latter

  • @AICoffeeBreak
    @AICoffeeBreak 3 роки тому +18

    Thanks, this episode is 🔥! You ask many questions I had in mind lately.

  • @aurelius2515
    @aurelius2515 3 роки тому +5

    This was definitely one of the better episodes - covered a lot of ground in some good detail with excellent content and good guiding questions and follow-up questions.

  • @tinyentropy
    @tinyentropy 2 роки тому +1

    You guys are so incredible. Thank you so much. We appreciate this every single second. ☺️☺️☺️

  • @beliefpropagation6877
    @beliefpropagation6877 3 роки тому +2

    Thank you for acknowledging the serious problems of calling images from Instagram "random", as is claimed in the SEER paper!

  • @Self-Duality
    @Self-Duality 2 роки тому +1

    Diving deep into this topic myself! So complex yet elegant… 🤔🤩

  • @maltejensen7392
    @maltejensen7392 3 роки тому

    Such high quality content, so happy I found this channel!

  • @yuviiiiiiiiiiiiiiiii
    @yuviiiiiiiiiiiiiiiii 3 роки тому +1

    Here from Lex Fridman's shout out in his latest interview with Ishan Misra.

  • @minma02262
    @minma02262 3 роки тому +1

    My gawd. I love this episode!!!

  • @strategy_gal
    @strategy_gal 3 роки тому

    What a very interesting topic! It's amazing to know why these vision algorithms actually work!

  • @drpchankh
    @drpchankh 3 роки тому

    Great episode and discussion! I think this discussion should also include GAN latent discovery discussion. Unsupervised learning is what every DS nirvana in production. On a side note, modern GAN can potentially span multi-domain though current works mainly are centered on single domain dataset area like Face, Bedroom etc. The latent variables or feature spaces are discovered in an unsupervised fashion by the networks though much work remains to be discovered for better encoder and generator/discriminator architecture. Current best model can reconstruct scene with different view angles, different lightings, different colours etc BUT they still CANNOT conjure up a structurally meaningful texture/structure of the scene, e.g. bed, table, curtain gets contorted beyond being a bed, table. ... It will be interesting to see if latent features discovered in GAN can help in unsupervised learning too.

    • @drpchankh
      @drpchankh 3 роки тому

      GANs are unsupervised learning algorithms that use a supervised loss as part of the training :)

  • @valkomilev9238
    @valkomilev9238 3 роки тому

    I was wondering if quantum computing will help with the latent variables mentioned at 1:24:54

  • @ayushthakur736
    @ayushthakur736 3 роки тому +1

    Loved the episode. :)

  • @mfpears
    @mfpears 3 роки тому

    23:00 The tendency of mass to clump together and increase spatial and temporal continuity...

  • @sugamtyagi101
    @sugamtyagi101 3 роки тому +3

    An agent always has a goal. No matter how broad or big, the data samples that it will collect from real world will be skewed towards that broader goal. So data samples collected by a such an agent will also have an inductive bias. Therefore collection of data is never completely disentangled from the task. So even if you pose a camera on a monkey or a snail, there will be a pattern to the data (i.e.. bias) that is collected.
    On the contrary to this approach of say taking completely random samples of images, say generated by a camera, which is parameterized by it's position (in the world) and view direction which are generated by a random number generator, will have very uniform distribution. But it that sense, is that even intelligence ?
    I think any form of intelligence ultimately imbues some sort of intrinsic bias. Humans beings being the most general intelligence machines and our goals (which is also learnt over time), also collect visual data in a converging fashion with age. Though still very general, humans too have a direction.
    PS. Excellent Video. Thanks for picking this up.

  • @akshayshrivastava97
    @akshayshrivastava97 3 роки тому

    Great discussion!
    A follow-up question, one thing I didn't quite understand (perhaps I'm missing something obvious).....
    with ref. to 6:36, from what I heard/read through the video/paper, these attention masks were gathered from the last self-attention layer of a VIT. DINO paper showed that one of the heads in the last self-attention layer is paying attention to areas that correspond to actual objects in the original image. Kinda seems weird, I'd think that by the time you reach the last few layers, the image representation would have been altered in ways that would make the original image irrecoverable. Would it be accurate to say this implies the original image representation either makes it through to the last layer(s) or it's somehow recovered?

    • @dmitryplatonov
      @dmitryplatonov 3 роки тому

      It is recovered. It traces back where are the inputs which trigger the most attention.

    • @akshayshrivastava97
      @akshayshrivastava97 3 роки тому

      @@dmitryplatonov thanks.

  • @tfaktas
    @tfaktas 2 роки тому

    What software are you using for annotating/presenting the papers?

  • @abby5493
    @abby5493 3 роки тому

    Amazing video 😍

  • @angelvictorjuancomuller809
    @angelvictorjuancomuller809 3 роки тому

    Hi, awesome episode! Can I ask which paper's is the figure in 1:15:51? It's supposed to be DINO but I can't find it in the DINO paper. Thanks in advance!

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  3 роки тому +1

      Page 2 of the DINO paper. Note "DINO" paper full title is "Emerging Properties in Self-Supervised Vision Transformers" arXiv:2104.14294v2

    • @angelvictorjuancomuller809
      @angelvictorjuancomuller809 3 роки тому +1

      ​@@MachineLearningStreetTalk Thanks! I was looking to another DINO paper (arXiv:2102.09281
      ).

  • @himanipku22
    @himanipku22 3 роки тому

    44:23 Is there a paper somewhere that I can read on this?

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  3 роки тому

      You mean the statement from Ishan that you could randomly initialise a CNN and it would already know cats are more similar to each other than dogs? Hmm. The first paper which comes to mind is this arxiv.org/abs/2003.00152 but I think there must be something more fundamental. Can anyone think of a paper?

  • @LidoList
    @LidoList 2 роки тому

    Correction: In 13:29, you said BYOL as Bring Your Own Latent. Actually, it should be Bootstrap Your Own Latent (BYOL) Augmentation technique

  • @nathanaelmercaldo2198
    @nathanaelmercaldo2198 3 роки тому

    Splendid video!
    Really like the intro music. Would anyone happen to know where to find the music used?

  • @zahidhasan6990
    @zahidhasan6990 3 роки тому

    It doesn't matter when I am not around, i.e. what happens in 100 years. - Modified from Mishra.

  • @sabawalid
    @sabawalid 3 роки тому +1

    Is a "cartoon banana" and a "real banana" subtypes of the same category, namely a "banana"? There's obviously some relation between the two, but Ishan Misra is absolutely right, a "cartoon banana" is a different category and is not a subtype of a "banana" (it cannot be eaten, it does not smell or taste like a banana, etc...) Interesting episode, as usual, Tim Scarfe

  • @rubyabdullah9690
    @rubyabdullah9690 3 роки тому

    what if you create a simulation about a first world (when there is no technology etc) and then create an Agent that learn about the environtment make the Agent and World rule as close as possible in real world and then try to learn like the monster architecture of Tesla, but it's unlabelled, it's kinda super duper hard to make, but i think that the best approach to create an Artificial General Intelligence :v

  • @_ARCATEC_
    @_ARCATEC_ 3 роки тому

    It's interesting how useful simple edits like crop, rotation, contrast, edge and curve + the Appearance of dirty pixels within intentionally low resolution images are, while Self learning is being applied.
    🍌🍌🍌😂So true 💓 the Map is not the territory.

  • @MadlipzMarathi
    @MadlipzMarathi 3 роки тому

    here from lex.

  • @shivarajnidavani5930
    @shivarajnidavani5930 3 роки тому +1

    Fake blur is very irritating. Hurts to see

  • @massive_d
    @massive_d 3 роки тому +3

    Lex gang

  • @fast_harmonic_psychedelic
    @fast_harmonic_psychedelic 3 роки тому

    Theres a lot of emphasis on this "us vs them" "Humans vs the machine" themes in your introduction, which i think is excessive and biased . Its not man and machine. It's just us. They are us. We're them.

  • @SimonJackson13
    @SimonJackson13 3 роки тому

    Radix sort O(n)

    • @SimonJackson13
      @SimonJackson13 3 роки тому

      When k < log(n) it's fantastic.

    • @SimonJackson13
      @SimonJackson13 3 роки тому

      For a cube root of bits in range a 6n FILO stack list sort time is indicated.

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  3 роки тому

      We meant that O(N log N) is the provably fastest comparison sort but great call out on Radix 😀

  • @fast_harmonic_psychedelic
    @fast_harmonic_psychedelic 3 роки тому

    machines are just an extension of nature just like a tree, a beehive, or a baby

  • @MachineLearningStreetTalk
    @MachineLearningStreetTalk  3 роки тому

    For those who want to learn more from Ishan and more academic detail on the topics covered in the show today, Alfredo Canziani just released another show twitter.com/alfcnz/status/1409481710618693632 😎