05L - Joint embedding method and latent variable energy based models (LV-EBMs)

Поділитися
Вставка
  • Опубліковано 8 чер 2024
  • Course website: bit.ly/DLSP21-web
    Playlist: bit.ly/DLSP21-UA-cam
    Speaker: Yann LeCun
    Chapters
    00:00:00 - Welcome to class
    00:00:39 - Predictive models
    00:02:25 - Multi-output system
    00:06:36 - Notation (factor graph)
    00:07:41 - The energy function F(x, y)
    00:08:53 - Inference
    00:11:59 - Implicit function
    00:15:53 - Conditional EBM
    00:16:24 - Unconditional EBM
    00:19:18 - EBM vs. probabilistic models
    00:21:33 - Do we need a y at inference?
    00:23:29 - When inference is hard
    00:25:02 - Joint embeddings
    00:28:29 - Latent variables
    00:33:54 - Inference with latent variables
    00:37:58 - Energies E and F
    00:42:35 - Preview on the EBM practicum
    00:44:30 - From energy to probabilities
    00:50:37 - Examples: K-means and sparse coding
    00:53:56 - Limiting the information capacity of the latent variable
    00:57:24 - Training EBMs
    01:04:02 - Maximum likelihood
    01:13:58 - How to pick β?
    01:17:28 - Problems with maximum likelihood
    01:20:20 - Other types of loss functions
    01:26:32 - Generalised margin loss
    01:27:22 - General group loss
    01:28:26 - Contrastive joint embeddings
    01:34:51 - Denoising or mask autoencoder
    01:46:14 - Summary and final remarks

КОМЕНТАРІ • 67

  • @COOLZZist
    @COOLZZist 2 роки тому +3

    Love the energy from Prof. Yann LeCun, just from his excitement on the topic and the small smiles he has when he is talking about how fresh this content is, is amazing. Thanks a lot Prof. Alfredo!

    • @alfcnz
      @alfcnz  2 роки тому

      😄😄😄

  • @oguzhanercan4701
    @oguzhanercan4701 3 місяці тому +2

    The most important video all around the internet for comp. vis. researchers. I watch the video several times in a year regularly.

    • @alfcnz
      @alfcnz  3 місяці тому

      😀😀😀

  • @damnit258
    @damnit258 2 роки тому +3

    gold, ive watched the last year's lectures and i'm filling the gaps with this year's ones.

    • @alfcnz
      @alfcnz  2 роки тому +1

      💛🧡💛

  • @lucamatteobarbieri2493
    @lucamatteobarbieri2493 Рік тому +1

    A cool thing about prediction systems is that they can be used also to predict the past, not only the future. For example if you see something falling you both intuitively predict where is going and where it came from.

  • @gonzalopolo2612
    @gonzalopolo2612 2 роки тому +2

    Again @Alfredo Canziani thank you very much for making this public, this is an amazing content.
    I have several questions (I refer to the instant(s) in the video):
    16:34 and 50:43 => Unconditional model is when the input is partially observed but you dont know exactly what part.
    - What is test/inference in these unconditional EBM models? Is there a proper split between training and inference/test in the unconditional models?
    - How does models like PCA or K-means fit here, what are the partially observed inputs Y? For example in K-MEans you receive all the components of Y, I dont see that they are partially observed
    25:10 and 1:01:50 => With the joint embedding architecture
    - What would be inference with this architecture, inferring a Y from a given X minimizing the cost C(h, h')? I know that you could run gradient descent to the Y backward the Pred(y) network but it is not clear to me the purpose of inferring Y given X in this architecure.
    - What does the "Advange: no pixel-level reconstruction" in green means? (I suspect that this may have something to do with my just above question)
    - Can this architecture also be trained as a Latent Variable EBM? or it is always trained in a Contrastive way

  • @cambridgebreaths3581
    @cambridgebreaths3581 2 роки тому +1

    Perfect. Thank you so much :)

    • @alfcnz
      @alfcnz  2 роки тому +4

      😇😇😇

  • @user-co6pu8zv3v
    @user-co6pu8zv3v 2 роки тому +1

    Thank you, Alfredo! :)

    • @alfcnz
      @alfcnz  2 роки тому +1

      Пожалуйста 🥰🥰🥰

  • @buoyrina9669
    @buoyrina9669 2 роки тому

    I guess i need to watch many times to get what Yann was trying to explain :)

    • @alfcnz
      @alfcnz  2 роки тому +2

      It's alright. It took me ‘only’ 5 repetitions 😅😅😅

  • @kalokng3572
    @kalokng3572 Рік тому +3

    Hi Alfredo thank you for making the course public. It is super useful especially to those who are self-learning cutting-edge AI concept and I've found EBM a fascinating one.
    I have a question regarding EBM: How should I describe "overfitting" in the context of EBM? Does that mean the energy landscape have very small volume surrounding the training sample data points?

    • @alfcnz
      @alfcnz  Рік тому +2

      You're welcome. And yes, precisely. And underfitting would be having a flat manifold.

  • @RJRyan
    @RJRyan 2 роки тому

    Thank you so much for making these lectures public!
    The slides are very difficult to read because of being overlaid over Yann's face and the background image. I imagine this could be an accessibility issue for anyone with vision impairments, too.

    • @alfcnz
      @alfcnz  2 роки тому

      That's why we provide the slides 🙂🙂🙂

  • @hamedgholami261
    @hamedgholami261 Рік тому

    so that is what contrastive learning is all about!

    • @alfcnz
      @alfcnz  Рік тому

      It seems so 😀😀😀

  • @arcman9436
    @arcman9436 2 роки тому

    Very Interesting

    • @alfcnz
      @alfcnz  2 роки тому

      🧐🧐🧐

  • @my_master55
    @my_master55 2 роки тому

    Hi, Alfredo 👋
    Am I missing something, or in this lecture there is no "non-contrastive joint embeddings" methods Yann was talking about at 1:34:40 ? I also briefly checked the next lectures but didn't find something related to this. Could you please point me out? 😇
    Thank you for the video, btw, brilliant as always :)

    • @anondoggo
      @anondoggo Рік тому

      If you open the slides for lecture 6 you can find a whole page on non-contrastive embeddings.

  • @SnoSixtyTwo
    @SnoSixtyTwo Рік тому

    Thanks a whole bunch for this lecture, after two times I think I'm starting to grasp it :) One thing that confuses me though is: in the very beginning, it is mentioned that x may or may not be adapted when going for the optimum location. I cannot quickly come up with an example where I would want that? Wouldn't that mean I am just discarding the info in x and - in the case of modeling with latent variables - now my inference becomes a function of z exclusively?

    • @alfcnz
      @alfcnz  Рік тому

      You need to write down the timestamp in minutes:seconds if you want me to be able to address any particular aspect of the video.

    • @SnoSixtyTwo
      @SnoSixtyTwo Рік тому

      @@alfcnz Thanks for taking the time to respond! Here we go, 15:20

  • @arashjavanmard5911
    @arashjavanmard5911 2 роки тому

    Great lecture, thanks a lot. But it would be also great if you could tell us a reference book or publications for this lecture. Thanks a lot in advance.

    • @alfcnz
      @alfcnz  2 роки тому +5

      I'm writing the book right now. A bit of patience, please 😅😅😅

    • @Vikram-wx4hg
      @Vikram-wx4hg 2 роки тому

      @@alfcnz Looking forward to the book Alfredo. Can you give a ball park estimate of the 'patience' here? :-)

    • @alfcnz
      @alfcnz  2 роки тому +2

      End of summer ‘22 the first draft will see the light.

    • @anondoggo
      @anondoggo Рік тому

      @@alfcnz omg, I'm so excited

  • @anondoggo
    @anondoggo Рік тому

    Dr. Yann only mentioned this in passing at 20:00 , but I just wanted to clarify, why does EBM offer more flexibility in choice of scores and objective functions? It's from page 9 on the slides. Thank you!

    • @anondoggo
      @anondoggo Рік тому

      nvm, I should have just watched on, at 1:04:27 Yann explained how probabilistic models are EBM where the objective function is NLL.

    • @anondoggo
      @anondoggo Рік тому

      then by extension, the scoring function for a probabilistic model is probably restricted to a probability.

    • @anondoggo
      @anondoggo Рік тому

      the info at 18:17 is underrated

  • @aljjxw
    @aljjxw 2 роки тому

    What are the research papers from Facebook mentioned around 1:30?

    • @alfcnz
      @alfcnz  2 роки тому

      All references are written on the slides.
      At that timestamp I don't hear Yann mentioning any paper.

  • @hamidrezaheidarian8207
    @hamidrezaheidarian8207 4 місяці тому

    Hi Alfredo, which book on DL do you recommend that has the same sort of structure as the content of this course?

    • @alfcnz
      @alfcnz  4 місяці тому

      The one I’m writing 😇

    • @hamidrezaheidarian8207
      @hamidrezaheidarian8207 4 місяці тому

      @@alfcnz Great, I think it would be a great companion to these lectures, looking forward to it.

  • @benjaminahlbrand9827
    @benjaminahlbrand9827 2 роки тому

    How do you use autograd in pytorch for "nonstochastic" gradient descent?

    • @shiftedabsurdity
      @shiftedabsurdity 2 роки тому

      probably conjugate gradient

    • @alfcnz
      @alfcnz  2 роки тому +2

      If the function I have is not approximate (not like the per-batch approximation of the dataset loss), then you're performing non-stochastic GD. The stochasticity comes from the approximation to the objective function.

  • @iamyouu
    @iamyouu 2 роки тому +1

    Is there any book that i can read from to know more about these methods. thank you.

    • @alfcnz
      @alfcnz  2 роки тому +1

      I'm writing the book. It'll take some time.

    • @iamyouu
      @iamyouu 2 роки тому

      @@alfcnz thank you so much!

    • @alfcnz
      @alfcnz  2 роки тому +1

      ❤️❤️❤️

  • @MehranZiadloo
    @MehranZiadloo 4 місяці тому

    Not to be nitpicking but I believe there's a minus missing @49:22 in denominator of P(y|x) at the far end (right side of screen) behind the beta.

    • @alfcnz
      @alfcnz  4 місяці тому

      Oh, yes indeed! Yann is s little heedless when crafting slides 😅

    • @MehranZiadloo
      @MehranZiadloo 4 місяці тому

      @@alfcnz These things happen. I just waned to make sure that I'm following the calculations correctly. Thanks for confirmation.

    • @alfcnz
      @alfcnz  4 місяці тому

      Sure sure 😊

  • @vageta008
    @vageta008 2 роки тому

    Interesting, Energy based models do something very similar to metric learning. (Or am I missing something?).

    • @alfcnz
      @alfcnz  2 роки тому +1

      Indeed metric learning can be formulated as an energy model. I'd say energy models are like a large umbrella under which many conventional models can be recast.

  • @mpalaourg8597
    @mpalaourg8597 2 роки тому

    I tried to calculate the derivative Yann said (1:07:45), but probably I am missing something because in my final result I don't have the integral (only -P_w(.) ...). Is there any supplementary material with these calculations?
    Thanks again for your amazing and hard work!

    • @alfcnz
      @alfcnz  2 роки тому

      Uh… can you share your calculations? I can have a look. Maybe post them in the Discord server, maths room, so that others may be able to help as well.

    • @mpalaourg8597
      @mpalaourg8597 2 роки тому

      @@alfcnz It was my bad. I... misunderstand the formula of P_w(y/x) and thought that was an integral at the numerator (over all y's), but that didn't make any sense to me and checked again your notes and ...voilà I got the right answer.
      Is the discord open to us too? I thought only for students of NY. I definitely join then (learning alone, isn't fun :P).

    • @alfcnz
      @alfcnz  2 роки тому +1

      Discord is for *non* NYU students. I have another communication system set up for them.

  • @pratik245
    @pratik245 2 роки тому

    French language seems to be more suited for misic.. Has a sweet tonality..

    • @alfcnz
      @alfcnz  2 роки тому

      🇫🇷🥖🗼

  • @pratik245
    @pratik245 2 роки тому

    Yanic kilcher is asking questions it seems

    • @alfcnz
      @alfcnz  2 роки тому

      Where, when? 😮😮😮

  • @pratik245
    @pratik245 2 роки тому

    Meta helicopter

  • @ShihgianLee
    @ShihgianLee 2 роки тому +1

    I spent some time to derive the step mention in 1:07:44. I made my best effort to get the final result. But, I am not sure if my steps are correct. I hope my fellow students can help to point out my mistakes. Due to the lack of LaTex support in UA-cam comment, I try my best to make my steps as clear as possible. I use partial derivative for log to get to the second step. Then, I use Leibniz integral rule to move the partial derivative inside the integral in the third step. The rest is pretty straightforward, hopefully. Thank you!
    ∂/∂w (1/β) log[ ∫y′ exp[−βFw(x, y')] ] = (1/β) [1/∫y′ exp[−βFw(x, y')] ∂/∂w ∫y′ exp[−βFw(x, y')] = (1/β) [1/∫y′ exp[−βFw(x, y')] [∫y′ ∂/∂w exp[−βFw(x, y')]] = (1/β) [1/∫y′ exp[−βFw(x, y')] [∫y′ exp[−βFw(x, y')] ∂/∂w −βFw(x, y')] = - [1/∫y′ exp[−βFw(x, y')] [∫y′ exp[−βFw(x, y')] ∂/∂w Fw(x, y')] = - [∫y′ exp[−βFw(x, y')/∫y′ exp[−βFw(x, y')] [∂/∂w Fw(x, y')] = - ∫y′ Pw(y'|x) ∂/∂w Fw(x, y')

    • @hamedgholami261
      @hamedgholami261 Рік тому

      can you put a link to a latex file? I did the derivative and maybe able to help.