OpenAI CLIP - Connecting Text and Images | Paper Explained

Поділитися
Вставка
  • Опубліковано 28 тра 2024
  • ❤️ Become The AI Epiphany Patreon ❤️ ► / theaiepiphany
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    In this video, I cover the CLIP paper - Learning Transferable Visual Models from Natural Language Supervision.
    You'll learn about:
    ✔️ How the contrastive learning behind CLIP works
    ✔️ All the nitty-gritty details behind the paper
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    ✅ Bitter lessons by Sutton: www.incompleteideas.net/IncIde...
    ✅ CLIP paper: cdn.openai.com/papers/Learnin...
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    ⌚️ Timetable:
    00:00 OpenAI's CLIP
    02:10 Detailed explanation of the method
    06:00 Comparision with SimCLR
    12:55 How does the zero-shot part work
    20:45 WIT dataset
    21:30 Why this method, hint efficiency
    28:35 Zero-shot - generalizing to new tasks
    31:30 Prompt programming and ensembling
    34:00 Zero-shot perf
    36:20 Few-shot comparison with best baselines
    38:20 How good the zero-shot classifier is?
    40:45 Compute error correlation
    41:20 Quality of CLIP's embedding space
    43:05 Robustness to distribution shift
    49:10 Limitations (MNIST failure)
    50:30 A short recap
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    💰 BECOME A PATREON OF THE AI EPIPHANY ❤️
    If these videos, GitHub projects, and blogs help you,
    consider helping me out by supporting me on Patreon!
    The AI Epiphany ► / theaiepiphany
    One-time donation:
    www.paypal.com/paypalme/theai...
    Much love! ❤️
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    💡 The AI Epiphany is a channel dedicated to simplifying the field of AI using creative visualizations and in general, a stronger focus on geometrical and visual intuition, rather than the algebraic and numerical "intuition".
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    👋 CONNECT WITH ME ON SOCIAL
    LinkedIn ► / aleksagordic
    Twitter ► / gordic_aleksa
    Instagram ► / aiepiphany
    Facebook ► / aiepiphany
    👨‍👩‍👧‍👦 JOIN OUR DISCORD COMMUNITY:
    Discord ► / discord
    📢 SUBSCRIBE TO MY MONTHLY AI NEWSLETTER:
    Substack ► aiepiphany.substack.com/
    💻 FOLLOW ME ON GITHUB FOR COOL PROJECTS:
    GitHub ► github.com/gordicaleksa
    📚 FOLLOW ME ON MEDIUM:
    Medium ► / gordicaleksa
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    #clip #openai #nlpsupervision

КОМЕНТАРІ • 24

  • @TheAIEpiphany
    @TheAIEpiphany  3 роки тому +3

    I love that OpenAI is pushing towards these more general methods in computer vision as well.
    Unsupervised learning is about to become super mainstream in CV.

    • @heejuneAhn
      @heejuneAhn Місяць тому

      in fact, they call it Natural language "supervised" learning. (section 2.1)

  • @heejuneAhn
    @heejuneAhn Місяць тому

    Yes, it is a kind of parameterized classification though embedding vector from text input as the last Fully connected layers in the classification network of image classification network.

  • @DistortedV12
    @DistortedV12 3 роки тому +3

    One of the favorite projects I’ve seen in a long time thanks for covering it.

  • @christophermarais5253
    @christophermarais5253 3 роки тому +3

    Geez man do you ever take a holiday lol. I appreciate all your videos.

    • @TheAIEpiphany
      @TheAIEpiphany  3 роки тому

      I do! As a matter of fact now! 😂 But since it's this fantastic pandemic+winter combo I'm making the most out of it.
      Thanks!!

  • @idoroth
    @idoroth 6 місяців тому

    very well presented - greatly helped me improve my understanding. Thank you very much.

  • @samirelzein1095
    @samirelzein1095 11 місяців тому

    Great job! thanks!

  • @leorabetesh3800
    @leorabetesh3800 2 роки тому

    Thank you for also giving the background on ConVIRT

  • @maulberto3
    @maulberto3 Рік тому

    Hi. Much about the value of a NN depends on the signal we give to it. In this example, as you said and very clever by the authors, if we treat it as classification task, we are giving the net a signal that some text belong to some image -not other images-, do that across millions of examples and create/learn intelligence.

  • @sasucarefree4694
    @sasucarefree4694 Місяць тому

    How does the fine-tuning actually work here? They don't talk about it in the paper, right?

  • @vivekpadman5248
    @vivekpadman5248 Рік тому

    what is the app you use to annotate these papers btw?

  • @er-wl9sy
    @er-wl9sy 3 роки тому +1

    Thanks for the video. I was wondering if you would also focus on the visual slam/ visual odometry as well. Thanks

    • @TheAIEpiphany
      @TheAIEpiphany  3 роки тому

      Hey! You mean like the more classical CV algos or you have something more concrete in mind?

    • @er-wl9sy
      @er-wl9sy 3 роки тому +1

      @@TheAIEpiphany I meant deep slam such as superpoints or super glue

    • @abudawood_phd
      @abudawood_phd 3 роки тому +1

      Yes I agree, hope you have some time to review some vSLAM that uses DL especially features extraction and depth detection in monocular cams

    • @dbzkidkev2
      @dbzkidkev2 3 роки тому +1

      @@abudawood_phd Yes would be really interesting! SLAM is a whole other beast

  • @chandlertimm8243
    @chandlertimm8243 Рік тому

    What is actually the main contribution of the CLIP paper? Data augmentation, zero shot, and using large datasets?

  • @oscarllerena2980
    @oscarllerena2980 5 місяців тому

    Thanks for this nice explanation. Let me please ask the following question: @ 05:40 you mentioned that CLIP paper was heavily inspired in the ConVIRT paper but the ConVIRT approach is only mentioned 2 times and it does not appear in the references. Is it that they intentionally did not referenced it?

    • @oscarllerena2980
      @oscarllerena2980 5 місяців тому

      Apologies. I found the reference in the Introduction. My CTRL+F could not find the Con-VIRT splited because of the paragraph length.

  • @razvanrotaru2285
    @razvanrotaru2285 2 роки тому +1

    i love you

  • @wzyjoseph7317
    @wzyjoseph7317 11 місяців тому

    Aleksa really looks like the khan in GOT series XD

    • @wzyjoseph7317
      @wzyjoseph7317 11 місяців тому

      And this Khan is reaaaaaly smart!!