rupert ai
rupert ai
  • 9
  • 310 880
The U-Net (actually) explained in 10 minutes
Want to understand the AI model actually behind Harry Potter by Balenciaga or the infamous image of the Pope in the puffer jacket? Well.. diffusion frameworks such as DALL-E 2, Midjourney, Imagen or Stable Diffusion seem to get a lot of credit, where as the true unsung hero of the story is the underlying U-Net architecture that they all actually use under the hood. Don't get me wrong Diffusion models are awesome but the U-Net is an absolute STAPLE when it comes to computer vision and this video aims to break it down in an easy way. Originally used for image segmentation the U-Net has developed into so much more. Happy watching!
U-Net paper: arxiv.org/abs/1505.04597
Many thanks to numerous online resources that helped me create this video.
Переглядів: 107 964

Відео

ResNet (actually) explained in under 10 minutes
Переглядів 100 тис.Рік тому
Want an intuitive and detailed explanation of Residual Networks? Look no further! This video is an animated guide of the paper 'Deep Residual Learning for Image Recognition' created using Manim. Sources / credits Resnet Paper: arxiv.org/abs/1512.03385 Manim animation library: www.manim.community/ Pytorch ResNet implementation: github.com/pytorch/vision/blob/main/torchvision/models/resnet.py
London ML Researcher @ CVPR 2022, New Orleans. Highlights! (ft. Telsa Cybertruck & Dream Fields)
Переглядів 4932 роки тому
London ML Researcher @ CVPR 2022, New Orleans. Highlights! (ft. Telsa Cybertruck & Dream Fields)
Implementing a multi-class CNN Image Classifier in Pytorch! Computer Vision Basics Ep. 3 CIFAR10 CNN
Переглядів 3,2 тис.2 роки тому
Want to learn how to create a basic convolutional neural network (CNN) to classify images? This short video runs you through the whole process of setting up a CNN to do image classification on the CIFAR 10 dataset. Intro: 00:00 Data loading (recap): 00:51 Model architecture: 2:34 Training loop: 13:30 Testing our trained model on validation set: 21:25 Outro: 25:29 If you guys like these practica...
Image Classification with CIFAR 10! Computer Vision Basics Ep. 1 Loading Data (coding follow-along)
Переглядів 2,3 тис.2 роки тому
This is the first episode in my brand new series on Computer Vision! To kick off this series we will be looking at the CIFAR10 Image classification challenge and building more complex models to get better results in the challenge as we go! The idea is that these series should be easy to follow and try to be as practical as possible. In the next episode we will be breaking down Convolution Neura...
Masked Language Modelling Part 2 - Retraining BERT w/ Hugging Face Trainer - MRSCC - Coding Tutorial
Переглядів 2,3 тис.2 роки тому
A practical Python Coding Guide - In this guide I use a hugging face language model on the Microsoft research sentence completion challenge! In this episode (part 2) I show you how to quickly retrain a masked language model to increase the performance on a given dataset. TUTORIAL NOTEBOOK colab.research.google.com/drive/14HXLn1hwDcSBn7RWNWiRUKkit_m8wBXD?usp=sharing remember to copy the notebook...
Masked Language Modelling with Hugging Face - Microsoft Sentence Completion - Coding Tutorial
Переглядів 3,8 тис.2 роки тому
A practical Python Coding Guide - In this guide I use a hugging face language model on the Microsoft research sentence completion challenge! This is a two part video, in this part I explain what masked language modelling is and then show you how to apply a pre-trained language model to the Microsoft research sentence completion challenge and evaluate the models performance. TUTORIAL NOTEBOOK co...
Multi-Label Classification on Unhealthy Comments - Finetuning RoBERTa with PyTorch - Coding Tutorial
Переглядів 38 тис.2 роки тому
A practical Python Coding Guide - In this guide I train RoBERTa using PyTorch Lightning on a Multi-label classification task. In particular the unhealthy comment corpus - this creates a language model that can classify whether an online comment contains attributes such as sarcasm, hostility or dismissiveness. TUTORIAL NOTEBOOK colab.research.google.com/drive/1ejBYmu0P5urzghoTTDB-GBUxpbUFX0Gz?us...
Hugging Face Transformers: the basics. Practical coding guides SE1E1. NLP Models (BERT/RoBERTa)
Переглядів 54 тис.3 роки тому
Practical Python Coding Guide - BERT in PyTorch In this first episode of the practical coding guide series, I discuss the basics of the Hugging Face Transformers Library. What is it? how does it work? what can you do with it? This episode focuses on high-level concepts, navigating their website and implementing some out-of-the-box functionality. Intro: 00:00 What is Hugging Face's Transformer L...

КОМЕНТАРІ

  • @samruddhisaoji7195
    @samruddhisaoji7195 43 хвилини тому

    9:02 i have doubt: how are the number of features in the LHS and RHS matching? LHS = w *h*c. RHS = (w/2)*(h/2)*(2*c). Thus RHS = 2*LHS

  • @caleharrison5387
    @caleharrison5387 12 днів тому

    Thanks, this is really good. One thing that would be helpful is if the example was itself convoluted like the algorithm, to make easier to visualise the algo.

  • @frommarkham424
    @frommarkham424 28 днів тому

    U NETS RULEEEEEEEEEEEEE

  • @mehdiaraghi4457
    @mehdiaraghi4457 28 днів тому

    The best explanation that I've ever seen. You answer all the questions I've had. kudos to you

  • @ABCEE1000
    @ABCEE1000 Місяць тому

    whould you please make a presentation on 3D Unet . that would be really appreciated

  • @ABCEE1000
    @ABCEE1000 Місяць тому

    Man i like you ! . you are the best ! how you simplify thing and how you are careful to deliver the idea perfectly >> please keep this great presentation up >>

  • @hemalathat8773
    @hemalathat8773 Місяць тому

    I LIKEED THE ANIMATIONS AND YOUR PTESENTING STYLE IN THE VIDEO. THANKS.

  • @Atreyuwu
    @Atreyuwu Місяць тому

    I found this while looking up UNet ELI5... 😭😭

  • @shinobidattebayo7650
    @shinobidattebayo7650 Місяць тому

    nice effort, but the sound of music is distracting.

  • @ciciy-wm5ik
    @ciciy-wm5ik Місяць тому

    at time 2:09 image 1- image2 = image 3 does not imply image1 + image 3 = image 2

    • @gunasekhar8440
      @gunasekhar8440 18 днів тому

      I mean we need to assume like that. Because in the paper they said h(x) be our desired mapping, x was input and f(x) would be some transformation. So f(x)=h(x)-x

  • @liliznotatnikiem6755
    @liliznotatnikiem6755 Місяць тому

    I’m interested at multiclass problems (recognising bike, human AND house). Also what would you choose instead of confusion matrix?

  • @prammar1951
    @prammar1951 Місяць тому

    everyone is praising the video, maybe it's just me but i really didn't understand what the residual connection hopes to achieve? and how does it do that? didn't make it clear.

  • @louisdante8457
    @louisdante8457 Місяць тому

    7:53 Why is there a need to preserve the time complexity per layer?

    • @samruddhisaoji7195
      @samruddhisaoji7195 47 хвилин тому

      The number of elements in the input and output of a convolution layer should remain same, as later we will be performing an element-wise operation

  • @boughouyasser7471
    @boughouyasser7471 Місяць тому

    Make a video on I-JEPA

  • @dhanushs4833
    @dhanushs4833 Місяць тому

    great vide mate , would love to see more brilliant stuff like this❤❤

  • @MuhammadHamza-o3r
    @MuhammadHamza-o3r 2 місяці тому

    Very well explained

  • @pranavgandhiprojects
    @pranavgandhiprojects 2 місяці тому

    Hey just show this first video from your channel and immediately subscribed to your:) Great explaination with visuals

  • @HelloIamLauraa
    @HelloIamLauraa 2 місяці тому

    best explainer!! great video, I had an "aaaaááaaa" moment at 8:05

  • @faaz12356
    @faaz12356 2 місяці тому

    Very useful and great explanation.

  • @HarshChinchakar
    @HarshChinchakar 3 місяці тому

    This is one of the best videos ive ever come across on youtube ngl, GG

  • @wege8409
    @wege8409 3 місяці тому

    6:38 this is the part that really made me understand, thank you

  • @FORCP-bq5fo
    @FORCP-bq5fo 3 місяці тому

    Love it bro!!

  • @terjeoseberg990
    @terjeoseberg990 3 місяці тому

    You didn't explain how the skip connections are connected across. What is the data that's transferred and how is it incorporated into the output half of the U-Net?

  • @AaronNicholsonAI
    @AaronNicholsonAI 3 місяці тому

    Thanks a whole big ton!

  • @sathvikmalgikar2842
    @sathvikmalgikar2842 3 місяці тому

    so simple and straightforward

  • @SakshamGupta-em2zw
    @SakshamGupta-em2zw 3 місяці тому

    Love the Music

  • @VikashSingh-vd9cp
    @VikashSingh-vd9cp 3 місяці тому

    bestvideo for understanding U-net model

  • @paruldhariwal
    @paruldhariwal 4 місяці тому

    It was really the most simplified and to the point video I watched on this topic. Great work!!

  • @luisluiscunha
    @luisluiscunha 4 місяці тому

    You are very funny!

  • @mincasurong
    @mincasurong 4 місяці тому

    Great summary, Great thanks

  • @atifadib
    @atifadib 4 місяці тому

    If you want to just use the Decoder how would you do it?

  • @ozzafar1982
    @ozzafar1982 4 місяці тому

    great explanation thanks!

  • @jaybrodnax
    @jaybrodnax 4 місяці тому

    I feel like this is more a description to experts than an actual explanation of how and why it works. Questions I'm left with: What is the purpose of downsampling/upsampling (I'm guessing performance?) How is segmentation actually done by the u-net? How is feature extraction actually done? What are max pooling layers? What does "channel doubling" mean, and what does it achieve? How does the encoder know "these are the pixels where the bike is"? Why is it beneficial to connect the encoder features to the decoder features at each step, versus in the last step? How does unet achieve anything other than downscaling/upscaling performance efficiency? Where are the actual operations to derive features? How is u-net specifically applied for various use cases like diffusion? What does diffusion add or change, for example.

    • @abansalah4677
      @abansalah4677 4 місяці тому

      (Disclaimer: I am a beginner, and this is not intended to be a complete answer.) You should read about convolutional layers and pooling layers to better understand this video. At any rate: A colored image has three channels: R, G, and B. A convolutional layer is specified by some spatial parameters (stride, kernel size, padding) and how many filters are there - the number of filters is the number of channels of the output. You can think of each filter as trying to capture different information. Doubling the channels, therefore, means using double the number of filters when using a stride of 2. The segmentation is done just like any ML task - the training data consists of pairs of images and their annotated versions. I think it's often hard to decipher the inner workings of a particular neural networks, and your question can/should be asked in a more general way - how do neural networks learn?

  • @TechHuntBD
    @TechHuntBD 4 місяці тому

    Nice explanation

  • @LucaBovelli
    @LucaBovelli 4 місяці тому

    bro why did u stop making videos i need you lmao (its a painful lmao.)

  • @LucaBovelli
    @LucaBovelli 4 місяці тому

    dude thankssssss i thought this was another one of these things thatll take me 2 hours of youtube to *not* understand, but u saved me

  • @s4lome792
    @s4lome792 4 місяці тому

    Clearly explained. What caused my consfusion in the first place is, in the graphic in the original paper, why does the segmentation mask not have the same dimensionality than the input image?

  • @mridulsehgal7773
    @mridulsehgal7773 4 місяці тому

    The best ever video you can get on Unet explaination

    • @Atreyuwu
      @Atreyuwu Місяць тому

      Not even close lol

  • @usaid3569
    @usaid3569 4 місяці тому

    Great video champ

  • @rezadadbin4684
    @rezadadbin4684 4 місяці тому

    Fucking fabulous

  • @notrito
    @notrito 5 місяців тому

    If anyone wonders how to concatenate the features if they don't match the size... they crop it.

  • @ingenuity8886
    @ingenuity8886 5 місяців тому

    Thank you very much bro...

  • @SarraAissaoui-sp3sm
    @SarraAissaoui-sp3sm 5 місяців тому

    I clicked on thumb down for wasting one minute of my precious time in the intro. Get to the F point !!

  • @InoceramusGigas
    @InoceramusGigas 5 місяців тому

    TIGHT TIGHT TIGHT

  • @Lautaro04000
    @Lautaro04000 5 місяців тому

    nice video, very helpful

  • @shubhamarle96
    @shubhamarle96 5 місяців тому

    thanks for the video, I am trying to use U-net for anomaly detection in time series and your video gave me the idea.

  • @runjhunsingh2348
    @runjhunsingh2348 5 місяців тому

    tried just everything but getting 38% hamming score accuracy on my multilabel classificastion of 24000 dataset into 26 labels, please suggest something

  • @nikhilchouhan1802
    @nikhilchouhan1802 5 місяців тому

    You might not find my comment since the video is too old, but man I just want to thank you for this video. I am a student who has always been interested in computer graphics and related fields like game engines, physical rendering, ray tracing, etc, and jst didnt get the ML/AI hype everyone was on the past 2 years. I only ever managed to study ML basics for 2 weeks before I left it for good. But recently I got in a team where my friends were working on CNN based projects, and that made me learn about many basics about NNs and DL. This explaination for Unet seals the deal for me, and I will strive to work on integrating my two interests into one and hopefully create something I love.

  • @user-mn2bj1hw1vdtfhgh
    @user-mn2bj1hw1vdtfhgh 5 місяців тому

    Me seeing the video at 1.5x 😂😅

  • @gusromul3356
    @gusromul3356 5 місяців тому

    cool info, thanks rupert ai