Active (Machine) Learning - Computerphile

Поділитися
Вставка
  • Опубліковано 17 гру 2024

КОМЕНТАРІ • 150

  • @gasdive
    @gasdive 5 років тому +21

    This is how you train people. Train them on the basics. Then get them to work closely supervised, then with someone they can ask if they get stuck, and then unsupervised.

  • @plasma06
    @plasma06 5 років тому +241

    wait so im doing work for Google to improve the ai when i select cars and street signs

    • @OrangeC7
      @OrangeC7 5 років тому +84

      *You're part of the system*

    • @yoyoyoyoyo1959
      @yoyoyoyoyo1959 5 років тому +27

      If it's free, you're the product ...
      wait that makes no sense
      i've a dream, that one day, i'll use accurate quoting

    • @yorickdewid
      @yorickdewid 5 років тому +33

      How could you not know? They've been doing that for ages

    • @seasong7655
      @seasong7655 5 років тому +27

      This is why I select the wrong images on captchas, to screw up their machine learning 😈

    • @tschumacher
      @tschumacher 5 років тому +5

      That's the idea behind Google's reCaptcha, yes. I mean look at the logo.

  • @clem494949
    @clem494949 5 років тому +71

    I hate when the captcha ask me about edge cases, I never know if I should include the pole of the traffic lights.

    • @Feuermagier1337
      @Feuermagier1337 5 років тому +1

      Clem Clem No you shouldn‘t.

    • @windar2390
      @windar2390 4 роки тому

      if you think the AI should recognize the thing only by the edge, then include it. otherwise its just a meaningless edge and shouldnt be included.

    • @i-never-look-at-replies-lol
      @i-never-look-at-replies-lol 3 роки тому

      I click everything allowable to throw off the algorithms

    • @cmm90871
      @cmm90871 3 роки тому

      @@i-never-look-at-replies-lol it may force you to do them more often

  • @brenesrob
    @brenesrob 5 років тому +61

    Is it possible to get a list of sources such as academic papers or so with each video for further reading? I feel like it would be pretty easy for the professors to just suggest a few papers or resources for introductory purposes.

    • @danielmichelin4832
      @danielmichelin4832 5 років тому +2

      @Daniel G A professor would definitely have an idea of useful papers for a beginner.

  • @fluidice1656
    @fluidice1656 5 років тому +3

    That's exactly what I did. Cooperative Learning is a kind of self-supervised learning but there are potential issues with it when confidence is high in falsely labeled data. There is also a problem with over-fitting that arises from selecting the high confidence training data/labels. Great topic!!

  • @spogs
    @spogs 5 років тому +197

    I can bet 5 frikandelbroodjes this guy is Dutch

    • @GoatzAreEpic
      @GoatzAreEpic 5 років тому +8

      lol heb er letterlijk net 2 op van de appie

    • @jurgenschuler8389
      @jurgenschuler8389 5 років тому +20

      Can you specify your confidence in this hypothesis?

    • @Sponge1310
      @Sponge1310 5 років тому +2

      Vlaams voor zover ik kon uitvinden :)

    • @-_-.-.._.
      @-_-.-.._. 5 років тому +3

      G E K O L O N I S E E R D maar I thought so ook

    • @fritsrits7591
      @fritsrits7591 5 років тому

      @@Sponge1310 Nederlands

  • @lord_nn
    @lord_nn 5 років тому +2

    I used to take Michel's Security lectures back in 2013. A very nice guy.

  • @sphereron
    @sphereron 5 років тому +21

    Wish we could get sources for these videos.

  • @kebakent
    @kebakent 5 років тому +7

    Never learned this at the university. Thanks!

  • @valuedhumanoid6574
    @valuedhumanoid6574 5 років тому +2

    I have this voice recognition software that is supposed to learn as you speak and become more efficient as you use it. It's called Dragon Natural Speaking. And at first I could not tell any improvement, but now it's been almost a year it's really fine tuned itself to my voice. When someone else uses it, it goes berserk until it learns a new voice. Very cool.

  • @brecoldyls
    @brecoldyls 3 роки тому +1

    That’s such an intuitive and amazing idea

  • @Vivekagrawal5800
    @Vivekagrawal5800 4 роки тому +1

    3:10 When you say the low confidence data is labelled and goes back to retrain the model so that it spits out better accuracy than before-
    do we label all such low confidence data and ingest back to retrain the model?
    Because if we do so, we will not have low confidence testing set to really estimate the improvements.
    Why no take 50% of the low confidence data as training set so that we can measure the actual gains on the remaining low confidence data

    • @Vivekagrawal5800
      @Vivekagrawal5800 4 роки тому +1

      Infact, how do you even define confidence of predictions since testing data is unlabeled

  • @UpcycleElectronics
    @UpcycleElectronics 5 років тому +5

    Hey Sean,
    Please consider doing a video about the intersection of Control Theory and it's practical implementations in real world computing.
    Yesterday I started to research why PID controllers are not used in power supplies. I came across a verbose explanation on this. During the explanation "Type 2" and "Type 3" controllers were passively mentioned. This sent me down the rabbit hole of the control theory wiki. This was a dead end with too much maths for me to gain an abstract overview type understanding.
    I just went through all 5 years of your videos looking for content on the subject (I stacked my watch later list in the process) but didn't find anything. I've seen a lot of info about PID controllers, but I'd really like to understand what other types of controllers are out there in practice in the computing world.
    ...anyways...
    Thanks for the upload.
    -Jake

  • @muhammadsaqib7355
    @muhammadsaqib7355 2 роки тому

    The best explanation ever of AL

  • @dewangsingh9324
    @dewangsingh9324 3 роки тому

    Terrific Video. Loved the explanation. Solved a lot of my doubts regarding active learning.

  • @ASLUHLUHC3
    @ASLUHLUHC3 5 років тому +6

    This process basically uses the AI to pick out cases that are least like their annotated training data thus far, which is what the AI would learn the most from having next.
    This provides humans with the best bang for the buck, achieving their desired accuracy with the least annotation required.

  • @Zauhd
    @Zauhd 5 років тому +9

    But what happens when the machine is confident in a wrong answer? Is that just a trade-off that will happen sparely for reducing human labeling?

    • @cortster12
      @cortster12 5 років тому

      Then you start all over again until it comes up with the answer you want. The implications, though, politics are going to get weird in the next few decades.

    • @wassollderscheiss33
      @wassollderscheiss33 5 років тому +1

      @@cortster12 Why politics?

    • @cortster12
      @cortster12 5 років тому

      @@wassollderscheiss33 Geopolitics, I should specify. Relations between governments fully run by AI and those not will be... interesting.

    • @lajya01
      @lajya01 5 років тому

      I've worked with a character recognition software 10 years ago. That was the main issue with it. The engine was 100% confident but wrong. We ended up having employees validating entries and basically "pressing enter" 90% of the time. Tedious work that caused more errors than traditional typing

    • @Oshyrath
      @Oshyrath 5 років тому

      Imagine if there was a giant explosion and everyone 100ft away from the center of the blast vaporizes. It's safe to assume everyone 5ft away from the center died as well. That's the essence of active learning. Making assumptions that seem intuitive.

  • @AYabdall
    @AYabdall 5 років тому +6

    my problem with active learning is :
    what if the data the machine is confident about is wrong?
    For example : you're trying to train one to predict where faces are in a photo. you train it on 10 percent of the data you have. Then it starts predicting 5 out of 10 faces confidently. but out of the 10 faces, 2 are not actually faces. However the machine is pretty sure it is a face. With the method suggested you do not check this probability you just check weather it's confident or not with new data. but what about the accuracy? what about if it is confident about something that isn't correct?

    • @nelsyeung
      @nelsyeung 5 років тому +8

      That problem will be resolved after some number of iterations. Think of it this way, the first iteration might generate wrong confidence for some percentage of images, but the human annotated some new data that the machine isn't so confident about. These new data will help the machine generate better confidence for the next iterations, such that the machine might not be so confident about the previous labels. The human will then annotate more images for the next iterations. Overall, we should still annotate less images than annotating 100% of data. There might still be noise but that's human as well, as we might recognise some awkward images wrong.

    • @AYabdall
      @AYabdall 5 років тому +2

      @@nelsyeung hmmm I see. thank you for the reply :)

  • @guangruli4486
    @guangruli4486 3 роки тому

    Pleasant to watch the video.

  • @user-zz6fk8bc8u
    @user-zz6fk8bc8u 5 років тому +29

    This would miss all the cases that are wrong but the machine is pretty confident about.

    • @ROFLARILO
      @ROFLARILO 5 років тому +11

      In practice you would probably sample a little bit from the confident results as well.

    • @JimGiant
      @JimGiant 5 років тому +6

      It would but it would still be a more efficient use of human time to focus most of their attention on the areas of least confidence. You do the same thing with teaching people. If you've got a class you have kids working on a project you'd ask them to come to you when they are unsure because you cannot supervise everyone on every step.

    • @ROFLARILO
      @ROFLARILO 5 років тому +4

      Jim Giant the problem is that confidence doesn’t always mean it’s correct, so you need to get rid of the high confidence wrong guesses, especially before you start accepting its predictions as truth.

    • @JimGiant
      @JimGiant 5 років тому

      Which is why I said focus *most* of their attention on areas of least confidence.

    • @ROFLARILO
      @ROFLARILO 5 років тому

      Jim Giant ah, I guess you were replying to him, I thought you were disagreeing with my comment

  • @prithviprakash1110
    @prithviprakash1110 3 роки тому +1

    Have to say this is a very under-researched section of ML/AI. This problem is hard enough for classification tasks, gets even harder with Semantic Segmentation when every pixel has an associated probability with it. Hopefully we see some improvement here over the years.

  • @robmckennie4203
    @robmckennie4203 5 років тому +5

    seems unnecessarily confusing that when the guy says "labels" it shows labels on the kind of data, "audio" "images", these aren't the kinds of labels he's talking about at all

    • @joebender9052
      @joebender9052 5 років тому +2

      There are so many words that mean so many things in machine learning, like "sample". It can get very confusing. I sometimes come up with my own names for things so I don't confuse myself.

    • @robmckennie4203
      @robmckennie4203 5 років тому

      @@joebender9052 i thought annotations was a much better term

    • @yugen3968
      @yugen3968 3 роки тому

      Then what kind of labels he's talking about? Can you elucidate a bit?

  • @jaromeleslie2747
    @jaromeleslie2747 3 роки тому

    Can you discuss how this approach may be adapted for regression problems where the goal is to predict a continuous target variable?

  • @amanjain6680
    @amanjain6680 3 роки тому

    How does the captcha know I correctly labelled it? I could mislabel and get away?

  • @michaelcharlesthearchangel
    @michaelcharlesthearchangel 5 років тому

    Now apply active machine learning to generative neural networks to improve conversational neurAl networks.

  • @karlkastor
    @karlkastor 5 років тому +1

    Also a similar approach: Have a clustering algorithm find clusters, then have the humans label the clusters. Then train a machine learner in these labels.

  • @bahaz.4562
    @bahaz.4562 3 роки тому

    Great and simple explanation! Thank you :D

  • @mydocsoft
    @mydocsoft 4 роки тому

    Very well explained. Thanks a lot!

  • @dominicmutzhas6002
    @dominicmutzhas6002 4 роки тому

    Waity But the captcha Data hast to ne Labels already doesnt IT? How Else would IT Know If you Made a mistake?

  • @lorenzoleongutierrez7927
    @lorenzoleongutierrez7927 8 місяців тому

    Great video!

  • @hrnekbezucha
    @hrnekbezucha 5 років тому +6

    Remember when the google recaptcha was supposed to work with one click? Yeah..

  • @KilgoreTroutAsf
    @KilgoreTroutAsf 5 років тому +1

    Wouldn't #1 priority have to be how to ditch supervised learning for unsupervised algorithms?

    • @KilgoreTroutAsf
      @KilgoreTroutAsf 5 років тому

      @@tommolldev
      Don't many classification algorithms already "discover" the classes when compressing-clustering the data (autoencoders, etc.) ? Why is it that important to annotate a million images before feeding them to an algorithm rather than annotating the (relatively few) classes it discovers during or after training?

  • @adamweishaupt3733
    @adamweishaupt3733 5 років тому

    I don't understand how they use reCAPTCHA to evaluate humans while also training machines. How do they know if the user is correctly labelling things if they weren't labelled in the first place? What do they compare the user's answers against to see if they're right?

    • @keith_cancel
      @keith_cancel 5 років тому

      Rechapcha also includes data that has already annotated, plus un-annotated data. They also see that most said this was a fire hydrant or cross walk. If 80% of people says it's a cross walk and you say it's not they can probably assume your wrong.

  • @hbloops
    @hbloops 4 роки тому +1

    Why would i use machine labeled data to train the AI. If it allready has such a high confidence on those images it seems counter intuitive to drown the database with uninteresting examples

  • @sabelch
    @sabelch 2 роки тому

    How is confidence computed?

  • @AyoubFrihaoui
    @AyoubFrihaoui 5 років тому +2

    Thank you

  • @davidvc4560
    @davidvc4560 10 місяців тому +1

    what if the machine learner has high confidence but the predicition is wrong?

  • @sandeepvk
    @sandeepvk 5 років тому +2

    How do you manage "labelling bias" ?

  • @TheAstronomyDude
    @TheAstronomyDude 5 років тому +2

    Tesla has 20,000 employees in Bangladesh drawing little squares.

  • @herq6
    @herq6 5 років тому +2

    nice oldschool phone

  • @ivandrofly
    @ivandrofly 5 років тому +1

    thank you!

  • @evanbelcher
    @evanbelcher 5 років тому +1

    I like this guy

  • @random4573
    @random4573 5 років тому +1

    Intel just released software to ease that annotations process

  • @egedemir4753
    @egedemir4753 5 років тому

    I could not understand what sets this apart from regular machine learning where you have a training set (20%, labeled) and another set which you want to work on(80%, unlabeled). Now you label the 10% of the 20% which is like working with a 2% training set and doing supervised learning on the rest of 18%. What am I missing?

    • @SillyMakesVids
      @SillyMakesVids 5 років тому +2

      You're describing cross-validation. Both the 20% and the 80% are labelled, except the machine isn't privy to the 80%'s labels. Active learning is about labelling your data more efficiently.

  • @domminney
    @domminney 5 років тому

    Nice example photo of faces :)

  • @techwithmohitkr
    @techwithmohitkr 4 роки тому

    This is True True!!

  • @ericsbuds
    @ericsbuds 5 років тому

    I wonder if creating AI will give us any insight into how our brains 'compute' information.

  • @asdawece
    @asdawece 5 років тому

    I intentionally misguide google's AI. Whatever image you click after a time they let you in.

  • @FuZZbaLLbee
    @FuZZbaLLbee 5 років тому

    Isn’t this what picasa did for face recognition?

  • @sergewiltshire8743
    @sergewiltshire8743 5 років тому

    Select all the fire hydrants

  • @Gooberslot
    @Gooberslot 5 років тому +3

    But what if the algorithm has a high confidence but is wrong?

  • @ml-simplified
    @ml-simplified 4 роки тому

    was it necessary to draw those complex (comparatively) flow charts to explain a very simple topic? :/

  • @aaryan3461
    @aaryan3461 5 років тому

    Why not do it like this?
    Take the data ( A ) and divide it into two sets (B and C). Then do the 10% running on set B as it will take much lesser time as it's data is half of A and when the B is complete you have 50% annotated data and the MLearner would take no time to label C as well!
    We could keep on dividing it into 4,8,16... chunks and see the results! What do you guys think?

  • @solhsa
    @solhsa 5 років тому

    Oh, that's what recaptcha was about.

  • @jeremy3046
    @jeremy3046 5 років тому +3

    If the machine is labeling data with high confidence, it seems there are only two possibilities:
    (1) It already knows how to interpret that kind of data, so training on it further is useless.
    (2) (Far less likely.) It made a horrible mistake, and giving it that training data again will reinforce that mistake even more!
    Seems to me like machine-labeled data is pretty useless. (Except that of coursse data labeled by an advanced machine could be given to a less advanced one. This could be handy for testing new learning methods, but not for any state-of-the-art AI.)

  • @Wook1333
    @Wook1333 5 років тому +2

    Fifth?

  • @lukethegiant5193
    @lukethegiant5193 5 років тому +2

    He sounds Dutch

  • @DrewTNaylor
    @DrewTNaylor 5 років тому +2

    It's cool that this is becoming an actual thing! Hopefully the machine doesn't try to deceive or lie to the administrator, though.

    • @sabelch
      @sabelch 2 роки тому

      What does "deceive or lie" mean when "the machine" is a million numbers multiplied and added together?

    • @DrewTNaylor
      @DrewTNaylor 2 роки тому

      @@sabelch It would have to become an artificial intelligence for that.

  • @dimitriosmallios5941
    @dimitriosmallios5941 5 років тому

    I guess the 10 % of the data should be balanced, right??

  • @ABNaseer1122
    @ABNaseer1122 5 років тому +2

    So Cooperative Learning is 100% automatic?

    • @user-xn4yu5rn9q
      @user-xn4yu5rn9q 5 років тому +1

      What do you mean automatic?

    • @ruroruro
      @ruroruro 5 років тому +3

      Did you watch the video?
      The first 10% still needs to be annotated.
      Also in reality all data that is added back into the system still needs to be checked by a human.
      The point of this approach is that checking labels is much faster than actually labeling the data.

    • @ABNaseer1122
      @ABNaseer1122 5 років тому +2

      RuRo did you LISTEN to the video? That 10% one is for Active Learning look at flowchart son, if machine confident it automatically labels and updates database rather than asking human.
      Yeah, I did watch the video, haha

    • @zoranhacker
      @zoranhacker 5 років тому

      @@ABNaseer1122 ok sir thank you for explanation

    • @AlthenaLuna
      @AlthenaLuna 5 років тому +1

      That's how it seems...which just raises the concern for me about high confidence false positives being fed back into the process.

  • @odorlessflavorless
    @odorlessflavorless 5 років тому

    Cool sweater :)

  • @nikanj
    @nikanj 5 років тому

    Let's face it. Researchers aren't labelling their own data, they're using cheap sources of labour such as the Mechanical Turk, click farms and grad students.

  • @sameerkhadka159
    @sameerkhadka159 5 років тому

    Hey can u give a guidence so that how can i become hacker bcz i am from a small country

  • @omkarCHALKE1992
    @omkarCHALKE1992 5 років тому +1

    nice

  • @RameenFallschirmjager
    @RameenFallschirmjager 5 років тому +1

    seriously, what's wrong with this channel?! None of its video have auto-caption!

  • @seifeddine4809
    @seifeddine4809 5 років тому

    Can you translate your videos into Arabic?

  • @yashind243
    @yashind243 5 років тому

    Nosql

  • @Nick08352
    @Nick08352 5 років тому +5

    guess im 700th after 5 minutes :D

  • @sandeepvk
    @sandeepvk 5 років тому

    I think machine learning is over rated. I say this because as you mentioned ML is useless without huge data set. Backpropagation is underpinned by data. And data is not available to us, like how Google, Apple, FB or MS have it. I don't think this is a world changing technology in the sense it is made to believe.

    • @triton62674
      @triton62674 5 років тому

      It will be once the data becomes available similar to how the internet became a freely available technology which was certainly revolutionary.

    • @nocodenoblunder6672
      @nocodenoblunder6672 3 місяці тому

      I wonder If you still stand by that point.

  • @liuby33
    @liuby33 5 років тому +1

    First time am I so early lol

  • @thefrontendfiend
    @thefrontendfiend 5 років тому +1

    First

  • @enian82
    @enian82 5 років тому +4

    second :P

  • @tusharpandey6584
    @tusharpandey6584 5 років тому

    lol

  • @DantalionNl
    @DantalionNl 5 років тому

    Dit accent onze louie van gaal is er niks bij.

    • @DantalionNl
      @DantalionNl 5 років тому +1

      Zijn grammatica en zinsconstructies zijn tenminsten wel adequate.

  • @Bit-while_going
    @Bit-while_going 5 років тому

    Even so, humans have neutral plasticity so that if an small amount of damage to neurons occurrs it can trigger creativity as brain is repaired. Bashing any computer with a hammer though doesn't solve anything, so they don't learn the same way therefore.

  • @hirakmondal6174
    @hirakmondal6174 5 років тому

    From next time Please mute that pen and paper hissssing sound..!

  • @user-xn4yu5rn9q
    @user-xn4yu5rn9q 5 років тому +1

    Thank you

  • @beargryllsfan007
    @beargryllsfan007 5 років тому +1

    First