Learning To See [Part 9: Bias Variance Throwdown]

Поділитися
Вставка
  • Опубліковано 12 чер 2024
  • In this series, we'll explore the complex landscape of machine learning and artificial intelligence through one example from the field of computer vision: using a decision tree to count the number of fingers in an image. It's gonna be crazy.
    Supporting Code: github.com/stephencwelch/Lear...
    welchlabs.com
    @welchlabs
  • Наука та технологія

КОМЕНТАРІ • 131

  • @MrGeertjuhh
    @MrGeertjuhh 7 років тому +75

    And now to wait 2 weeks for the next one... Worth it!

    • @4620971
      @4620971 6 років тому

      Geertiebear I

    • @cem_kaya
      @cem_kaya 2 роки тому +1

      it was definitely worth it.

  • @Plaswin
    @Plaswin 7 років тому +1

    Man you are really gifted. This is probably one of the best things I ve seen on UA-cam. I wish I had had teachers like you at the university.

  • @allmhuran
    @allmhuran 7 років тому +12

    This video demonstrates how we use some ideas about human learning to come up with better machine learning... in turn, maybe we can take some of our rigorous analysis of machine learning and apply it back to human learning!
    I was considering the bias/variance scale, and wondering whether it applies to people. That is to say, as people have more and more experiences (in other words, get older), do they tend to move in one or the other direction on this scale?
    For humans, there isn't a strong distinction between "traning" and "testing" data... all data is in a sense both, as concepts can be continuously refined. It seems apparent that the younger someone is, the more the process of "refining" actually happens. But as we get older we become confident in our predictions, and the process of refining _seems_ to become unnecessary - our classifications are working "well enough".
    In everyday language, we would probably call this "bias". But it's not clear whether that's the same bias as described in the context of machine learning. After all, does the "refining" of our human "rules" imply more assumptions and simplifications, or does it imply the incorporation of greater complexity from the integration of a wider variety of examples accumulated over time? As we get older, do we see the world as simpler, or more complex?

  • @TheRowanFerrabee
    @TheRowanFerrabee 7 років тому +16

    Just want to say that I absolutely love and look forward to your videos. Your series is totally motivating, informative, and thought-inspiring. I think you'll start moving along the bias/variance spectrum and the models will begin to overfill before they start perform well enough on the test data. This is because the rule that we're searching for isn't well defined using raw data, and you'll have a good segway into introducing feature extraction. Either that or some rule like "¬left & middle & ¬right" will perform well enough for our application.

  • @MisterLoukass
    @MisterLoukass 7 років тому +1

    Great job ! I like to see the work you put on these video. They are really didactic and well-made!

  • @teacul
    @teacul 6 років тому

    This is the coolest episode so far. The spectrum between the baseline and memorization is really mindblowing and cool

  • @chrn22
    @chrn22 7 років тому

    the person who made the music and times it is a genius.... gives a very adventurous vibe

  • @mikip3242
    @mikip3242 5 років тому

    One might notice that the 1-pixel rule that detaches from the absolute generalization of the "baseline algorithm" and steps forward to complexity what is really doing is making the machine learn to focus its attention to certain parts of the image. The vast majority of fingers shown in the training examples (if not all) are located in the center of the image (or at least not in the borders and corners). We are learning to pay more attention to things in the middle of the frame of view of the camera because there is where the fingers usually are, so here the algorithm is so simple still that it has not learned how to identify fingers but how to identify anything that one wants the machine to pay attention to, if you put an apple in front of the camera he will also thing it is a finger, since it is in the area where the machine knows it has to be aware of fingers.

  • @sovata
    @sovata 7 років тому +2

    I love these videos! I look forward to each episode more than I look forward to an episode of Game Of Thrones! Thanks :)

  • @abdelkaderguellati5164
    @abdelkaderguellati5164 7 років тому

    my thirsty mind can't get enough of these videos!!!
    I hope you make longer ones or more of them;

  • @Ben-ds3cm
    @Ben-ds3cm 7 років тому +1

    Absolutely amazing series!!!!

  • @whatthefunction9140
    @whatthefunction9140 7 років тому +6

    [Part 300: FINALLY GETS TO THE POINT]

  • @realGBx64
    @realGBx64 7 років тому +1

    Great series! Thank you!

  • @mustafazakiassagaf1757
    @mustafazakiassagaf1757 7 років тому +20

    Cool video, makes me want to learn ML

    • @duckymomo7935
      @duckymomo7935 7 років тому +1

      mustafa zaki
      I had a horrible professor, ugh

    • @justinward3679
      @justinward3679 7 років тому +1

      mustafa zaki Coursera had a course that I took. It covered regression analysis, classifiers, overfitting etc.

    • @me5ng3
      @me5ng3 7 років тому +1

      +Justin Ward is the course worth the time? I mean, I'm still in highschool and would love to study ML in the future. I would also like to start earlier and have try it for myself. Do you recommend the course for a newbie in the world of Machine Learning?

    • @justinward3679
      @justinward3679 7 років тому +2

      Brian You can take it at your own pace. Just be aware that calculus/statistics/python programming knowledge is assumed. Topics like gradients and bell curves are brought up casually.

    • @TtttTt-ub5xb
      @TtttTt-ub5xb 6 років тому +1

      هاي

  • @ki-ka
    @ki-ka 6 років тому

    Excellent points!

  • @arpyzero
    @arpyzero 7 років тому +5

    Let's see here...
    So, in the context of this video, a finger will be one of two things: either a strip of black, then white, then black (type 1), or a peninsula of white surrounded by black (type 2), except on the bottom. There's some fringe cases there, but these types form the majority.
    Now, you only really need three pixels to check the validity of type 1, and an additional one to check the validity of type 2. Assuming that these two types form up at least 3/4ths of the data, and our rules are good enough to recognize 9/10ths of our "expected fingers", we should be good.
    So, I'm expecting C to be the answer, but I'm giving solid odds to B as well. Will be quite surprised if it is A or D.

  • @TheTruthSentMe
    @TheTruthSentMe 7 років тому +16

    2:10 P(...) > 1. Ouch.

    • @TheDGomezzi
      @TheDGomezzi 6 років тому +4

      It's cuz he did the math wrong on the previous video, unfortunately :(

  • @SamirMishra6174
    @SamirMishra6174 7 років тому +30

    You rock

  • @simonero2000
    @simonero2000 7 років тому

    your videos are awesome. thanks

  • @kandurirajkumar
    @kandurirajkumar 7 років тому +1

    boss, you're just awesome

  • @fejfo6559
    @fejfo6559 7 років тому +3

    4 pixels should do it : fairly strait and not to much to the left or right

  • @paedrufernando2351
    @paedrufernando2351 Рік тому

    Bias in this videos context means "Assumptions about our data" i.e we can consider only one pixel for our rule and ignore other pixels(like ignoring the feature of a dog that it has short or long fur. but only concetrating on features like 4 legs to undersand it is a dog(chpter 8 explins this)..The other end of spectrum is the meorization rule

  • @alexsere3061
    @alexsere3061 7 років тому +6

    I have a stategy, reduce hands to lines and blobs, lines are finges, you measure the number of separate lines

  • @richarda1630
    @richarda1630 3 роки тому

    Is this the beginning of the concept of gradient descent? you guys are amazing, i

  • @michaelleue7594
    @michaelleue7594 7 років тому

    You just rigorously proved Occam's Razor!

  • @tsunamio7750
    @tsunamio7750 7 років тому +2

    6:40 Wait.. you use trainY but you only declaire trainX? You can use an undecleared var?!
    I just started python today but this really looks suspicious to me... Do I lack trainning data?

  • @mohamednabil9146
    @mohamednabil9146 7 років тому +1

    These videos are so good... up untill you say "Next time" ... I want the next video now :"D

  • @Mrjarnould
    @Mrjarnould 7 років тому

    I want to see graphs! Plot the number of pixels in our rule on the x-axis, ranging from 1 to 81, versus on the y-axis the different performances. I wonder what that would look like? I'm thinking matplotlib

  • @velocityra
    @velocityra 7 років тому +2

    My guess is *3 pixels*.
    Two negative on the left and right, one positive in the center. 65% recall and accuracy is pretty low anyway.

  • @afzalsayed96
    @afzalsayed96 7 років тому

    please tell me which music did you use. I love it

  • @NamelessNr1
    @NamelessNr1 7 років тому

    Does this variance happen to have anything to do with statistical variance? Feels like it should given how much the subjects are connected but at the same time I can't imagine how it would work.

  • @TristanBomber
    @TristanBomber 6 років тому +1

    I'm going to make a complete guess here as to your last question - 9 pixels, the square root of 81, which is logarithmically halfway between 1 pixel (the simplest nontrivial rule) and 81 (the most complex possible rule).

  • @ankenowottne1723
    @ankenowottne1723 7 років тому

    Thanks a lot for this inspiring series of videos ..
    I have one question, I can't really grasp - why (starting from 2.50 -..) it is said, that the base line approach does generalize well (compared to the memorization approach) - Accuracy is the same in both (0.93) and Recall is even higher in the latter (though still terribly low of course) ---- and precision does not exist at all in the baseline approach (as it misses every single ex.)

    • @WelchLabsVideo
      @WelchLabsVideo  7 років тому

      Good question - we can't really measure precision for the baseline approach, but it's accuracy and recall are similar for our training and testing sets, meaning that it generalized beyond the training set.

    • @ankenowottne1723
      @ankenowottne1723 7 років тому

      Thanks a lot for your answer!
      So, that's how you would »measure« the quality of generalization - that accuracy and recall are about the same (not dropping) in training and testing data? - and you would even say it generalizes better in this case, although in the memorization approach it's slightly better, but worse than in the memorization training set?

  • @DaysNightsTeam
    @DaysNightsTeam 7 років тому

    Increase entropy, since the universe is driven by it:
    Pseudorandomly distort training data
    Probabilistically compare with testing data
    Increase chance of detecting as finger (while training) by checking how similar two sets of training and testing data are (how many pixels match).
    I just wrote down some nonsense that came to my mind

  • @ryanmurray5973
    @ryanmurray5973 7 років тому

    5 seems about right.

  • @ThePiro246
    @ThePiro246 7 років тому

    In the very first video of this series while testing the decisiontree, 3 pixels were indicated blue on the grid. So my guess is 3.

  • @frisosmit8920
    @frisosmit8920 7 років тому +1

    I think 3 pixels. 1 in the middle to verify there is a finger and 1 on each side to very it isn't a hand.

  • @vedant6633
    @vedant6633 4 роки тому

    How do you define simple?

  • @theseekerwhoseeks8947
    @theseekerwhoseeks8947 7 років тому

    I have a question, unrelated to this video: What is p-adic number?

  • @karltraunmuller7048
    @karltraunmuller7048 7 років тому +6

    All of this is based on the core assumption that operating on pixels is a good strategy to begin with. I think feature detection/recognition should be based on a more abstract representation of the images.

    • @Tracy_AC
      @Tracy_AC 7 років тому +3

      Computers can only "see" in pixels. Whatever abstraction you try to enforce will ultimately be reduced to pixels.

    • @karltraunmuller7048
      @karltraunmuller7048 7 років тому

      The input vector to a classifier can be about anything, be it image or time series samples, fourier or wavelet coefficients, coefficients of polynomials found in edge detection and fitting, statistical properties, you name it. The question is which representation is most effective for the task at hand. For the finger counting problem, the absolute position or orientation of the hand in the image is irrelevant, for example, so this information should not be reflected in the classifier input. This is what I mean by abstraction, choosing an appropriate representation of the raw data.

    • @comelypepper
      @comelypepper 7 років тому

      What abstract representation would work here?

    • @karltraunmuller7048
      @karltraunmuller7048 7 років тому +1

      But you don't have to directly operate on pixels for the finger-counting problem. If you represent the information contained in the image in some other form, you could operate on that. Take JPEG as an example: a JPEG file does not contain raw pixel data, it contains Fourier coefficients of 8x8 blocks sampled from the image. You could use this (lossy) representation as the input to a finger-counting classifier.

    • @karltraunmuller7048
      @karltraunmuller7048 7 років тому

      Moses Won Maybe something that makes use of knowledge about the structure of the feature we're trying to recognize. Like, fingers are attached to the hand (there are no gaps between fingers and the hand they belong to), they have a certain spacing between them, and a certain length w.r.t to the hand, etc. And they can only take on certain forms, somewhere between a straight line and a circle. That's why I was mentioning polynomials (e.g. as used in TrueType fonts for describing curved segments).

  • @flaviomartinelli1803
    @flaviomartinelli1803 7 років тому +6

    Lààààst time...

  • @dannykusuma2431
    @dannykusuma2431 5 років тому

    does this mean that it is impossible to learn perfectly the universe?

  • @MrCk0212
    @MrCk0212 7 років тому

    You mentioned g4 belongs to an enormous class of rules. How do we define the class?

    • @WelchLabsVideo
      @WelchLabsVideo  7 років тому

      I was thinking as all rules we can form from ands, ors, and nots.

    • @MrCk0212
      @MrCk0212 7 років тому

      I have something not very clear. In Part 8, you define g1(x)=x1, g4(x)=(x1 and not x2 and x3 not x4) or ... . And you mentioned there are 8 functions like g1(x) and 2^16 functions like g4(x). Obviously, you have defined two sets of function but I don't quite understand what is the definition of these sets.

  • @atti1120
    @atti1120 2 роки тому

    name of music piece?

  • @otesunki
    @otesunki 6 років тому

    A 4x4 grid is good

  • @_trupples
    @_trupples 7 років тому

    the question sounds an awful lot like the birthday paradox. I'm guessing 4 rules but don't have a clue as to what a pixel count to precision formula would look like

    • @ZonkoKongo
      @ZonkoKongo 7 років тому

      had the same thought with the bd paradox.

  • @Tumbolisu
    @Tumbolisu 7 років тому

    Maybe about 9 pixels should be enough? Possibly less.

  • @chriskarampa
    @chriskarampa 7 років тому

    What IDE is that?

  • @theartistflores
    @theartistflores 7 років тому

    What was your college major?

    • @theartistflores
      @theartistflores 7 років тому +1

      I'd say computer science with some math classes?

    • @WelchLabsVideo
      @WelchLabsVideo  7 років тому +1

      I wish I had done more CS and Math. I was an EE major.

  • @LycanoxYT
    @LycanoxYT 7 років тому

    i think 6 or 7 pixels :)

  • @evyatarbaranga5624
    @evyatarbaranga5624 7 років тому +1

    I guess c)4 pix

  • @jacobkantor3886
    @jacobkantor3886 7 років тому

    >5 seems obvious which makes me doubt it

  • @non-pe8xn
    @non-pe8xn 7 років тому

    me: mmmmmmmmmmmmMORE

  • @hemapunyamoorty6020
    @hemapunyamoorty6020 7 років тому

    Why don't you make videos on c++

    • @ThePiro246
      @ThePiro246 7 років тому +2

      python is more sufficient for machine learning

  • @erikziak1249
    @erikziak1249 7 років тому

    My guess is 6 or 7.

  • @haakonpad
    @haakonpad 7 років тому

    I'm guessing 12 pixels

  • @xunglam9530
    @xunglam9530 7 років тому

    My guess is 12 pixels.

  • @mnkntmn6
    @mnkntmn6 7 років тому +1

    4 pixels because you need 4 pixels to draw two lines

  • @SuitcaseGradient
    @SuitcaseGradient 7 років тому +1

    2 pixels

  • @ultrio325
    @ultrio325 3 роки тому

    Ah yes, based and bluepilled, the fundamentals of computer science

  • @Squidward1314
    @Squidward1314 7 років тому

    @ 1:36 calling it 'astronomical' is giving way too much credit to astronomy! ;-)

  • @Kratax
    @Kratax 7 років тому +1

    Last time... I didn't learn anything. You have wrong rules. A finger is a digit sticking out of a hand. Not a bunch of pixels.

    • @tisajokt7676
      @tisajokt7676 7 років тому

      What would be your right rules, then?

    • @VulpeculaJoy
      @VulpeculaJoy 7 років тому

      Yeah, and you are an idiot with a keyboard and not a person that uses its intellect to comprehend the complexity of computer vision and machine learning.

    • @tisajokt7676
      @tisajokt7676 7 років тому +4

      MrBaronmoll Jeez, hostility not required.

    • @Kratax
      @Kratax 7 років тому

      I have explained it before. Real intelligence requires understanding of what you are looking at. That means, you get the best results by figuring out, what there really is. For example if there is an image of a motorcycle, you don't compare it to other pictures of motorcycles. You let artificial intelligence to analyze the picture. For example it might notice that there are two wheels, so it probably isn't a lama... A motorcycle is already on the list of probable candidates of the right choice, and a lama isn't. If there is a woman in front of the motorcycle, the AI doesn't get baffled. It can determine that the metallic object is probably one piece, even though it is behind this other skinny thing. The visual system needs to be good enough to see different materials, colors, and patterns, though. But why not. So, the AI continues like this, figures out parts of the object in front (looks like a lot of naked skin, has two leg looking things, two arm looking things, long hair looking things, curvy, etc...) and comes up with the correct answer. A motorcycle and a woman in front of it. The AI also sees, that the motorcycle is on a road and that there is a gas station far behind in the back, and the road continues to a mountainous terrain, behind which there seems to be sun setting down. The AI hops on to the motorcycle, grabs the woman along, and rides to new adventures. How is this possible? Because the AI developed itself also a physical body and went to the real world to look for a similar set up.

    • @VulpeculaJoy
      @VulpeculaJoy 7 років тому +2

      Yep, you just provided me with the best proof I could hope for that you, sir are full of bullshit.