Learning to See [Part 8: More Assumptions...Fewer Problems?]

Поділитися
Вставка
  • Опубліковано 2 чер 2024
  • In this series, we'll explore the complex landscape of machine learning and artificial intelligence through one example from the field of computer vision: using a decision tree to count the number of fingers in an image. It's gonna be crazy.
    Supporting Code: github.com/stephencwelch/Lear...
    welchlabs.com
    @welchlabs
  • Наука та технологія

КОМЕНТАРІ • 134

  • @WelchLabsVideo
    @WelchLabsVideo  5 років тому +59

    Correction at 10:30. Thank you to Jendrik Weise, fejfo's games, and Andrew Kay for pointing this out. I mistakenly multiplied the probability of randomly selecting 5 correctly labeled examples from a set of 16 total examples, with the remaining 11 examples incorrectly labeled, p = 2/10,000, by the number or trials 65,536 to compute the probability of this event happening one or more times. That is, I computed P("getting lucky" one or more times) = p*n. I based this off the common formula for adding the probabilities of mutually exclusive events to compute the overall probability of one or the other event occurring. As pointed out by fejfo's games, this approximation is reasonable for the n=16 case, but does not work for n=65,536.
    Instead, we need to use the binomial distribution. In thinking through this again, it was helpful for me to think about this as the probability of tossing one or more heads in n=65,536 trials for a "bent coin" where P(heads) = 2/10,000 = p. As correctly pointed out by Jendrik Weise, fejfo's games, and Andrew Kay, this probability is: 1- (1-p)^n = 0.9999979721. The conclusion is the same: "getting lucky is highly probable", but my means of getting there were incorrect. Took me way too long to correct this - sorry for the delay!

    • @foudesfilms
      @foudesfilms 5 років тому +3

      I've been watching this series again recently (it's great), and the pinned note here as well as the comments below are incredibly helpful in thinking about the probability discussed in the video. But I'm pretty sure the corrected math in the n=65,536 case is still technically wrong. It WOULD be correct if you were actually taking 65,536 samples with replacement, but that's not *exactly* what you're doing. You're looking at 65,536 rules in relation to a single sample, and the important point is we KNOW one of those rules does correctly label all squares in-sample and mislabels all squares out-of-sample. So if we run through all 65,536 rules, the probability that we will find one that exactly labels all in-sample squares and mislabels all out-of-sample squares is 1.
      This may seem an excessively pedantic point on a video two years old, but I just got really interested in the questions the video and discussion raised, so I hope you won't mind. Also, I might be wrong, and I hope someone points it out to me if so.

  • @AndrewKay
    @AndrewKay 7 років тому +118

    "A probability greater than 1"
    Nope. The approximation 1 - (1 - x)^n ~= nx is only valid when x and n are sufficiently small, and 65536 is not sufficiently small. The result should be 1 - (9998/10000)^65536 = 0.999997972. Getting an answer bigger than 1 doesn't mean the outcome is certain, it means you did something wrong.

    • @AndrewKay
      @AndrewKay 7 років тому +13

      It is further complicated by this formula not being correct in context, because it assumes each outcome is independent. That's almost the case in the first example of 8 rules, when 24 of the 28 pairs are independent and the other 4 pairs are mutually exclusive. It breaks down in the example of 65536 rules, because many more pairs of rules are dependent.
      You could say you are calculating the expected number of rules that match exactly, rather than the probability that at least one does. In that case your calculation is correct, but it's not a probability, and a number greater than 1 still doesn't mean it's certain at least one will match exactly. (In fact, it is certain, for reasons you gave in the video before this; the argument in this video doesn't show it's certain, though.)

    • @WelchLabsVideo
      @WelchLabsVideo  7 років тому +22

      Thanks for watching - yes, I should have been more clear that this is an upper bound on the probability. For a much more elegant presentation, see my source: work.caltech.edu/slides/slides02.pdf. There's also a great video lecture that goes along: work.caltech.edu/lectures.html

    • @AndrewKay
      @AndrewKay 7 років тому +18

      Yes, calculating an upper bound is a valid way of simplifying. Still, an upper bound of 13 isn't much use when you're talking about a probability; 1 is a better upper bound which takes no work to establish.
      I think the non-probabilistic argument is simpler: it's certain that at least one formula exactly matches the training set, because there's an algorithm (hinted at in the previous video) to construct such a formula for each possible training set.
      My comment was more aimed at other viewers - I know there are students who will "round down" a probability to 1 rather than realise they've made a mistake. I don't agree that the slides you linked to are more elegant - your videos are an excellent presentation of this topic.

  • @mpete0273
    @mpete0273 7 років тому +2

    This is the future of education. Animations are wonderful - colors are wonderful - annotations appear instantly - music tracks the mood of logical progression. With either a blackboard or book, this material would take twice as long for half the clarity. I sincerely appreciate the time you put into making these videos.

  • @markthesecond3380
    @markthesecond3380 7 років тому +68

    I am so lost... I'm gonna have to watch this a few times.

    • @nicat6153
      @nicat6153 7 років тому +7

      Me too, plus, I've already forgotten most of the stuff in previous episodes. I'll have to re-watch all of these after the series is finished :D

    • @markthesecond3380
      @markthesecond3380 7 років тому +2

      Saaaame

    • @velocityra
      @velocityra 7 років тому

      Just keep in mind the math towards the end is very wrong. ;)
      Another comment here by +fejfo's games has them correctly.

  • @jacobkantor3886
    @jacobkantor3886 7 років тому +35

    This and 3Blue1Brown are the best channels on youtube.

  • @jacobkantor3886
    @jacobkantor3886 7 років тому +54

    This is the first video in the series to genuinely confuse me.

    • @Friek555
      @Friek555 5 років тому +6

      Because it's the first one that's just bad. The calculations should be prefaced with an explanation of what exactly is being calculated.

    • @motbus3
      @motbus3 3 роки тому

      i dont think the video is bad.
      it's just too hard to make a 5-10 minute video to explain in a simple matter, and keeping the same subject for much more videos isn't probably and maybe too boring since in practical you can't do it anyway

    • @motbus3
      @motbus3 3 роки тому +1

      a simpler explanation of this video is that choosing simple rules and combining them with some probability and margin for error is much better than trying to do a specific logic approach. which is the base for current state of art ML

  • @blakeaustin5986
    @blakeaustin5986 7 років тому +27

    I've been binge watching your channel for the last week or so. you need more subs!!!!

  • @ryanmurray5973
    @ryanmurray5973 7 років тому +3

    1:00 measuring puppy cuteness actually requires machine learning; that next series needs this one to continue.

  • @vuvffufg
    @vuvffufg 7 років тому +6

    You deserve so much more than you receive.
    Thank you.

  • @qwertymanzzz
    @qwertymanzzz 7 років тому +12

    Your videos motivated me to learn Python image processing

  • @fca003
    @fca003 7 років тому +33

    Learning is easy, they said.
    Even a baby can do it, they said.

    • @electromorphous9567
      @electromorphous9567 7 років тому

      Karlos F but a baby can indeed do it. Just a computer can't and that's where our intelligence fails at this date. We can never make anything as conscious, thinking learning and deciding as us. For us, its easy as hell

    • @fca003
      @fca003 7 років тому

      Electromorphous
      "they" are the humans. ;)

    • @electromorphous9567
      @electromorphous9567 7 років тому

      Karlos F obviously. I know. But they meant it for other humans. Not for computers.

  • @anmolsethi5094
    @anmolsethi5094 7 років тому +4

    This is the only video from you that I have ever been confused about. I had problems understanding what you did at 4:11 and onward. I have absolutely no idea how you calculated the probability of the worst case happening. Isn't that just the probability of 5 correct samples being pulled out of a bag with 15 samples in a row? And I didn't understand why we had to think about choosing our one pixel rule from the set of 8 rules for that single pixel. Likewise for when you repeated the above with the more complex rule.
    I sort of understand how a simple rule would generalize better though because it puts less constraints on the input.
    It definitely didn't hurt me from understanding your other videos, I continued on and this series definitely helped me understand this topic much better. Thank you :)

  • @EmadGohari
    @EmadGohari 6 років тому

    Let's take a moment to also appreciate how visually appealing the style of your videos are. Up to the top with 3B1B ,but using a different approach, using physical objects to make the concepts more sensible and intuitive. Thanks for awesome videos.

  • @njrom
    @njrom 7 років тому +1

    This is my favorite series on all of youtube right now keep up the great work!!

  • @MegaRainnyday
    @MegaRainnyday 6 років тому +2

    I really enjoy the little texts at the bottom right of this video XD

  • @masdemorf
    @masdemorf 7 років тому +4

    I love this series, I'm learning a lot. Thanks.

  • @YogeshPersonalChannel
    @YogeshPersonalChannel 5 років тому

    I am watching your videos again.
    Didnt notice the funny one liners at the bottom in the end until this time. 😂
    Your videos and work is awesome. Highly appreciated!

  • @WateryIce54321
    @WateryIce54321 7 років тому +11

    Things have been clear to me up until 6:25 in this video.
    1) We chose g1(X)=x_1 out of 2,048 potential functions, so why is your probability based on 8 functions? Why those 8 in particular?
    2) Where did 2^16 come from for the number of potential rules for g4 at 9:45? I thought there were, again, 2048 potential functions to choose from. Is it (16 inputs)*(2 outputs)*(2048 potential functions)? I'm grasping for straws and re-watching parts 7 and 8 have not helped.
    3) Where are you getting this n*p = 8*P(5 matches) = (16/10,000) approximation from at 8:25? Isn't this a Geometric Distribution, where p = 'chance that a population permutation equals our population sample' and n = 'the number of rules this is being tested on'. Where we're interested in getting the probability of all n functions matching only the training set (i.e. overfitting).
    Probability, why must you always complicate everything!

    • @Rahulmayhem
      @Rahulmayhem 6 років тому +3

      I've watched this video 10 times already and I still have the exact same questions

    • @shahtamzid
      @shahtamzid 6 років тому +7

      1) We made a new assumption: the rule that defines the data is dependent on only ONE of x1, x2, x3, or x4 and so the 3 remaining variables are irrelevant (recall his example of how a child recognizes a dog). Hence, we can either bet that it depends on x1 only, or on x2 only, or on x3 only, or on x4 only. And in each of the 4 cases, there are 2 possible classifications: either 0 implies '+' class and 1 implies '-' class or the other way around. Hence we narrowed down our possible set of rules to 8.
      2) Here we're trying to determine the sample size (i.e. total number of possible rules) out of which g4 would have been selected randomly. We see that g4 takes into account all four values (x1 through x4) and the state of each of the grids in the training set (i.e., it is highly specific to the training set in contrast with the more general rule g1). Since there are 16 possible grids and each can be either + or -, the most non-generalizable (i.e. highly specific) rule would simply treat each grid as a standalone and assign it one of 2 possible classes: + or -. These rules belong to a sample size of 2^16 possible rules (2 possibilities (+ or -) for grid 1 * 2 possibilities for grid 2 * .... * 2 possibilities for grid 16 = 2^16).
      Here I think he has assumed that we select the rule randomly INDEPENDENT of the training data (i.e. before looking at what data is classified as + and what data is -), so the 2 possibilities remain open for all 16 grids. The 2048 figure was for when we chose the rule AFTER looking at the training data, i.e., we could already fix the classification for 5 of the grids, so it was 2^(16-5) = 2048 possible rules.
      3) As he and others have pointed out, that calculation is not completely accurate, just an approximation. The correct way would be to do 8 * (2/10000 * (9998/10000)^7) because there are 8 possible scenarios, and in each scenario, one of the rules gets lucky (manages to avoid all mislabels) with 2/10000 probability and the remaining 7 do not get lucky with 9998/10000 probability. Since 9998/10000 ≈ 1, he simplified the calculation that way. I think this better fits a binomial distribution than a geometric distribution.

    • @xelaxander
      @xelaxander 6 років тому +1

      Shah Rahman Thanks for the great explanation. I get that he is preferring simple functions because that's how nature seems to work. But "simple" seems more like a trait of the representation of a function. (You could just list all preimage - image pairs; Possible counterargument: Information density metrics?)
      What actually seems to happen is that he takes a random sample A from the function space and then minimizes over that sample. Call the set of overfitting functions O.
      We are then looking for P(A n O not Empty).
      How do we decide if f in O?
      Edit: B.S. question

    • @strawberrycow8141
      @strawberrycow8141 5 років тому +2

      I managed to get as far as @7:27 before I got lost. However, when I watched the video again, what helped me, was pausing @4:50! It is helpful to not lose sight of what Stephen is saying here: We want to avoid the situation where our rule performs really well for sample data but terrible for data it has never seen. The most extreme case of that would be when our rule would predict ALL of the unseen scenarios wrongly. That is, what we don’t want is to decide upon a rule BASED on our sample data which works PERFECTLY for the scenarios it sees BUT will fail for every situation it has not seen! The image @4:50 illustrates the most extreme example of this. You will note that Stephen has labelled each scenario on the bottom right hand corner with TP=True Positive, TN=True Negative, FP=False Positive and FN=False Negatives. What Stephen is trying to say is we DON’T want a rule that produces FPs and FNs for unseen data. The way to minimise the chances of this happening is to ensure our sample chosen for training is truly random!

    • @smolytchannel5062
      @smolytchannel5062 4 роки тому

      That n*p was probably an approximation of 2/10000 + 2/9999 + 2/9998 etc

  • @sidsarasvati
    @sidsarasvati 6 років тому

    Incredible! This is the best intro course on machine learning for both beginners and advanced level. Keep it up

  • @owendeheer5893
    @owendeheer5893 7 років тому +1

    i really love this series!!! so i would like more frequent videos :D very interesting stuff

  • @lolzist
    @lolzist 7 років тому

    I am a junior comp sci major in college and want to go into machine learning. I can't thank you enough for making these videos. You a great style of web video Nd all your information is clear and concise.
    Thank you

  • @cloudgalaxy9231
    @cloudgalaxy9231 7 років тому

    Keep it up man. These videos are beautiful. Everyone will come around soon.

  • @BrendanxP
    @BrendanxP 7 років тому +2

    Just stumbled upon your channel thinking all your videos are like a year old when I suddenly realised this video was posted yesterday and I have to wait to finish this series 😂

  • @YandiBanyu
    @YandiBanyu 7 років тому +1

    Finally, it's worth the wait!

  • @pebre79
    @pebre79 7 років тому +6

    Your voice and nackground music reminds me of the Arrested Development recap

    • @WelchLabsVideo
      @WelchLabsVideo  7 років тому +1

      I will take that as a huge compliment. Thanks for watching!

  • @andreyyaskulsky5029
    @andreyyaskulsky5029 7 років тому

    You are awesome!!! Thank you very much!!! I find your explanation as the best in the whole internet!!!

  • @vitorpinheiro1856
    @vitorpinheiro1856 6 років тому

    I need to say thank you, I'm learning a lot. Your series are really interesting and you're a great explainer.

  • @jojojorisjhjosef
    @jojojorisjhjosef 7 років тому +3

    I hope you're as passionate for your work as I am

  • @jewe37
    @jewe37 7 років тому +68

    please fix your probability math D:

    • @fejfo6559
      @fejfo6559 7 років тому +30

      Glad I'm not the only one that was annoyed by it. the formula isn't simply p*n but 1-(1-p)^n
      so for the first you get 1-(1-(2/10000))^8 or 0.00159888045 so 0.0016 was actually a decent approximation
      but for the second you get 1-(1-(2/10000))^65536 or 0.999997972 so 13.1 wasn't a good approximation.

    • @Kroppeb
      @Kroppeb 7 років тому +5

      Jendrik Weise yes fix it.
      I was just getting a mental breakdown while that was on screen (and after)!!!

    • @jewe37
      @jewe37 7 років тому +1

      Robbe Pincket ehh yeah no that might be a bit much....

    • @Kroppeb
      @Kroppeb 7 років тому +4

      Jendrik Weise sorry there is just one thing in math i can't stand and that is this. The whole field of probability started with some mathematicians discovering you can't just do p*n. Please let's have respect for this.

    • @jewe37
      @jewe37 7 років тому +1

      Robbe Pincket there is worse things than messing up probabilities, come on

  • @paedrufernando2351
    @paedrufernando2351 Рік тому +1

    @8:05 Welch, says that the training data happened to be the correct data on which the rule was perfectly classifying and so we are going to consider the rule that has the less chance of perfrectly working on the data that we used while creting rule1. so compared to the bigger rule G1 has the least possibility of compared to the bigger rule that fit (closer to 1 probability)...so G1 wins and that is the crux of this lecture.Also if you see the larger the rulle the more it is become tailor fit towards the test data and it cannot be more flexible i.e tighly coupled ; so intuitively also G1 wins the race to be more generic aka generalized .thus is the same for th erest 2048 rule...compare and eliminate.

  • @ptyamin6976
    @ptyamin6976 7 років тому +8

    Yea, this video seems like nonsense.
    And others are saying the probability calculations were wrong. This is the only video in your series I didn't get the gist of. I still don't know what random sampling means.

  • @DOYbuffPORrec
    @DOYbuffPORrec 7 років тому +6

    Just in my free time!

  • @guspus3050
    @guspus3050 7 років тому

    Im slowly getting the hang of this!

  • @paedrufernando2351
    @paedrufernando2351 Рік тому +2

    @5:34 is the situation that we are using to evaluate...ie getting the sampled data right and missing all others(i.e test data) as per rule G1(i.e we are trying to evalute the worst possible performance of the rule G1 and hwo to be sure this is the right setup we are valuting our rule for worst possible performance?? Answer is, .even if we take a random sample of the superset the correctly identifed perices are not coming up often ..see @5:17 about random smpling, where he says that we didn't randomy sample our data and then choose G1 and got the news that it fitted perfectly but)..The picture at @5:34 is the superset of all possibilities of 4squares ; And G1 to be successful only at the training sample and not at the other samples is just 0.0002(This is the crux)..This compared to the larger rule i.e G4 will eventually turnout that bigger the rule(almost 1 probability @10:26).i.e .more chances it has on working great only on sampled data and wrong (i.e misclassifying) on the test data aka not generalized as a rule...(I wrote the above explanation for some people who say this lecture was confusing which included me too unitll I rewatched it many times over many days to eventually shake my head into it)

  • @Ironypencil
    @Ironypencil 7 років тому +26

    My name isn't Greg :(

    • @gregbernstein7524
      @gregbernstein7524 7 років тому +29

      Mine is and I got a little freaked out.

    • @WelchLabsVideo
      @WelchLabsVideo  7 років тому +12

      Haha, that's awesome!

    • @offchan
      @offchan 7 років тому +3

      I guess that he randomly choose one name to freak that person out.

    • @want-diversecontent3887
      @want-diversecontent3887 6 років тому +1

      Welch Labs
      My best friend is greg.
      Him and I were watching this video.
      He was surprised.

  • @abdullahawisimulaha
    @abdullahawisimulaha 6 років тому

    I found it much enjoyable than watching tv shows. Who are you?

  • @jonathanlimm7221
    @jonathanlimm7221 7 років тому

    That confession at the end, though...

  • @vitorpinheiro1856
    @vitorpinheiro1856 6 років тому

    Let me ask you, what's music you used in this video? It's great.

  • @ekaterinaburakova8629
    @ekaterinaburakova8629 5 років тому

    Hey there,
    From 6:30 on: why do you consider rules which only involve one variable (tile)?

  • @Rrrr45569
    @Rrrr45569 7 років тому

    What software is being used to develop these approaches? I am a high school freshman and am interested in learning more about computer science. I love watching these. Keep 'em coming.

    • @WelchLabsVideo
      @WelchLabsVideo  7 років тому

      Jupyter notebook!

    • @Rrrr45569
      @Rrrr45569 7 років тому

      Thank you. I will take a look. Perhaps in the future you could do a project using micro-controllers and create physical devices that incorporate these machine learning approaches. Do you recommend any courses or websites to get a more in depth introduction to machine learning and how it works? Thanks again.

  • @AngelValis
    @AngelValis 7 років тому

    Wait... were we taking the color into account as well as the placement of it? In that case, wouldn't the full set of possibilities be 81?

  • @jiaxinpeng6054
    @jiaxinpeng6054 5 років тому

    很有意思,视频质量很好

  • @piranha031091
    @piranha031091 5 років тому

    Is that essentially a mathematical demonstration of Occam's razor?

  • @cooltv2776
    @cooltv2776 7 років тому

    is this more or less you going back over what got people to where we are now in machine learning, but in a more interesting way? and in a way that actually teaches people what got us to this point

  • @PiercingSight
    @PiercingSight 7 років тому

    So... is this the last video? Or are you going to give us some closure on the whole counting fingers thing? I'd love to see these concepts actually applied, and it seems that doing the application will help a lot of people in the comments ;P
    Awesome video by the way. This one got rather abstract, and I think it took too long to get a simple point across (we can and should make assumptions because the simpler the rule is, the more it can generalize), hence why I think many people got lost. It was almost as if you were making the video to teach a computer more than making the video to teach a person. When teaching people about teaching computers, that can happen, but still, simplifying for human brains to understand would make it easier for people to later translate it into teaching computers the same thing.
    Keep it up, man! I really want to see a finale/conclusion to this series. :)

    • @jonathanlimm7221
      @jonathanlimm7221 7 років тому +3

      He mentioned a next episode, so this probably was not the last one.

    • @ZonkoKongo
      @ZonkoKongo 7 років тому +1

      DaneGraphics there will be a couple more, you can see the result in the introduction.

  • @alejandrofabiangarcia5917
    @alejandrofabiangarcia5917 4 роки тому +1

    0:58 good joke (in small words at down part of the video)

  • @wael.m2030
    @wael.m2030 7 років тому +1

    goooooooooooooooood gob! keep it up :)

  • @foudesfilms
    @foudesfilms 5 років тому

    I'm enjoying watching this series a year and a half after it was published. I have two questions that I'd like to ask here:
    1. Is g1(x) = x1 functionally equivalent to g1(x) = (x1∧¬x2∧¬x3∧¬x4) ∨ (x1∧x2∧¬x3∧¬x4) ∨ (x1∧¬x2∧x3∧¬x4) ∨ (x1∧¬x2∧¬x3∧x4) ∨ (x1∧x2∧x3∧¬x4) ∨ (x1∧x2∧¬x3∧x4) ∨ (x1∧¬x2∧x3∧x4) ∨ (x1∧x2∧x3∧x4)? And if so, how do we differentiate it from a rule like g4, or any of the other 65,534 rules expressed as distinct Boolean functions? The answer in Part 7 to "How many distinct rules (or Boolean functions) perfectly fit our data?" seemed almost like sleight of hand, because it substituted for my lay understanding of "rule" a Boolean description for every possible arrangement of 16 four-tile blocks (minus the 5 in the example). But this video seems to turn around and say there *is* a qualitative difference between a rule like g1(x) = x1 and g1(x) = the much longer Boolean expression, even if they are functionally equivalent, and that difference makes g1(x) = x1 much more generalizable.
    2. From about 6:30 onwards, when you talk about how testing more rules on a single sample makes it more and more likely that one of those rules will get "lucky" by chance, is this basically the same problem as p-hacking?
    Thanks for the great content!

  • @atti1120
    @atti1120 2 роки тому

    I'm confused isn't the point to get "lucky"? Like there is no "true" function which has magical understanding, there is only a function which produce the outcome we want right? So why does chance matter, if it performs well on unseen data?

  • @TheKirbymaster3
    @TheKirbymaster3 7 років тому +7

    I watched this entire series for like 4 times now...and i just feel stupid...I just don't get it. Like any of it. From choosing the rules, how they actually do the "looking at samples"...I'm just so confused...

    • @watchingvids6789
      @watchingvids6789 7 років тому

      Then it's time to think outside the box. Or outside your sample space is another way to put it.

    • @umnikos
      @umnikos 7 років тому

      TheKirbymaster3 The whole episode is about why simpler rules are better and that they have less statistical chance of being wrong.

    • @umnikos
      @umnikos 7 років тому +2

      TheKirbymaster3 I feel like he did some fokus pokus too, but let's stick with him and see if it all just clicks in the end.

    • @watchingvids6789
      @watchingvids6789 7 років тому

      I know, Sometimes however I tend lead with a dead hammer though. In other words force an answer.
      Helpful and fast, though sometimes it doesn't get the result you want. However I tend to find experience
      always adds a little more weight to the argument.

  • @Carlobergh
    @Carlobergh 7 років тому

    What is the song around the 8:10 mark, and where can I get it?

  • @Stefkostov
    @Stefkostov 7 років тому

    awesome

  • @LapisGarter
    @LapisGarter 7 років тому +1

    Boy I can see how letting AI get it wrong and ignore stuff would lead to a world of death.

  • @tassioleno808
    @tassioleno808 7 років тому

    WOOOW

  • @fossil98
    @fossil98 7 років тому +2

    Your probability is wrong. Its still high, but a probability can't be over one.

  • @strawberrycow8141
    @strawberrycow8141 5 років тому

    @7:27. What is an example? What is a misclassified example? Can anyone please help?

  • @datsnek
    @datsnek 7 років тому +1

    What a fox! You got my real name!

  • @iustinianconstantinescu5498
    @iustinianconstantinescu5498 7 років тому +1

    1:07 LOL

  • @yogeissler1666
    @yogeissler1666 6 років тому

    The assumtion of a correct rule beeing relatively simple is normaly quite trivial by ockham's razor.
    Because when given n as of now (after the training data) correct rules we learned that the simple Case-matching Nature of the complicated rules is not that great (expert algorithms) but the rule needs to grasp the concepts.

    • @WelchLabsVideo
      @WelchLabsVideo  6 років тому +1

      I would have completely agreed with you until I read Tom Mitchell's Machine learning while researching this series - specifically page 65.

    • @yogeissler1666
      @yogeissler1666 6 років тому

      Welch Labs
      Yes, but i think that using occam's razor only as a principle stating:
      "If you can find an aribitrary set of criteria that narrows the set S of possible solutions of a problem to a subset s, fufilling theese criteria, to a minimum size, then the chance of having a working hypothesis included in s is smaller than in S and therefor (why ?) the s-hypothesis is more valid."
      , which Mr. Michell does in his counterarguments, is a bit far from the Idea of "simple".
      Because in my opinion, [2, 12367, 15092688622113788323693563264538101449859497] (oeis.org/A082912) is not a simple sequence, while [1,2,3] is, even if they both only have 3 members.
      And working backwards from a (possibly) perfect Hypothesis:
      There are infinitely many, more complicated Versions that do the same but are ultimately worse in their conciseness than the simplest (while not nessesary simple), which I think should be the goal to find.

  • @Adowrath
    @Adowrath 7 років тому

    I really like watching these. But, I think I haven't learned a thing about how I should realize any of this in a program..

  • @NamelessNr1
    @NamelessNr1 7 років тому

    I understand where you're going with the argument, just not how, I'm confused. Like why do you make it sound like a good thing that it only managed to pick 5 right squares, most were wrong?

  • @khanetor
    @khanetor 7 років тому

    I do not quite get the part where g1 is better than g4. Could you explain it in another way?

    • @fudgesauce
      @fudgesauce 7 років тому +2

      I'd say it this way, my means of an analogy.
      No matter how many data points you have in simple graph, you can always come up with any number of functions which exactly equal at each data point. When there are a lot of data points, the function will tend to be large. For instance, if you have N data points, a polynomial of order N can always exactly match at those data points. Such functions tend to be "over fitted". A lower order polynomial will be close but not exactly right at most data points, but its value between the specified points will tend to be more reasonable -- that is, it generalizes better.
      en.wikipedia.org/wiki/Overfitting

  • @huhneat1076
    @huhneat1076 3 роки тому

    the guy named Greg watching 0_0

  • @mercurysimos1680
    @mercurysimos1680 7 років тому

    Well, sure, the probability of being totally wrong is very very low, but that doesn't make our rule successful. Also, what happens with more complex data, for example 5x5 or 9x9 tiles? Finding rules that simple is surely impossible when the complexity increases. Man, these cliffhangers are the curse and the joy of learning!

  • @sojourner_303
    @sojourner_303 5 років тому

    Why are the captions in portuguese?

  • @dgimop
    @dgimop 7 років тому

    Probability > 1? How is that possible?

  • @ihzbc
    @ihzbc 7 років тому

    i feeling myself and all humanity so idiot After watching this video

  • @1Thor61storm8
    @1Thor61storm8 7 років тому +6

    Am I the only one who has to pause to read those little right bottom corner sentences and gets annoyed by the video UI which don't let me read it? Please, put those sentences a little bit more to the top :)

  • @ba_livernes
    @ba_livernes 7 років тому +1

    8:18 You wrote "becuase". Great (like really great) video(s) otherwise

  • @therealhaxwell
    @therealhaxwell 4 роки тому

    Who's Greg?

  • @jameselmore1780
    @jameselmore1780 7 років тому

    Secodn

  • @otesunki
    @otesunki 5 років тому

    7:39 who else saw h5?

  • @Yotam1703
    @Yotam1703 6 років тому

    whatever, STEVEN.

  • @jsduenass
    @jsduenass 6 років тому

    i got a little confused in this one

  • @electromorphous9567
    @electromorphous9567 7 років тому

    Are u an alien? Or in contact with any alien whose helping u with all this?

  • @jasbirsingh8159
    @jasbirsingh8159 7 років тому +1

    just create a decision tree classifier kido :)

  • @NourMuhammad
    @NourMuhammad 7 років тому

    Please please remove the music, it is really disturbing.
    Not all your viewer are native speakers!
    I am doing more effort to try to First understand your English(which is very nice by the way) second understand the concept.
    The music doesn't allow me to do both at the same time especially this episode, the music is very loud.
    Kindly remove the music, you already doing a very nice job with your graphics and the simple approach you are using in the illustration.
    Thanks

  • @Kratax
    @Kratax 7 років тому

    After the start about human brain learning I didn't understand anything. Maybe my brain just tries to forget all this grid nonsense. I skip to vectors.

  • @MahatasinAzad
    @MahatasinAzad 2 роки тому

    Needlessly complicated explanation! If you don't have a very strong grasp of what is being discussed, chances of you learning anything new are slim. Good luck. I appreciate the time and effort by the creator though. Take it as a constructive feedback and my one line review of the video.

  • @gal766
    @gal766 20 днів тому

    The music is way to loud! We care for what you have to say not for the bloody music which sounds nice to one guy and terrible to another.