Why Information Theory is Important - Computerphile

Поділитися
Вставка
  • Опубліковано 27 сер 2024
  • Zip files & error correction depend on information theory, Tim Muller takes us through how Claude Shannon's early Computer Science work is still essential today!
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottsco...
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

КОМЕНТАРІ • 151

  • @mba4677
    @mba4677 2 роки тому +331

    "a bit"
    "a bit more"
    after years living with the pack of geniuses, he had slowly become one

    • @laurenpinschannels
      @laurenpinschannels 2 роки тому +7

      ah yes I recognize this sense of genius. it's the same one people use when I say that doors can be opened. "thanks genius" I am so helpful

    • @VivekYadav-ds8oz
      @VivekYadav-ds8oz 3 місяці тому +1

      "he had slowly become *one* "

    • @dudeman0401
      @dudeman0401 15 днів тому

      ​@@laurenpinschannels it's a double entendre, one coin can convey one bit ("a bit") of information, while two coins can convey two bits of information ("a bit" + "a bit more")

  • @Ziferten
    @Ziferten 2 роки тому +302

    EE chiming in: you stopped as soon as you got to the good part! Shannon channel capacity, equalization, error correction, and modulation are my jam. I'd love to see more communications theory on Computerphile!

    • @Mark-dc1su
      @Mark-dc1su 2 роки тому +12

      If anyone wants an extremely accessible intro to these ideas, Ashby's Introduction to Cybernetics is the gold standard.

    • @hellowill
      @hellowill 2 роки тому +4

      Yeah feels like this video was a very simple starter

    • @travelthetropics6190
      @travelthetropics6190 2 роки тому +1

      Greetings EE! those are the first topics on our "communications theory" subject back at Uni.

    • @OnionKnight541
      @OnionKnight541 2 роки тому +1

      Hey! What channel is that stuff on ? I'm still a bit confused by IT

    • @mokovec
      @mokovec 2 роки тому +2

      Look at the older videos on this channel - prof. Brailsworth already covered a lot of the details and history.

  • @louisnemzer6801
    @louisnemzer6801 2 роки тому +218

    This is the best unscripted math joke I can remember!
    How surprised are you?
    >A bit
    One bit?

    • @JavierSalcedoC
      @JavierSalcedoC 2 роки тому +69

      _Flips 2 coins_ "And now, how surprised are you?"
      "A bit more"
      *exactly*

    • @068LAICEPS
      @068LAICEPS 2 роки тому +1

      I noticed during the video but after reading here now I am laughing

  • @roninpawn
    @roninpawn 2 роки тому +78

    Nice. This explanation ties so elegantly to the hierarchy of text-compression. While I've, many times, been told its mathematically provable that there is no more efficient method... This relatively simple explanation leaves me feeling like I understand HOW it is mathematically provable.

  • @LostTheGame6
    @LostTheGame6 2 роки тому +97

    The way I like to do that conclusion would be to say : ok let's describe a population where everyone plays once.
    In the case of the coin flip, if a million people play, you need to, on average, give the name of 500k people who got tails (or heads). Otherwise your description is incomplete.
    In the case of the lottery, you can just say "no one won", or just give the name of the winner. So, you can clearly see how much more information is needed in the first case.

    • @MrKohlenstoff
      @MrKohlenstoff 2 роки тому +2

      That's a nice explanation!

    • @sanferrera
      @sanferrera 2 роки тому

      Very nice, indeed!

    • @NathanY0ung
      @NathanY0ung 2 роки тому +2

      This makes me think of something like an ability to correctly guess. For a coin flip, which requires more information, it's harder to guess the outcome than the wining of a lottery.

  • @elimgarak3597
    @elimgarak3597 2 роки тому +41

    I believe Popper made this connection between probability and information a bit earlier on his Logik Der Forschung (1934 Shannon's first paper was written in 1949). That's why he says that we ough to search for "bold" theories, that is, theories with low probability and thus more content. Except, at first, he used a simpler formula: Content(H) = 1-P(H), where H is a scientific hypothesis.
    Philosopher's role on the history of logic and computer science is a bit underrated and obscured imo (see for example, Russell's type theory).
    Btw, excellent explanation. Please, bring this guy more often.

    • @yash1152
      @yash1152 2 роки тому +3

      thanks a lot for bringing philosophy up in here 😇

    • @Rudxain
      @Rudxain Рік тому

      This reminds me of quantum superposition

  • @Double-Negative
    @Double-Negative 2 роки тому +56

    The reason we use the logarithm is because it turns multiplication into addition.
    The chances of 2 independent events X and Y happening is P(X)*P(Y)
    if entropy(X) = -log(P(X))
    entropy(X and Y) = -log(P(X)*P(Y)) = -log(P(X))-log(P(Y)) = entropy(X) + entropy(Y)

    • @PetrSojnek
      @PetrSojnek 2 роки тому +23

      isn't that more of a result of using logarithm, instead of reason of using logarithm? It feels like using logarithm for better scaling was still the primary factor.

    • @entropie-3622
      @entropie-3622 2 роки тому +8

      @@PetrSojnek There are lots and lots of choices for functions that model diminishing returns, but only the log functions will turn multiplication into addition.
      Considering how often independent events show up in probabilistic theory it makes a lot of sense to use the log function for this specific property and it will yield all kinds of nice results that you would not see if you were to use another diminishing returns model.
      If we go by the heuristic of it representing information this property is fairly integral.
      Because you would expect that the total information for multiple independent events should come out as the sum of the information about the singular events.

    • @GustavoOliveira-gp6nr
      @GustavoOliveira-gp6nr 2 роки тому +1

      Exactly, the choice of the log function is more due to the addition property than about diminishing returns.
      Also, it is totally related to the number of bits it uses to code a sequence of fair coins using binary digits. 1 more digit on a sequence changes the sequence probability by a factor of 2 while adding exactly 1 more bit of information, which works well with the logarithm formula.

    • @temperedwell6295
      @temperedwell6295 Рік тому +1

      The reason for using logarithm to base 2 is that there are 2^N different words of length N formed with the alphabet {H,T}; i.e., length of word =log_2 number of words. The reason for the minus sign is so that N gives a measure of the amount of information.

  • @gaptastic
    @gaptastic 2 роки тому +19

    I'm not gonna lie, I didn't think this video was going to be interesting, but man, it's making me think about other applications. Thank you!

  • @agma
    @agma 2 роки тому +9

    The bit puns totally got me 🤣

  • @travelthetropics6190
    @travelthetropics6190 2 роки тому +10

    This and Nyquist-Shannon sampling theorem are two of the buildings block of communication as we know today. So we can say even this video is brought to us by those two :D

  • @Jader7777
    @Jader7777 2 роки тому +8

    Coffee machine right next to computer speaks louder than any theory in this video.

  • @scitortubeyou
    @scitortubeyou 2 роки тому +35

    "million-to-one chances happen nine times out of ten" - Terry Pratchett

    • @-eurosplitsofficalclanchan6057
      @-eurosplitsofficalclanchan6057 2 роки тому +2

      how does that work?

    • @AntonoirJacques
      @AntonoirJacques 2 роки тому +6

      @@-eurosplitsofficalclanchan6057 By being a joke?

    • @IceMetalPunk
      @IceMetalPunk 2 роки тому +5

      "Thinking your one-in-a-million chance event is a miracle is underestimating the sheer number of things.... that there are...." -Tim Minchin

    • @davidsmind
      @davidsmind 2 роки тому +2

      Given enough time and iterations million to one chances happen 100% of the time

    • @hhurtta
      @hhurtta 2 роки тому +4

      @@-eurosplitsofficalclanchan6057 Terry Pratchett knew human behavior and reasoning really well. We tend to exaggerate a lot, we have trouble comprehending large numbers, and we are usually very bad at calculating probabilities. Hence we often say one-in-a-million chance when it's actually much lower. On the other hand, one-in-a-million events do occur much more often than we intuitively expect, when iterating enough, like brute forcing guessing a 5 letter password (abt 1 in 12 millions).

  • @CristobalRuiz
    @CristobalRuiz 2 роки тому +4

    Been seeing lots of documentary videos about Shannon lately. Thanks for sharing.

  • @drskelebone
    @drskelebone 2 роки тому +6

    Either I missed a note, there's a note upcoming, or there is no note stating that these are log_2 logarithms, not natural or common logarithms.@
    @5:08. "upcoming" is the winner, giving me log_2(1/3) ~= 1.585 bits of information.

  • @YouPlague
    @YouPlague 2 роки тому +1

    I already knew everything he talked about, but boy this was such a nice concise way of presenting it to laymen!

  • @gdclemo
    @gdclemo 2 роки тому +5

    You really need to cover arithmetic coding, as this makes the relationship between Shannon entropy and compression limits much more obvious. I'm guessing this will be in a followup video?

  • @DeanHorak
    @DeanHorak 2 роки тому +3

    Greenbar! Haven’t seen that kind of paper used in years.

  • @Juurus
    @Juurus 2 роки тому +1

    I like how there's almost every source of caffeine on the same computer desk.

  • @nathanbrader7591
    @nathanbrader7591 2 роки тому +12

    3:41 "So 1 in 2 is an odds of 2, 1 in 10 is an odds of 10" That's not right: If the probability is 1 in x then the odds is (1/x)/(1-(1/x)). So, 1 in 2 is an odds of 1 and 1 in 10 is an odds of 1/9.

    • @patrolin
      @patrolin 2 роки тому +2

      yes, probability 1/10 = odds 1:9

    • @BergenVestHK
      @BergenVestHK 2 роки тому +4

      Depends on the system, I guess. Where I am from, we would say that the odds are 10, when the probability is 1/10. I know you could also call it "one-to-nine" (1:9), but that's not in common use here. Odds of 10 would be correct here.

    • @nathanbrader7591
      @nathanbrader7591 2 роки тому

      @@BergenVestHK Interesting. Where are you from?

    • @BergenVestHK
      @BergenVestHK 2 роки тому

      @@nathanbrader7591 I'm from Norway. I just googled "odds systems", and found that there are supposedly three main types of odds: "fractional (British) odds, decimal (European) odds, and moneyline (American) odds".
      I must say, that seeing as Computerphile is UK based, I do agree with you. I am a little surprised that they didn't use the fractional system in this video.
      However, I see that Tim, the talker in this video, previously studied in Luxembourg and the Netherlands, so perhaps he imported the European decimal odds systems from there. :-)

    • @nathanbrader7591
      @nathanbrader7591 2 роки тому +2

      @@BergenVestHK Thanks for this. That explains his usage which I take to be intentionally informal for an audience perhaps more familiar with gambling lingo. I'd expect (hope) that with a more formal discussion, the term "odds" would be reserved for the fractional form as it is used in statistics.

  • @tlrndk123
    @tlrndk123 11 місяців тому +1

    the comments in this video are surprisingly informative

  • @CarlJohnson-jj9ic
    @CarlJohnson-jj9ic Рік тому

    Boolean algebra is awesome!!!: Person(Flip(2), Coin(Heads,Tails)) = Event(Choice1, Choice2) == (H+T)^2 == (H+T)(H+T) == H^2 + 2HT + T^2 (notice coefficient orderings) where the constant coefficient is the frequency of the outcome and the exponent or order is the amount of times the identity is present in the outcome. This preserves lots of the algebraic axioms which are largely present in expanding operations. If you try to separate out the object and states from agents using denomination of any one of the elements, you can start to be able to combine relationships and quantities with standard algebra words with positional notation(I like abstraction be used as the second quadrant, like exponents are in the first, to resolve differences of range in reduction operations from derivatives and such) polynomial equations to develop rich descriptions of the real world and thus we may characterize geometrically the natural paths of systems and their components. These become extraordinarily useful when you consider quantum states and number generators which basically describe the probability of events in a world space which allows one to rationally derive the required relationships elsewhere, events or agents involved by stating with a probability based on seemingly disjoint phenomena, i.e. coincident and if we employ a sophisticated field ordering, we can look at velocities of gravity to discern what the future will bring. Boolean algebra is awesome! Right up there with the placeholder-value string system using classification of identities.

  • @clearz3600
    @clearz3600 2 роки тому +1

    Alice and Bob are sitting at a bar when Alice pulls out a coin, flips it and says heads or tails.
    Bob calls out heads while looking on in anticipation.
    Alice reveals the coin to be indeed heads and asks how surprised are you.
    A bit proclaims Bob.

  • @TheArrogantMonk
    @TheArrogantMonk 2 роки тому +2

    Extremely clever bit on such a fascinating subject!

  • @laurenpinschannels
    @laurenpinschannels 2 роки тому +1

    if you don't specify what base of log you mean, it's base NaN

  • @adzmarsh
    @adzmarsh 2 роки тому

    I listened to it all. I hit the like button.
    I did not understand it.
    I loved it

  • @TheNitramlxl
    @TheNitramlxl 2 роки тому +1

    A coffee machine on the desk 🤯this is end level stuff

  • @sean_vikoren
    @sean_vikoren 2 роки тому +1

    I find my best intuition of Shannon Entropy flows from Chaos Math.
    Plus I get to stare at clouds while pretending to work.

  • @PopescuAlexandruCristian
    @PopescuAlexandruCristian Місяць тому

    Not sure but for arithmetic encoding we should get better results then the entropy because we have "fractional bits" there right ?

  • @oussamalaouadi8521
    @oussamalaouadi8521 2 роки тому +8

    I guess information theory is - historically - a subset of communications theory which is a subset of EE.

    • @sean_vikoren
      @sean_vikoren 2 роки тому +8

      Nice try. Alert! Electrical Engineer in building, get him!

    • @eastasiansarewhitesbutduet9825
      @eastasiansarewhitesbutduet9825 2 роки тому +2

      Not really. Well, EE is a subset of Physics.

    • @oussamalaouadi8521
      @oussamalaouadi8521 2 роки тому

      @@eastasiansarewhitesbutduet9825
      Yes EE is a subset of Physics.
      Information theory was coined solving EE problems ( transmission of information, communication channel characterisation and capacity, minimum compression limit, theoritical model for transmission.. etc) , and Shannon himself was an EE.
      Despite the extended use of information theory in many fields such as computer science and statistics and physics, it's historically an EE thing.

    • @nHans
      @nHans 2 роки тому +2

      ​@@oussamalaouadi8521 Dude! Engineering is nobody's subset! It's an independent and a highly rewarding profession-and it predates science by several millennia.
      Engineering *_uses_* science. It also uses modern management, finance, economics, market research, law, insurance, math, computing and other fields. That doesn't make it a "subset" of any of those fields.

  • @Lokesh-ct8vt
    @Lokesh-ct8vt 2 роки тому +3

    Question : is this entropy in anyway related to the thermodynamic one?

    • @temperedwell6295
      @temperedwell6295 Рік тому +3

      I am no expert, so please correct me if I am wrong. As I understand, entropy was first introduced by Carnot, Clausius, and Kelvin as a macroscopic quantity whose differential temperature is integrated with respect to to give energy. Boltzman was the first to relate macroscopic quantities of thermodynamics, i.e., heat and entropy to what is happing on the molecular level. He discovered that entropy is related to the number of microstates associated to a macrostate, and as such is a measure of disorder of the system of molecules. Nyquist, Hartley, and Shannon extended Boltzman's work by replacing statistics on microsystems of molecules to statistics on messages formed from a finite set of symbols.

    • @danielbrockerttravel
      @danielbrockerttravel 5 місяців тому

      Related but not identical because the thermodynamic one still hasn't been worked out and because Shannon never defined meaning. I strongly suspect that solving those two will allow for a unification.

  • @DrewNorthup
    @DrewNorthup 2 роки тому

    The DFB penny is a great touch

  • @Wyvernnnn
    @Wyvernnnn 2 роки тому +15

    The formula log(1/p(n)) was explained as if it was arbitrary, it’s not

    • @OffTheWeb
      @OffTheWeb 2 роки тому +2

      experiment with it yourself.

  • @danielg9275
    @danielg9275 2 роки тому +2

    It is indeed

  • @Mark-dc1su
    @Mark-dc1su 2 роки тому +2

    I'm reading Ashby at the moment and we recently covered Entropy. He was very heavy handed with making sure we understood that the measure of Entropy is only applicable when the states are Markovian, or that the state the system is currently in is only influenced by the state immediately preceding it. Does this still hold?

    • @ConnorMcCormick
      @ConnorMcCormick 2 роки тому +2

      You can relax the markovian assumption if you know more about your environment. You can still compute the entropy of a POMDP, it just requires guesses at the underlying generative models + your confidence in those models

  • @elixpo
    @elixpo Рік тому

    This explanation was really awesome

  • @MrVontar
    @MrVontar 2 роки тому

    stanford has a page about the entropy in the english language, it is interesting as well

  • @David-id6jw
    @David-id6jw 2 роки тому

    How much information/entropy is needed to encode the position of an electron in quantum theory (either before or after measurement)? What about the rest of its properties? More generally, how much information is necessary to describe any given object? And what impact does that information have on the rest of the universe?

    • @ANSIcode
      @ANSIcode 2 роки тому +1

      Surely, you don't expect to get an answer to that here in a UA-cam comment? Maybe start with the wiki article on "Quantum Information"...

  • @TheFuktastic
    @TheFuktastic 2 роки тому

    Beautiful explanation!

  • @068LAICEPS
    @068LAICEPS 2 роки тому

    Information Theory and Claude Shannon 😍

  • @assepa
    @assepa 2 роки тому

    Nice workplace setup, having a coffee machine next to your screen 😀

  • @sdutta8
    @sdutta8 3 місяці тому

    We claim Shannon as a communication theorist, rather than a computer theorist, but concede with Shakespeare: what’s in a name.

  • @user-fd9rx8dh9b
    @user-fd9rx8dh9b Рік тому

    Hey, I wrote an article using information theory, I was hoping I could share it and receive some feedback?

  • @Veptis
    @Veptis 2 роки тому

    Variance as the derivation of the expected value is the interesting concept of statistics, entropy as the amount of information is the interesting concept.of information theory.
    But I feel like they kinda do the same.

  • @filipo4114
    @filipo4114 2 роки тому

    1:54 - "A bit more." - "That's right - one bit more" ;D

  • @johnhammer8668
    @johnhammer8668 2 роки тому

    how can a bit be floating point

  • @jimjackson4256
    @jimjackson4256 Рік тому

    Actually I wouldn’t be surprised at any combination of heads and tails.If it was purely random why would any combination be surprising?

  • @dixztube
    @dixztube 2 роки тому

    I got the talis tails one on a guess and now I understand the allure of gambling and casinos it’s fun psychologically

  • @GordonjSmith1
    @GordonjSmith1 2 роки тому +2

    I am not sure that the understanding of 'information theory' has been moved forward by this vlog, which is unusual for Computerphile. In 'digital terms' it might have been better to explain Claude Shannon's paper first, but from an 'Information professional's perspective' this was not an easy watch.

  • @sanderbos4243
    @sanderbos4243 2 роки тому

    I loved this

  • @muskduh
    @muskduh Рік тому

    thanks

  • @pedro_8240
    @pedro_8240 2 роки тому

    6:58 in absolute terms, no, not really, but when you start taking into consideration the chances of just randomly getting your hands on a winning ticket, without actively looking for a ticket, any ticket, that's a whole other story.

  • @danielbrockerttravel
    @danielbrockerttravel 5 місяців тому +1

    I cannot believe that philosophers, who always annoying go on about what stuff 'really means' never thought to try to update Shannon's theory to include meaning. Shannon very purposefully excludes meaning from his analysis of information. Which means it provides an incomplete picture.
    In order for information to be surprising, it has to say something about a system that a recipient doesn't know. This provides a clue as to what meaning is- a configuration of a system. If a system configuration is already known, then no information about it will be surprising to the recipient. If the system configuration changes, then the amount of surprise the information contains will increase in proportion.
    In order for information to be informative there must be meanings to communicate, which means that meaning is ontologically prior to information.
    All of reality is composed of networks and these networks exhibit patterns. In networks with enough variety of patterns to be codable, you create the preconditions for information.

  • @arinc9
    @arinc9 2 роки тому

    I understood not much because of my bad math but this was fun to watch

  • @abiabi6733
    @abiabi6733 2 роки тому

    wait, so this is base on probability?

  • @Andrewsarcus
    @Andrewsarcus 2 роки тому

    Explain TLA+

  • @juliennapoli
    @juliennapoli 2 роки тому +1

    Can we imagine a binary lottery where you bet on a 16bits séquence of 0 an 1 ?

  • @retropaganda8442
    @retropaganda8442 2 роки тому +1

    4:02 Surprise, the paper has changed! ;p

  • @jamsenbanch
    @jamsenbanch Рік тому

    It makes me uncomfortable when people flip coins and don’t catch them

  • @CalvinHikes
    @CalvinHikes Рік тому

    I'm just good enough at math to not play the lottery.

  • @hypothebai4634
    @hypothebai4634 2 роки тому

    So, Claude Shannon was a figure in communications electronics - not computer science. And, in fact, the main use of the Shannon Limit was in RF modulation (which is not part of computer science).

  • @user-js5tk2xz6v
    @user-js5tk2xz6v 2 роки тому

    So there is one arbitrary equation and I don't understand form where it came and also what is it's purpose.
    And once he said that 0.0000000X is minimal amount of bits ,but then he says he needs 1 bit for information about wining and 0 for losing, so it seems the minimal amount of bits to store information is always 1, so how can it be smaller than 1 ?

    • @shigotoh
      @shigotoh 2 роки тому +1

      A value of 0.01 means that you can store on average 100 instances of such information in 1 bit. It is true that when storing only one piece of information it cannot use less than one bit.

    • @hhill5489
      @hhill5489 2 роки тому

      You typically take the ceiling of that function output when thinking practically about it, or for computers. Essentially, the information contained was that miniscule number, but realistically you still need 1 bit to represent it. For an event that is guaranteed, or probablity 100% /1.0, there is 0 information gained by its observance....therefore it takes zero bits to represent that sort of event.

    • @codegeek98
      @codegeek98 2 роки тому

      You only have fractional bits in _practice_ with amortization (or reliably if the draws are batched).

  • @pedropeixoto5532
    @pedropeixoto5532 Рік тому

    It is really maddening when someone calls Shannon a Computer Scientist. It would be a terrible anachronism if Electrical Engineering didn't exist!
    He was really (a mathematician and) an Electrical Engineer and not only The father of Information Theory, but The father of Computer Engineering (as a subarea of Electronics Engeneering), i.e., the first to systematize the analysis of logic circuits for implementing computers in his famous masters thesis, "A Symbolic Analysis of Relay and Switching Circuits", before gifting us with Information Theory.
    CS diverges from EE in the sense EE cares about the computing "primitives". Quoting Brian Harvey:
    "Computer Science is not about computers and it is not a science [...] a more appropriate term would be 'Software Engineering'".
    Finally, I think CS is beaultiful and has a father that is below no one, Turing.

  • @desmondbrown5508
    @desmondbrown5508 2 роки тому

    What is the known compression minimum size for things like RAW text or RAW image files? I'm very curious. I wish they'd have given some examples of known quantities of common file types.

    • @damicapra94
      @damicapra94 2 роки тому +4

      It's not really the file type, rather the file contents that determine it's ideal minimum size.
      At the end of the day, files are simply a collection of bits. Wheter they represent text, images, video or more.

    • @Madsy9
      @Madsy9 2 роки тому

      @@damicapra94 The content *and* the compressor and decompressor. Different file formats use different compression algorithms or different combinations of them. And lossy compression algorithms often care a great deal about the structure of the data (image, audio, ..).

  • @AntiWanted
    @AntiWanted 2 роки тому

    Nice

  • @blayral
    @blayral 2 роки тому

    i said head for the first throw, tail-tail for the second. i'm 3 bits surprised...

  • @Maynard0504
    @Maynard0504 2 роки тому

    I have the same coffee machine

  • @hypothebai4634
    @hypothebai4634 2 роки тому

    The logs that Shannon originally used were natural logs (base e) for obvious reasons.

  • @sedrickalcantara9588
    @sedrickalcantara9588 2 роки тому

    Shoutout to Thanos and Nebula in the thumbnail

  • @sundareshvenugopal6575
    @sundareshvenugopal6575 2 місяці тому

    Claude shannon theory is not in the least bit true. It is at best a very supercilious view of compression coding. I have come up with scores of methods where 2^n bits of information can be losslessly coded in O(n) bits(order of n bits). So c*n bits of data, where c is a very very small constant can contain at least 2^n bits of information coded losslessly. Not only is massive lossless data compression a reality, but large numbers of terabytes sizes can be represented and manipulated within a few bytes, all mathematical operations performed within those few bytes.

  • @levmarcus8198
    @levmarcus8198 2 роки тому

    I want an expresso machine right on my desk.

  • @joey199412
    @joey199412 2 роки тому +1

    Amazing video, title should have been something else because I was expecting something mundane, not to have my mind blown and look at computation differently forever.

  • @liambarber9050
    @liambarber9050 2 роки тому

    My suprisal was very high @4:58

  • @inuwara6293
    @inuwara6293 2 роки тому

    Wow 👍Very interesting

  • @eliavrad2845
    @eliavrad2845 2 роки тому

    The "reasonable intuition" about this formula is that, if there are two independent things, such as a coin flip and a lottery ticket, the information about them should be a sort of sum
    H(surprise about a coinflip and a lottery result)=H(surprise about coinflip result)+H(surprise about lottery result)
    but the probabilities should be multiplication
    p(head and win lottery)=p(head)p(win)
    and the best way to get from multiplication to addition is a log
    Log(p(head)p(win))=Log(p(head)) + Log(p(win))

  • @GordonjSmith1
    @GordonjSmith1 2 роки тому +3

    Let me add a 'thought experiment'. Some people spend money every week on the Lottery, their chance of winning is very small. So what is the difference between a 'smart' investment strategy' and an 'information' based strategy? Answer: Rational investors will consider their chances of winning and conclude that for every dollar extra they invest (say from one dollar to two dollars) their chance will increase proportionally. An 'Information engaged' person will see that the chance of winning is entirely remote, and increasing the investment hardly improves the chances, in this case they know that in order to 'win' they need to be 'in', but even the smallest amount spent is nearly as likely to win as those who place more bets. No !! Scream the 'numbers' people, but 'Yes'!!! scream anyone who has considered the opposite case. The chance of winning is so small that the increase in paying for more Lotto numbers really does not do that much to improve the payback from entering, better to be 'just in' than 'in for a lot'...

  • @ilovedatfruitybooty9546
    @ilovedatfruitybooty9546 2 роки тому

    7:11

  • @KX36
    @KX36 2 роки тому

    after all that you could have at least given us some lottery numbers at the end

  • @anorak9383
    @anorak9383 2 роки тому +2

    Eighth

  • @rmsgrey
    @rmsgrey 2 роки тому +2

    "We will talk about the lottery in one minute".
    Three minutes and 50 seconds later...

  • @karavanidet
    @karavanidet Рік тому

    Very difficult :)

  • @atrus3823
    @atrus3823 2 роки тому

    This explains why they don't announce the losers!

  • @filda2005
    @filda2005 2 роки тому

    8:34 No one really no one has been rolling on the floor?
    LOOL and in addition the cold blood face to it. It's like visa card, you can't buy that with money.

  • @mcjgenius
    @mcjgenius 2 роки тому

    wow ty🦩

  • @thomassylvester9484
    @thomassylvester9484 2 роки тому

    “Expected amount of surprisal” seems like quite an oxymoron.

  • @zxuiji
    @zxuiji 2 роки тому +1

    Hate to be pedantic but a coin flip has more than 2 possible outcomes, there's the edge after all, it's the reason why getting either side is not a flat 50%
    Likewise with dice, they have edges and corners, they can also be an outcome, it's just made rather unlikely due to the air circulation and the lack of resistance vs the the full drag of the landing zone, by full drag I mean the earth dragging it along while rotating and by lack of resistance I mean that not enough air molecules not slam into it through their own drag state, thereby allowing it to just roll over/under the few that do

    • @galliman123
      @galliman123 2 роки тому +1

      Except you just rule those out and skew the probability 🙃

    • @roninpawn
      @roninpawn 2 роки тому +1

      There is no indication, whatsoever, that you "hate to be pedantic" about this. ;)

    • @zxuiji
      @zxuiji 2 роки тому

      @@roninpawn ever heard of OCD, it's similar, I couldn't ignore the compulsion to correct the info

    • @zxuiji
      @zxuiji 2 роки тому

      @@galliman123 except that gives erroneous results, the bane of experiments and utilization

    • @JansthcirlU
      @JansthcirlU 2 роки тому

      doing statistics is all about confidence intervals, the reason why you're allowed to ignore those edge cases is that they only negligibly affect the odds of those events you are interested in

  • @TheCellarGuardian
    @TheCellarGuardian 2 роки тому +1

    Great video! But terribile title... Of course it's important!

  • @kofiamoako3098
    @kofiamoako3098 2 роки тому +1

    So no jokes in the comments??

  • @ThomasSirianniEsq
    @ThomasSirianniEsq 10 місяців тому

    Wow. Reminds me how stupid I am

  • @CandyGramForMongo_
    @CandyGramForMongo_ 2 роки тому

    Lies! I zip my zip files to save even more space!

  • @artic0203
    @artic0203 2 роки тому

    i solved AI join me now before we run out of time

  • @BAMBAMBAMBAMBAMval
    @BAMBAMBAMBAMBAMval 9 місяців тому

    A bit 😂

  • @atsourno
    @atsourno 2 роки тому +3

    First 🤓

    • @Ellipsis115
      @Ellipsis115 2 роки тому +1

      @@takotime NEEEEEEEEEEEEEEEEEERDS

    • @atsourno
      @atsourno 2 роки тому

      @Rubi ❤️

  • @elijahromer6544
    @elijahromer6544 2 роки тому

    IN FIRST

  • @h0w1347
    @h0w1347 2 роки тому

    thanks