Markov Chain Compression (Ep 3, Compressor Head)

Поділитися
Вставка
  • Опубліковано 28 жов 2024

КОМЕНТАРІ • 51

  • @GoogleDevelopers
    @GoogleDevelopers  10 років тому +116

    *Compressor Head Episode 3: Markov Chains*
    #compressorhead #everybitcounts #developers
    At the cutting edge of compression algorithms sits the lonly kingdom of Markov Chains. These algorithms take an Artificial Intelligence approach to compression by allowing the encoder and decoder to ‘predict’ what data is coming next. In this episode Colt McAnlis talks about how these magical algorithms compress data, and why some think that they are the future of compression.

    • @eunicelarson697
      @eunicelarson697 10 років тому +1

      What the heak....3 ....chains, mzgical, artificial inteligence.....???????!!!!!!!!!
      I dont think I need that.
      I have my brain thanks.
      Lolololol

    • @arpyan
      @arpyan 10 років тому +2

      It gets fast and complex in the end, but at least the fundamentals are easily understandable.
      It was a great show, I learned a lot of interesting and useful things, thank you Colt McAnlis.

    • @ZelenoJabko
      @ZelenoJabko 9 років тому

      Google Developers You didn't actually move from 12 to 3 bits, since in the first case, you have a starting and ending state encoded, in the second, you do not.

  • @BasantSiingh
    @BasantSiingh 10 років тому +53

    The second half of this episode went straight above my head.

    • @pbezunartea
      @pbezunartea 10 років тому +3

      Basant Singh Matharu Join the club! :P

    • @OmegaLok
      @OmegaLok 10 років тому +8

      This episode is not in line with the previous two. In the second half he's talking too fast.

    • @ajayramanathan7589
      @ajayramanathan7589 10 років тому

      Go read a paper; I think these videos are more like giving intro to topic. Something like a Wiki.

    • @OmegaLok
      @OmegaLok 10 років тому

      Thank you for a very useful comment. Keep it constructive. Your contribution is very much appreciated.

    • @carlosmontano8277
      @carlosmontano8277 2 роки тому

      I feel so good now that I know I'm not the only one, thanks for your very useful comment!

  • @419
    @419 2 роки тому

    I like how you take each section into a different room or area and display the information in different ways. That really helps make each piece of information unique and easier to remember.

  • @DaveWhoa
    @DaveWhoa 9 років тому +16

    good videos but a bit harder to learn from this one because you fly through everything so quickly, not much time to think about what you've said before you've already moved on to the next part

  • @LeYoCosK
    @LeYoCosK 10 років тому

    I come from a heavy probability background and this being my first time handling any compression problem I'm glad to meet Markov Chains.
    Thanks for the insights Colt McAnlis

  • @CasperBang
    @CasperBang 10 років тому +12

    Love these videos, however, had to give up on the baseball thing. Less cultural-dependent examples like i.e. states of a CD player, would cater to a non-american audience as well.

  • @denis419
    @denis419 7 років тому +1

    Looks like the most of the video is the presentation of "Adaptive Huffman Coding" technique. And the most compression comes out of it, although, the part with the Markov Chain is still important.

  • @johnb5033
    @johnb5033 10 років тому

    That was nice and clear! Thanks for taking the time to make this.

  • @Businessaccountingllc
    @Businessaccountingllc 10 років тому

    Awesome presentation on all three videos..love the content and the personality!! Keep on making them!

  • @alexparker7791
    @alexparker7791 7 років тому +1

    at the end, the whiteboard shows you're encoding "AABAC..." so I would expect to see A in a context 0 table and another A in a context 1 table (because of the transition from A→A)... but there doesn't appear to be anything like that. it almost seems like the encoder just magically knows when to descend into a new context and when to reset back to context 0.
    when do you reset back to context 0? if you never do, you'll always open a new context, start with an escape code, and output a symbol to the data stream for every input stream, leading to a compression ratio of 9/8; a net loss
    how compressible is the data stream itself?

  • @alexparker7791
    @alexparker7791 7 років тому

    if you wanted to precompute an initial set of Markov chains using a corpus in a given language, how would you combine those probabilities in the encoder and decoder with the statistical information from the input stream? maybe some experimentation is in order

  • @dARKf3n1Xx
    @dARKf3n1Xx 10 років тому +6

    You are bit fast man :(
    I am too dumb, need to catch up with you... :)

  • @wisdom538
    @wisdom538 2 місяці тому +1

    2024 here

  • @aoeu256
    @aoeu256 5 років тому

    Is there a way that compression could use other files or shared data for compression? For example, it could use an English dictionary for compression, and for videos of say Mario it could figure out how Mario is drawn and share it on the cloud, and then it would be cached on the local computer so if you frequently download Mario videos it would only have to download Mario once and reuse it for all Mario videos. It could also with AI replace Mario with a stick figure and allow you to play the video with stick figures if a shared Mario isn't available.

  • @jaypandya1154
    @jaypandya1154 10 років тому

    Colt McAnlis in the last problem u said its 46 bits long but it looks only 45 bits and I didn't understand the part at 17:11 where u encoded ABC the first time

  • @vishalparmar988
    @vishalparmar988 4 роки тому +1

    What is going on !! Can anyone tell me ?? I don't understand Ep. 3 , not at all.

  • @GuillermoValleCosmos
    @GuillermoValleCosmos 8 років тому

    The video isn't sync with the voice at around 16m right?

  • @theyruinedyoutubeagain
    @theyruinedyoutubeagain 10 років тому +2

    Just lovely. Could've used some Silicon Valley (Mike Judge's new show) references, though. :p

    • @marcinmiszczyk
      @marcinmiszczyk 10 років тому +1

      And why do you think they are doing it in first place? :)

    • @theyruinedyoutubeagain
      @theyruinedyoutubeagain 10 років тому

      Ah, touche.

    • @ColtMcAnlis
      @ColtMcAnlis 10 років тому +5

      I was doing compression _before_ it was cool ;)

    • @theyruinedyoutubeagain
      @theyruinedyoutubeagain 10 років тому

      Colt McAnlis I'm not a smart man

    • @marcinmiszczyk
      @marcinmiszczyk 10 років тому +1

      Colt McAnlis so do you think the series is somehow based on your life? :D I would be proud of myself

  • @SAnDAnGE
    @SAnDAnGE 10 років тому

    Colt McAnlis at 12:43 shouldn't the output change to 100 ?

    • @ColtMcAnlis
      @ColtMcAnlis 10 років тому

      Nope. Once a VLC change occurs, we don't re-process the whole stream, we simply move forward with the encoding state at that point. This way, the encoder and decoder both know how to handle the problem.

  • @NoctumusTV
    @NoctumusTV 8 років тому +2

    This video could easily be compressed to around 50% of its original size by simply removing all the places where he says the word "actually" ;)

  • @DilSeDesee
    @DilSeDesee 10 років тому +4

    Bit complex :(

  • @DerekHaasHaas
    @DerekHaasHaas 9 років тому +4

    The 9 people that disliked this were beaten up by Markov.

  • @matkallada9316
    @matkallada9316 10 років тому +1

    Baha =), this series is awesome

  • @diegonayalazo
    @diegonayalazo 3 роки тому

    Thanks

  • @birajdpatel
    @birajdpatel 10 років тому +1

    dude, u r very fast cud nt understnd 2nd part ep03

  • @msclrhd
    @msclrhd 10 років тому

    For the baseball state example, can't you just use 1 bit to encode the data? You removed the transitions from the out/on base states to the batting states, so also remove that from the data stream (it is redundant information). Thus, you only need to know whether a batter transitioned to on base (0) or (1), which can be represented as a single bit. Everything else is implicit.

  • @concavex
    @concavex 8 років тому +2

    09:22 T-BONER!!

    • @batlin
      @batlin 7 років тому +1

      The saga of T-BONER begins even earlier at 7:54 :D

  • @macromeda3746
    @macromeda3746 10 років тому +2

    The hardest to understand so far .

  • @jameswasil
    @jameswasil 10 років тому +3

    These things are just a simple rehash in this video of what I've been doing privately and working on since I was 11 years old. This is a very basic video and leaves out a lot of important information about data logic and entropy.
    But trying to make it "hip" and "trendy" is not winning anyone over. And the "finger shaking" crap at the viewer via the camera is really annoying, dude. Seriously. We're already paying attention, so you don't need to wave a stupid cardboard sign at us to "pay attention" annoy us further, either. Thanks.
    P.S: You don't run out of memory for Markov chains if you write functions in advance to transition system memory to virtual memory and page that data in address spaces to disk (and then read it back to a memory buffer when needed).
    Yes, this requires some file organization and balancing of data, but it's worth it and far more efficient than expecting people to add gigabytes of ram to their system just to analyze a file or series of data and complete the algorithm.
    Granted, it's not as fast (unless you have an SSD drive to make up for the read/write time), but it works and allows for much higher precision of your trees when they grow than to be limited by available system ram only. Whether you're using markov chains, PPM, or hybrids with BWT and LZ,
    it's better to do it this way rather than all in RAM unless you have to use restricted hardware (like PIC controllers or other embedded systems) where you can't expect storage space to exist beyond a finite set. For PCs and other systems, you have a lot more flexibility, and should make use of it for efficiency's sake if you're not going just for speed in your algorithm.

    • @ColtMcAnlis
      @ColtMcAnlis 10 років тому +4

      Check out Episode 1 (g.co/compressorhead) which discusses those topics.

  • @heavensrevenge
    @heavensrevenge 10 років тому +2

    Ahem, this method of teaching is just terrible... he is speaking as if he's trying to inform a trucker or pro chef on how this stuff works, the way of explanation and intended audience(hopefully programmers) is a huge impedance mismatch. I can't get any of this because I can't stand how he try's to present it and can't stoop down dumb enough to be on the same stupidity page as him. I'll go read about this the old fashioned way.

  • @bonbonpony
    @bonbonpony 8 років тому +1

    00:18 Yeah... You should rather work on a code which would allow me to find my own UA-cam comments I wrote a week ago, because as of now it is impossible ;P (yeah, search engine my ass :P )

  • @Jirayu.Kaewprateep
    @Jirayu.Kaewprateep Рік тому

    📺💬 We are moving from 12 bits to 3 bits transmission and he will help you with how the Markov Model helps in this scenario.
    🥺💬 In the Markov Model state of the message is important you do not need to create multiple flags for information when the state of messages is considered and dataflow is verified and correct.
    🧸💬 It is simple as AI, your character is jumping it cannot jump across itself in the opposite direction, the message stage is the same when you request inquiry data you only have one message accept info or wait until the query set is ready.
    🐑💬 To have compression that rates it possible by encrypting data into multiple states, ordering its sequences, and creating message flow diagrams when it represents in a matrix it becomes a set of possibilities. ( explain in telecommunication computing )
    📺💬 To make sure Yui talking about the output streams.
    🥺💬 The next sequence is caused by the previous result or an update from the result.
    🧸💬 That is correct and for the first letter, you do not need to make its accuracy prediction to 100% because for 3 -4 syllable evaluation perform an update of the entire word with a set of probability matching with the word or sentence that is how Markov model wins in the speech or text sequence works.
    🐑💬 You may try randomly saying it wrong at the beginning or last of the sentence but the Markov model tries to match the most recognized word for your ears, it is the output bits stream applied to telecommunication is error correction as you read from ISBN code error correction.

  • @codycanaday3836
    @codycanaday3836 10 років тому +1

    Hey google, DO YOUR JOBS AND FIX YOUR MESSED UP APPS LIKE PLAY MUSIC, youtube ect.

    • @SamuelAttard
      @SamuelAttard 10 років тому +3

      It's funny because Google or their developers will probably never read this. And even if they did what you have said will not help them fix the supposed problem at all.
      If something truly doesn't work submit a bug report with details of what is broken. Personally I have never found a problem with the Play Music and UA-cam apps. They work perfectly :P