Deduplication for Dummies - What is deduplication?

Поділитися
Вставка
  • Опубліковано 18 кві 2011
  • Adam Sell of Nine Technology takes you through a very simple run-down of what deduplication is with regard to online backup, and why block-level deduplication is more efficient and powerful than file-level.
  • Наука та технологія

КОМЕНТАРІ • 53

  • @faisalfares5112
    @faisalfares5112 3 роки тому +9

    I have spent few hours watching many people to just understand how deduplication works... and this simple video is the best video I have ever seen.
    You have explained the whole thing in just few minutes so any layman like me can understand.
    Thank you very much.

  • @luislednick1413
    @luislednick1413 4 роки тому +2

    Very simple and straight explanation. Good job.

  • @mrkevinlwright
    @mrkevinlwright 12 років тому

    Awesome, Summation! Im in the process of selecting a SAN solution and this simple video added a great piece of knowledge to my overall understanding!
    Thanks!

  • @mindsting
    @mindsting 9 років тому +3

    I know this video is old, but is very well done! Thanks! I hope it is okay that I share this to help train storage/DC sales people who need help on this topic so they truly can 'get it.' Will always make sure you get credit for sure! Great work.

  • @pakodasingh
    @pakodasingh 3 роки тому +1

    In MS world all people will have pointer to the blocks called Reparse Pointer stored in original file. And where all unique pieces of blocks get stored in server is called Chunk Store...awesome video.

  • @johnmclaughlin4218
    @johnmclaughlin4218 9 років тому

    Clear and concise explanation - thanks

  • @zeeshan-tp5hp
    @zeeshan-tp5hp 4 роки тому

    Wow. You have explained this so well. 👍

  • @encikbett
    @encikbett 9 років тому

    Thank you. Now I have more understanding.

  • @nph24
    @nph24 11 років тому

    This helped me so much! Great job!!!!

  • @kannan991
    @kannan991 9 років тому

    Great explanation. Thanks

  • @AryaPrasetya
    @AryaPrasetya 7 років тому

    well explained, thank you!

  • @fordgt8847
    @fordgt8847 5 років тому

    Awesome explanation!

  • @MrMariog2681
    @MrMariog2681 12 років тому

    Thank you. Well done.

  • @ringhp
    @ringhp 11 років тому

    Well done. Thank you!

  • @PankajVerma-fw5ww
    @PankajVerma-fw5ww 10 років тому

    Thanks for sharing this info...

  • @borkisoufiane
    @borkisoufiane Рік тому

    perfect too simple to understand thank you

  • @happysnapperman
    @happysnapperman 13 років тому

    Explained in plain English. Well done. Thanks.

  • @TPHBLIB
    @TPHBLIB 11 років тому

    Thank you very much!

  • @nemonemo6285
    @nemonemo6285 2 роки тому

    Perfect. Thank you.

  • @newajay100
    @newajay100 5 років тому

    Thanks, Nicely Explained

  • @avimzrh
    @avimzrh 10 років тому

    great explanation.

  • @karthikpillai2378
    @karthikpillai2378 11 років тому

    Thanks, Well Explained.

  • @ericmiller7213
    @ericmiller7213 10 років тому

    That was Bad Ass.... great explanation and summation

  • @Therockingww
    @Therockingww Рік тому

    thank you !!

  • @chrisengelbrecht9996
    @chrisengelbrecht9996 9 років тому

    Great! Thanks

  • @MelroyvandenBerg
    @MelroyvandenBerg 4 місяці тому

    Still relevant to this day. Remember that kids ;)

    • @MelroyvandenBerg
      @MelroyvandenBerg 4 місяці тому

      Ps. you also have DB deduplication eg. via memory cache, so on other parts of the software or in a network. Not only disk.

  • @nduwana
    @nduwana 12 років тому

    Nice work. Any recommendation which backup software is good/superior than others for block level dedupe?

  • @abcd123181
    @abcd123181 10 років тому +3

    We understand file level but In block level deduplication, if any 1 will change his data then how it will get store in data center ?? and if no two people have same data then ?

  • @mihas101
    @mihas101 11 років тому

    well done. that is understood even if your non-technical like me. thanks

  • @sambitbehera8835
    @sambitbehera8835 4 роки тому

    Thats helped a lot

  • @ewasteonline
    @ewasteonline 11 років тому

    Nice job

  • @troller4jesus
    @troller4jesus 7 років тому +2

    doesn't block change if a file within the block changes?

  • @anisaa8752
    @anisaa8752 2 роки тому

    good explanation

  • @reilagji4752
    @reilagji4752 3 роки тому

    why does every video from the early '10s look like it was the 80s. Boy has technology changed us

  • @poloboy
    @poloboy 4 роки тому

    thanks brah

  • @anamfarooqui3795
    @anamfarooqui3795 2 місяці тому

    Thanks

  • @mubbashirjavaid7486
    @mubbashirjavaid7486 4 роки тому

    Great

  • @karatbarsinfo345
    @karatbarsinfo345 9 років тому

    nice

  • @gatewayer1
    @gatewayer1 11 років тому

    hi! The only thing you dont tell: How does deduplication on block data knows where each block goes? I mean, 12345 ist just a row of number for 1) different users and 2) different blocks; so it's not Me=12345M; Ted=12345 ... so, how does Deduplication knows where each block goes? and how much space does this information require compared to the origin-block information?
    Hope you can explain that, maybe either in a video or with a comment, THANKS!

  • @metalaarif
    @metalaarif 8 років тому

    Great explanation. Thanks but I have a question in regards to Block Level DeDuplication. If 2 guys have some song and 3 guy has different song how would block level deduplication work there. Let's say 2 of them has Coldplay - Yellow and the other guy has Iron Maiden - Fear of The Dark. How would Block Level work there.

    • @AmazinglyAwkward
      @AmazinglyAwkward 6 років тому

      Maybe the file is recognized as a music file so like 1 block would be saved across all of the files would be that it IS a music file but then it would be seperate blocks for the artist and songs?

    • @robbstark8692
      @robbstark8692 5 років тому

      I think a music file was a bad example, only because it makes it hard to visualize the bits being used to dedupe and most music files are already compressed. There are some other videos that show how it works, but basically the software recognizes patterns of bits inside every byte being backed up. Using this video's example, let's say block 1 is 0110 in binary and block 2 is 0101. Maybe it's just metadata or file headers that tells the computer it's an MP3 file (I'm not sure, just trying to use an example). This wouldn't change for ANY MP3 file being backed up, so it would be redundant to store each example of those for every MP3 file being backed up. Block 3 could be 1010 in binary, block 4 1100, and block 5 1001. This could contain the specific audio codex being used, different bit rates, or other components of an MP3 file that varies from file to file. Let's say block 3 says the bitrate is 128 Kbps, block 4 says the bitrate is 160 Kbps, and block 5 says the bitrate is 256 Kbps. The rest of the file is contained in 100s of other blocks, so those blocks will be largely unique and couldn't be deduped very well (compressed file formats like MP3 are terrible at deduping, and many times the file actually becomes larger). These binary patterns are stored in a dedupe engine used by the software, and every time a specific pattern is recognized the software points to the location in the file and determines what binary pattern can be inserted into that location in the block.
      All 3 files are MP3s, so we don't need to keep saving that part of the file, but we do need to know the other pieces of information to ensure the file is usable when it's restored. Over time, these redundancies can become huge amounts of data. We don't need to save block 1 and 2 for every file, we simply need to know what block 1 and block 2 look like (0110 and 0101) and what they represent. Then, when the deduplication engine sees these patterns, it knows it can skip backing them up and use a pointer to indicate where the pattern exists in the specific file. I'm far from an expert, that's just my understanding of how the deduplication process works.

    • @pakodasingh
      @pakodasingh 3 роки тому +1

      Deduplication only works on duplicate files not on unique files.

  • @AmazinglyAwkward
    @AmazinglyAwkward 6 років тому

    This is a great explanation. I just have 1 question. Why?

    • @SupremeUnicorn
      @SupremeUnicorn 5 років тому

      Storage and performance optimization.

  • @khaledsoliman7936
    @khaledsoliman7936 8 років тому

    What is deduplication?

  • @ovimt
    @ovimt 11 років тому

    You keep a table transparent to the user that wants to ignore that does the logical (what you think there is on the bup storage) and physical (what you actually have on the storage). Let's take the last case from the vid. you think you have 1234, 1245, 1235 but in fact you have 12345. The mapping table might contain: Block 1 represents 1st, 5th and 9th logical blocks. Block 2 logically represents 2nd, 6th and 10th logical blocks.... Block 5 represents 8th and 12th logical blocks.

  • @rajatsharma01
    @rajatsharma01 11 років тому

    grossaly underestimated file deduplication: forgot Rabin fingerprints, chunking files?

  • @indawgwetrust4255
    @indawgwetrust4255 8 років тому

    i suggest a shave and a haircut.

  • @salander8729
    @salander8729 Рік тому

    Doesn't have indian accent; watchable.

  • @kannan991
    @kannan991 9 років тому

    Great explanation . Thanks

  • @kannan991
    @kannan991 9 років тому

    Great explanation . Thanks