As some people have noted below, in the word cloud "bz2" appears incorrectly as "b2z". But all is correct in the BZip 2 section at 7:11. And I have corrected the thumbnail. Also note that GZip compressed TAR files have the extensions tgz or tar.gz, not tar.tgz, as noted in the third bullet point around 7:53. My apologies. This video was checked, corrected and re-rendered many times . . . but at the moment, things like this are slipping through. :(
I've actually seen tar.tgz now and then, and wondered WTF. Apparently there's some software out there that uses it as the extension for a tar.gz file, but I have no idea what it might be. Oh, another use for the software, especially WinRAR and 7zip: they can unpack a lot of installers (EXE, CAB, RPM, DEB, various others).
@@ThatMattWhiteThere is no way to correct anything in a video! :) I've fixed the thumbnail version, but cannot change what appears in the video, hence he message here. :)
It's worth mentioning that media files like pictures, audio and video formats are usually already compressed so additional file compression is not very effective.
Correct. I would only put all of them together into a container if needed. But other than that, it's not worth wasting time on compression and decompressing.
So it is but I still would like a video on how video and picture compression works and and the evolution of it. Especially the old ones. Just the other day I sat wondering of the difference between the mars rover photos and the ones sent by the Viking probes in the -70:s. Hardware vs software and so on. To me it appears that technology evolution is now a days is accellerating exponentially compared with pre WW2.
I still zip them if, for example want to copy a large amount of small files to a mobile device. On Linux, USB file transfer is somewhat flaky, especially with Android devices which use MTP. So instead of copy a hundred small jpegs, I zip them and copy one bigger file that way. Even if I use a wireless file transfer method like Warpinator or KDE Connect, it's still more sane to have only one file transferred and not getting hundreds of notifications of receiving/getting a file. :-)
Compressing a bunch of pic or vid files into one archive is useful for uploading/downloading multiple files to/from the internet, even if said files are already compressed by default as a feature of there format/type. This is why when you download multiple photos/vids all at once from Google Photos, it first puts them into a single ZIP archive file on the server side before downloading them to your computer.
I'm not gonna lie, one problem "as a computer guy myself", Is getting many of my customers to understand just how ZIP files work. Your video offers a great explanation on how they work. I have to admit, I've used some of these exact "teaching tools", and sometimes people just don't get it. I feel that 90% of the battle of getting people, including senior citizens, is if you can just teach them to understand where things "LIVE" in file explorer. This is an extremely helpful video that I will send to some of my more "semi-educated" PC customers who are just starting to understand things. As always, keep up the AWESOME work Mr. Barnatt!
Reminds me of the Verge article last year about professors noticing some of their students didn't really know what files and folders are, they saw files as one bucket you search in. It is the simplest explanation, I get it, if all they ever did was click on apps and produce 1 single document occasionally.
I've had some luck using a suitcase analogy. A ZIP archive is like a suitcase, and a compressed archive is a suitcase you've sat on to get it to close.
I am a boomer. This means I grew up with computers. I was 27 when I owned my first home computer (Atari 400). I do remember in the 1990s trying to explain computers and the internet to my senior citizen parents. It required that they understand concepts that were never before needed. It was difficult. What I don't understand is why today's senior citizens seem so STUPID when it comes to computers. Windows 3 was introduced well over 30 years ago, with Windows 95 in 1995. WHERE THE HELL WERE THESE PEOPLE??? In the mid 2000s, smart phones became ubiquitous; these are nothing but pocket computers. Have senior citizens been asleep since turning 30???
@@laurendoe168 I believe it comes down to what they are used to, thus comfortable with along with their interests. Just because the tech has been around, doesn't mean everyone uses it or wants to. They must have an interest in using the tech to learn about it. My mother is a good example. She was born in 42, farmers daughter. Never took any interest in any tech of any kind. Other than using the radio and tv. I got her to barely be able to use a computer to check email. And that only lasted about a year. But I am a computer tech. My first computer was a Commodore 128D. I had always been fascinated with tech. So yeah, I think it really comes down to what a person is used to from their younger lives and if they have any interest in new tech or not. If they don't use it, they will know nothing about it.
I recently wrote a decompressor for Phil Katz's Deflate algorithm. It was... one of the more difficult programming tasks I set for myself. To have been Phil Katz and come up with this way of using different kinds of compression (Lempel-Ziv and Huffman codes) in ways that play to their strengths... it's staggering how smart he was.
It's based on analog signal compression and multiplexing. Dude was smart but he ported a lot of this over from analog transmission systems as well as what was going on with digital multiplexing (which is eerily similar to compression in a lot of ways).
Another thing to consider is if you're going to share the compressed file. If you're compressing the file only for your own personal use, you can use whichever format works best for you. But if you need to share the files with others in a windows ecosystem, zip is probably the best choice, especially if you're sharing with non-technical people.
One shouldn't cater too much to these people you call non-technical, who are really just too lazy to google the thing and download the thing that opens the thing. This is much easier and much more intuitive than e.g. trying to understand how to use The Ribbon in your Office workflow, or understand which way to swipe to show things hidden on purpose on a phone or laptop.
@@matiaanjansenvanrensburg771 Absolutely no! They should not be using computers, or mobile phones, if it's too challenging for them. And I'm 100% against that. I'm sorry, but just getting a program and running it is the base level for using a 'device'. If they can't do that, they are simply not ready. And if they are not ready, we must fix this, because they will be disempowered. This is about humanism. There was a great movement in the late 1970s and early 1980s, and you will not have me believe humans are incapable of opening any file on any device much later. That movement was a response to the fear from above - technocracy. There was a great counter-movement, but you still had to buy magazines or read or borrow books. Today, corporations rely on this laziness (not even spending 5 seconds searching for how to open the file) to control you and enforce monopolies. There are so many helpful resources - just like this one that you found. It's extremely important that anyone can write and publish any program or file format on the Web, and that anyone can use it on any platform. Any movement away from this moves users and developers closer to this now much more threatening monopoly dictatorship! This has been ongoing for decades, and it should terrify you.
@@ScoopexUs Sounds more like laziness on the part of the person sending the file. Rather than just selecting a zip they choose their preferred format, then the receiver has to find a trusted decompression program, install it, and figure out how to use it. Meanwhile zip works on almost any platform out of the box.
The amount of time i have to spend for people forwarding me emails "i can't open this" clearly speaks to your point. It's usually media formats, but same idea.
I've had my fair share of experience with 7-Zip as a lot of custom content made by folks for Sims games tend to come compressed in the .rar format. Alas, the version of Windows I use can only natively decompress .zip files. I think it's silly that Microsoft only recently has added in the ability to access more file compression formats to Windows. With all that said, this was an absolutely excellent video! :)
@@ScoopexUs Sure can! On my end, 7-Zip is my beloved. 😍 I started using that years ago, I still use it when it's needed, and I never looked back. It's my jam. :3
Love this video, Chris! While I’ve been decompressing files from the internet forever, I’ve never really had much understanding of what was going on with the files in general. Definitely going to show this one to other folks.
There is also another point that should be considered. And that is "Recovery Record" wich is built in to the rar format, but is disabled by default and can be enabled easily. Which can become handy in times of corruption of archived data. Zip and 7z file can also be protected by a program called MultiPAR, which basically adds Parity Data to the end of archive which can be used to recover corrupt archive.
I agree parity protection should be enabled by default. Especially if you're planning to store the archive files for years on some media that might be questionable in the future.
As I note above, but forgot why, RAR is more recoverable than ZIP and its kin. WinRAR also does a better job of extracting whatever can still be recovered from a corrupted ZIP than do any of the ZIP tools.
If I am concerned about file corruption, I use QuickPAR. I have found RAR's "Recovery Record" performance to be spotty at best (and useless if the recovery record itself is corrupt). Granted, QuickPAR can create a somewhat large set of separate files (*), but I have YET to be unable to recover a corrupt file when using it (unless, of course, the recovery files themselves are corrupt, but QuickPAR can often work around even this). (*) - QuickPAR does not alter the original file. I've used QuickPAR on an entire set of 600 MP3 files that can recreate an MP3 even if deleted. Set to 10% recovery on that set of 600 MP3s, any 60 MP3s could be deleted and completely recovered or it can fix all 600 if each is less than 10% corrupt... or anything in between (any 150 MP3s are 40% corrupt, etc). I store the QuickPAR files on a different drive from the original files being backed up.
@@laurendoe168 I agree 💯, I have also found it in my tests, I was able to corrupt rar file with recovery record. And, as you mentioned PAR2 can deal with it even with single PAR2 file; if it isn't corrupt enough it can recover the files protected by it. Then PAR2 can be recreated. I have tried it too. What I am still searching for is for a compression tool that can add Parity Data so much close to the data that it can read correct data without an attempted to repair it in the first place. Like Parity in QR Codes, CDs & DVDs, and Satellite 📡 transmissions. But, unfortunately I am still searching for it.
Sunday Morning Compressed EC. Thank you for this. So often we assume everyone understands a lot of these terms and standards because us old people grew up with them.
Fantastic! I wanted to research about this but got confused with so many formats and methods. This video is a collection of all of them and I understood much better. Thanks!
Excellent video! There is one thing worth mentioning is that once a file is already compressed compressing it again will make the archive bigger. Newer compressors are smart enough to detect already compressed files so it gets stored in the archive.
Well, not necessarily. Had a friend whose amusement with a new blank hard drive was to zip up the nulls. So he'd have 1GB of nulls zipped up, then he'd rezip it a few times and get it down to a few MB. Admittedly this is a special case.
I actually think it muddies the waters significantly. Attempting to cover half a dozen different compression formats in one accessible video is an ambitious project. Name dropping a bunch of compression algorithms used by various container formats is confusing to the average user and will unfortunately immediately fly over most people's heads. All of the content in this video was basic computing 101 sort of stuff for me but for someone encountering data compression for the first time, I don't think this video is all that great. But if it helps some people including yourself learn something new, then I'm willing to admit that I'm wrong.
Fantastic video. I got interested in file compression technicalities back in BBS days and have loved teaching & discussing data/media compression topics and improvements ever since.
Back in those days, the compressed file format was .arc. You had to have a license to create or extract files Phil Katz is a hero for creating the zip file format and Licensing it to the public for free.
In those days the reduction of transfer time was the biggest driver for using them. Also packaging a bunch of files together and integrity verification, when the transfer protocols were flaky or manual. Zmodem was great, started automatically.
ARJ & LHA are the one's i've used during MS-DOS days, ARJ has multiple file archives into ideal filesize for your floppy disk, while LHA is commonly used by Japanese users for file compression.
@@ExplainingComputersdid you happen to catch yesterday's episode of Retro Recipes where they used an Amiga 3000 running a branch of Linux to decode a message bounced off the moon?
Another clear and well-laid-out video that will no doubt be extremely useful to those unfamiliar with this topic. Well done, Chris, you're making a difference in a world full of worthless short-form clickbait content and SEO spam.
I'll be liking the video before watching this as that helps the algorithm. Joking aside your Explaining series has always been solid that I watch them even if the topic are ones I have an idea of already and to my amazement every time you explain something I haven't heard or read before. Thank you Chris. Of the compression formats on the thumbnail I've only used maybe 10 of them with the common ones being zip, 7z, rar, and tar variants. Compression has always helped with slower internet connections and limited bandwidth allocations. I find it interesting that compression also has ties with software piracy from BBS sharing to direct downloads to torrenting and people have developed custom algorithms for the purposes of packing software and games.
Unfortunately UA-cam algorithm doesn't work that way. You need to at least watch the video for 1 minute before Clicking the Like. Reason? Your like doesn't count and only visible to you if you instantly click Like at the beginning of video. Which means, other people don't see your Like registered. Clicking like after 1 minute of video playing, will help algorithm better because it's registered in real-time.
@@niezzayt3809 Thanks for the explanation. Kinda watched the video twice though, half as I let it play while typing my comment and a second time watching it in full on a tv.
The 'beauty' of winzip for me is you can conserve the critical names in the file structure with many folders and file extensions into ONE file. This allows software that needs specific folder names and material files and texture files so that they can be shared and opened again. Time saving when sending 3D items online.
Ive always used WinRAR with Windows I remember using PKZIP along time ago with DOS used to think it was like magic when it was first introduced, interesting video well explained
Thank you very much, Chris. This was a very good stroll down memory lane and informative. The video is useful, it is good to have all this information in one location. Best wishes.
Anyone else remember ARJ? You could split your compressed files onto multiple 1,44 MB floppy disks. Good old times… (with lacking computer power and disk space 😅)
We did a clever trick at work for 8 bit lidar data compression. We take a chunk of data which in this case is a set of successive time series values, load them in a 2D array and save as a PNG file. This reduces the data storage size losslessly but also allows one to look at the thumbnails and visualize the data when browsing the data set.
It's gratifying to see the Zip I've been using since 1990 is still leading the pack. Back then with the limited sizes of hard drives and floppy drives, Zip was such a critical game changer.
Define "leading the pack"? Because he just demonstrated that, out of four compression formats, Zip creates the biggest files compared to bz2, xz and rar.
@@laurendoe168 And before I answer you, I presume the original poster appointed you as their official spokesperson? Because, with the greatest of respect, my comment was directed at them, not the general audience. Or do you have some belief in your own telepathic powers that allows you to read the contents of the minds of others?
@@terrydaktyllus1320 you realise how ridiculous that position is when you are posting in a public forum, surely? OP could very well use the same logic to dismiss your comment, given they were, in theory at least, talking directly to the presenter featured in the video. Certainly they did not invite your, unenecessarily combative, comment.
@@paultapping9510 "you realise how ridiculous that position is when you are posting in a public forum, surely?" "Ridiculous" is an opinion on your part and they haven't invented "Telepathy over TCP/IP" yet to the point where I have visibility of the contents of your mind and your thought patterns. So no, I can't realise how ridiculous I look in your eyes, and, given you're just a "stranger on the Internet", I could care less either. "OP could very well use the same logic to dismiss your comment," Then they are more than welcome to do so. I wasn't aware they'd appointed you their "official spokesperson" - or are you here simply trying to make "new Internet friends" as some kind of "vigilante of YT comments' sections"? "given they were, in theory at least, talking directly to the presenter featured in the video." This is a public messaging forum that invites comments from anyone with the capability of writing a response. I wasn't directing my comments at you either, yet you exercised your right to reply to me. Do I smell "burning hypocrite" now? "Certainly they did not invite your, unenecessarily combative, comment." I don't need an invitation to write a comment, as I explained above, because that is not how YT works. The onus is on you now to go do some learning and understand better how YT does work because clearly it's your lack of understanding that is the issue here. Now, was there anything else you wanted or have I made you look silly enough now and we can leave the conversation there? Run along now, mind how you go and stay away from sharp scissors. Discussion closed.
The first compression utility I used was pkzip on DOS, but even then I still learned something from this video. I didn't know that file compression was being worked on long before then. As always, a fantastic video!
Compression pre-dates computers. Even old analog broadcast "tv" signals used compression. Studying signal compression makes obvious where the digital compression algorithms came from.
Have been a 7z user since, I don't know how long. As you say, the need for compression is not not quite there in the past, on place it seems to help is in backup. If I have a directory with say 100 files in it, backups work much faster if the 100 files are replaced with a single compressed file. It isn't that the compressed file is smaller than the individual 100 files (it is) but there isn't all the FAT accesses needed to copy the file. You wouldn't want to do that with every directory but directories where the data is static, it definitely saves time.
I love these types videos where you explain the history of and differences between different aspects of computers that we interact with but never really think about
I was somewhat hoping to hear a little about ZStandard, designed by Facebook, which seems to have popped up very suddenly recently. An especially interesting detail about ZStandard is that, in addition to Huffman coding, it also uses another "entropy coder" called "Asymmetric Numeral Systems", which is actually the _3rd_ dominant entropy coder, being proceeded by not just Huffman coding, but also Arithmetic Coding. Arithmetic coding, from what I can tell, generating smaller encodings than Huffman coding, but also being more expensive to compute, and didn't recieve much attention due to patents, while Asymmetric Numeral Systems seem to fall somewhere between with similar computation efficiency as huffman coding, but similar compression ratios as Arithmetic coding.
Hello Chris, at around 8:05 the slide mentions a format with a .tgz or tar.tgz extension, but I think the latter would rather be tar.gz. I take the opportunity to thank you for all your work with Explaining Computers videos, they are always very clear, informative and comprehensive.
Thanks Chris for your new “EXCV” format (Explaining Computers Video) for helping me decompress the depression every time I have to deal with compressed computer files….😂! Have a great week!
WinRAR can add 'recovery records' to archives(compressed or not), allowing corruption to be repaired. Not sure what situations would cause corruption or how well it works, but it's a nice extra layer of protection for my backups.
I think recovery was intended for archives stored on physical media - floppies, cds, dvds. One or couple of damaged sectors can be recovered from easily. More than that and you're at the backup level. Which EC also has some videos on.
Can easily check if it works by hex editing random bytes in a rar archive that has a recovery record and then attempting to recover it. Even if I use it rarely, I like having it for peace of mind.
Great explainer, with a minor nitpick @ 7:53 that a GZ compressed TAR can be .tar.gz or .tgz but .tar.tgz would be strange (though not technically impossible), as it would imply nested TARs. It was useful emphasising the difference between compression of files and archiving collections of files. If it wasn't spelled out, a compressed TAR creates the archive first then compresses that single file, compared with .zip where each file is compressed (or just stored) individually prior to collecting into the archive, which means the latter can have an advantage if you only need to unpack a few files from a large archive. I understand why disc images were left out of a video about compression, and would be worth a separate video because they can be used for archiving and backup. Their own advantage is that they can often be mounted as a virtual drive to read or update depending on the OS and image type. This could also cover which archiving methods preserve file ownership, permissions or other attributes as another commenter mentioned.
@@kmartyCZWhat if it's an ISO9660 or FAT based storage and the long file names get corrupted, so you only have the short names during recovery or forensics attempts? The echoes of the old way still exist and are useful sometimes. When archiving one must expect and armor oneself from bitrot as much as possible otherwise why make backups at all.
I almost clicked off the video at the beginning since the intro was a little slow and boring, but I'm glad I stayed around because the rest of the video held my attention well and was very informative 😊. Thank you for making it.
I love this stuff. I use Win RAR and have for years. I started with the 21-day freebie use and was hooked on its many functions. Yes, you pay for it but it's great software. Many years ago working with Apple BASIC and storing data on 1.44 floppies I was constantly running out of space on these floppies and decided to try my hand at actually creating something. I wasn't really aware of compression options in those days. Not knowing what I was doing I accidently, created a raw compression algorithm that actually did what I wanted it to do allowing me to store all the data I wanted to save onto one 1.44 floppy. YAY. Of course, I've totally lost the code, the disks, the programs, and the old Laser 128. I became a PC user shortly after that and started using what was available in the wild. Amateurs...you have to love them.
lzma is the clear winner when a balance of high compression ratios and high availability are valued. When speed matters, deflate or even lz4 and friends dominate. Yeah, compressing a megabyte using lzma is no big deal, but what about compressing backups or network transmissions or memory contebts etc, where continuous operation matters? there's a reason http uses deflate, not lzma. When concerns tip strongly towards compression ratio, there are other formats like the ppm family or winarc that pull ahead. Though it requires large datasets fit them to really excel.
To be honest I never put in that much thought as to why many things I download from the internet are zipped or even the difference in size between identical zipped and unzipped files. I just accepted that it was one extra step to get to the files. Now that I understand that zip files are a form of compression, it does make sense why sending a zipped file over the internet is more practical.
Thanks for the video. I finished upgrading an old windows 10 into a useful windows 10 and was going to look into what might be best to use on this PC. Your video takes my search to a quick conclusion.
Excellent thank you ❤. I don’t upload or create. So it’s always been a frustration for me when I receive a compressed file -and didn’t know how to decompress it - I was under the impression that there was zip and that was it. (!) Thanks to your superbly informative video I’ve now got a much better understanding.
a great video as always - most enjoyable. One of the other benefits that i find of using RAR is the ability to break files up into multi part archives - this is really useful if you want to transfer things via email and it is bigger than the per email mailbox limits
I find it quite funny that you managed to compress so much information into a video about compression applications... This was an interesting video. Only real enthusiasts have that WinRAR license key. ;)
Compression is still used commonly on the internet. Generally web pages compress the HTML, CSS, JS, etc files on the fly to the browser (using fast compression option), as it still generally saves time loading, and is useful for your phone on mobile data, and those still on DSL. It was a huge deal in the 80s though, connecting to BBSs over slow dialup, compression was a godsend.
Another excellent explanation, especially how to bypass the limitation for several formats that can only do single files by using tar. Although iso files may not ne compressed by themselves, they are an option, too, to compile a number of files into a dingle file prior to compression, as this is a method I've used, as you can iso a folder, as well as a disk, then compress the iso.
Something interesting to me is *how* different archiving formats choose to lay out the file list, which can actually affect the compression efficiency. Allegedly, CPIO is more efficient than TAR in this regard, but ended up losing to it due to its more difficult command syntax
Such great explanation. Thank you. I’ve wondered previously why certain formats had certain extensions and to see it come down to name of authors in some instances is brilliant.
Very interesting. Back in my day, so late 90s early 00s, I do remember seeing PKZIP and .zip files but for all my pir... uhm, backups, I only used ARJ. Even knew the command lines by heart. But not anymore, sadly. Then again, I don't use DOS as often anymore as I did back then.
Thank you for sharing this with us. This information was concise and interesting, and was even somewhat informative to someone who has already been compressing files for several years.
A very nice summary of various ways to archive and/or compress files. One conceptual difference I would've liked to see highlighted is the difference of archiving a set of compressed files (like zip does), as opposed to compressing a set of archived files (like tar does, with your separate choice of compression). When processing a large number of relatively small files (like a set of files for an application source code), the compressed archives tend to be far superior to the archives of compressed files, in size efficiency. However, the drawback of compressed archives is that if you just want to extract a single file, you have to decompress the archive all the way to the point where your desired file is, just disregarding all the decompressed data before the bit you actually want. With an archive of compressed files, you can directly access any of the compressed files in the archive, and decompress just the one you wanted. Perhaps another point to highlight might be the various file types/filename extensions that for the file handling point of view are just zip files (Java jar, war, rar, ear, likely multiple others as well). These are just zip files with specific metadata files included, and thus handled in a special way by the Java platform.
Well explained and I agree with your software recommendations. 👍 I'm one of those rare people who actually paid for a WinRAR license! 😆 Worth a mention that GZ and BZ2 can only compress one single file (Ideally a tar archive or an image file).
@@donerlil3903 Yep. If you ever get your hands on an epub file, you can open it with an archiving tool. It contains, htm, xml and some other files. It's basically a static website.
@@donerlil3903 Not only that, epub is a zip file containing good ol' HTML (and XML, CSS, images, etc), basically a zipped web page in a specific folder structure
Zip is one model - file based . The Unix compressors are interesting because they compress any stream. i.e. they can compress output data before it is even written to disc or over a network link between 2 processes.
I remember back when bzip2 was new that I went and compressed nearly everything using it because it produced the smallest files of any compression software. I was still new to using Linux back then and had a lot of fun playing with that, so I look at it with fond memories even though it's no longer the best. Of course, now there's a bzip3 and I'm going to need to test that.
@@Reziac Too bad they changed the classic user interface buttons to dumb looking "modern" buttons. At least they have a classic theme available for download to restore it though.
Great and informative video as usual, I think archiving large number of small files in one bundle (even without compression) will make copying/moving them faster between 2 drives or over the network. Thanks Chris!
Chris, your comprehensive computer knowledge never ceases to amaze me but these days with massive hard drives and very high speed broadband for me compressed files seem to be receding into the past, and as you say windows will decompress the odd file I might get And if I wrote this before the video ended I could have saved myself the trouble
I remember using some of the formats back in time, you actually had to choose compression rate vs. speed. Fast compression = low compression and the other way around, and then there also were a "balanced mode" . :)
Always interesting to see that there never is any mention of the most versatile and capable file handling tool ever - Total Commander. Any and all things you might want to do with a file, or to a bunch of files, TC can do - with ease!
Just to add another data point: I tried several formats and programs for compressing large project folders containing mostly uncompressed audio data and there, rar was the best by far, especially considering the time taken to create and open/extract the archives. I set the mode to the slowest, strongest compression setting in winrar. 7zip was able to create similarly small archives but took SO MUCH longer!
FLAC and variants would probably compress the audio more, though maybe slower. Apparently RAR does do some basic handling of audio, e.g. breaking audio files into channels, to improve compression.
@@gblargg Yes, I also use FLAC quite a lot and these become even smaller than rar archives (setting both to highest compression rate). But usually that's impractical because I'm not just archiving wav files, but also DAW project files which reference these wav files. Converting everything to flac would break the projects and usually I just want to keep the original recorded files. That's why I did the comparison, where rar came out clearly on top of every other option I tried. Edit: It was several years ago though, so things might have changed. Maybe 7zip is faster/more optimized today.
Zopfli isn't a new format, it's a new, slower but stronger compressor for Deflate (used in .gz and .zip). Brotli and Zstandard are worth exploring, especially with Zstandard having some of the best quality-to-speed ratios out there, but it's so good that at low levels _it usurps LZO_ in my opinion. LZ4 is still worth talking about though, it has even lower ratios than LZO but extremely high speeds, to the point where in some cases the decompression speed can outstrip a Gen 3 NVMe drive's read speed.
I've gotten really good results with zstd. It's faster and uses less memory than 7z, yet it achieves compression that's on-par, or even better. The only problem is it doesn't have an archive format, and tar is SUPER slow for some reason (It's not CPU, hard drive, or RAM bottlenecked). Maybe I should start using .iso.zstd
I install the free 7-zip on windows almost as soon as the updates for the OS finish. It's great, supports a ton of formats, it's free and source is available. There's also version for Linux and MacOS.
In bygone days, I think it was common practice to use WinRAR to automatically split a large file, and save it in multiple diskettes. What a great tool for multivolume operations.
Ah, reminds me of my Amiga days, using LZH and LHA to cram more onto those 880k floppy disks. Once you'd worked out the command syntax, it was common to use a 'directory browser' like SID so you could create or extract archived files just by selecting them from a list.
Excellent video. I didn't know compression was built into Linux. I used PKZip back in the day when all of us were using 3.5 floppies, came in handy. Thank you Chris!
Another very enjoyable video. The mention of PKZIP brought back memories of my introduction to DOS in the mid-80s. I used it extensively both at home and work. Today I use WinRar and 7zip.
WinRAR was the first piece of PC software I ever bought that wasn't a game. I don't remember exactly when but I think I was still using one of Windows 95 betas (Chicago) at the time. ... my back hurts just thinking about it
Lzop deserves mention because it is so fast that it is faster to decompress a huge text file and pipe it into grep than to grep the uncompressed file. The efficiency isn't high but the speed makes it useful to compress files that you commonly wish to access.
Good presentation. A few other considerations: If you need to stash the file online -- crawlers can generally snoop inside ZIP and TAR.GZ files. They generally cannnot snoop inside RAR files. (Yes, you can mitigate this by using passwords, but a lost password or garbled encode means the contents are lost.) If the file becomes corrupted, RAR files are generally more recoverable than ZIP files, and WinRAR usually gets farther than do ZIP recovery tools. If the file header becomes corrupted, the contents will probably be lost. .ODT and .DOCX are just glorified ZIPs. If the header becomes corrupted, the contents will probably not be recoverable. I have an editing client who lost an entire finished novel that way (the bad copy had propagated to backups before it was discovered. WinRAR was able to peel out a background image; nothing else could see any content at all.) Always save a copy in a human-readable format, like RTF. _You Have Been Warned_ .
My method (mainly for storing virusses without letting antivirus interfere) is to put an empty file in the root directory of the zip/7z, and set its name as the password. You can see the password by opening the archive (because it lets you list contents), so any human can decode it, but antivirus/crawlers don't have the human intuition to do that.
10:46 I have to point out that 7-Zip support in Linux is not as widely applicable as in Windows, as the archive format does not support chmod attributes.
Sometimes I zip a lot of files into a single file, not necessarily with compression - apparently it's easier for machines to copy few larger files then multiple smaller files. Also makes for an excellent backup both in the actual machine and in another drive. EDIT: 7Zip has the option to increase the compression level - would've been interesting to see at which point more RAM and CPU power didn't lead to a smaller size.
@@sarkybugger5009 7Zip tells me higher compression needs more memory, both to pack and unpack. I guess if there isn't enough RAM the pagefile in Windows or equivalent on other OSs is used after maxing out the RAM.
@@CnCDuneAccording to Wikipedia, LZMA2's (7zip's main compression algorithm) memory usage is around the size of the dictionary, which can be *forced* up to 1 GiB in size, although using normal compression settings at max, it's only 24MiB. So even at the hypothetical max, it shouldn't consume much more than 1GiB of memory.
I like to approach it more generally and abstract: An archive file serves a few purposes: gathering multiple files together as a single unit, reliably keeping the full filenames and meta-information no matter what system the file moves through, verification of integrity, password encryption, and reducing size (to save on storage cost or speed up transfer over a slow link or medium). As for reducing size, of all the possible files of a given size, most of them aren't useful to us. Thus, this subset of useful files can be represented with less data. Various approaches are taken that recognize the typical patterns of redundancy in files, some lossy. As others have mentioned, media files almost all use their own compression already, and they tend to take most of the space on computers, thus the compression aspect of archives is less-important. So all the other benefits become the main reason for archive files.
WOW, I now understand compression better than I did, but that may not be saying a lot. LOL. Compression, depression, maybe I should just keep my mouth zipped! Looking forward to your next video.
As some people have noted below, in the word cloud "bz2" appears incorrectly as "b2z". But all is correct in the BZip 2 section at 7:11. And I have corrected the thumbnail. Also note that GZip compressed TAR files have the extensions tgz or tar.gz, not tar.tgz, as noted in the third bullet point around 7:53. My apologies. This video was checked, corrected and re-rendered many times . . . but at the moment, things like this are slipping through. :(
You're okay! Please don't beat yourself to a pulp over it.
Looks like you missed correcting tar.b2z in the word cloud
I've actually seen tar.tgz now and then, and wondered WTF. Apparently there's some software out there that uses it as the extension for a tar.gz file, but I have no idea what it might be.
Oh, another use for the software, especially WinRAR and 7zip: they can unpack a lot of installers (EXE, CAB, RPM, DEB, various others).
These sort of minor mistakes can occur, don't worry about them.
It is okay.
Not a big deal.
@@ThatMattWhiteThere is no way to correct anything in a video! :) I've fixed the thumbnail version, but cannot change what appears in the video, hence he message here. :)
It's worth mentioning that media files like pictures, audio and video formats are usually already compressed so additional file compression is not very effective.
Correct. I would only put all of them together into a container if needed. But other than that, it's not worth wasting time on compression and decompressing.
So it is but I still would like a video on how video and picture compression works and and the evolution of it. Especially the old ones. Just the other day I sat wondering of the difference between the mars rover photos and the ones sent by the Viking probes in the -70:s. Hardware vs software and so on. To me it appears that technology evolution is now a days is accellerating exponentially compared with pre WW2.
I still zip them if, for example want to copy a large amount of small files to a mobile device. On Linux, USB file transfer is somewhat flaky, especially with Android devices which use MTP. So instead of copy a hundred small jpegs, I zip them and copy one bigger file that way. Even if I use a wireless file transfer method like Warpinator or KDE Connect, it's still more sane to have only one file transferred and not getting hundreds of notifications of receiving/getting a file. :-)
often compressing any style file that has any form of compression will make the output file larger
Compressing a bunch of pic or vid files into one archive is useful for uploading/downloading multiple files to/from the internet, even if said files are already compressed by default as a feature of there format/type. This is why when you download multiple photos/vids all at once from Google Photos, it first puts them into a single ZIP archive file on the server side before downloading them to your computer.
I'm not gonna lie, one problem "as a computer guy myself", Is getting many of my customers to understand just how ZIP files work. Your video offers a great explanation on how they work. I have to admit, I've used some of these exact "teaching tools", and sometimes people just don't get it. I feel that 90% of the battle of getting people, including senior citizens, is if you can just teach them to understand where things "LIVE" in file explorer. This is an extremely helpful video that I will send to some of my more "semi-educated" PC customers who are just starting to understand things. As always, keep up the AWESOME work Mr. Barnatt!
Reminds me of the Verge article last year about professors noticing some of their students didn't really know what files and folders are, they saw files as one bucket you search in. It is the simplest explanation, I get it, if all they ever did was click on apps and produce 1 single document occasionally.
For the older folks, I found that once I used the reference to writing in shorthand, they then understood it quite well.
I've had some luck using a suitcase analogy. A ZIP archive is like a suitcase, and a compressed archive is a suitcase you've sat on to get it to close.
I am a boomer. This means I grew up with computers. I was 27 when I owned my first home computer (Atari 400). I do remember in the 1990s trying to explain computers and the internet to my senior citizen parents. It required that they understand concepts that were never before needed. It was difficult.
What I don't understand is why today's senior citizens seem so STUPID when it comes to computers. Windows 3 was introduced well over 30 years ago, with Windows 95 in 1995. WHERE THE HELL WERE THESE PEOPLE??? In the mid 2000s, smart phones became ubiquitous; these are nothing but pocket computers. Have senior citizens been asleep since turning 30???
@@laurendoe168 I believe it comes down to what they are used to, thus comfortable with along with their interests. Just because the tech has been around, doesn't mean everyone uses it or wants to. They must have an interest in using the tech to learn about it. My mother is a good example. She was born in 42, farmers daughter. Never took any interest in any tech of any kind. Other than using the radio and tv. I got her to barely be able to use a computer to check email. And that only lasted about a year. But I am a computer tech. My first computer was a Commodore 128D. I had always been fascinated with tech. So yeah, I think it really comes down to what a person is used to from their younger lives and if they have any interest in new tech or not. If they don't use it, they will know nothing about it.
I recently wrote a decompressor for Phil Katz's Deflate algorithm. It was... one of the more difficult programming tasks I set for myself. To have been Phil Katz and come up with this way of using different kinds of compression (Lempel-Ziv and Huffman codes) in ways that play to their strengths... it's staggering how smart he was.
It's based on analog signal compression and multiplexing. Dude was smart but he ported a lot of this over from analog transmission systems as well as what was going on with digital multiplexing (which is eerily similar to compression in a lot of ways).
which license it uses? if it is open, can you send a link to it? I'm benchmarking these algorythms rn
Another thing to consider is if you're going to share the compressed file.
If you're compressing the file only for your own personal use, you can use whichever format works best for you. But if you need to share the files with others in a windows ecosystem, zip is probably the best choice, especially if you're sharing with non-technical people.
One shouldn't cater too much to these people you call non-technical, who are really just too lazy to google the thing and download the thing that opens the thing. This is much easier and much more intuitive than e.g. trying to understand how to use The Ribbon in your Office workflow, or understand which way to swipe to show things hidden on purpose on a phone or laptop.
@@ScoopexUs It's not just laziness. Many people don't have computer skills and anything that does not work out of the box is not good enough for them
@@matiaanjansenvanrensburg771 Absolutely no! They should not be using computers, or mobile phones, if it's too challenging for them. And I'm 100% against that.
I'm sorry, but just getting a program and running it is the base level for using a 'device'. If they can't do that, they are simply not ready.
And if they are not ready, we must fix this, because they will be disempowered. This is about humanism. There was a great movement in the late 1970s and early 1980s, and you will not have me believe humans are incapable of opening any file on any device much later.
That movement was a response to the fear from above - technocracy. There was a great counter-movement, but you still had to buy magazines or read or borrow books.
Today, corporations rely on this laziness (not even spending 5 seconds searching for how to open the file) to control you and enforce monopolies.
There are so many helpful resources - just like this one that you found.
It's extremely important that anyone can write and publish any program or file format on the Web, and that anyone can use it on any platform. Any movement away from this moves users and developers closer to this now much more threatening monopoly dictatorship!
This has been ongoing for decades, and it should terrify you.
@@ScoopexUs Sounds more like laziness on the part of the person sending the file. Rather than just selecting a zip they choose their preferred format, then the receiver has to find a trusted decompression program, install it, and figure out how to use it. Meanwhile zip works on almost any platform out of the box.
The amount of time i have to spend for people forwarding me emails "i can't open this" clearly speaks to your point. It's usually media formats, but same idea.
I've had my fair share of experience with 7-Zip as a lot of custom content made by folks for Sims games tend to come compressed in the .rar format. Alas, the version of Windows I use can only natively decompress .zip files. I think it's silly that Microsoft only recently has added in the ability to access more file compression formats to Windows. With all that said, this was an absolutely excellent video! :)
Thanks for your support.
@@ExplainingComputers You're welcome!
Anyone can use any file compression format on Windows. Most also come on Linux and Mac platforms. Explore!
@@ScoopexUs Sure can! On my end, 7-Zip is my beloved. 😍
I started using that years ago, I still use it when it's needed, and I never looked back. It's my jam. :3
Love this video, Chris! While I’ve been decompressing files from the internet forever, I’ve never really had much understanding of what was going on with the files in general. Definitely going to show this one to other folks.
Thanks for your support. :)
There is also another point that should be considered. And that is "Recovery Record" wich is built in to the rar format, but is disabled by default and can be enabled easily. Which can become handy in times of corruption of archived data.
Zip and 7z file can also be protected by a program called MultiPAR, which basically adds Parity Data to the end of archive which can be used to recover corrupt archive.
I agree parity protection should be enabled by default. Especially if you're planning to store the archive files for years on some media that might be questionable in the future.
As I note above, but forgot why, RAR is more recoverable than ZIP and its kin. WinRAR also does a better job of extracting whatever can still be recovered from a corrupted ZIP than do any of the ZIP tools.
PAR and PAR2 files deserve a mention, as any usenet user will tell you.
If I am concerned about file corruption, I use QuickPAR. I have found RAR's "Recovery Record" performance to be spotty at best (and useless if the recovery record itself is corrupt). Granted, QuickPAR can create a somewhat large set of separate files (*), but I have YET to be unable to recover a corrupt file when using it (unless, of course, the recovery files themselves are corrupt, but QuickPAR can often work around even this).
(*) - QuickPAR does not alter the original file. I've used QuickPAR on an entire set of 600 MP3 files that can recreate an MP3 even if deleted. Set to 10% recovery on that set of 600 MP3s, any 60 MP3s could be deleted and completely recovered or it can fix all 600 if each is less than 10% corrupt... or anything in between (any 150 MP3s are 40% corrupt, etc). I store the QuickPAR files on a different drive from the original files being backed up.
@@laurendoe168 I agree 💯, I have also found it in my tests, I was able to corrupt rar file with recovery record.
And, as you mentioned PAR2 can deal with it even with single PAR2 file; if it isn't corrupt enough it can recover the files protected by it. Then PAR2 can be recreated.
I have tried it too.
What I am still searching for is for a compression tool that can add Parity Data so much close to the data that it can read correct data without an attempted to repair it in the first place. Like Parity in QR Codes, CDs & DVDs, and Satellite 📡 transmissions.
But, unfortunately I am still searching for it.
Sunday Morning Compressed EC.
Thank you for this. So often we assume everyone understands a lot of these terms and standards because us old people grew up with them.
So very true.
@@ExplainingComputers So, if I unzip Chris, can we have more of him? :D
Good counter-point to the "senior citizens don't understand computers" meme. rms is now a senior citizen.
Time marches on doesn't it.
Fantastic! I wanted to research about this but got confused with so many formats and methods. This video is a collection of all of them and I understood much better. Thanks!
What an interesting & informative video Chris! And you included FreeBSD - Thank you for that...
See you next time! Take care....
Thanks for your support! :)
Excellent video! There is one thing worth mentioning is that once a file is already compressed compressing it again will make the archive bigger. Newer compressors are smart enough to detect already compressed files so it gets stored in the archive.
Well, not necessarily. Had a friend whose amusement with a new blank hard drive was to zip up the nulls. So he'd have 1GB of nulls zipped up, then he'd rezip it a few times and get it down to a few MB. Admittedly this is a special case.
The first recompression you might get a tiny bit smaller file, but repeating compression will start to gradually increase it.
Bravo.
This is another gold nugget of basic yet necessary addition to the EC library. This clears up the muddy waters of compression and archiving.
I actually think it muddies the waters significantly. Attempting to cover half a dozen different compression formats in one accessible video is an ambitious project. Name dropping a bunch of compression algorithms used by various container formats is confusing to the average user and will unfortunately immediately fly over most people's heads. All of the content in this video was basic computing 101 sort of stuff for me but for someone encountering data compression for the first time, I don't think this video is all that great. But if it helps some people including yourself learn something new, then I'm willing to admit that I'm wrong.
I love this guy! And this hairstyle is not easiest one to pull off, but does it nonetheless. Thanks for sharing you knowledge with the world! ❤
Genuinely the best explanation for the wizardry of compression i’ve seen yet.
Fantastic video. I got interested in file compression technicalities back in BBS days and have loved teaching & discussing data/media compression topics and improvements ever since.
Back in those days, the compressed file format was .arc. You had to have a license to create or extract files Phil Katz is a hero for creating the zip file format and Licensing it to the public for free.
In those days the reduction of transfer time was the biggest driver for using them. Also packaging a bunch of files together and integrity verification, when the transfer protocols were flaky or manual. Zmodem was great, started automatically.
ARJ & LHA are the one's i've used during MS-DOS days, ARJ has multiple file archives into ideal filesize for your floppy disk, while LHA is commonly used by Japanese users for file compression.
not to mention LHA is the most common form of file compression on the Amiga platform
Ah, I still have a soft spot for Amigadom. :)
@@ExplainingComputersdid you happen to catch yesterday's episode of Retro Recipes where they used an Amiga 3000 running a branch of Linux to decode a message bounced off the moon?
@@stumblepuppy606 I don't know whether he did, but I did for sure! :D
Yeap! Loved ARJ back then. Very powerful and versatile.
I'm very happy to use WinRAR and 7-Zip for archiving files that I need to. Thanks for explaination Chris! 👍🏻
Another clear and well-laid-out video that will no doubt be extremely useful to those unfamiliar with this topic. Well done, Chris, you're making a difference in a world full of worthless short-form clickbait content and SEO spam.
I'll be liking the video before watching this as that helps the algorithm. Joking aside your Explaining series has always been solid that I watch them even if the topic are ones I have an idea of already and to my amazement every time you explain something I haven't heard or read before. Thank you Chris.
Of the compression formats on the thumbnail I've only used maybe 10 of them with the common ones being zip, 7z, rar, and tar variants.
Compression has always helped with slower internet connections and limited bandwidth allocations.
I find it interesting that compression also has ties with software piracy from BBS sharing to direct downloads to torrenting and people have developed custom algorithms for the purposes of packing software and games.
Unfortunately UA-cam algorithm doesn't work that way.
You need to at least watch the video for 1 minute before Clicking the Like.
Reason? Your like doesn't count and only visible to you if you instantly click Like at the beginning of video. Which means, other people don't see your Like registered.
Clicking like after 1 minute of video playing, will help algorithm better because it's registered in real-time.
@@niezzayt3809 Thanks for the explanation. Kinda watched the video twice though, half as I let it play while typing my comment and a second time watching it in full on a tv.
The 'beauty' of winzip for me is you can conserve the critical names in the file structure with many folders and file extensions into ONE file. This allows software that needs specific folder names and material files and texture files so that they can be shared and opened again. Time saving when sending 3D items online.
Great video! I ran a BBS in the 90s and used pkzip, lha, arc, and arj. There were others but those were the most common. I still use zip/gzip.
Ive always used WinRAR with Windows I remember using PKZIP along time ago with DOS used to think it was like magic when it was first introduced, interesting video well explained
I can honestly say: I learned ZIP from this video! Keep up the good work! 😄
Thank you very much, Chris. This was a very good stroll down memory lane and informative. The video is useful, it is good to have all this information in one location. Best wishes.
Anyone else remember ARJ? You could split your compressed files onto multiple 1,44 MB floppy disks. Good old times… (with lacking computer power and disk space 😅)
We did a clever trick at work for 8 bit lidar data compression. We take a chunk of data which in this case is a set of successive time series values, load them in a 2D array and save as a PNG file. This reduces the data storage size losslessly but also allows one to look at the thumbnails and visualize the data when browsing the data set.
Sounds very cool.
It's gratifying to see the Zip I've been using since 1990 is still leading the pack. Back then with the limited sizes of hard drives and floppy drives, Zip was such a critical game changer.
Define "leading the pack"? Because he just demonstrated that, out of four compression formats, Zip creates the biggest files compared to bz2, xz and rar.
@@terrydaktyllus1320 I presume "leading the pack" means "the most commonly used format."
@@laurendoe168 And before I answer you, I presume the original poster appointed you as their official spokesperson? Because, with the greatest of respect, my comment was directed at them, not the general audience.
Or do you have some belief in your own telepathic powers that allows you to read the contents of the minds of others?
@@terrydaktyllus1320 you realise how ridiculous that position is when you are posting in a public forum, surely? OP could very well use the same logic to dismiss your comment, given they were, in theory at least, talking directly to the presenter featured in the video. Certainly they did not invite your, unenecessarily combative, comment.
@@paultapping9510 "you realise how ridiculous that position is when you are posting in a public forum, surely?"
"Ridiculous" is an opinion on your part and they haven't invented "Telepathy over TCP/IP" yet to the point where I have visibility of the contents of your mind and your thought patterns. So no, I can't realise how ridiculous I look in your eyes, and, given you're just a "stranger on the Internet", I could care less either.
"OP could very well use the same logic to dismiss your comment,"
Then they are more than welcome to do so. I wasn't aware they'd appointed you their "official spokesperson" - or are you here simply trying to make "new Internet friends" as some kind of "vigilante of YT comments' sections"?
"given they were, in theory at least, talking directly to the presenter featured in the video."
This is a public messaging forum that invites comments from anyone with the capability of writing a response. I wasn't directing my comments at you either, yet you exercised your right to reply to me. Do I smell "burning hypocrite" now?
"Certainly they did not invite your, unenecessarily combative, comment."
I don't need an invitation to write a comment, as I explained above, because that is not how YT works. The onus is on you now to go do some learning and understand better how YT does work because clearly it's your lack of understanding that is the issue here.
Now, was there anything else you wanted or have I made you look silly enough now and we can leave the conversation there?
Run along now, mind how you go and stay away from sharp scissors.
Discussion closed.
The first compression utility I used was pkzip on DOS, but even then I still learned something from this video. I didn't know that file compression was being worked on long before then. As always, a fantastic video!
Compression pre-dates computers. Even old analog broadcast "tv" signals used compression. Studying signal compression makes obvious where the digital compression algorithms came from.
Have been a 7z user since, I don't know how long.
As you say, the need for compression is not not quite there in the past, on place it seems to help is in backup. If I have a directory with say 100 files in it, backups work much faster if the 100 files are replaced with a single compressed file. It isn't that the compressed file is smaller than the individual 100 files (it is) but there isn't all the FAT accesses needed to copy the file.
You wouldn't want to do that with every directory but directories where the data is static, it definitely saves time.
Best Lecture On File Compression I'll Probably Ever See. Thank You.
I love these types videos where you explain the history of and differences between different aspects of computers that we interact with but never really think about
You should also include Zstandard (zst), sizewise like bz2 and speed like gzip.
It's also now the default compression format for Arch linux packages (.pkg.tar.zst)
I was somewhat hoping to hear a little about ZStandard, designed by Facebook, which seems to have popped up very suddenly recently. An especially interesting detail about ZStandard is that, in addition to Huffman coding, it also uses another "entropy coder" called "Asymmetric Numeral Systems", which is actually the _3rd_ dominant entropy coder, being proceeded by not just Huffman coding, but also Arithmetic Coding. Arithmetic coding, from what I can tell, generating smaller encodings than Huffman coding, but also being more expensive to compute, and didn't recieve much attention due to patents, while Asymmetric Numeral Systems seem to fall somewhere between with similar computation efficiency as huffman coding, but similar compression ratios as Arithmetic coding.
this one of the craziest intro tracks iv ever heard
Hello Chris, at around 8:05 the slide mentions a format with a .tgz or tar.tgz extension, but I think the latter would rather be tar.gz.
I take the opportunity to thank you for all your work with Explaining Computers videos, they are always very clear, informative and comprehensive.
Thanks for this. I've added a corrective pinned comment. :)
Thanks Chris for your new “EXCV” format (Explaining Computers Video) for helping me decompress the depression every time I have to deal with compressed computer files….😂!
Have a great week!
WinRAR can add 'recovery records' to archives(compressed or not), allowing corruption to be repaired. Not sure what situations would cause corruption or how well it works, but it's a nice extra layer of protection for my backups.
I think recovery was intended for archives stored on physical media - floppies, cds, dvds. One or couple of damaged sectors can be recovered from easily.
More than that and you're at the backup level. Which EC also has some videos on.
Can easily check if it works by hex editing random bytes in a rar archive that has a recovery record and then attempting to recover it. Even if I use it rarely, I like having it for peace of mind.
Great explainer, with a minor nitpick @ 7:53 that a GZ compressed TAR can be .tar.gz or .tgz but .tar.tgz would be strange (though not technically impossible), as it would imply nested TARs.
It was useful emphasising the difference between compression of files and archiving collections of files.
If it wasn't spelled out, a compressed TAR creates the archive first then compresses that single file, compared with .zip where each file is compressed (or just stored) individually prior to collecting into the archive, which means the latter can have an advantage if you only need to unpack a few files from a large archive.
I understand why disc images were left out of a video about compression, and would be worth a separate video because they can be used for archiving and backup. Their own advantage is that they can often be mounted as a virtual drive to read or update depending on the OS and image type.
This could also cover which archiving methods preserve file ownership, permissions or other attributes as another commenter mentioned.
A good kit-pck -- it should indeed be tar.gz. My bad. :(
@@ExplainingComputers at 9:14 there's also a typos "tar.b2z" -> "tar.bz2" in "Linux" row.
Also, there are .tbz and .txz for whenever you don't prefer or can't use multiple extensions or longer than three chars, and not using gzip
@@Spudz76 three letter single extensions are relict from CP/M, I wouldn't take them seriously in this century 🙂
@@kmartyCZWhat if it's an ISO9660 or FAT based storage and the long file names get corrupted, so you only have the short names during recovery or forensics attempts? The echoes of the old way still exist and are useful sometimes. When archiving one must expect and armor oneself from bitrot as much as possible otherwise why make backups at all.
I almost clicked off the video at the beginning since the intro was a little slow and boring, but I'm glad I stayed around because the rest of the video held my attention well and was very informative 😊. Thank you for making it.
I love this stuff. I use Win RAR and have for years. I started with the 21-day freebie use and was hooked on its many functions. Yes, you pay for it but it's great software. Many years ago working with Apple BASIC and storing data on 1.44 floppies I was constantly running out of space on these floppies and decided to try my hand at actually creating something. I wasn't really aware of compression options in those days. Not knowing what I was doing I accidently, created a raw compression algorithm that actually did what I wanted it to do allowing me to store all the data I wanted to save onto one 1.44 floppy. YAY. Of course, I've totally lost the code, the disks, the programs, and the old Laser 128. I became a PC user shortly after that and started using what was available in the wild. Amateurs...you have to love them.
Cheers! 7zip all the way! 🎉 Super small compressed sizes and a very easy to use!
lzma is the clear winner when a balance of high compression ratios and high availability are valued.
When speed matters, deflate or even lz4 and friends dominate. Yeah, compressing a megabyte using lzma is no big deal, but what about compressing backups or network transmissions or memory contebts etc, where continuous operation matters? there's a reason http uses deflate, not lzma.
When concerns tip strongly towards compression ratio, there are other formats like the ppm family or winarc that pull ahead. Though it requires large datasets fit them to really excel.
I love your pragmatic, no-nonsense format!
To be honest I never put in that much thought as to why many things I download from the internet are zipped or even the difference in size between identical zipped and unzipped files. I just accepted that it was one extra step to get to the files.
Now that I understand that zip files are a form of compression, it does make sense why sending a zipped file over the internet is more practical.
Thanks for the video. I finished upgrading an old windows 10 into a useful windows 10 and was going to look into what might be best to use on this PC. Your video takes my search to a quick conclusion.
Excellent thank you ❤. I don’t upload or create. So it’s always been a frustration for me when I receive a compressed file -and didn’t know how to decompress it - I was under the impression that there was zip and that was it. (!) Thanks to your superbly informative video I’ve now got a much better understanding.
Sometimes file compression can achieve amazing results. With xz. I was able to compress a diskimage from 20 GiB to about 2 GiB.
a great video as always - most enjoyable.
One of the other benefits that i find of using RAR is the ability to break files up into multi part archives - this is really useful if you want to transfer things via email and it is bigger than the per email mailbox limits
I find it quite funny that you managed to compress so much information into a video about compression applications... This was an interesting video. Only real enthusiasts have that WinRAR license key. ;)
I can’t believe how you post exact subject I am interested at the moment wow
Another excellent video even if a lot I already knew...I still enjoy the information presentation and cadence.
Compression is still used commonly on the internet. Generally web pages compress the HTML, CSS, JS, etc files on the fly to the browser (using fast compression option), as it still generally saves time loading, and is useful for your phone on mobile data, and those still on DSL. It was a huge deal in the 80s though, connecting to BBSs over slow dialup, compression was a godsend.
Another excellent explanation, especially how to bypass the limitation for several formats that can only do single files by using tar. Although iso files may not ne compressed by themselves, they are an option, too, to compile a number of files into a dingle file prior to compression, as this is a method I've used, as you can iso a folder, as well as a disk, then compress the iso.
Something interesting to me is *how* different archiving formats choose to lay out the file list, which can actually affect the compression efficiency.
Allegedly, CPIO is more efficient than TAR in this regard, but ended up losing to it due to its more difficult command syntax
Chris has compressed himself into a zip package 📦
Such great explanation. Thank you. I’ve wondered previously why certain formats had certain extensions and to see it come down to name of authors in some instances is brilliant.
Very interesting. Back in my day, so late 90s early 00s, I do remember seeing PKZIP and .zip files but for all my pir... uhm, backups, I only used ARJ. Even knew the command lines by heart. But not anymore, sadly. Then again, I don't use DOS as often anymore as I did back then.
Thank you for sharing this with us.
This information was concise and interesting, and was even somewhat informative to someone who has already been compressing files for several years.
A very nice summary of various ways to archive and/or compress files.
One conceptual difference I would've liked to see highlighted is the difference of archiving a set of compressed files (like zip does), as opposed to compressing a set of archived files (like tar does, with your separate choice of compression).
When processing a large number of relatively small files (like a set of files for an application source code), the compressed archives tend to be far superior to the archives of compressed files, in size efficiency. However, the drawback of compressed archives is that if you just want to extract a single file, you have to decompress the archive all the way to the point where your desired file is, just disregarding all the decompressed data before the bit you actually want. With an archive of compressed files, you can directly access any of the compressed files in the archive, and decompress just the one you wanted.
Perhaps another point to highlight might be the various file types/filename extensions that for the file handling point of view are just zip files (Java jar, war, rar, ear, likely multiple others as well). These are just zip files with specific metadata files included, and thus handled in a special way by the Java platform.
Indeed. Rar and 7z offer a way to do that called a solid archive. With advantages and disadvantages you've described.
File compression is relevant even this day in times of fast internet & large storage capacity. Thanks for making this video.
Well explained and I agree with your software recommendations. 👍 I'm one of those rare people who actually paid for a WinRAR license! 😆
Worth a mention that GZ and BZ2 can only compress one single file (Ideally a tar archive or an image file).
Thanks for this. I did note that gz "is not an archiving format, so can only compress a single file". Forgot to do this for Bz2 though.
Fun fact: Microsoft office (365) documents are just renamed zip files. “.jar”s are the same.
Same with epub.
I would say that that is pretty smart.
Some installer exe files too
@@AndreasElf epub is just a zip file???
@@donerlil3903 Yep. If you ever get your hands on an epub file, you can open it with an archiving tool. It contains, htm, xml and some other files. It's basically a static website.
@@donerlil3903 Not only that, epub is a zip file containing good ol' HTML (and XML, CSS, images, etc), basically a zipped web page in a specific folder structure
Zip is one model - file based . The Unix compressors are interesting because they compress any stream. i.e. they can compress output data before it is even written to disc or over a network link between 2 processes.
One of the first things installed on any new os install is 7zip . Great video
I remember back when bzip2 was new that I went and compressed nearly everything using it because it produced the smallest files of any compression software. I was still new to using Linux back then and had a lot of fun playing with that, so I look at it with fond memories even though it's no longer the best. Of course, now there's a bzip3 and I'm going to need to test that.
I purchased a copy of WinRAR back in 2004 and it has definitely been my go to format.
Mine too. Gets all the features in the most everyday-usable interface.
There are dozens of us!
Someday I will buy WinRAR for the years of relentless service it did for me
@@Reziac Too bad they changed the classic user interface buttons to dumb looking "modern" buttons. At least they have a classic theme available for download to restore it though.
Thank you once more for guidance through the thicket of complex choices.
Wow. Good info. I didnt know most of the information conveyed. Thank You!
Thanks!
Thanks for your support.
Hello, fellow Christopher!.... back again....
thank you for this. thank you for explain and growing my knowledge of stuff i already knew.
The latest versions of RAR are a problem for me since I have not found a way to decompress them on Raspbian/OS. Very good video as always.
9:45 Yes, you can create password-protected zip files in macOS natively.
Christopher....You have no ideea how your tutorials impact my life! Thank you old man.
Sorry for my bad eng.
You are most welcome. :)
Great and informative video as usual,
I think archiving large number of small files in one bundle (even without compression) will make copying/moving them faster between 2 drives or over the network.
Thanks Chris!
Chris, your comprehensive computer knowledge never ceases to amaze me but these days with massive hard drives and very high speed broadband for me compressed files seem to be receding into the past, and as you say windows will decompress the odd file I might get
And if I wrote this before the video ended I could have saved myself the trouble
I remember using some of the formats back in time, you actually had to choose compression rate vs. speed. Fast compression = low compression and the other way around, and then there also were a "balanced mode" . :)
Great - thank You very much :-)
Thanks for your support.
Great again. I've been using rar for a long time what I like about it is that I can include recovery files in the rar file to fix any errors.
Always interesting to see that there never is any mention of the most versatile and capable file handling tool ever - Total Commander.
Any and all things you might want to do with a file, or to a bunch of files, TC can do - with ease!
Ah yes, good old fashioned compressed tarballs. Brings back memories. I think they’re still in use today if I recall correctly.
Very, very much still in use.
If you use linux, they are common
Just to add another data point: I tried several formats and programs for compressing large project folders containing mostly uncompressed audio data and there, rar was the best by far, especially considering the time taken to create and open/extract the archives. I set the mode to the slowest, strongest compression setting in winrar. 7zip was able to create similarly small archives but took SO MUCH longer!
FLAC and variants would probably compress the audio more, though maybe slower. Apparently RAR does do some basic handling of audio, e.g. breaking audio files into channels, to improve compression.
@@gblargg Yes, I also use FLAC quite a lot and these become even smaller than rar archives (setting both to highest compression rate). But usually that's impractical because I'm not just archiving wav files, but also DAW project files which reference these wav files. Converting everything to flac would break the projects and usually I just want to keep the original recorded files. That's why I did the comparison, where rar came out clearly on top of every other option I tried.
Edit: It was several years ago though, so things might have changed. Maybe 7zip is faster/more optimized today.
Instead of Explaining Computers, today we are Explaining Compression!
:)
You should also explore zopfli (Zöpfli), brotli (Brötli) as well as zstd (Zip Standard). These are relatively new. LZO is also interesting.
Zopfli isn't a new format, it's a new, slower but stronger compressor for Deflate (used in .gz and .zip). Brotli and Zstandard are worth exploring, especially with Zstandard having some of the best quality-to-speed ratios out there, but it's so good that at low levels _it usurps LZO_ in my opinion. LZ4 is still worth talking about though, it has even lower ratios than LZO but extremely high speeds, to the point where in some cases the decompression speed can outstrip a Gen 3 NVMe drive's read speed.
I've gotten really good results with zstd. It's faster and uses less memory than 7z, yet it achieves compression that's on-par, or even better. The only problem is it doesn't have an archive format, and tar is SUPER slow for some reason (It's not CPU, hard drive, or RAM bottlenecked). Maybe I should start using .iso.zstd
@@SupaKoopaTroopa64 There's a 7zip fork with ZSTD support. But the modified 7zip format is non-standard of course and unofficial.
@@lithiumwyvern_ Brotli performs better for losslessly filtered image data.
There's so many to learn about file compression format :)
Great informative video, thank you! I love how you include some history.
I install the free 7-zip on windows almost as soon as the updates for the OS finish. It's great, supports a ton of formats, it's free and source is available. There's also version for Linux and MacOS.
Thanks for another very clear and fascinating tutorial!
In bygone days, I think it was common practice to use WinRAR to automatically split a large file, and save it in multiple diskettes. What a great tool for multivolume operations.
Very true.
Ah, reminds me of my Amiga days, using LZH and LHA to cram more onto those 880k floppy disks. Once you'd worked out the command syntax, it was common to use a 'directory browser' like SID so you could create or extract archived files just by selecting them from a list.
Lovely and concise explanation. Compression and archiving is a difficult topic. Mainly leaving the comment so the algorithm up votes your vid.
Excellent video. I didn't know compression was built into Linux. I used PKZip back in the day when all of us were using 3.5 floppies, came in handy. Thank you Chris!
Another very enjoyable video. The mention of PKZIP brought back memories of my introduction to DOS in the mid-80s. I used it extensively both at home and work. Today I use WinRar and 7zip.
I now just use ZSTD. Fastest compression and decompression and smaller files than all other compression tools most of the time.
WinRAR was the first piece of PC software I ever bought that wasn't a game. I don't remember exactly when but I think I was still using one of Windows 95 betas (Chicago) at the time. ... my back hurts just thinking about it
Lzop deserves mention because it is so fast that it is faster to decompress a huge text file and pipe it into grep than to grep the uncompressed file. The efficiency isn't high but the speed makes it useful to compress files that you commonly wish to access.
Good presentation. A few other considerations:
If you need to stash the file online -- crawlers can generally snoop inside ZIP and TAR.GZ files. They generally cannnot snoop inside RAR files. (Yes, you can mitigate this by using passwords, but a lost password or garbled encode means the contents are lost.)
If the file becomes corrupted, RAR files are generally more recoverable than ZIP files, and WinRAR usually gets farther than do ZIP recovery tools.
If the file header becomes corrupted, the contents will probably be lost.
.ODT and .DOCX are just glorified ZIPs. If the header becomes corrupted, the contents will probably not be recoverable. I have an editing client who lost an entire finished novel that way (the bad copy had propagated to backups before it was discovered. WinRAR was able to peel out a background image; nothing else could see any content at all.) Always save a copy in a human-readable format, like RTF. _You Have Been Warned_ .
My method (mainly for storing virusses without letting antivirus interfere) is to put an empty file in the root directory of the zip/7z, and set its name as the password.
You can see the password by opening the archive (because it lets you list contents), so any human can decode it, but antivirus/crawlers don't have the human intuition to do that.
@@gunt-her That's a really good idea!
I used to collect viruses myself, tho eventually lost interest.
10:46 I have to point out that 7-Zip support in Linux is not as widely applicable as in Windows, as the archive format does not support chmod attributes.
Worth noting that comic book formats cbz et al, are really zipped directory with a different name and a special html file for panel viewing.
Sometimes I zip a lot of files into a single file, not necessarily with compression - apparently it's easier for machines to copy few larger files then multiple smaller files.
Also makes for an excellent backup both in the actual machine and in another drive.
EDIT: 7Zip has the option to increase the compression level - would've been interesting to see at which point more RAM and CPU power didn't lead to a smaller size.
There is no point at which RAM and CPU have any effect on compression levels, just how long it takes to compress it.
@@sarkybugger5009 7Zip tells me higher compression needs more memory, both to pack and unpack.
I guess if there isn't enough RAM the pagefile in Windows or equivalent on other OSs is used after maxing out the RAM.
@@CnCDuneAccording to Wikipedia, LZMA2's (7zip's main compression algorithm) memory usage is around the size of the dictionary, which can be *forced* up to 1 GiB in size, although using normal compression settings at max, it's only 24MiB.
So even at the hypothetical max, it shouldn't consume much more than 1GiB of memory.
Storage mediums have a better time trying to read one continuous big file versus having to find lots of tiny ones that are clustered all around.
I like to approach it more generally and abstract: An archive file serves a few purposes: gathering multiple files together as a single unit, reliably keeping the full filenames and meta-information no matter what system the file moves through, verification of integrity, password encryption, and reducing size (to save on storage cost or speed up transfer over a slow link or medium). As for reducing size, of all the possible files of a given size, most of them aren't useful to us. Thus, this subset of useful files can be represented with less data. Various approaches are taken that recognize the typical patterns of redundancy in files, some lossy. As others have mentioned, media files almost all use their own compression already, and they tend to take most of the space on computers, thus the compression aspect of archives is less-important. So all the other benefits become the main reason for archive files.
Thanks for this. Great video as always.
WOW, I now understand compression better than I did, but that may not be saying a lot. LOL. Compression, depression, maybe I should just keep my mouth zipped! Looking forward to your next video.