Epub files can also be renamed as zip then extracted. Or you can also just zip a bunch of pictures and save it as .CBZ (or CBR for RAR’ed files) and voila, you have a lossless container/wrapper for your image files that can be viewed with any comic viewer la Sumatra pdf.
The page ordering rules for cbz are a bit vague though, so you have to left-pad the page numbers with zeros to make sure you get a consistent ordering when viewing.
@c6amp It's the original picture given to users who didn't have a profile picture. When UA-cam decided that everyone needed to have a picture and that it was going to give generic letter-based pictures to anyone who didn't upload their own, I took a screenshot of the default image and uploaded it as my profile picture. Call it my own little protest against UA-cam imposing its will on the users. :) Since nobody else still has this image, it stands out when I'm looking through the comments.
Jar files are zip folders that hold special code for the JRE to compile at runtime. The structure is defined by the developer of the specific application. The only standard thing about it is simply that it needs a main.class in the root. Other than that, it's up to the dev to build out the program's file structure.
I think you meant manifest file that hold the metadata info of the jar like the main class if any, main class is only needed if you want that to run if double click Edit Added some corrections Also jre is for running (java.exe ) Jdk for compiling (javac.exe)
PK3 and PK4 files (used by games like quake 3, star wars Jedi knight Jedi academy, doom 3 and etc) are also just zip files. Wish more games would use standard data compression instead of having to use a tool to manually unpack game archives with a script.
Beyond obfuscation like has been mentioned, I think another big factor might be performance. Zip is somewhat old and may not be exactly optimal in terms of how fast you can get the compressed data off of your drive and into memory.
For the exe files, i reckon driver installers are basically always archives extractables with 7-zip. Which is really usefull to get the INF and SYS files alone and not deal with the crappy installers of hardware vendors
YEs they are and if you can figure out which folder it needs to go on the C: drive you can just drag and drop it from the archiver program into the folder to install it.
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file. PE files typically has the .text section (the program's actual machine code) and the .rsrc section (other data the program needs). But the .rsrc section can include other files, including valid Zip files. This is what I think self-extracting exes are extracting, and what Archiver programs look for. Finding none, they'll show you the PE structures instead.
@@mfaizsyahmi It is. Basically, when you are generating resources for an executable image, you do have the option of adding files to the resource section. So at build time you can use a resource script with a user defined resource file and add an archive to the resource section. Since this isn't useful for self extractors, since it would require you building the executable, it is also possible to edit the resources section. Windows has the UpdateResource function which facilitates this. A self extracting archive or an installer which has the files as part of the executable resources can either just enumerate the resources, or assume that a fixed resource ID will always be used.
@@mfaizsyahmi You can just attach Zip data to the end of EXE, there is no need to add it to any section. In fact, adding a big Zip to a section may be unwanted, as sections are loaded in RAM. Most self-extracting archives don't add the archive to sections. Zip format is special, because it has a footer instead of header - archivers read it from the end of the file. That's why it can coexist with many header-based formats.
It's actually a great way to design a file format though, I've done it myself with my own coding. It means you can easily edit the file with third party software, and use existing zip libraries to pack data in and out of the file, and you get compression on top of it. It's a smart way to work.
Ok... I cannot tell you how many times I need an image from a Word document, because some genius client decided to send me a word document instead of the original PNG or JPG file. And, until now, there was no easy way for me to get the *ORIGINAL* image out of that Word document... Now I know how to do that!! This is incredible!!
The reason you pack it in a zip file is because every file is compressed separately. This means you can have a metadata file inside the zip and just extract this (probably quite small) text file. With other compression formats you would have to decompress the whole (possibly very large) file first to check some small file within.
Archives where you have to extract everything are called solid archives, most formats have the option to create solid or non-solid archives so it isn't a particular advantage for zip files. Zip is just built into Windows so devs* don't have to worry about including an archiver with their programs (which is a bit of a problem for slow internet). The advantage of solid archives is that they can be made smaller but how much smaller depends on a lot of factors and the risk of corruption are far worse (if the archive is corrupted at all then all of the files within will likely be lost). Source: I messed around with solid RAR archives back in the day trying to save space on my EHDD but ended up losing a bunch of data because the archive got corrupted so I started using non-solid archives exclusively and found there was very little difference in size. Just to make sure I was remembering everything correctly I double checked and found that 7zip (the format I switched to in more recent years) uses a bit of a hybrid approach where the archives can have multiple solid blocks the size of which can be set. 7-zip also allows for the creation of completely non-solid archives. Since I've switched to internal SSDs and a NAS with high quality NAS HDDs and have multiple backups in general I'm not as worried about corruption as I used to be. *There's an interesting story behind that and Microsoft's Embrace, Extend, and Extinguish policy of the time.
Yeah, that’s in no way unique to Zip. If anything, I suspect that compression formats that require you to extract all the files at once are the exception, not the rule.
@@salamiwallnut as the name sugests a tar.gz is a file compressed with tar and after that compressed with gunzip, just a way to make the file smaller. it has became so standard in linux, that it has became just one binary who does both.
@@salamiwallnut I think he's referring to Arch pacman packages. However you are correct in pointing out that that is just a generic archive format, and that Arch pacman packages are quite literally zipped tar archives with no special file extension of any kind. As such, it wouldn't really fit in a video like this.
If you've ever used Scratch (the programming language), you might know that .SB3 and .SB2 files are also ZIPs. Extracting them actually has a use: checking the size of the project.json file inside (which is limited to 5MB). edit: correct error - .sb files aren't .zips
Great niche topic to cover. Most of those do indeed have file headers that begin with PK (Phil Katz). We can see this by opening in a hex editor like HxD, or even Notepad++ (if it's not massive in size). Many Windows executables begin with MZ (Mark Zbikowski) and can also be extracted. Thankfully 7zip makes it quick and easy to check.
Regarding opening the minecraft jar, the `assets` folder is in fact the exact same structure as a resource pack, and the `data` folder is the exact same structure as a datapack. We often recommend that people do exactly as shown in this video to find where they need to put files and what to put in them when they are making their own resource packs and datapacks! It's very handy
I think in the video you are showing an older version of the game, in newer versions a majority of the folders shown in the video are nested inside the `data` and `assets` folders that I mentioned in my above comment
Ah yes, opening files with a file extractor. I always do this with Android APK files whenever I want to extract some fonts or *assets* files used in that app. I never knew you can also open Office files the same way, this is helpful.
Many years ago (I won't say how many lest I age myself), an indie game I loved that had a level editor had the fan community asking the dev to support custom player skins in the created levels. I looked into how hard or easy that would be, and that's when I first discovered how incredibly useful it is to just package data, images, etc. into a ZIP file and change the extension to make it look fancier 😂 It's nice to know even the big companies do the same thing and it wasn't a dumb novice-dev idea of mine.
@@IceMetalPunk Not exactly. Like Joe said in the video, changing the extension helps the OS (mainly Microsoft Windows) to open the file with a specific program (because Windows depends on the extension, not the file content). Then, how will Windows know if that .zip file belongs to Adobe Illustrator or Java Virtual Machine (Minecraft)? That's the point.
I've been watching your videos for years. Most of the time, your videos are very interesting and entertaining. And most of the time I pretty much know the information that you are covering or I am aware of it. In this video, there is a first. I never knew that all of these files were actually zip or compressed files. Keep up the good work. I always enjoy your videos.
Yeah same, only with ISO files. Because archiver programs seem to associate files that don't appear to be archive files until you look for yourself. I already knew jar files were a form of archive files because it literally has archive in its name. And also from winrar associating jar files as an archive file.
As a Linux user, I've been so confused as to why .docx and other ms office xml formats have archive manager assigned as default or one of the programs to open it. I suppose it's because the format isn't native or foss to be properly incorporated that file magic can only recognize the .zip aspect of it.
Apple platforms like iOS and macOS also hide folders like that. If you've ever installed anything on a Mac, you might know that the app comes in a .dmg file (similar to .iso) and then you're supposed to drag the [AppName].app file from there to your applications folder and run it by double clicking like an .exe. But it's not actually a file, as if you extract a DMG on Windows, the .app shows up as a folder. Inside, there is a directory called "Contents" and in it are a bunch of resource files alongside a "MacOS" directory that has the main executable.
I’m a Java developer and being able to confirm that the proper files are stored in the jar is very useful. It is also useful to be able to decompile Java .class files specifically.
If you're using something like 7-Zip, you don't need to rename the file. Changing the extension simply tells the OS's built-in archiving software to decompress the file.
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file. If you think about it, they can't be zip files because then how would a zip program (meaning the program that zips/unzips zip files) run? How would windows even exist before zip files were invented?
A file is a bunch of information stored in a certain format, an archive is also a bunch of information stored in a certain format, perfectly reasonable to be interchangable.
You can also open executable files (EXE) in a file archiver using a program such as 7zip or GNOME Archive Manager and the EXE file includes the relevant files necessary for the file to work such as app icons, version info and some text files will be filled with text that is used in the program. I should also add that you don't need to change the file type in Windows, instead just open the folder path in 7zip and right-click on the file and click on "open inside" (does not appear on the Linux snap version of p7zip)... I may also add that most file types can be opened in any text editor to view the same information e.g. an SVG Vector file can be opened in Notepad or Kate text editor and you view the coordinates of the file information...
An important thing about EXE/DLL is that they're not ZIP files, they're PE files, totally different specs. But any archiver program worth their salt understands the PE format as well as a ZIP format.
ThioJoe, thank you for this youtube channel. I love this youtube channel. Thank you for making this youtube channel about computers and the windows operating system. I've used the Windows operating system for years. I've essentially grown up with Windows as pretty much 99.999% of the whole world also has since Windows is pretty much the most used operating system of the world (the other two popular operating systems: macOS and Linux). Anyways, thank you for this video! I never knew that and was interesting to learn that Word documents are actually just .ZIP files in disguise. The More You Know! 😍😍
archiving programs are agnostic to file endings you don't need to rename the files to open any of these, just right-click it and use the context menu to let the archiving program handle the file in a way you want....or open directly from the files menu within the archiving program, just select "all files" in the file opening dialog, so you can see them in the selection window.
Jar files are not Minecraft specific neither are they just for severs, the normal game on your local pc also uses them to store the game, and they are used for mods. They aren't just zip files but they behave like zip files, they are actually Executables for Java Programms/Installers where the class files are stored in a big archive to 1. Save Disk Space and 2. have less files laying around in an easy to access location also you don't even have to add the .zip extension, you can just extract the data using winzip, 7zip, winrar or any other decompression programm that supports them. but i still find your videos extremely good!
A few years ago I was doing this to Firefox extensions, so I could go into one of the files a bump up the the highest version number the extension was compatible with for extensions that were no longer being supported.
Comic book files like cbz, cbr and cb7 are basically this, just an archive full of pictures, mostly jpegs or pngs, but I've seen webp in some, and the only difference is whatever they are using ZIP, RAR or 7Z for compression.
Most of driver installation .exe files can be opened this way if you have problem installing driver incompatible with your version of windows. Extract, find .ini files, install manually, most of the times works.
6:07 this version of minecraft is pretty old. In newer versions all the image data is grouped in 1 top level assets folder, which has the same structure as a texture pack. It's basically the default texture pack.
Wow that’s super useful I was literally wondering how to do this especially extracting images from the word document definitely going to keep this in mind thanks
On Macs, lot of files you wouldn't expect are actually package files, but really just folders with more stuff inside. I took a MainStage concert file as an example and renamed it to .zip and it just turned into a folder, letting me see the contents. Applications are package folders as well, which is why you could just open that .app iOS app.
Honestly, packages are a brilliant concept that solves so many common problems. But since none of the other major OSes have any equivalent concept for bundling a folder into an opaque, pseudo-monolithic file, using a zip archive is a great workaround that basically achieves the same thing.
Re: 5:30 - Yes, the file called “minecraftpe” with no extension *is* the main executable in the case of the iOS app. It’s in the Mach-O format (just like on macOS) which never has an extension. Neither does ELF (the main Linux binary format) for that matter - only in Windows do executable files have any extension at all.
Jobs wasn't joking when he said the iPhone runs OS X - with the transition to Apple Silicon on the Mac, they even allow running iOS apps on macOS. I would assume macOS and iOS share as much technology and APIs as possible, which does show, given how optimized they are.
@@XeZrunner And both are direct descendants of NeXTStep. The .app format (itself, just a folder, not even zipped), came from that platform, and some of the utilities in MacOS are (or started out as) direct ports of ones included on the NeXT computers.
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file. If you think about it, they can't be zip files because then how would a zip program (meaning the program that zips/unzips zip files) run? How would windows even exist before zip files were invented?
Source Engine map files called Binary Space Partition or just ".bsp" files can be open with 7zip even though they are not zip files. It is very useful to see the content baked into the bsp files this way.
Suggestion: Could you also explain (as in possibly a new video) .rar, .7z, .tar.gz, tar.xz and .tar.bz2 and how are they different from .zip? I think you are one of the few people who can do it well
I can answer for the tar variants: tar once stood for "tape archive", and was a way to just pack a number of small files together to store on magnetic tape, back when Unix was king. These use no compression. The .gz, .xz, and .bz2 extensions are just different compression methods applied to a standard tar file, so unlike zip, the whole file up to and including the files you are looking to extract has to be run through the uncompressor first. Zip stores a list of the files and their positions in the zip file at the end of the zip file, and each file is compressed separately, often using different compression algorithms.
More than a decade ago I opened an iso file WinRAR by mistake and was surprised to find all the contents of the disk, since then I always try to do this first if I need to get some asset first instead of searching for an specific unpacker.
As a programmer i knew that some of the files are archives, like .jar .exe .ipa .apk .msi. Bit the .docx or .odt or any of the office files were mind blowing
BTW, JAR files are not just for servers. They can be run on client versions of Windows, macOS, and Linux, provided the correct version of the Java Runtime Environment is installed. Also, the only folder you're guaranteed to see in a JAR file is the META-INF folder.
Why change extension? Right click and select "Open with" and select WinZip, WinRAR, 7-Zip or what application you use for un/compressing files. Make sure "Always use this app to open" is unselected because you don't want to associate it with the wrong application.
maybe someone has pointed this out already, but .ora, a layered image format like .psd, used by GIMP, Krita and some other open-source image editing software (MyPaint is the one I have on my raspberry pi, since it can't run Krita, and it uses .ora) is also just a .zip in disguise! a friend of mine took advantage of this fact to use it for sprites for a game she made. .kra is Krita's modified version of .ora and also can be opened with an archiver.
The some of the most popular file formats for digital comic books are .CRZ and .CBR. These are just Zip and Rar files with sequentially numbered jpeg images.
They aren't really standards so much as loose agreements. You even see some with .webp images in now, because it provides much better compression - though to the annoyance of some, because not all viewers support the new format.
*.PK3 is another zip file in disguise. It's mainly used for modding. particularly DOOM/Heretic/Hexen modding. It contains assets (sprites/graphics/sounds) and code inside various code lump types (ACS/DECORATE/ZScript/TEXTURE/MAPINFO/etc).
The Office Files are actually really cool. I had a scenario where a department I was working with put data in word docs. To extract it you can write a script to make it a zip file, then parse the XML for the data, & return it. Super useful!
Already knew about jars, didn't know about anything else, this is interesting. Browsing Minecraft's jar for resources is super useful actually, looking at their assets is a great way to learn how to make loot tables or texture packs etc. Or if you just want one of their in game textures.
i knew this for a while, but it took me years to figure out what this imply.. and what it does? well, as a programmer its dirty easy to create those files, you can just pick an empty file as template, modify it and generate documents on demand, if you dont know how to do something specific you can use the software to make what you want and see what changed in those files. looking at minecraft, that also explain why its so easy to mod.
I was kind of mindblowing when I first discovered them about document files but it makes sense when you realize the X in DOCX means XML. Modern office documents are basically similar to webpages which run from a folder of HTML, CSS, and script files. In fact, EPUB literally runs on HTML and CSS as seen in the video!
Phil Katz met an untimely demise, but his ZIP file specs lives on, even flourishing, as the go-to data container. Also, the setup.exe is being opened *not as a zip file, but as the Portable Executable format* itself, which is the executable format used by Windows. That's because in every PE container, there's a ".text" section, where the machine code actually is. ".reloc" must have been the address reallocation table that places the machine code and data to predefined places in RAM, so machine code can know where everything it needs relatively is. Then ".rsrc" must be the resources used by the executable, which itself is generally separated into sections for icons, icon groups, GUI, text, images, etc. The appearance of a folder structure in the PE container is the archiver program you're opening it with interpreting and remapping the PE structures to be user friendly. That might be the case with the msi file too.
(1:00) You should definitely uncheck hide file extensions, for security reasons. This is because people send files such as "photo.jpg.exe" which then shows up as "photo.jpg". Most people will not care that the .jpg is visible and find it strange, they'll just register that as a photo and will try to open it by running it, and now they're running malicious software. But with visible file extensions, you'll see it being photo.jpg.exe and now you can see that it's not a photo easily, and should definitely not run it.
.pk3 files for GZDOOM are also very similar to .zip files. I don’t think they can be converted just by renaming them but they have a very similar structure and can both be opened in the SLADE3 editor for .wad and .pk3 files.
Really neat trick. Might use that for an application I'm working on, just so that it's easier for the OS to link the file type to the program. And I'm totally gonna try that out on other files, see what comes up.
Office and LibreOffice file format are basically a collection of standardized XML files. XML files are sort of like HTML but with custom tags and structure. Due to XML files having so many repetitive keywords (like tags), ZIP-ing it make sense thus creating smaller file. Old Document format like 95-2003 Microsoft Office create a pure condensed binary document. Nobody outside Microsoft knows how to make that format precisely as Microsoft Office does, other apps have to reverse-engineer the binary format with mixed result (such as, inconsistent formatting). Even when Microsoft does actually open the specification, there is still "outside-of-specification" feature that are not documented properly. With XML based specification, at least someone can see a clear text tag and property of document even if there are actual "outside-of-specification" features implemented. Reverse-engineering it will be much easier. The only headache is that Microsoft will likely to continue doing "out-of-spec" thing with each new version of Office 365 that will make it hard for other office apps to read Office XML file properly with consistent format.
The difficulty was never really with the binary format as such; it’d been reverse-engineered long before the XML versions, and MS even had documentation for the binary container format itself. The problem is that to open a document correctly, you have to faithfully recreate the entire document object model, every feature, _and every bug_ in the Microsoft programs. The object model of a MS Office document is the same whether it’s in a binary or XML container. For example, Excel spreadsheets can use one of two date epochs (1900 or 1904). The 1904 epoch originated in Excel for Mac (which predates Excel on PC!!), while the 1900 epoch originated on the PC in Lotus 1-2-3. For interoperability, Excel for Mac and Windows have both long supported both epochs. But because Lotus 1-2-3 had some bugs in its date handling code, Excel has to mimic those bugs because otherwise spreadsheets that originated on Lotus 1-2-3 would break. (Lotus was the existing market leader when Excel came out, so compatibility with it was paramount.) This means that Excel spreadsheets with the 1900 epoch behave subtly differently than 1904 ones, so any competing spreadsheet has to not only recognize the different epoch, but clone the differing behavior. (To this day, in Excel document properties, you can set the 1904 epoch, which is sometimes necessary to get certain date calculations to work properly.) Similarly, if you want to open a Word document and have the formatting match up precisely, it means you have to recreate every feature that’s represented in the Word object model, and implement it identically. This is the real reason alternative office suites still struggle with round-trip compatibility, especially with complex Word documents. They don’t support the same features as Word, and/or implement them differently, so the result is inaccurate. Reading the files is not the challenge.
i can't thank you enough,,am 72 and just when i was going to toss this out the window ,,i chanceon to your website.And what a difference you explain in simple (72 year old )language and when i did have questions you actually took the time to answer me .Thank you again
Setup.exe is actually an executable file and a zip file glued together. The executable is the installer program, while the zip contains the software to be installed. When reading a .exe file to load and run it, Windows starts reading the file at the beginning. A .zip file, on the other hand, has its table of contents at the end. The idea was the compression software would compress the files and write out where each one is in the zip once it’s finished. As far as Windows is concerned, Setup.exe is an executable with a very large data segment at the end. As far as any compression software is concerned, Setup.exe is a zip file with a bunch of garbage data at the beginning that isn’t referenced in any of the table of contents entries.
another benefit of using zip files by programs is compression, which makes the file size smaller and easier for transfer. some of them add password protection.
The best kept secret in the world 🤫
😂
Sure 😅
No one will ever know (besides us)! 😉😏
Shhhh. I wont tell anyone, joe, trust me
Btw wow im early
Epub files can also be renamed as zip then extracted. Or you can also just zip a bunch of pictures and save it as .CBZ (or CBR for RAR’ed files) and voila, you have a lossless container/wrapper for your image files that can be viewed with any comic viewer la Sumatra pdf.
Or if you are using a decent archiving program(like 7zip) you can just extract the data of any container, so no need to change file extensions.
The page ordering rules for cbz are a bit vague though, so you have to left-pad the page numbers with zeros to make sure you get a consistent ordering when viewing.
@@vylbird8014 Numbered files that AREN'T left-padded with zeros are an abomination! :)
@c6amp It's the original picture given to users who didn't have a profile picture. When UA-cam decided that everyone needed to have a picture and that it was going to give generic letter-based pictures to anyone who didn't upload their own, I took a screenshot of the default image and uploaded it as my profile picture. Call it my own little protest against UA-cam imposing its will on the users. :)
Since nobody else still has this image, it stands out when I'm looking through the comments.
@@cronchcrunchbtw 7z format that developed by 7zip. It also can be save as cb7.
Not only interesting, but very useful. Especially for pulling pictures and charts out of Office documents quickly or en masse.
Jar files are zip folders that hold special code for the JRE to compile at runtime. The structure is defined by the developer of the specific application. The only standard thing about it is simply that it needs a main.class in the root. Other than that, it's up to the dev to build out the program's file structure.
I think you meant manifest file that hold the metadata info of the jar like the main class if any, main class is only needed if you want that to run if double click
Edit
Added some corrections
Also jre is for running (java.exe )
Jdk for compiling (javac.exe)
i don't think the jre does any compiling at runtime, it just acts as a platform to run the bytecode generated by javac
Yup. Minecraft's classes look like that because it was obfuscated to prevent people from just opening it up and stealing the code to make a ripoff.
Comic book reader cbr files are also zip files.
Any java decompiler can decompile the class files and apk files can be decompiled too.
PK3 and PK4 files (used by games like quake 3, star wars Jedi knight Jedi academy, doom 3 and etc) are also just zip files. Wish more games would use standard data compression instead of having to use a tool to manually unpack game archives with a script.
Pk3s are also used in srb2/modern doom source ports
The reason why they don't do this it's simple: they just don't want you to do it!
Sorry but you would actually not really like it
Beyond obfuscation like has been mentioned, I think another big factor might be performance. Zip is somewhat old and may not be exactly optimal in terms of how fast you can get the compressed data off of your drive and into memory.
For the exe files, i reckon driver installers are basically always archives extractables with 7-zip. Which is really usefull to get the INF and SYS files alone and not deal with the crappy installers of hardware vendors
Damn, that's a live pro tipp
YEs they are and if you can figure out which folder it needs to go on the C: drive you can just drag and drop it from the archiver program into the folder to install it.
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file.
PE files typically has the .text section (the program's actual machine code) and the .rsrc section (other data the program needs).
But the .rsrc section can include other files, including valid Zip files. This is what I think self-extracting exes are extracting, and what Archiver programs look for. Finding none, they'll show you the PE structures instead.
@@mfaizsyahmi It is. Basically, when you are generating resources for an executable image, you do have the option of adding files to the resource section. So at build time you can use a resource script with a user defined resource file and add an archive to the resource section.
Since this isn't useful for self extractors, since it would require you building the executable, it is also possible to edit the resources section. Windows has the UpdateResource function which facilitates this. A self extracting archive or an installer which has the files as part of the executable resources can either just enumerate the resources, or assume that a fixed resource ID will always be used.
@@mfaizsyahmi You can just attach Zip data to the end of EXE, there is no need to add it to any section. In fact, adding a big Zip to a section may be unwanted, as sections are loaded in RAM. Most self-extracting archives don't add the archive to sections.
Zip format is special, because it has a footer instead of header - archivers read it from the end of the file. That's why it can coexist with many header-based formats.
It's actually a great way to design a file format though, I've done it myself with my own coding. It means you can easily edit the file with third party software, and use existing zip libraries to pack data in and out of the file, and you get compression on top of it. It's a smart way to work.
Ok... I cannot tell you how many times I need an image from a Word document, because some genius client decided to send me a word document instead of the original PNG or JPG file. And, until now, there was no easy way for me to get the *ORIGINAL* image out of that Word document... Now I know how to do that!! This is incredible!!
The reason you pack it in a zip file is because every file is compressed separately. This means you can have a metadata file inside the zip and just extract this (probably quite small) text file. With other compression formats you would have to decompress the whole (possibly very large) file first to check some small file within.
Archives where you have to extract everything are called solid archives, most formats have the option to create solid or non-solid archives so it isn't a particular advantage for zip files. Zip is just built into Windows so devs* don't have to worry about including an archiver with their programs (which is a bit of a problem for slow internet). The advantage of solid archives is that they can be made smaller but how much smaller depends on a lot of factors and the risk of corruption are far worse (if the archive is corrupted at all then all of the files within will likely be lost).
Source: I messed around with solid RAR archives back in the day trying to save space on my EHDD but ended up losing a bunch of data because the archive got corrupted so I started using non-solid archives exclusively and found there was very little difference in size.
Just to make sure I was remembering everything correctly I double checked and found that 7zip (the format I switched to in more recent years) uses a bit of a hybrid approach where the archives can have multiple solid blocks the size of which can be set. 7-zip also allows for the creation of completely non-solid archives. Since I've switched to internal SSDs and a NAS with high quality NAS HDDs and have multiple backups in general I'm not as worried about corruption as I used to be.
*There's an interesting story behind that and Microsoft's Embrace, Extend, and Extinguish policy of the time.
Yeah, that’s in no way unique to Zip. If anything, I suspect that compression formats that require you to extract all the files at once are the exception, not the rule.
I love how Joe uses AI generated images now instead of stock photos 😂
I'll use a mix of both
@@ThioJoe That's good to know.
Ok...
I mean, it's basically no work. just a minute or two to make one that works and you're done! no copyright or anything to worry about
@@ThioJoe which AI generator do you use?
Another example is Apple iPSW files, those are the firmware files for all of Apple’s firmwares. Those are secretly ZIP files as well.
I'd love to see you go over files such as .DEB (Debian packages) and AppImage/Flatpak/Snap files as well!
And .rpm
Hello Again ._.
@@salamiwallnut as the name sugests a tar.gz is a file compressed with tar and after that compressed with gunzip, just a way to make the file smaller.
it has became so standard in linux, that it has became just one binary who does both.
wanted to comment that too, i ended up with one on a non-debian system recently
@@salamiwallnut I think he's referring to Arch pacman packages. However you are correct in pointing out that that is just a generic archive format, and that Arch pacman packages are quite literally zipped tar archives with no special file extension of any kind. As such, it wouldn't really fit in a video like this.
If you've ever used Scratch (the programming language), you might know that .SB3 and .SB2 files are also ZIPs. Extracting them actually has a use: checking the size of the project.json file inside (which is limited to 5MB).
edit: correct error - .sb files aren't .zips
.sb files are not zips afaik, only sb2/3s.
wait .sb2 aswell? didn't know
Great niche topic to cover. Most of those do indeed have file headers that begin with PK (Phil Katz). We can see this by opening in a hex editor like HxD, or even Notepad++ (if it's not massive in size). Many Windows executables begin with MZ (Mark Zbikowski) and can also be extracted. Thankfully 7zip makes it quick and easy to check.
Regarding opening the minecraft jar, the `assets` folder is in fact the exact same structure as a resource pack, and the `data` folder is the exact same structure as a datapack.
We often recommend that people do exactly as shown in this video to find where they need to put files and what to put in them when they are making their own resource packs and datapacks!
It's very handy
I think in the video you are showing an older version of the game, in newer versions a majority of the folders shown in the video are nested inside the `data` and `assets` folders that I mentioned in my above comment
Ah yes, opening files with a file extractor. I always do this with Android APK files whenever I want to extract some fonts or *assets* files used in that app. I never knew you can also open Office files the same way, this is helpful.
Many years ago (I won't say how many lest I age myself), an indie game I loved that had a level editor had the fan community asking the dev to support custom player skins in the created levels. I looked into how hard or easy that would be, and that's when I first discovered how incredibly useful it is to just package data, images, etc. into a ZIP file and change the extension to make it look fancier 😂 It's nice to know even the big companies do the same thing and it wasn't a dumb novice-dev idea of mine.
That (---) was really unnecessary qnd like no one cares how many years ago.
It's not to be fancy, it's to bundle all the different things needed to get things to work to come in a single file, arranged in a certain way.
@@Gunz1234 Your entire reply was unnecessary 🤷♂
@@mfaizsyahmi Oh, no, I meant changing the file extension makes it look fancier than just naming it with .zip 🙂
@@IceMetalPunk Not exactly. Like Joe said in the video, changing the extension helps the OS (mainly Microsoft Windows) to open the file with a specific program (because Windows depends on the extension, not the file content). Then, how will Windows know if that .zip file belongs to Adobe Illustrator or Java Virtual Machine (Minecraft)? That's the point.
I've been watching your videos for years. Most of the time, your videos are very interesting and entertaining. And most of the time I pretty much know the information that you are covering or I am aware of it. In this video, there is a first. I never knew that all of these files were actually zip or compressed files. Keep up the good work. I always enjoy your videos.
Now I know why I can open .jar files with 7zip or winrar. It makes sense
Yeah same, only with ISO files. Because archiver programs seem to associate files that don't appear to be archive files until you look for yourself.
I already knew jar files were a form of archive files because it literally has archive in its name. And also from winrar associating jar files as an archive file.
As a Linux user, I've been so confused as to why .docx and other ms office xml formats have archive manager assigned as default or one of the programs to open it. I suppose it's because the format isn't native or foss to be properly incorporated that file magic can only recognize the .zip aspect of it.
Apple platforms like iOS and macOS also hide folders like that. If you've ever installed anything on a Mac, you might know that the app comes in a .dmg file (similar to .iso) and then you're supposed to drag the [AppName].app file from there to your applications folder and run it by double clicking like an .exe. But it's not actually a file, as if you extract a DMG on Windows, the .app shows up as a folder. Inside, there is a directory called "Contents" and in it are a bunch of resource files alongside a "MacOS" directory that has the main executable.
I’m a Java developer and being able to confirm that the proper files are stored in the jar is very useful. It is also useful to be able to decompile Java .class files specifically.
If you're using something like 7-Zip, you don't need to rename the file. Changing the extension simply tells the OS's built-in archiving software to decompress the file.
Interesting. I just knew before about that APKs, ISOs, JARs and some EXEs are Zips. The rest was new for me
Same.
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file.
If you think about it, they can't be zip files because then how would a zip program (meaning the program that zips/unzips zip files) run? How would windows even exist before zip files were invented?
If you are into 3D printing, the 3mf format used by many slicers to store objects, print settings, etc are also zip files.
A file is a bunch of information stored in a certain format, an archive is also a bunch of information stored in a certain format, perfectly reasonable to be interchangable.
You can also open executable files (EXE) in a file archiver using a program such as 7zip or GNOME Archive Manager and the EXE file includes the relevant files necessary for the file to work such as app icons, version info and some text files will be filled with text that is used in the program.
I should also add that you don't need to change the file type in Windows, instead just open the folder path in 7zip and right-click on the file and click on "open inside" (does not appear on the Linux snap version of p7zip)...
I may also add that most file types can be opened in any text editor to view the same information e.g. an SVG Vector file can be opened in Notepad or Kate text editor and you view the coordinates of the file information...
An important thing about EXE/DLL is that they're not ZIP files, they're PE files, totally different specs. But any archiver program worth their salt understands the PE format as well as a ZIP format.
@@mfaizsyahmi What these archivers show is the section headers for almost everything, and the resource directory for the resource section.
Awesome educational video on something I had never thought of before. More stuff like this please!
I knew about a few of these but never thought that DOCX was also zip hahaha amazing
7:03
*The beam that it sucks in
Lmao- great video, super interesting!
ThioJoe, thank you for this youtube channel. I love this youtube channel. Thank you for making this youtube channel about computers and the windows operating system. I've used the Windows operating system for years. I've essentially grown up with Windows as pretty much 99.999% of the whole world also has since Windows is pretty much the most used operating system of the world (the other two popular operating systems: macOS and Linux). Anyways, thank you for this video! I never knew that and was interesting to learn that Word documents are actually just .ZIP files in disguise. The More You Know! 😍😍
.cbr and .cbz files used for comics are also just zip files of numbered jpegs (and sometimes a metadata xml file)!
Thank you very much.
I've been searching for something like this for a long time.
archiving programs are agnostic to file endings you don't need to rename the files to open any of these, just right-click it and use the context menu to let the archiving program handle the file in a way you want....or open directly from the files menu within the archiving program, just select "all files" in the file opening dialog, so you can see them in the selection window.
Jar files are not Minecraft specific neither are they just for severs, the normal game on your local pc also uses them to store the game, and they are used for mods. They aren't just zip files but they behave like zip files, they are actually Executables for Java Programms/Installers where the class files are stored in a big archive to 1. Save Disk Space and 2. have less files laying around in an easy to access location also you don't even have to add the .zip extension, you can just extract the data using winzip, 7zip, winrar or any other decompression programm that supports them.
but i still find your videos extremely good!
I have realised that if you open it up, every files date & time inside of it is 1/1/1980 12:00AM
Example: 2:22
A few years ago I was doing this to Firefox extensions, so I could go into one of the files a bump up the the highest version number the extension was compatible with for extensions that were no longer being supported.
Hahaha! Nice trick
This is super cool, now I'm gonna look through every single file on my phone and computer and see what changes I can make to them
Several feature phone theme files are also archives. .nth for Nokia are just ZIP files, and .thm for Sony Ericsson are TAR files.
Comic book files like cbz, cbr and cb7 are basically this, just an archive full of pictures, mostly jpegs or pngs, but I've seen webp in some, and the only difference is whatever they are using ZIP, RAR or 7Z for compression.
Most of driver installation .exe files can be opened this way if you have problem installing driver incompatible with your version of windows. Extract, find .ini files, install manually, most of the times works.
6:07 this version of minecraft is pretty old. In newer versions all the image data is grouped in 1 top level assets folder, which has the same structure as a texture pack. It's basically the default texture pack.
Wow that’s super useful I was literally wondering how to do this especially extracting images from the word document definitely going to keep this in mind thanks
Not sure if this was already mentioned, but Google Earth and several mapping & GIS programs make use of “.KMZ” files, which are also zip archives.
Joe explaining the minecraft source is like learning French on a English class
About office files, I've known and used it for a long time :D I've been extracting raw images from ppt presentations for years - super useful
Cool, didn't know office files were also structured as zip
On Macs, lot of files you wouldn't expect are actually package files, but really just folders with more stuff inside. I took a MainStage concert file as an example and renamed it to .zip and it just turned into a folder, letting me see the contents. Applications are package folders as well, which is why you could just open that .app iOS app.
Honestly, packages are a brilliant concept that solves so many common problems. But since none of the other major OSes have any equivalent concept for bundling a folder into an opaque, pseudo-monolithic file, using a zip archive is a great workaround that basically achieves the same thing.
Re: 5:30 - Yes, the file called “minecraftpe” with no extension *is* the main executable in the case of the iOS app. It’s in the Mach-O format (just like on macOS) which never has an extension. Neither does ELF (the main Linux binary format) for that matter - only in Windows do executable files have any extension at all.
Jobs wasn't joking when he said the iPhone runs OS X - with the transition to Apple Silicon on the Mac, they even allow running iOS apps on macOS.
I would assume macOS and iOS share as much technology and APIs as possible, which does show, given how optimized they are.
@@XeZrunner And both are direct descendants of NeXTStep. The .app format (itself, just a folder, not even zipped), came from that platform, and some of the utilities in MacOS are (or started out as) direct ports of ones included on the NeXT computers.
I never figured that all these files were just zip files. This sort of thing is exactly why I watch your videos
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file.
If you think about it, they can't be zip files because then how would a zip program (meaning the program that zips/unzips zip files) run? How would windows even exist before zip files were invented?
.xpi files and .crx files can be opened in 7-zip, these types of files are firefox and chrome extension files respectively
Source Engine map files called Binary Space Partition or just ".bsp" files can be open with 7zip even though they are not zip files. It is very useful to see the content baked into the bsp files this way.
Very cool that you got official video from a 1980 timetraveler for the Microsoft portion
The "award winning photograph" is so funny 😂
Lol yep
Suggestion: Could you also explain (as in possibly a new video) .rar, .7z, .tar.gz, tar.xz and .tar.bz2 and how are they different from .zip? I think you are one of the few people who can do it well
I can answer for the tar variants: tar once stood for "tape archive", and was a way to just pack a number of small files together to store on magnetic tape, back when Unix was king. These use no compression. The .gz, .xz, and .bz2 extensions are just different compression methods applied to a standard tar file, so unlike zip, the whole file up to and including the files you are looking to extract has to be run through the uncompressor first. Zip stores a list of the files and their positions in the zip file at the end of the zip file, and each file is compressed separately, often using different compression algorithms.
Fun fact: IPSW files (Used for iDevice restores) are also hidden ZIPs.
More than a decade ago I opened an iso file WinRAR by mistake and was surprised to find all the contents of the disk, since then I always try to do this first if I need to get some asset first instead of searching for an specific unpacker.
As a programmer i knew that some of the files are archives, like .jar .exe .ipa .apk .msi. Bit the .docx or .odt or any of the office files were mind blowing
BTW, JAR files are not just for servers. They can be run on client versions of Windows, macOS, and Linux, provided the correct version of the Java Runtime Environment is installed. Also, the only folder you're guaranteed to see in a JAR file is the META-INF folder.
That illustrator folder thing blows my mind
Fascinating, thank you so much. Very useful for converting file x to file y.
Really interesting. Great job 👍
Fun fact - MediaMonkey's addon install packages (.mmip) are also just zip files
You can teach me the whole day about computers and I'll never get bored.
I miss him ever since when the triple the internet video was uploaded. Man those times 😞
@@TrojanLube69 I've got extra 64 gigs of RAM thanks to him.
Legendary UA-camr
@@just.nobody yeah, no way you can download that amount anymore 😭😩
Why change extension? Right click and select "Open with" and select WinZip, WinRAR, 7-Zip or what application you use for un/compressing files. Make sure "Always use this app to open" is unselected because you don't want to associate it with the wrong application.
maybe someone has pointed this out already, but .ora, a layered image format like .psd, used by GIMP, Krita and some other open-source image editing software (MyPaint is the one I have on my raspberry pi, since it can't run Krita, and it uses .ora) is also just a .zip in disguise! a friend of mine took advantage of this fact to use it for sprites for a game she made. .kra is Krita's modified version of .ora and also can be opened with an archiver.
The some of the most popular file formats for digital comic books are .CRZ and .CBR. These are just Zip and Rar files with sequentially numbered jpeg images.
They aren't really standards so much as loose agreements. You even see some with .webp images in now, because it provides much better compression - though to the annoyance of some, because not all viewers support the new format.
*.PK3 is another zip file in disguise. It's mainly used for modding. particularly DOOM/Heretic/Hexen modding. It contains assets (sprites/graphics/sounds) and code inside various code lump types (ACS/DECORATE/ZScript/TEXTURE/MAPINFO/etc).
quake engine derivative moment
Good topic and share, thank you.
The Office Files are actually really cool. I had a scenario where a department I was working with put data in word docs. To extract it you can write a script to make it a zip file, then parse the XML for the data, & return it. Super useful!
ThioJoe I love the way you explain Tech stuff on your channel!! Keep it up!!!!!!
If you have 7zip installed, you don't even need to change the extension. You can right click and open it anyways
I only knew about epub (since I work with them as a hobby) and cbz (it’s in the name). Didn’t know about the rest, neat.
Can you put malware inside those containers, like a .doc file, and have it execute automatically when the doc file is opened?
Already knew about jars, didn't know about anything else, this is interesting. Browsing Minecraft's jar for resources is super useful actually, looking at their assets is a great way to learn how to make loot tables or texture packs etc. Or if you just want one of their in game textures.
WinAmp "Classic" Skins used zip as the container. 7-zip is the first tool I get on a fresh install.
I learned this a couple years ago and I was blown away 😂 told all my developer friends that it's all just zipper XML these days
i knew this for a while, but it took me years to figure out what this imply..
and what it does? well, as a programmer its dirty easy to create those files, you can just pick an empty file as template, modify it and generate documents on demand, if you dont know how to do something specific you can use the software to make what you want and see what changed in those files.
looking at minecraft, that also explain why its so easy to mod.
As a Minecraft expert, I can assure you that you are using very old versions for the two Minecraft folders you are looking at
I did know about the APK files but not about the other files, nice vid as always
Great Video as always! 👍
I was kind of mindblowing when I first discovered them about document files but it makes sense when you realize the X in DOCX means XML. Modern office documents are basically similar to webpages which run from a folder of HTML, CSS, and script files. In fact, EPUB literally runs on HTML and CSS as seen in the video!
This might be a bit obscure for most people, but I also recently found that MATLAB also uses these for their UI applications on the .mlapp format
Phil Katz met an untimely demise, but his ZIP file specs lives on, even flourishing, as the go-to data container.
Also, the setup.exe is being opened *not as a zip file, but as the Portable Executable format* itself, which is the executable format used by Windows. That's because in every PE container, there's a ".text" section, where the machine code actually is. ".reloc" must have been the address reallocation table that places the machine code and data to predefined places in RAM, so machine code can know where everything it needs relatively is. Then ".rsrc" must be the resources used by the executable, which itself is generally separated into sections for icons, icon groups, GUI, text, images, etc. The appearance of a folder structure in the PE container is the archiver program you're opening it with interpreting and remapping the PE structures to be user friendly.
That might be the case with the msi file too.
Exactly my thoughts as soon as I saw that `.text` section
Windows installer (.msi) files are databases. So what is being shown there are the database tables.
I remember converting .jar files to .zip files on my old 2012 mac when I was modding minecraft.
sometimes I just like to open files with 7zip, doesnt normally give much info but it's fun
(1:00) You should definitely uncheck hide file extensions, for security reasons.
This is because people send files such as "photo.jpg.exe" which then shows up as "photo.jpg". Most people will not care that the .jpg is visible and find it strange, they'll just register that as a photo and will try to open it by running it, and now they're running malicious software.
But with visible file extensions, you'll see it being photo.jpg.exe and now you can see that it's not a photo easily, and should definitely not run it.
2:08 wait, HOW ARE THOSE DOCUMENTS MADE IN 1980?
.pk3 files for GZDOOM are also very similar to .zip files. I don’t think they can be converted just by renaming them but they have a very similar structure and can both be opened in the SLADE3 editor for .wad and .pk3 files.
Really neat trick. Might use that for an application I'm working on, just so that it's easier for the OS to link the file type to the program.
And I'm totally gonna try that out on other files, see what comes up.
Office and LibreOffice file format are basically a collection of standardized XML files. XML files are sort of like HTML but with custom tags and structure. Due to XML files having so many repetitive keywords (like tags), ZIP-ing it make sense thus creating smaller file.
Old Document format like 95-2003 Microsoft Office create a pure condensed binary document. Nobody outside Microsoft knows how to make that format precisely as Microsoft Office does, other apps have to reverse-engineer the binary format with mixed result (such as, inconsistent formatting). Even when Microsoft does actually open the specification, there is still "outside-of-specification" feature that are not documented properly.
With XML based specification, at least someone can see a clear text tag and property of document even if there are actual "outside-of-specification" features implemented. Reverse-engineering it will be much easier. The only headache is that Microsoft will likely to continue doing "out-of-spec" thing with each new version of Office 365 that will make it hard for other office apps to read Office XML file properly with consistent format.
The difficulty was never really with the binary format as such; it’d been reverse-engineered long before the XML versions, and MS even had documentation for the binary container format itself. The problem is that to open a document correctly, you have to faithfully recreate the entire document object model, every feature, _and every bug_ in the Microsoft programs. The object model of a MS Office document is the same whether it’s in a binary or XML container. For example, Excel spreadsheets can use one of two date epochs (1900 or 1904). The 1904 epoch originated in Excel for Mac (which predates Excel on PC!!), while the 1900 epoch originated on the PC in Lotus 1-2-3. For interoperability, Excel for Mac and Windows have both long supported both epochs. But because Lotus 1-2-3 had some bugs in its date handling code, Excel has to mimic those bugs because otherwise spreadsheets that originated on Lotus 1-2-3 would break. (Lotus was the existing market leader when Excel came out, so compatibility with it was paramount.) This means that Excel spreadsheets with the 1900 epoch behave subtly differently than 1904 ones, so any competing spreadsheet has to not only recognize the different epoch, but clone the differing behavior. (To this day, in Excel document properties, you can set the 1904 epoch, which is sometimes necessary to get certain date calculations to work properly.)
Similarly, if you want to open a Word document and have the formatting match up precisely, it means you have to recreate every feature that’s represented in the Word object model, and implement it identically.
This is the real reason alternative office suites still struggle with round-trip compatibility, especially with complex Word documents. They don’t support the same features as Word, and/or implement them differently, so the result is inaccurate. Reading the files is not the challenge.
i can't thank you enough,,am 72 and just when i was going to toss this out the window ,,i chanceon to your website.And what a difference you explain in simple (72 year old )language and when i did have questions you actually took the time to answer me .Thank you again
This is the most interesting thing I heard today.
Who still remembers the good old cab files from the Windows Installation disks?✌🏻
Learned something new. Thanks! :)
Setup.exe is actually an executable file and a zip file glued together. The executable is the installer program, while the zip contains the software to be installed. When reading a .exe file to load and run it, Windows starts reading the file at the beginning. A .zip file, on the other hand, has its table of contents at the end. The idea was the compression software would compress the files and write out where each one is in the zip once it’s finished.
As far as Windows is concerned, Setup.exe is an executable with a very large data segment at the end. As far as any compression software is concerned, Setup.exe is a zip file with a bunch of garbage data at the beginning that isn’t referenced in any of the table of contents entries.
I'm a web dev and i use this technique as well for exporting/importing data across apps
AMAZING! Dope contect! keep it comiing!🔥👌🏾❤️👑
another benefit of using zip files by programs is compression, which makes the file size smaller and easier for transfer. some of them add password protection.
Fun fact: the Microsoft Office files and Android (APK) files "were" last modified in the years 1980 and 1981