Epub files can also be renamed as zip then extracted. Or you can also just zip a bunch of pictures and save it as .CBZ (or CBR for RAR’ed files) and voila, you have a lossless container/wrapper for your image files that can be viewed with any comic viewer la Sumatra pdf.
The page ordering rules for cbz are a bit vague though, so you have to left-pad the page numbers with zeros to make sure you get a consistent ordering when viewing.
@c6amp It's the original picture given to users who didn't have a profile picture. When UA-cam decided that everyone needed to have a picture and that it was going to give generic letter-based pictures to anyone who didn't upload their own, I took a screenshot of the default image and uploaded it as my profile picture. Call it my own little protest against UA-cam imposing its will on the users. :) Since nobody else still has this image, it stands out when I'm looking through the comments.
PK3 and PK4 files (used by games like quake 3, star wars Jedi knight Jedi academy, doom 3 and etc) are also just zip files. Wish more games would use standard data compression instead of having to use a tool to manually unpack game archives with a script.
Beyond obfuscation like has been mentioned, I think another big factor might be performance. Zip is somewhat old and may not be exactly optimal in terms of how fast you can get the compressed data off of your drive and into memory.
Jar files are zip folders that hold special code for the JRE to compile at runtime. The structure is defined by the developer of the specific application. The only standard thing about it is simply that it needs a main.class in the root. Other than that, it's up to the dev to build out the program's file structure.
I think you meant manifest file that hold the metadata info of the jar like the main class if any, main class is only needed if you want that to run if double click Edit Added some corrections Also jre is for running (java.exe ) Jdk for compiling (javac.exe)
For the exe files, i reckon driver installers are basically always archives extractables with 7-zip. Which is really usefull to get the INF and SYS files alone and not deal with the crappy installers of hardware vendors
YEs they are and if you can figure out which folder it needs to go on the C: drive you can just drag and drop it from the archiver program into the folder to install it.
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file. PE files typically has the .text section (the program's actual machine code) and the .rsrc section (other data the program needs). But the .rsrc section can include other files, including valid Zip files. This is what I think self-extracting exes are extracting, and what Archiver programs look for. Finding none, they'll show you the PE structures instead.
@@mfaizsyahmi It is. Basically, when you are generating resources for an executable image, you do have the option of adding files to the resource section. So at build time you can use a resource script with a user defined resource file and add an archive to the resource section. Since this isn't useful for self extractors, since it would require you building the executable, it is also possible to edit the resources section. Windows has the UpdateResource function which facilitates this. A self extracting archive or an installer which has the files as part of the executable resources can either just enumerate the resources, or assume that a fixed resource ID will always be used.
@@mfaizsyahmi You can just attach Zip data to the end of EXE, there is no need to add it to any section. In fact, adding a big Zip to a section may be unwanted, as sections are loaded in RAM. Most self-extracting archives don't add the archive to sections. Zip format is special, because it has a footer instead of header - archivers read it from the end of the file. That's why it can coexist with many header-based formats.
It's actually a great way to design a file format though, I've done it myself with my own coding. It means you can easily edit the file with third party software, and use existing zip libraries to pack data in and out of the file, and you get compression on top of it. It's a smart way to work.
The reason you pack it in a zip file is because every file is compressed separately. This means you can have a metadata file inside the zip and just extract this (probably quite small) text file. With other compression formats you would have to decompress the whole (possibly very large) file first to check some small file within.
Archives where you have to extract everything are called solid archives, most formats have the option to create solid or non-solid archives so it isn't a particular advantage for zip files. Zip is just built into Windows so devs* don't have to worry about including an archiver with their programs (which is a bit of a problem for slow internet). The advantage of solid archives is that they can be made smaller but how much smaller depends on a lot of factors and the risk of corruption are far worse (if the archive is corrupted at all then all of the files within will likely be lost). Source: I messed around with solid RAR archives back in the day trying to save space on my EHDD but ended up losing a bunch of data because the archive got corrupted so I started using non-solid archives exclusively and found there was very little difference in size. Just to make sure I was remembering everything correctly I double checked and found that 7zip (the format I switched to in more recent years) uses a bit of a hybrid approach where the archives can have multiple solid blocks the size of which can be set. 7-zip also allows for the creation of completely non-solid archives. Since I've switched to internal SSDs and a NAS with high quality NAS HDDs and have multiple backups in general I'm not as worried about corruption as I used to be. *There's an interesting story behind that and Microsoft's Embrace, Extend, and Extinguish policy of the time.
Yeah, that’s in no way unique to Zip. If anything, I suspect that compression formats that require you to extract all the files at once are the exception, not the rule.
If you've ever used Scratch (the programming language), you might know that .SB3 and .SB2 files are also ZIPs. Extracting them actually has a use: checking the size of the project.json file inside (which is limited to 5MB). edit: correct error - .sb files aren't .zips
Ok... I cannot tell you how many times I need an image from a Word document, because some genius client decided to send me a word document instead of the original PNG or JPG file. And, until now, there was no easy way for me to get the *ORIGINAL* image out of that Word document... Now I know how to do that!! This is incredible!!
@@salamiwallnut as the name sugests a tar.gz is a file compressed with tar and after that compressed with gunzip, just a way to make the file smaller. it has became so standard in linux, that it has became just one binary who does both.
@@salamiwallnut I think he's referring to Arch pacman packages. However you are correct in pointing out that that is just a generic archive format, and that Arch pacman packages are quite literally zipped tar archives with no special file extension of any kind. As such, it wouldn't really fit in a video like this.
Re: 5:30 - Yes, the file called “minecraftpe” with no extension *is* the main executable in the case of the iOS app. It’s in the Mach-O format (just like on macOS) which never has an extension. Neither does ELF (the main Linux binary format) for that matter - only in Windows do executable files have any extension at all.
Jobs wasn't joking when he said the iPhone runs OS X - with the transition to Apple Silicon on the Mac, they even allow running iOS apps on macOS. I would assume macOS and iOS share as much technology and APIs as possible, which does show, given how optimized they are.
@@XeZrunner And both are direct descendants of NeXTStep. The .app format (itself, just a folder, not even zipped), came from that platform, and some of the utilities in MacOS are (or started out as) direct ports of ones included on the NeXT computers.
Great niche topic to cover. Most of those do indeed have file headers that begin with PK (Phil Katz). We can see this by opening in a hex editor like HxD, or even Notepad++ (if it's not massive in size). Many Windows executables begin with MZ (Mark Zbikowski) and can also be extracted. Thankfully 7zip makes it quick and easy to check.
Regarding opening the minecraft jar, the `assets` folder is in fact the exact same structure as a resource pack, and the `data` folder is the exact same structure as a datapack. We often recommend that people do exactly as shown in this video to find where they need to put files and what to put in them when they are making their own resource packs and datapacks! It's very handy
I think in the video you are showing an older version of the game, in newer versions a majority of the folders shown in the video are nested inside the `data` and `assets` folders that I mentioned in my above comment
As a Linux user, I've been so confused as to why .docx and other ms office xml formats have archive manager assigned as default or one of the programs to open it. I suppose it's because the format isn't native or foss to be properly incorporated that file magic can only recognize the .zip aspect of it.
Ah yes, opening files with a file extractor. I always do this with Android APK files whenever I want to extract some fonts or *assets* files used in that app. I never knew you can also open Office files the same way, this is helpful.
Many years ago (I won't say how many lest I age myself), an indie game I loved that had a level editor had the fan community asking the dev to support custom player skins in the created levels. I looked into how hard or easy that would be, and that's when I first discovered how incredibly useful it is to just package data, images, etc. into a ZIP file and change the extension to make it look fancier 😂 It's nice to know even the big companies do the same thing and it wasn't a dumb novice-dev idea of mine.
@@IceMetalPunk Not exactly. Like Joe said in the video, changing the extension helps the OS (mainly Microsoft Windows) to open the file with a specific program (because Windows depends on the extension, not the file content). Then, how will Windows know if that .zip file belongs to Adobe Illustrator or Java Virtual Machine (Minecraft)? That's the point.
Yeah same, only with ISO files. Because archiver programs seem to associate files that don't appear to be archive files until you look for yourself. I already knew jar files were a form of archive files because it literally has archive in its name. And also from winrar associating jar files as an archive file.
I've been watching your videos for years. Most of the time, your videos are very interesting and entertaining. And most of the time I pretty much know the information that you are covering or I am aware of it. In this video, there is a first. I never knew that all of these files were actually zip or compressed files. Keep up the good work. I always enjoy your videos.
If you're using something like 7-Zip, you don't need to rename the file. Changing the extension simply tells the OS's built-in archiving software to decompress the file.
Apple platforms like iOS and macOS also hide folders like that. If you've ever installed anything on a Mac, you might know that the app comes in a .dmg file (similar to .iso) and then you're supposed to drag the [AppName].app file from there to your applications folder and run it by double clicking like an .exe. But it's not actually a file, as if you extract a DMG on Windows, the .app shows up as a folder. Inside, there is a directory called "Contents" and in it are a bunch of resource files alongside a "MacOS" directory that has the main executable.
I’m a Java developer and being able to confirm that the proper files are stored in the jar is very useful. It is also useful to be able to decompile Java .class files specifically.
BTW, JAR files are not just for servers. They can be run on client versions of Windows, macOS, and Linux, provided the correct version of the Java Runtime Environment is installed. Also, the only folder you're guaranteed to see in a JAR file is the META-INF folder.
(1:00) You should definitely uncheck hide file extensions, for security reasons. This is because people send files such as "photo.jpg.exe" which then shows up as "photo.jpg". Most people will not care that the .jpg is visible and find it strange, they'll just register that as a photo and will try to open it by running it, and now they're running malicious software. But with visible file extensions, you'll see it being photo.jpg.exe and now you can see that it's not a photo easily, and should definitely not run it.
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file. If you think about it, they can't be zip files because then how would a zip program (meaning the program that zips/unzips zip files) run? How would windows even exist before zip files were invented?
6:07 this version of minecraft is pretty old. In newer versions all the image data is grouped in 1 top level assets folder, which has the same structure as a texture pack. It's basically the default texture pack.
A file is a bunch of information stored in a certain format, an archive is also a bunch of information stored in a certain format, perfectly reasonable to be interchangable.
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file. If you think about it, they can't be zip files because then how would a zip program (meaning the program that zips/unzips zip files) run? How would windows even exist before zip files were invented?
You can also open executable files (EXE) in a file archiver using a program such as 7zip or GNOME Archive Manager and the EXE file includes the relevant files necessary for the file to work such as app icons, version info and some text files will be filled with text that is used in the program. I should also add that you don't need to change the file type in Windows, instead just open the folder path in 7zip and right-click on the file and click on "open inside" (does not appear on the Linux snap version of p7zip)... I may also add that most file types can be opened in any text editor to view the same information e.g. an SVG Vector file can be opened in Notepad or Kate text editor and you view the coordinates of the file information...
An important thing about EXE/DLL is that they're not ZIP files, they're PE files, totally different specs. But any archiver program worth their salt understands the PE format as well as a ZIP format.
ThioJoe, thank you for this youtube channel. I love this youtube channel. Thank you for making this youtube channel about computers and the windows operating system. I've used the Windows operating system for years. I've essentially grown up with Windows as pretty much 99.999% of the whole world also has since Windows is pretty much the most used operating system of the world (the other two popular operating systems: macOS and Linux). Anyways, thank you for this video! I never knew that and was interesting to learn that Word documents are actually just .ZIP files in disguise. The More You Know! 😍😍
Suggestion: Could you also explain (as in possibly a new video) .rar, .7z, .tar.gz, tar.xz and .tar.bz2 and how are they different from .zip? I think you are one of the few people who can do it well
I can answer for the tar variants: tar once stood for "tape archive", and was a way to just pack a number of small files together to store on magnetic tape, back when Unix was king. These use no compression. The .gz, .xz, and .bz2 extensions are just different compression methods applied to a standard tar file, so unlike zip, the whole file up to and including the files you are looking to extract has to be run through the uncompressor first. Zip stores a list of the files and their positions in the zip file at the end of the zip file, and each file is compressed separately, often using different compression algorithms.
Jar files are not Minecraft specific neither are they just for severs, the normal game on your local pc also uses them to store the game, and they are used for mods. They aren't just zip files but they behave like zip files, they are actually Executables for Java Programms/Installers where the class files are stored in a big archive to 1. Save Disk Space and 2. have less files laying around in an easy to access location also you don't even have to add the .zip extension, you can just extract the data using winzip, 7zip, winrar or any other decompression programm that supports them. but i still find your videos extremely good!
archiving programs are agnostic to file endings you don't need to rename the files to open any of these, just right-click it and use the context menu to let the archiving program handle the file in a way you want....or open directly from the files menu within the archiving program, just select "all files" in the file opening dialog, so you can see them in the selection window.
Why change extension? Right click and select "Open with" and select WinZip, WinRAR, 7-Zip or what application you use for un/compressing files. Make sure "Always use this app to open" is unselected because you don't want to associate it with the wrong application.
Wow that’s super useful I was literally wondering how to do this especially extracting images from the word document definitely going to keep this in mind thanks
Source Engine map files called Binary Space Partition or just ".bsp" files can be open with 7zip even though they are not zip files. It is very useful to see the content baked into the bsp files this way.
As a programmer i knew that some of the files are archives, like .jar .exe .ipa .apk .msi. Bit the .docx or .odt or any of the office files were mind blowing
On Macs, lot of files you wouldn't expect are actually package files, but really just folders with more stuff inside. I took a MainStage concert file as an example and renamed it to .zip and it just turned into a folder, letting me see the contents. Applications are package folders as well, which is why you could just open that .app iOS app.
Honestly, packages are a brilliant concept that solves so many common problems. But since none of the other major OSes have any equivalent concept for bundling a folder into an opaque, pseudo-monolithic file, using a zip archive is a great workaround that basically achieves the same thing.
maybe someone has pointed this out already, but .ora, a layered image format like .psd, used by GIMP, Krita and some other open-source image editing software (MyPaint is the one I have on my raspberry pi, since it can't run Krita, and it uses .ora) is also just a .zip in disguise! a friend of mine took advantage of this fact to use it for sprites for a game she made. .kra is Krita's modified version of .ora and also can be opened with an archiver.
Already knew about jars, didn't know about anything else, this is interesting. Browsing Minecraft's jar for resources is super useful actually, looking at their assets is a great way to learn how to make loot tables or texture packs etc. Or if you just want one of their in game textures.
Office and LibreOffice file format are basically a collection of standardized XML files. XML files are sort of like HTML but with custom tags and structure. Due to XML files having so many repetitive keywords (like tags), ZIP-ing it make sense thus creating smaller file. Old Document format like 95-2003 Microsoft Office create a pure condensed binary document. Nobody outside Microsoft knows how to make that format precisely as Microsoft Office does, other apps have to reverse-engineer the binary format with mixed result (such as, inconsistent formatting). Even when Microsoft does actually open the specification, there is still "outside-of-specification" feature that are not documented properly. With XML based specification, at least someone can see a clear text tag and property of document even if there are actual "outside-of-specification" features implemented. Reverse-engineering it will be much easier. The only headache is that Microsoft will likely to continue doing "out-of-spec" thing with each new version of Office 365 that will make it hard for other office apps to read Office XML file properly with consistent format.
The difficulty was never really with the binary format as such; it’d been reverse-engineered long before the XML versions, and MS even had documentation for the binary container format itself. The problem is that to open a document correctly, you have to faithfully recreate the entire document object model, every feature, _and every bug_ in the Microsoft programs. The object model of a MS Office document is the same whether it’s in a binary or XML container. For example, Excel spreadsheets can use one of two date epochs (1900 or 1904). The 1904 epoch originated in Excel for Mac (which predates Excel on PC!!), while the 1900 epoch originated on the PC in Lotus 1-2-3. For interoperability, Excel for Mac and Windows have both long supported both epochs. But because Lotus 1-2-3 had some bugs in its date handling code, Excel has to mimic those bugs because otherwise spreadsheets that originated on Lotus 1-2-3 would break. (Lotus was the existing market leader when Excel came out, so compatibility with it was paramount.) This means that Excel spreadsheets with the 1900 epoch behave subtly differently than 1904 ones, so any competing spreadsheet has to not only recognize the different epoch, but clone the differing behavior. (To this day, in Excel document properties, you can set the 1904 epoch, which is sometimes necessary to get certain date calculations to work properly.) Similarly, if you want to open a Word document and have the formatting match up precisely, it means you have to recreate every feature that’s represented in the Word object model, and implement it identically. This is the real reason alternative office suites still struggle with round-trip compatibility, especially with complex Word documents. They don’t support the same features as Word, and/or implement them differently, so the result is inaccurate. Reading the files is not the challenge.
Comic book files like cbz, cbr and cb7 are basically this, just an archive full of pictures, mostly jpegs or pngs, but I've seen webp in some, and the only difference is whatever they are using ZIP, RAR or 7Z for compression.
6:20 It's actually the game files, but usually it has an assets folder where the grass block for example would be. I don't know why your JAR didn't have it though.
Most of driver installation .exe files can be opened this way if you have problem installing driver incompatible with your version of windows. Extract, find .ini files, install manually, most of the times works.
The some of the most popular file formats for digital comic books are .CRZ and .CBR. These are just Zip and Rar files with sequentially numbered jpeg images.
They aren't really standards so much as loose agreements. You even see some with .webp images in now, because it provides much better compression - though to the annoyance of some, because not all viewers support the new format.
Really neat trick. Might use that for an application I'm working on, just so that it's easier for the OS to link the file type to the program. And I'm totally gonna try that out on other files, see what comes up.
Where do you get your cursor? Is it just windows default, because yours looks bigger, the point looks pointier, and the stub at the back looks longer. 5:58
I was kind of mindblowing when I first discovered them about document files but it makes sense when you realize the X in DOCX means XML. Modern office documents are basically similar to webpages which run from a folder of HTML, CSS, and script files. In fact, EPUB literally runs on HTML and CSS as seen in the video!
A few years ago I was doing this to Firefox extensions, so I could go into one of the files a bump up the the highest version number the extension was compatible with for extensions that were no longer being supported.
i can't thank you enough,,am 72 and just when i was going to toss this out the window ,,i chanceon to your website.And what a difference you explain in simple (72 year old )language and when i did have questions you actually took the time to answer me .Thank you again
*.PK3 is another zip file in disguise. It's mainly used for modding. particularly DOOM/Heretic/Hexen modding. It contains assets (sprites/graphics/sounds) and code inside various code lump types (ACS/DECORATE/ZScript/TEXTURE/MAPINFO/etc).
The Office Files are actually really cool. I had a scenario where a department I was working with put data in word docs. To extract it you can write a script to make it a zip file, then parse the XML for the data, & return it. Super useful!
5:30 yeah, that is the executable for sure. y'know, if you've ever unsure what a file is (missing extension, weird extension), oftentimes the `file` utility is useful. it should identify this as a Mach-O executable
More than a decade ago I opened an iso file WinRAR by mistake and was surprised to find all the contents of the disk, since then I always try to do this first if I need to get some asset first instead of searching for an specific unpacker.
8:46 That’s not the case, the actual extraction program is a program built-in to any Windows OS, somewhere located in Sys32. I think that is the file which contains the .exe file that is to be extracted
A little as for 02:00 as I tried in the past experimenting with storing pictures in word documents... the maximum resolution you can save an image into a word document is around 2000x2000 pixels, any higher gets automatically downsize
Setup.exe is actually an executable file and a zip file glued together. The executable is the installer program, while the zip contains the software to be installed. When reading a .exe file to load and run it, Windows starts reading the file at the beginning. A .zip file, on the other hand, has its table of contents at the end. The idea was the compression software would compress the files and write out where each one is in the zip once it’s finished. As far as Windows is concerned, Setup.exe is an executable with a very large data segment at the end. As far as any compression software is concerned, Setup.exe is a zip file with a bunch of garbage data at the beginning that isn’t referenced in any of the table of contents entries.
The best kept secret in the world 🤫
😂
Sure 😅
No one will ever know (besides us)! 😉😏
Shhhh. I wont tell anyone, joe, trust me
Btw wow im early
Epub files can also be renamed as zip then extracted. Or you can also just zip a bunch of pictures and save it as .CBZ (or CBR for RAR’ed files) and voila, you have a lossless container/wrapper for your image files that can be viewed with any comic viewer la Sumatra pdf.
Or if you are using a decent archiving program(like 7zip) you can just extract the data of any container, so no need to change file extensions.
The page ordering rules for cbz are a bit vague though, so you have to left-pad the page numbers with zeros to make sure you get a consistent ordering when viewing.
@@vylbird8014 Numbered files that AREN'T left-padded with zeros are an abomination! :)
@c6amp It's the original picture given to users who didn't have a profile picture. When UA-cam decided that everyone needed to have a picture and that it was going to give generic letter-based pictures to anyone who didn't upload their own, I took a screenshot of the default image and uploaded it as my profile picture. Call it my own little protest against UA-cam imposing its will on the users. :)
Since nobody else still has this image, it stands out when I'm looking through the comments.
@@cronchcrunchbtw 7z format that developed by 7zip. It also can be save as cb7.
Not only interesting, but very useful. Especially for pulling pictures and charts out of Office documents quickly or en masse.
PK3 and PK4 files (used by games like quake 3, star wars Jedi knight Jedi academy, doom 3 and etc) are also just zip files. Wish more games would use standard data compression instead of having to use a tool to manually unpack game archives with a script.
Pk3s are also used in srb2/modern doom source ports
The reason why they don't do this it's simple: they just don't want you to do it!
Sorry but you would actually not really like it
Beyond obfuscation like has been mentioned, I think another big factor might be performance. Zip is somewhat old and may not be exactly optimal in terms of how fast you can get the compressed data off of your drive and into memory.
Jar files are zip folders that hold special code for the JRE to compile at runtime. The structure is defined by the developer of the specific application. The only standard thing about it is simply that it needs a main.class in the root. Other than that, it's up to the dev to build out the program's file structure.
I think you meant manifest file that hold the metadata info of the jar like the main class if any, main class is only needed if you want that to run if double click
Edit
Added some corrections
Also jre is for running (java.exe )
Jdk for compiling (javac.exe)
i don't think the jre does any compiling at runtime, it just acts as a platform to run the bytecode generated by javac
Yup. Minecraft's classes look like that because it was obfuscated to prevent people from just opening it up and stealing the code to make a ripoff.
Comic book reader cbr files are also zip files.
Any java decompiler can decompile the class files and apk files can be decompiled too.
For the exe files, i reckon driver installers are basically always archives extractables with 7-zip. Which is really usefull to get the INF and SYS files alone and not deal with the crappy installers of hardware vendors
Damn, that's a live pro tipp
YEs they are and if you can figure out which folder it needs to go on the C: drive you can just drag and drop it from the archiver program into the folder to install it.
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file.
PE files typically has the .text section (the program's actual machine code) and the .rsrc section (other data the program needs).
But the .rsrc section can include other files, including valid Zip files. This is what I think self-extracting exes are extracting, and what Archiver programs look for. Finding none, they'll show you the PE structures instead.
@@mfaizsyahmi It is. Basically, when you are generating resources for an executable image, you do have the option of adding files to the resource section. So at build time you can use a resource script with a user defined resource file and add an archive to the resource section.
Since this isn't useful for self extractors, since it would require you building the executable, it is also possible to edit the resources section. Windows has the UpdateResource function which facilitates this. A self extracting archive or an installer which has the files as part of the executable resources can either just enumerate the resources, or assume that a fixed resource ID will always be used.
@@mfaizsyahmi You can just attach Zip data to the end of EXE, there is no need to add it to any section. In fact, adding a big Zip to a section may be unwanted, as sections are loaded in RAM. Most self-extracting archives don't add the archive to sections.
Zip format is special, because it has a footer instead of header - archivers read it from the end of the file. That's why it can coexist with many header-based formats.
It's actually a great way to design a file format though, I've done it myself with my own coding. It means you can easily edit the file with third party software, and use existing zip libraries to pack data in and out of the file, and you get compression on top of it. It's a smart way to work.
I love how Joe uses AI generated images now instead of stock photos 😂
I'll use a mix of both
@@ThioJoe That's good to know.
Ok...
I mean, it's basically no work. just a minute or two to make one that works and you're done! no copyright or anything to worry about
@@ThioJoe which AI generator do you use?
The reason you pack it in a zip file is because every file is compressed separately. This means you can have a metadata file inside the zip and just extract this (probably quite small) text file. With other compression formats you would have to decompress the whole (possibly very large) file first to check some small file within.
Archives where you have to extract everything are called solid archives, most formats have the option to create solid or non-solid archives so it isn't a particular advantage for zip files. Zip is just built into Windows so devs* don't have to worry about including an archiver with their programs (which is a bit of a problem for slow internet). The advantage of solid archives is that they can be made smaller but how much smaller depends on a lot of factors and the risk of corruption are far worse (if the archive is corrupted at all then all of the files within will likely be lost).
Source: I messed around with solid RAR archives back in the day trying to save space on my EHDD but ended up losing a bunch of data because the archive got corrupted so I started using non-solid archives exclusively and found there was very little difference in size.
Just to make sure I was remembering everything correctly I double checked and found that 7zip (the format I switched to in more recent years) uses a bit of a hybrid approach where the archives can have multiple solid blocks the size of which can be set. 7-zip also allows for the creation of completely non-solid archives. Since I've switched to internal SSDs and a NAS with high quality NAS HDDs and have multiple backups in general I'm not as worried about corruption as I used to be.
*There's an interesting story behind that and Microsoft's Embrace, Extend, and Extinguish policy of the time.
Yeah, that’s in no way unique to Zip. If anything, I suspect that compression formats that require you to extract all the files at once are the exception, not the rule.
If you've ever used Scratch (the programming language), you might know that .SB3 and .SB2 files are also ZIPs. Extracting them actually has a use: checking the size of the project.json file inside (which is limited to 5MB).
edit: correct error - .sb files aren't .zips
.sb files are not zips afaik, only sb2/3s.
wait .sb2 aswell? didn't know
Ok... I cannot tell you how many times I need an image from a Word document, because some genius client decided to send me a word document instead of the original PNG or JPG file. And, until now, there was no easy way for me to get the *ORIGINAL* image out of that Word document... Now I know how to do that!! This is incredible!!
Another example is Apple iPSW files, those are the firmware files for all of Apple’s firmwares. Those are secretly ZIP files as well.
I'd love to see you go over files such as .DEB (Debian packages) and AppImage/Flatpak/Snap files as well!
And .rpm
Hello Again ._.
@@salamiwallnut as the name sugests a tar.gz is a file compressed with tar and after that compressed with gunzip, just a way to make the file smaller.
it has became so standard in linux, that it has became just one binary who does both.
wanted to comment that too, i ended up with one on a non-debian system recently
@@salamiwallnut I think he's referring to Arch pacman packages. However you are correct in pointing out that that is just a generic archive format, and that Arch pacman packages are quite literally zipped tar archives with no special file extension of any kind. As such, it wouldn't really fit in a video like this.
Re: 5:30 - Yes, the file called “minecraftpe” with no extension *is* the main executable in the case of the iOS app. It’s in the Mach-O format (just like on macOS) which never has an extension. Neither does ELF (the main Linux binary format) for that matter - only in Windows do executable files have any extension at all.
Jobs wasn't joking when he said the iPhone runs OS X - with the transition to Apple Silicon on the Mac, they even allow running iOS apps on macOS.
I would assume macOS and iOS share as much technology and APIs as possible, which does show, given how optimized they are.
@@XeZrunner And both are direct descendants of NeXTStep. The .app format (itself, just a folder, not even zipped), came from that platform, and some of the utilities in MacOS are (or started out as) direct ports of ones included on the NeXT computers.
Great niche topic to cover. Most of those do indeed have file headers that begin with PK (Phil Katz). We can see this by opening in a hex editor like HxD, or even Notepad++ (if it's not massive in size). Many Windows executables begin with MZ (Mark Zbikowski) and can also be extracted. Thankfully 7zip makes it quick and easy to check.
Regarding opening the minecraft jar, the `assets` folder is in fact the exact same structure as a resource pack, and the `data` folder is the exact same structure as a datapack.
We often recommend that people do exactly as shown in this video to find where they need to put files and what to put in them when they are making their own resource packs and datapacks!
It's very handy
I think in the video you are showing an older version of the game, in newer versions a majority of the folders shown in the video are nested inside the `data` and `assets` folders that I mentioned in my above comment
As a Linux user, I've been so confused as to why .docx and other ms office xml formats have archive manager assigned as default or one of the programs to open it. I suppose it's because the format isn't native or foss to be properly incorporated that file magic can only recognize the .zip aspect of it.
Ah yes, opening files with a file extractor. I always do this with Android APK files whenever I want to extract some fonts or *assets* files used in that app. I never knew you can also open Office files the same way, this is helpful.
Many years ago (I won't say how many lest I age myself), an indie game I loved that had a level editor had the fan community asking the dev to support custom player skins in the created levels. I looked into how hard or easy that would be, and that's when I first discovered how incredibly useful it is to just package data, images, etc. into a ZIP file and change the extension to make it look fancier 😂 It's nice to know even the big companies do the same thing and it wasn't a dumb novice-dev idea of mine.
That (---) was really unnecessary qnd like no one cares how many years ago.
It's not to be fancy, it's to bundle all the different things needed to get things to work to come in a single file, arranged in a certain way.
@@Gunz1234 Your entire reply was unnecessary 🤷♂
@@mfaizsyahmi Oh, no, I meant changing the file extension makes it look fancier than just naming it with .zip 🙂
@@IceMetalPunk Not exactly. Like Joe said in the video, changing the extension helps the OS (mainly Microsoft Windows) to open the file with a specific program (because Windows depends on the extension, not the file content). Then, how will Windows know if that .zip file belongs to Adobe Illustrator or Java Virtual Machine (Minecraft)? That's the point.
Now I know why I can open .jar files with 7zip or winrar. It makes sense
Yeah same, only with ISO files. Because archiver programs seem to associate files that don't appear to be archive files until you look for yourself.
I already knew jar files were a form of archive files because it literally has archive in its name. And also from winrar associating jar files as an archive file.
I've been watching your videos for years. Most of the time, your videos are very interesting and entertaining. And most of the time I pretty much know the information that you are covering or I am aware of it. In this video, there is a first. I never knew that all of these files were actually zip or compressed files. Keep up the good work. I always enjoy your videos.
If you're using something like 7-Zip, you don't need to rename the file. Changing the extension simply tells the OS's built-in archiving software to decompress the file.
If you are into 3D printing, the 3mf format used by many slicers to store objects, print settings, etc are also zip files.
Awesome educational video on something I had never thought of before. More stuff like this please!
Apple platforms like iOS and macOS also hide folders like that. If you've ever installed anything on a Mac, you might know that the app comes in a .dmg file (similar to .iso) and then you're supposed to drag the [AppName].app file from there to your applications folder and run it by double clicking like an .exe. But it's not actually a file, as if you extract a DMG on Windows, the .app shows up as a folder. Inside, there is a directory called "Contents" and in it are a bunch of resource files alongside a "MacOS" directory that has the main executable.
I’m a Java developer and being able to confirm that the proper files are stored in the jar is very useful. It is also useful to be able to decompile Java .class files specifically.
BTW, JAR files are not just for servers. They can be run on client versions of Windows, macOS, and Linux, provided the correct version of the Java Runtime Environment is installed. Also, the only folder you're guaranteed to see in a JAR file is the META-INF folder.
(1:00) You should definitely uncheck hide file extensions, for security reasons.
This is because people send files such as "photo.jpg.exe" which then shows up as "photo.jpg". Most people will not care that the .jpg is visible and find it strange, they'll just register that as a photo and will try to open it by running it, and now they're running malicious software.
But with visible file extensions, you'll see it being photo.jpg.exe and now you can see that it's not a photo easily, and should definitely not run it.
Interesting. I just knew before about that APKs, ISOs, JARs and some EXEs are Zips. The rest was new for me
Same.
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file.
If you think about it, they can't be zip files because then how would a zip program (meaning the program that zips/unzips zip files) run? How would windows even exist before zip files were invented?
I have realised that if you open it up, every files date & time inside of it is 1/1/1980 12:00AM
Example: 2:22
6:07 this version of minecraft is pretty old. In newer versions all the image data is grouped in 1 top level assets folder, which has the same structure as a texture pack. It's basically the default texture pack.
7:03
*The beam that it sucks in
Lmao- great video, super interesting!
Cool, didn't know office files were also structured as zip
I knew about a few of these but never thought that DOCX was also zip hahaha amazing
A file is a bunch of information stored in a certain format, an archive is also a bunch of information stored in a certain format, perfectly reasonable to be interchangable.
2:08 wait, HOW ARE THOSE DOCUMENTS MADE IN 1980?
Joe explaining the minecraft source is like learning French on a English class
I never figured that all these files were just zip files. This sort of thing is exactly why I watch your videos
EXE, DLL (and probably MSI) files shown in this video are NOT ZIP files. EXE/DLL are PE files, which stands for Portable Executable. It's the format Windows uses so that machine code, and data it uses, for a program is in one, portable, file.
If you think about it, they can't be zip files because then how would a zip program (meaning the program that zips/unzips zip files) run? How would windows even exist before zip files were invented?
You can also open executable files (EXE) in a file archiver using a program such as 7zip or GNOME Archive Manager and the EXE file includes the relevant files necessary for the file to work such as app icons, version info and some text files will be filled with text that is used in the program.
I should also add that you don't need to change the file type in Windows, instead just open the folder path in 7zip and right-click on the file and click on "open inside" (does not appear on the Linux snap version of p7zip)...
I may also add that most file types can be opened in any text editor to view the same information e.g. an SVG Vector file can be opened in Notepad or Kate text editor and you view the coordinates of the file information...
An important thing about EXE/DLL is that they're not ZIP files, they're PE files, totally different specs. But any archiver program worth their salt understands the PE format as well as a ZIP format.
@@mfaizsyahmi What these archivers show is the section headers for almost everything, and the resource directory for the resource section.
ThioJoe, thank you for this youtube channel. I love this youtube channel. Thank you for making this youtube channel about computers and the windows operating system. I've used the Windows operating system for years. I've essentially grown up with Windows as pretty much 99.999% of the whole world also has since Windows is pretty much the most used operating system of the world (the other two popular operating systems: macOS and Linux). Anyways, thank you for this video! I never knew that and was interesting to learn that Word documents are actually just .ZIP files in disguise. The More You Know! 😍😍
.cbr and .cbz files used for comics are also just zip files of numbered jpegs (and sometimes a metadata xml file)!
Thank you very much.
I've been searching for something like this for a long time.
Suggestion: Could you also explain (as in possibly a new video) .rar, .7z, .tar.gz, tar.xz and .tar.bz2 and how are they different from .zip? I think you are one of the few people who can do it well
I can answer for the tar variants: tar once stood for "tape archive", and was a way to just pack a number of small files together to store on magnetic tape, back when Unix was king. These use no compression. The .gz, .xz, and .bz2 extensions are just different compression methods applied to a standard tar file, so unlike zip, the whole file up to and including the files you are looking to extract has to be run through the uncompressor first. Zip stores a list of the files and their positions in the zip file at the end of the zip file, and each file is compressed separately, often using different compression algorithms.
Jar files are not Minecraft specific neither are they just for severs, the normal game on your local pc also uses them to store the game, and they are used for mods. They aren't just zip files but they behave like zip files, they are actually Executables for Java Programms/Installers where the class files are stored in a big archive to 1. Save Disk Space and 2. have less files laying around in an easy to access location also you don't even have to add the .zip extension, you can just extract the data using winzip, 7zip, winrar or any other decompression programm that supports them.
but i still find your videos extremely good!
You can teach me the whole day about computers and I'll never get bored.
I miss him ever since when the triple the internet video was uploaded. Man those times 😞
@@TrojanLube69 I've got extra 64 gigs of RAM thanks to him.
Legendary UA-camr
@@just.nobody yeah, no way you can download that amount anymore 😭😩
About office files, I've known and used it for a long time :D I've been extracting raw images from ppt presentations for years - super useful
Very cool that you got official video from a 1980 timetraveler for the Microsoft portion
The "award winning photograph" is so funny 😂
Lol yep
This is super cool, now I'm gonna look through every single file on my phone and computer and see what changes I can make to them
archiving programs are agnostic to file endings you don't need to rename the files to open any of these, just right-click it and use the context menu to let the archiving program handle the file in a way you want....or open directly from the files menu within the archiving program, just select "all files" in the file opening dialog, so you can see them in the selection window.
Why change extension? Right click and select "Open with" and select WinZip, WinRAR, 7-Zip or what application you use for un/compressing files. Make sure "Always use this app to open" is unselected because you don't want to associate it with the wrong application.
Wow that’s super useful I was literally wondering how to do this especially extracting images from the word document definitely going to keep this in mind thanks
ThioJoe I love the way you explain Tech stuff on your channel!! Keep it up!!!!!!
Not sure if this was already mentioned, but Google Earth and several mapping & GIS programs make use of “.KMZ” files, which are also zip archives.
Source Engine map files called Binary Space Partition or just ".bsp" files can be open with 7zip even though they are not zip files. It is very useful to see the content baked into the bsp files this way.
Fun fact: IPSW files (Used for iDevice restores) are also hidden ZIPs.
As a programmer i knew that some of the files are archives, like .jar .exe .ipa .apk .msi. Bit the .docx or .odt or any of the office files were mind blowing
This is the most interesting thing I heard today.
Fun fact - MediaMonkey's addon install packages (.mmip) are also just zip files
On Macs, lot of files you wouldn't expect are actually package files, but really just folders with more stuff inside. I took a MainStage concert file as an example and renamed it to .zip and it just turned into a folder, letting me see the contents. Applications are package folders as well, which is why you could just open that .app iOS app.
Honestly, packages are a brilliant concept that solves so many common problems. But since none of the other major OSes have any equivalent concept for bundling a folder into an opaque, pseudo-monolithic file, using a zip archive is a great workaround that basically achieves the same thing.
.xpi files and .crx files can be opened in 7-zip, these types of files are firefox and chrome extension files respectively
Fascinating, thank you so much. Very useful for converting file x to file y.
I learned this a couple years ago and I was blown away 😂 told all my developer friends that it's all just zipper XML these days
Several feature phone theme files are also archives. .nth for Nokia are just ZIP files, and .thm for Sony Ericsson are TAR files.
maybe someone has pointed this out already, but .ora, a layered image format like .psd, used by GIMP, Krita and some other open-source image editing software (MyPaint is the one I have on my raspberry pi, since it can't run Krita, and it uses .ora) is also just a .zip in disguise! a friend of mine took advantage of this fact to use it for sprites for a game she made. .kra is Krita's modified version of .ora and also can be opened with an archiver.
That illustrator folder thing blows my mind
Already knew about jars, didn't know about anything else, this is interesting. Browsing Minecraft's jar for resources is super useful actually, looking at their assets is a great way to learn how to make loot tables or texture packs etc. Or if you just want one of their in game textures.
I did know about the APK files but not about the other files, nice vid as always
Office and LibreOffice file format are basically a collection of standardized XML files. XML files are sort of like HTML but with custom tags and structure. Due to XML files having so many repetitive keywords (like tags), ZIP-ing it make sense thus creating smaller file.
Old Document format like 95-2003 Microsoft Office create a pure condensed binary document. Nobody outside Microsoft knows how to make that format precisely as Microsoft Office does, other apps have to reverse-engineer the binary format with mixed result (such as, inconsistent formatting). Even when Microsoft does actually open the specification, there is still "outside-of-specification" feature that are not documented properly.
With XML based specification, at least someone can see a clear text tag and property of document even if there are actual "outside-of-specification" features implemented. Reverse-engineering it will be much easier. The only headache is that Microsoft will likely to continue doing "out-of-spec" thing with each new version of Office 365 that will make it hard for other office apps to read Office XML file properly with consistent format.
The difficulty was never really with the binary format as such; it’d been reverse-engineered long before the XML versions, and MS even had documentation for the binary container format itself. The problem is that to open a document correctly, you have to faithfully recreate the entire document object model, every feature, _and every bug_ in the Microsoft programs. The object model of a MS Office document is the same whether it’s in a binary or XML container. For example, Excel spreadsheets can use one of two date epochs (1900 or 1904). The 1904 epoch originated in Excel for Mac (which predates Excel on PC!!), while the 1900 epoch originated on the PC in Lotus 1-2-3. For interoperability, Excel for Mac and Windows have both long supported both epochs. But because Lotus 1-2-3 had some bugs in its date handling code, Excel has to mimic those bugs because otherwise spreadsheets that originated on Lotus 1-2-3 would break. (Lotus was the existing market leader when Excel came out, so compatibility with it was paramount.) This means that Excel spreadsheets with the 1900 epoch behave subtly differently than 1904 ones, so any competing spreadsheet has to not only recognize the different epoch, but clone the differing behavior. (To this day, in Excel document properties, you can set the 1904 epoch, which is sometimes necessary to get certain date calculations to work properly.)
Similarly, if you want to open a Word document and have the formatting match up precisely, it means you have to recreate every feature that’s represented in the Word object model, and implement it identically.
This is the real reason alternative office suites still struggle with round-trip compatibility, especially with complex Word documents. They don’t support the same features as Word, and/or implement them differently, so the result is inaccurate. Reading the files is not the challenge.
Comic book files like cbz, cbr and cb7 are basically this, just an archive full of pictures, mostly jpegs or pngs, but I've seen webp in some, and the only difference is whatever they are using ZIP, RAR or 7Z for compression.
6:20 It's actually the game files, but usually it has an assets folder where the grass block for example would be. I don't know why your JAR didn't have it though.
Most of driver installation .exe files can be opened this way if you have problem installing driver incompatible with your version of windows. Extract, find .ini files, install manually, most of the times works.
The some of the most popular file formats for digital comic books are .CRZ and .CBR. These are just Zip and Rar files with sequentially numbered jpeg images.
They aren't really standards so much as loose agreements. You even see some with .webp images in now, because it provides much better compression - though to the annoyance of some, because not all viewers support the new format.
Really neat trick. Might use that for an application I'm working on, just so that it's easier for the OS to link the file type to the program.
And I'm totally gonna try that out on other files, see what comes up.
I'm a web dev and i use this technique as well for exporting/importing data across apps
Where do you get your cursor? Is it just windows default, because yours looks bigger, the point looks pointier, and the stub at the back looks longer. 5:58
I was kind of mindblowing when I first discovered them about document files but it makes sense when you realize the X in DOCX means XML. Modern office documents are basically similar to webpages which run from a folder of HTML, CSS, and script files. In fact, EPUB literally runs on HTML and CSS as seen in the video!
As a Minecraft expert, I can assure you that you are using very old versions for the two Minecraft folders you are looking at
A few years ago I was doing this to Firefox extensions, so I could go into one of the files a bump up the the highest version number the extension was compatible with for extensions that were no longer being supported.
Hahaha! Nice trick
Really interesting. Great job 👍
Learned something new. Thanks! :)
i can't thank you enough,,am 72 and just when i was going to toss this out the window ,,i chanceon to your website.And what a difference you explain in simple (72 year old )language and when i did have questions you actually took the time to answer me .Thank you again
*.PK3 is another zip file in disguise. It's mainly used for modding. particularly DOOM/Heretic/Hexen modding. It contains assets (sprites/graphics/sounds) and code inside various code lump types (ACS/DECORATE/ZScript/TEXTURE/MAPINFO/etc).
quake engine derivative moment
The Office Files are actually really cool. I had a scenario where a department I was working with put data in word docs. To extract it you can write a script to make it a zip file, then parse the XML for the data, & return it. Super useful!
5:30 yeah, that is the executable for sure. y'know, if you've ever unsure what a file is (missing extension, weird extension), oftentimes the `file` utility is useful. it should identify this as a Mach-O executable
sometimes I just like to open files with 7zip, doesnt normally give much info but it's fun
Great Video as always! 👍
If you have 7zip installed, you don't even need to change the extension. You can right click and open it anyways
This might be a bit obscure for most people, but I also recently found that MATLAB also uses these for their UI applications on the .mlapp format
I’ve used this to take images out of PowerPoint. Very useful secret to know.
Good stuff to know. Now I can explain to my users why zipping the file does not make it smaller.
Fun fact: the Microsoft Office files and Android (APK) files "were" last modified in the years 1980 and 1981
Good video, bro! Keep it up!
Good topic and share, thank you.
More than a decade ago I opened an iso file WinRAR by mistake and was surprised to find all the contents of the disk, since then I always try to do this first if I need to get some asset first instead of searching for an specific unpacker.
I remember converting .jar files to .zip files on my old 2012 mac when I was modding minecraft.
8:46 That’s not the case, the actual extraction program is a program built-in to any Windows OS, somewhere located in Sys32. I think that is the file which contains the .exe file that is to be extracted
A little as for 02:00 as I tried in the past experimenting with storing pictures in word documents... the maximum resolution you can save an image into a word document is around 2000x2000 pixels, any higher gets automatically downsize
Who still remembers the good old cab files from the Windows Installation disks?✌🏻
Setup.exe is actually an executable file and a zip file glued together. The executable is the installer program, while the zip contains the software to be installed. When reading a .exe file to load and run it, Windows starts reading the file at the beginning. A .zip file, on the other hand, has its table of contents at the end. The idea was the compression software would compress the files and write out where each one is in the zip once it’s finished.
As far as Windows is concerned, Setup.exe is an executable with a very large data segment at the end. As far as any compression software is concerned, Setup.exe is a zip file with a bunch of garbage data at the beginning that isn’t referenced in any of the table of contents entries.