What is a File Format?
Вставка
- Опубліковано 29 вер 2024
- Let's explore what a file format is, and provide a different view on it. We dive into polyglots, file format research and the impact on security.
Funky File Formats Talk: • Ange Albertini: Funky ...
corkami/mitra tool: github.com/cor...
Guessing vs. Not Knowing featuring Steganography: • Guessing vs. Not Knowi...
=[ ❤️ Support ]=
→ per Video: / liveoverflow
→ per Month: / @liveoverflow
=[ 🐕 Social ]=
→ Twitter: / liveoverflow
→ Instagram: / liveoverflow
→ Website: liveoverflow.com/
→ Subreddit: / liveoverflow
→ Facebook: / liveoverflow
Aah yes! the Schrodinger's zip file.
I laughed at this a little too hard.
That cat example doesn't seem that random...
അടിപൊളി
@@thefridge6913 you can never laugh too hard :D
Schrödinger doesn't like this trick
Wow, you make it so easy for people to understand these complex topics!
love the example with the "Town Musicians of Bremen
" :)
File formats based on extension: Windows virgin.
File format based on actual content: Unix file CHAD basedlord
MIME types: Our new web overlords
6:26 Here's a triple syntax-highlighting image that can help understand how this can be valid PHP, C and Bash at the same time i.imgur.com/f7a4Uqu.png
Nice! that should be uploaded to Wikipedia it would make it visually clear!
en.wikipedia.org/wiki/Help:Adding_image
en.wikipedia.org/wiki/Polyglot_(computing)
*.rar.jpeg was the file format monstrocity I knew about before. Considering what you showed here, wonder how much pairs or tripletts can be made with popular programs.
Great video, minus perhaps the whiff of command line evangelism in the beginning. You can easily come across scenarios where you manually override which program you want a file to be opened with, even in a GUI environment, such as when you want to edit an image instead of viewing it - dismantling the argument right away (you consciously make the file consumed by a non-default app of your choice).
Going back to CLI, consider shebangs: how often is it that you execute e.g. a shell script by passing it directly to a shell program, rather than invoking it straight away? Not so often, I'd imagine. But even beyond all this, I find the whole default program thing to not really be a barrier of any sort in understanding this. Maybe this is just me not being a computer novice though, I don't know.
Thinking about files as being "programs" however was definitely a food for thought, I quite enjoyed that.
Do you know some hex editors for Linux with decoding abilities similar to 010?
This video is pretty fascinating. With many years of experience using Linux and command line, I am already familiar with the fact that file format is just a trick (every file is just a binary stream anyway), but I am sill surprised that you can easily craft files to be interpreted by multiple programs differently. I am not sure whether it's a good thing or a bad thing. From a developer's perspective, we want the file format to be unambiguous, because we know from experience that ambiguity is a common source of bugs and unexpected behaviors. However, sometimes we also want flexibility and tolerance. For example, we want to add more features to file format but not break the older version program, which means we shouldn't be overly strict on recognizing format. These two design principle are sometimes conflicting to each other, and I think it is the main cause of the issue.
@@00O3O1B a word file is also just a zip file
this is why magic numbers exist
It also doesn't sound very efficient to put different "files" into the same file for most use cases.
Roger, LiveOverflow gone Dark Mode
Why use my name?
I was searching for this comment🤣🤣
I love dark mode. I wonder what % of 0.5M viewers are on OLED and how much energy was saved.
Black lives matter
I honestly like the light mode more. Not only it feels more reminiscent with what I associate with the channel, keeping it light at all times prevents eye whiplash like at 0:55 where the light fills the entire video instantly.
The scene with dark background, a table and a simple t-shirt makes this feel like an interrogation scene where police is asking the criminal questions.
I really like this scenario, for me doesn’t look any like that... but only a minimalist and well filmed scenario. No more that typical youtuber background bullshit with their setup behind
#hackersroom
"We know you work with the File Format Cartel! Who is your leader?!?"
Except instead, he’s answering questions nobody asked.
"grew up with a commandline" yes, you could say that i grew up when i started using linux a few years ago
LiveOverflow: „You don’t want to do PDF by hand“
Me: *cries in LaTeX*
i feel you!
What is Latex, some condom ingredient?
A typesetting tool.
LaTeX is fucking incredible. Also if you like meme ways of writing your documents check out groff/troff. Much simpler than LaTeX.
Or, just start converting markdown/emacs org-mode to LaTeX. That takes the pain out of it. I wrote all my Bio notes in org-mode then compiled it into a final LaTeX document without much trouble.
LaTeX by hand is much easier than PDF by hand.
The conversation went like this:
- WTF did you do?
- You dog...
- Zip it man!
I really like this type of videos. The explanations where really good the quality is very high and overall I can confidently say that I've learnt something I didn't know before!
If anyone has seen this image going around on Discord; "Please don't open me in the browser".
Basically 2 image png animation. Renamed to .zip, gives a .mp3 file inside, containing some metadata about opening the audio in a image viewer.
Pretty cool, took half of the day to get to the end of it.
Yes. I'm not the only one. I just posted this video in the discord server.
Looks interesting. Can you post a link here?
@Nigel YING No. It's not quiet the end. Try to do "file" on the happiness file
I saw the one which only works on VLC (not the UWP version)
11 minutes in
"I don't know exactly why the pdf isn't shown in the zip file"
Dude that's literally the only reason I watched this video.
The pdf file is probably contained in a place where the zip program doesn't check, and pdf headers don't need to start at the beginning
@@petey5009 Since the PDF is in the Zip record it is probably checked by the Zip program. But since it has no filename it is simply not displayed.
He does know why, just not precisely what the reason is in this particular case. In this particular case it could be anything, depending on the way the zip format is there might be many ways to hide the pdf. You just need to find a way to make the information redundant, like making things a comment in the earlier polyglot C php bash etc example. :)
That's because the PDF file was not zipped, its contents were just combined with the contents of the zip file in the final generated file.
So when the zip program reads the final file, it encounters the only thing that was zipped in that file: the text file.
It's not that hard to get 🙂
@@erickcardozo462 You should watch the video.
How dark should the background be?
Live overflow: YES
I remember being blown away by file extensions when I played DDLC
Dont We All...................
just monika
ngl LiveOverflow should check out/play games that have neat tricks like what ddlc does, and im pretty sure there are obscure as heck ones out there.
"YOU decide what to open the file with"
xdg-open: exists
Now that this secret is public, I want to confess. ✝️
It was my trick in all teen years to hide my private content in a shared pc with family.
Of course, we had different users but windows 98, ME and XP were so kind to let me browse other users files.
So, I imagined that maybe someone else can browse my files too...
Even if they managed to gain access to my files they didn't know if they change the file suffix/extension they see a whole new thing
I don't exactly remember which year I learned about file format, but it was between 2000 and 2002.
omggg u finally made a dark mode intro. 🙏🏿🙏🏿🙏🏿
I remember my first time learning about this file format trick was probably about 2005-2006 on 4chan of all places. Someone uploaded an image, it was the cover of a C++ textbook, or some C language. I can't quite remember, but what I do remember was you could download the image, and extract that exact document from it. They embedded the textbook within the image itself, and used any image hosting site to discretely share it with people. I was blown away.
LiveOverFlow: hiding files in files is not fun
justCTF: yes
I always used to think that these formats are "strict" as in they wouldn't allow unknowns. Turns out they do and you can play tricks with them.
_[HTML without DOCTYPE has entered the chat]_
Hm.. I usually like your videos but this time.. dunno, this episode was lacking in important details and too much about unnecessary analogies. It did not explain what exactly different implementation of PDF readers are looking for, what the "start of a PDF file" is, what "file extensions" are and how they are handled, why programs (PDF reader or ZIP reader) just ignores the rest of the file, whether or not that would work out with more than 2 files, different file types, etc. Or "what is a file header? does a textfile have a file header? are binary files and text files the same?". Nothing answered. The content of this video pretty much summarizes to "it's two files in one file because it's two files in one file and each program just picks its file" without really going into greater detail than that. Why and how exactly that works or what the PDF and ZIP mask you apply in your 010 editor actually do and look for is completely unknown.
I agree, but I think he does it by questioning the basics and encouraging you to ask the right questions, not giving answers or explaining details.
LiveOverflow 2016 - finding a parser differential in loading ELF
LIveOverflow 2020 - what is a file format
just joking. top notch stuff I didn't know.
The 010 tool is pretty awesome, reminds me of something similar I made for terminals - but more advanced and with a hex editor. Thanks for sharing that!
Love the musicians of bremen image and the new videos format!
2:38
There's is magic though! At the first bytes of the file
Thank you for dark mode!
I'm someone in the category of people freaking out about closed source binwalk so I see files agnostically already. But I thought - "Hey, If I give this video a chance I'm sure LiveOverFlow will teach me something new" - All I can say is WOW The idea of not encapsulating but "programming" a file into the zip format is a complete paradigm shift. I will never be able to look at files in the same way again holy shit bro you just blew my mind.
010's template feature is the best one I have seen yet in any hex editor. It's really useful for reversing proprietary file formats.
Nothing beats Hex Editor Neo. Unfortunately it's not free.
YOU CAN DRAG FILES TO NOTEPAD???!!!!
I used to do this with BMP images. Compress all your files, and combine them with a BMP image of your choice. Just a simple copy command in cmd will work and you can hide some stuff from people. copy /b image.bmp files.zip image2.bmp. I used to hide my games in school lab PC this way.
This is also exactly the way SFX archives work. Open an SFX archive (.exe file) in any Zip program and it will show the contents. Fun stuff indeed.
Should have been said Linux has "xdg-open" and "file" to not scare the windows users to much.
@@egesanl1 Pure elitism. Its the biggest problem with the linux community.
Its how they repeatedly shoot themselves in the foot while pretending they actually want it to become more mainstream so it actually gets more big company support.
@@egesanl1 Most people are dumb. Therefore most Windows users are dumb. Not knowing how to install or use Linux doesn't mean you are dumb, but generally being able to install and use Linux means you are at least smart enough to understand some basic lower level computing concepts. Windows comes on most PCs by default so there's no "filter" like there is with Linux.
Claiming that if you use Windows you are dumb is equally dumb. I run Linux or Windows depending on what I need to do - the right tool for the right job. But it's not a totally unwarranted assumption that people who only use Windows are generally less knowledgeable about computers.
@hgfd
just made a imaginary ctf
title:Downgrader
Difficulty:Easy(First Challenge)
This new router supported ssh, web server and other cool stuff but manufacturer blocked them because of "Security Issues" in the early days you could have downgrade the firmware to the older version by using uploading firmware file which it's zip file named "DEBUG_FIRMWARE_OVERRIDE_123.FIRMWARE" but it's blocked now and they locked it even more by only allowing "valid" media, document files can you downgrade it?
attachment:older firmware, address to challenge server
solution:make older firmware file valid pdf file using program and name it "DEBUG_FIRMWARE_OVERRIDE_123.ZIP.PDF" and upload it to server
it's just a shitty idea not gonna lie
You can do this in windows with the copy command and the /B switch for binary
"copy /B picture.jpg+folder.zip new.jpg"
I learned this when I heard that a promotional desktop wallpaper for Portal had an Easter egg in it. If you opened it as an archive the ending song "Still Alive" mp3 was in there. This was a triumph!
If u are here from jusctf gimme a link or else.
Dark mode for intro :)
A rather long winded video to say: PDF ignores bytes until it finds headers. Zip won't show files that have zero length set in headers. So you can make a file that is both a valid PDF and a valid ZIP with another file in it.
That was very informative and entertaining. Learn something new today. Thanks :D
Files being source code is so blatantly obvious I never though of it, but when you pointed it out it instantly made sense how one could play with the file.
and espessially when you showed that ansicphpbash "file format" :)
How many of us have struggled with one code trying to parts a bit of another code as a string/variable, or what ever, only to realize you forgot to reformat it so that the thing you're trying to pass is not a being interpreted as actual code.
À true hacker spirit, reminds me of my youth. It pleases me to see young talents, there are so few of them, while I thought 30 years ago that there would be countless hackers far better than us in the future. It never happened, everything has gone down, so these videos are refreshing.
This is one reason why simple things like the `file` command in Linux are *so useful*
porn collectors be like: after all these years trying to hide our collections and now you show this?
the thing that went in my head when I see 6:26 is C
I didn't realise there's php and bash until you told us
I like this channel a lot :) I really liked how you explained that zip files actually program zip to make a file, rather than "contain" data. You could have brought in zip-bombs at this point, because then a 1Kb file making a 42Gb file kind of shows how it's generative. Having said that, I think you made one point unclear, which was that programs sometimes "ignore" bytes they don't understand, like scanning for a dog and not seeing the cat. PDF is weird in that it looks for it's magic sequence anywhere in the file, ignoring the zip at the beginning. ZIP, for example, wont do this. Python wont do this. etc etc. Nevertheless, through use of commenting, which is like programming zip to deliberately ignore code, you make polyglots. Polyglots almost always use commenting. If commenting wasn't possible, making polyglots would be WAY harder! Programs don't typically ignore anything, unless you trick them into it.
Also this whole video strikes at the heart of a big problem in Europe, what does data privacy/security/illegal information actually mean? What if a picture file, for example, looks like a beautiful sunset in one image viewer application, but child-pron in another. Is the *file* child-pron, or is the image-viewer *making* child-pron when it's displayed? Or both? Do you need to have both on your computer to break the law? TL;DR we all have child-pron and state-secrets on our computers, sometimes in the same file, we just don't have the software to view it.
Great video, I love thinking of zip as interpreter for .zip "source code"! I also just love the concept of weird machines
Recently, for educational purposes, I've written a couple of image file formats and also I'm writing an interpreter, so this is right up my alley :)
2:38 LOL at no magic joke 😀
For those new to this, the linux file format recognizer (the file command) is configured in a file called /etc/magic
Anyone ever use that file? I though it just stayed empty and people used defaults, #!, /usr/share/applications/, or whatever.
The term "magic" comes from en.wikipedia.org/wiki/Magic_number_(programming)
Just checked on my system, /etc/magic does not exist. So at least for Arch Linux it is in /usr/share/file/misc/magic. It reads a compiled version (.mgc) first.
File exists in Ubuntu 😃
Pretty sure the joke was about the magic numbers like Sam said
It's the first few bytes of a file / the signature which you can teoad to find out the format and other info like the version of the program uses to create that file
4:58 yo why the virus looking like sans
What if a CTF challenge goes like this: it's a .pdf file and also is a .zip file. The .pdf gonna be something that will make people to find for a real .txt that writes "Psst, this pdf file can also be executed by zip" lol
My mind was absolutely blown away. I've never thought that the same file could be interpreted differently. This is eye-opening for me.
loving the new brand design
Tha thing from the Thumbnail is called "bremer Stadtmusikanten" it's a statue in bremen (germany) lel
I remember my frustration when first switching from Windows to Ubuntu for work projects. I didn't understand how the Ubuntu file system structure worked, how I should manage individual files, and how to work with them. I asked people questions like "Where should I install programs in Ubuntu?" and similar. At that time I thought to myself self "Gosh, Windows seems like a much cleaner system, everything is neatly organized, I have a dedicated folder for Program Files and the only thing I should do is click shortcuts". But after learning the Ubuntu FS layout, understanding how PATH actually works and what is it intended for, and a lot of other tips and tricks Windows FS principles feel rather restrictive. Although I now daily-drive Windows for home and work stuff (Windows made MAJOR progress towards being a developer-friendly system in the last few years), I still miss some of that clean simplicity and infinite possibilities that a proper GNU/Linux system provides.
6:30
Agnosticism as a programming language
from justctf
cool stuff. This is the time to weaponize it.
7 views 22 likes, lets go
I can totally relate to this. Grew up with windows but entirely switched to Linux like 8 years ago. I have a completely different understanding for the filesystem now.
Over the last twelve years, I have tried to switch over to Linux no less than 6 times. I am driven insane by trying to deal with obscure problems, and have to turn back to windows every time. Perhaps it's just because I'm not a coder and am just a power user. But if a power user like me can't take the frustration of Linux, I can't imagine normal people ever being able to take it.
@@LeoStaley if you are a windows user who needs windows it will be difficult I guess. I'm a software engineer, so I have a big benefit from using Linux and only downsides when using windows so that made the decision easy for me.
@@LeoStaley never fully switch to linux, dont get tricked by the masochistic nerds
Don't start talking about .lnk files
Thank you for sharing this knowledge, we truly live in the age of information!
People growing up with Windows... that's terrible childhood
I do at the moment with a mixture of win and ubuntu
you must be very young to say that
@@foxinrot Ubuntu FTW! BTW I use arch now.
i only use windows because games
Yes but actually no!
Ange ist Franzose. Er wird nicht "Ange" gesprochen, sondern "Ongsch", so wie hier: www.pronouncenames.com/pronounce/ange
Epic dark mode intro!
Don’t listen to the weirdos. Loving the low light setup. I see you clearly and nothing else. Minimal and clean
I opened this video with Microsoft Excel.......now my Windows OS is patched and has 0 security bugs. Thanks.
I think the CTF about finding the hidden stuff in a file would be a great challenge for a stego CTF. And it is valuable experence in identifying stego. And if your getting into information security stego experence is very important.
Didn't have to be part of the zip structure (as long as the entire ZIP file is less than 1024 bytes). This is a side effect of the PDF format, which allows random data in the first 1024 bytes.
>redstar-winOS
i see what you did there
Often PDFs will be read from the end. PDFs are designed to be written linearly, but a linearly written file is inefficient to access in a random fashion. Thus, after writing out the contents of a PDF, the writer will usually append an index at the end. The index is placed at a known offset from the end of the file, and readers will generally access that first, and then only read the parts of the file that
they actually need.
Interesting Tom Scott Video on why the dark background look jaggedy / pixelated ua-cam.com/video/h9j89L8eQQk/v-deo.html
Dark Mode Intro
What I learnt today:
- PDF files are dogs
- ZIP files are cats
2:41 this is actually wrong.
wiki.archlinux.org/index.php/XDG_MIME_Applications
basically, you do `xdg-open $FILE` and it uses the program it thinks is best
This reminds me of that time I concatenated a shell script unzipping itself with a zip file to have basic self-extracting archives.
Source code is interpreted, but needs to be compiled on forehand. So you already need a computer language to create source code.
All of history this has been done with ASM, which was used to build OS'es, languages & compilers. Let's call ASM a computer source language then? A definition that catches php, ansi c and bash from a single syntax. Maybe the most traditional and classical computerlanguage ASM even equals the definition of compiled software providing a programming language? Or binairy package processing on your system or something?
I never get it. Why don't they define IT based on integers vs. strings & glyphs? That's how you make information after all. Can't wrap my head around that.. Do people still believe computers can operate directly on the alphabet? Potential brew of blasphemy, heresy, and insanity all in one machine, if you ask me.
I'd like to point out some flaw in the video
1. The Person A and Person B analogy, rather than just "liking" it should've been "only knowing" or "ignoring except". PDF program would read the PDF code, and not the ZIP code, and the other way around for Zip programs
2. Rather than changing the file name extensions, you could probably just run the file with the program right away. Though I'm not sure since I haven't tried it, but I'm sure it would run as is.
Anyway, great video as usual, thanks for sharing this information with us!
This is great, the education was great, and the whole building up to the message about the CTFs was hilarious, but I 100% agree.
I’d say programming/code implies executable instructions (+ interspersed data) in a turing-complete programming language (or assembly / machine code); while a data-only format is “encoded”/“an encoding”, not programming.
there is a similar trick to hide files inside an image using cmd, I found it on youtube probably around 2009-2010 (I know it uses the copy function and worked on xp)
after a quick google search here is the command>> copy /b source-image.jpg + your-archive.zip target-image-file.jpg
if you use the command you end up with a file named target-image-file.jpg that can be opened as a jpg and looks like source-image.jpg
but when opened with winrar, will show the content of your-archive.zip (just tested with 7zip and still works), also I tested with png and still works
Edit: After a bit of research, it looks like the + is concatenating the files and the /b is only to indicate the files are binary
This gives me new appreciation for the mantra: parse don't validate. If you just look for what you are expecting, you might admit more than you were bargaining for.
I'm here a bit late, but I'm going to explain why I think it's not showing the pdf as a file:
Basically, what I *believe* it's doing is the zip file doesn't think the file has "started" until _after_ the pdf.
What I mean by that is: zip files are able to actually have a blob of arbitrary data in front of them which they will ignore. They only read everything after a file signature, which is 0x04034b50.
I found this a while ago in a stack overflow post which basically took an executable script and a jar file and combined them. All the script did is it ran itself using the `java -jar` command. You can find it here if you want: stackoverflow.com/a/41829433
But I believe this is probably doing something similar:
the zip file doesn't "start" until the end of the pdf, so it ignores that huge blob of data it doesn't understand. And the pdf file "ends" before the zip file, so it just ignores the blob of zip file data that _it_ doesn't understand. (Which I assume would be indicated by a couple of bytes that are used to "end" the pdf.)
This way, you can contain multiple files within the same file and have them be read differently. Because each program starts reading after a certain set of bytes and stops after another set of bytes.
Ideal video to start reading my digital forensic course. As if you know I am procrastinating
Hello, do you mind if i ask you something? this is actually off-topic. I watched your previous videos about fly hack in Pwn Adventure 3, and i think it might be related to my intention to do so on another game, especially the "third-person camera" based ones. I'm also a stranger to hacking, would you mind giving me tips to somewhere i could start? Thank you :)
"I could use this to make a red herring puzzle."
>DONT let players guess what you have hidden
"Damn, he knows what I'm going to do."
Ehm.. Why does the thumbnail show the "Bremer Stadtmusikanten"? This is the symbol of my home city in Germany lol, didn't expect to ever see that on UA-cam!
PoC||GTFO PDFs are all zip polyglot, some of the PDFs are also SNES roms and boot sectors... They also have an exploration of how it is done in the journal volumes.
Highly recommended reading that journal.
This is kind of an old trick for piracy, I remember back in 99 or 2000s to download movies or games from emule, wich would come as an Zip file it would say corrupted when open, but if it was a movie and you pull it to the top of a video player it would reproduce because in reallity it was an AVI with some kind of codec but the termination was just changed to .ZIP or something like that, back then in windows 98 I think there was even no way to hide the .xxx thing in the files like you do it now :s
Ohhhhh man ...i lobh your talks🔥🔥🔥🔥 the last part was funny " please dont give such kinda Ctfs"
I disagree regarding guessing challenges in CTF. Guessing has value in application of skillsets and building muscle memory.
A good CTF challenge should include a balanced mixture of skill development portions, like you are advocating for and some elements of guessing.
These kinds of tricks seem very popular on Discord. People like making videos that are different every time you play them and weird things like that...
While challenges that teach how to apply steganographic techniques are great (like your example), I wonder what challenge you would create to actually teach someone how to identify stego. I agree with you that guessing at the technique seems silly, but maybe a challenge that gives you a normal looking website, with steganography applied somewhere, and you could be tasked with finding the file that contains another file. This way, you wouldn't have to guess at the method, but you could improve your analytic skills by, say, looking at the file sizes of images on the websites to see if one looks abnormally large, or if one has weird file headers like having both PK and JFIF. Maybe have a stretch goal giving a few extra points for getting the file out, but the main goal would be finding the "malicious" file in the first place. Or say "somewhere on this website there is a zip file containing a pdf hidden in one of the images". The solution to the challenge could be to scrape all the images from the site and sort them by file size to get the most likely candidates, or scraping them then writing a python script to search for the magic bytes of a zip file or something. That way you'd know ahead of time what technique you're going to be applying, and you'd learn how to identify when it is being used and gain some familiarity with looking at files hidden with this technique. Thoughts? (I've only done CTF challenges, never written any, so if this idea sucks feel free to say so!)
Oh, expected there to be an error since PDF is not using deflate like docx and odf files.
Clever way to manipulate file to run it in multiple programs successfully :)
in brazil uploading zips was forbidden in many free hosts (20 years ago)
they hid the download file inside .jpg files
if you open the file using any image viewer just show random image
if you rename .zip comes the download
its possible to do that just using a simple .BAT script
Not only is the zipfile code, but you can actually write that code legibly (a real source code) with a good macro assembler. I have done something similar for emitting object code without explicit support for the object code format in the assembler.
I am a bit late here, if programs look for the magic bytes instead of considering starting bytes as magic bytes, can't we just cat a.png >>file; cat b.pdf >> file and create the polyglot?
Like a decade ago, I stumbled on an image on 4chan with instructions on it which gave you told you how to extract files from the image, to use in creating your own fake coupon barcodes.
I got some $0.50 monster energy drinks, and a few days later, you couldn't use coupons at self checkout anymore. I got scared noped out, and deleted all the files, including the image.
we can go even deeper, it is possible to make file without filename and extension but with content