An example why I love the internet. There are people still exploring the 1997 game Frogger! In this video we will look at an old compression algorithm to learn how a compression works in general.
Kneesnap is the guy who abused a privilege escalation bug on public Minecraft servers in 2014/2015, including on my Minecraft server. This was done without permission and was done to destroy my server.
I can't help but think this compression format is optimized for tape storage. Write forward, then read in reverse so you don't have to rewind. Inline "dictionaries" to prevent seeking, assuming in particular that you're dealing with "big" files that don't fit in memory, and possibly streaming your output to another tape.
It was first designed in 1982 so I suppose it is likely that it could have been deisnged for cassette. Do many tape decks support playing backwards? If you turn over a cassette and begin playing it is turning backwards on the first side but you are now reading on the other side. Or do you mean some other kind of tape storage like those tape formats purposely designed for data storage?
It's actually how the fastest decompression algorithm work, I was talking about ANS in another thread, and the reason why it's decompressed backwards is because of the algorithm. The best compression ratio/decompression speed algorithms available today work like that, they are based on Asymetric Numeral Systems (ANS) and on Finite State Entropy (FSE). They're the most efficient for sth like tapes up to today where these algorithms are used in the latest games, see Oodle library, ZSTD and the like. They're all based on LZ principles for codec.
@@bangerbangerbro I was thinking of the archival tapes they used to use for mass storage. They could freely seek back and forth, but performance was limited by the physical locations of data on the tape, how much the tape needed to be advanced or rewound from its current position to reach the desired data. @MrOokaze This still doesn't alone explain why they would put data size information at the *end* of the file, where the program has to scan past all those 0's to find it. It's not really a safe assumption (especially in the 1980's!) that a big compressed file will fit entirely into memory, and so "they have to load it into memory either way" doesn't work if half the file is evicted by the time it reaches the end.
@@Keldor314 The Amiga didn't need to scan past the file size. If you couldn't safely allocate memory in AmigaDOS EXEC, you didn't get to load the file in any case, so the chances of not being able to load a compressed PP20 file that wouldn't physically fit into memory was nil. Also, there is no "scanning" needed. When you physically load the file in AmigaDOS or any AmigaDOS compliant util like Power Packer was, you would know ahead of time by interrogating the file info block as to how big that file is. As you know how big the file is, you would simply add the filesize to the location you loaded the file into memory, then subtract 5 bytes to get to the depack file size (1 byte for the $1f control byte, and 4 bytes to get to the start of the longword with the depack size). Now that you have the depack size, you would then again allocate memory in AmigaDOS using EXEC, and it would tell you if you had enough available memory to depack or not. There was absolutely NO loading of files in AmigaDOS and "hoped" they fit, Amigas had as robust a file system with file allocation to ensure you could load (or not) with absolute confidence you were not loading "blind".
Live, I faced a similar issue with compression. Bit of a funny story actually, I tried to reverse engineer a GBA game, and it had compressed elements in it as well. There were no file headers, since GBA games are built differently than PC games. I can PM you the whole story it's very interesting, but I tracked down the lead developer and he told me so much about the compression algorithm they used, as well as why they used it and why they implemented it the way they did. Fascinating stuff! They came up with their own pointer system to locate strings in compressed chunks!
@@LiveOverflow I know right? I'll email you the story. Geek out a bit. I'm not much of a writer, and also wouldn't know where to post or how interesting that would be to most people.
soo... if its a GBA game, chances are that it uses some form of the algorithm explained in the video, namely LZ77, another one of those Lempel-Ziv algorithms... The reason being that the GBA hardware came with a summary of built-in BIOS functions you can call via Software Interrupts, and 2 of them are used to decompress file formats in eighter Huffman Encoding, or LZ77, whereas the latter is widely used on different games (for example Pokémon, if thats a thing that interests you :P)
This video reminds me of my first or second semester of studying mathematics where I learned about the Huffman algorithm. I was so impressed by it (and the simplicity of the theory behind it) that I implemented it myself and experimented with tokenization for better compression and block sizes for faster compression. I also tried to optimize the binary output format to minimize seeking :)
There was this game called Knight Online (well it's there on Steam now) that I played about 10 years ago before it became shit and the data files were encrypted. These data files were tables for timers, duration, damage for spells, attacks and items which could be modified in memory after they were decrypted but after regular patching of the game causing pointer changes and what not I spent a day in a debugger to find out how the files were decrypted. The end result was a function that had about 25 lines of code but after some refactoring and heavy use of logic I got it down to 3 lines. A week after I released it to the public the game patched again but this time they used some heavy, possibly 256 bit, algo that no one would be able to decrypt. The cat was out of the bag tho and since I had the decrypted tables already I threw in a hook after the decryption routine to just copy the already decrypted files into memory overwriting what the decryption routine had just put there (altered to cheaty tables). They didn't know what to do after that...
I love this kind of bit/byte level explanation of comp sci/comp-e! It's very useful, and not enough people seem to care about this low level stuff. It's nice to find this kind of thing available.
PowerPacker was a compression format heavily used for Amiga software. Written by Nico Francois. Nico wrote a lot of Amiga FIDO BBS software (Trapdoor, I believe was written by him, although my memory is a little hazy about that now)
The Amiga and its demo scene, all my childhood, there were some pretty amazing experts in my time and even before, the quality has only regressed through the years, as people were not constrained by the hardware as we were, so don't have to think as much. I'm disappointed because back in the day, I thought I'd see more and more proficient people in CS field, but it actually has gone in reverse.
With everything backwards, instinctively it feels like all the compressed data is being done on the stack as it's growing downwards. That's just how it feels to me.
Love,love, love your outro explaining WHY this was useful for real world skills even though you're using an old video game as data - the concepts and knowledge are universally beneficial for computer scientists. Thank you kneeslap & thank you Liveoverflow - stuff like this reminds me why I LOVE this subject!!
Oh my God!!! I've been looking for this game for years as this is my childhood. I was searching for "animals jumping across road" cuz that's all I remembered. And I finally know the game name! Thank you!
Just today I was investigating the Oodle Kraken real time compression algorithm and I was studying the ANS used by it (with FSE). Very interesting and more advanced than what I coded 20+ years ago, when I applied a subset of ANS encoding known as Huffman encoding, very efficient for sentences. It was good times when I learned these things back in the day, I used this compression in one of my own software, for which I still have the code developed originally on AmigaOS, now ported to GNU Linux GTK.
Literally yesterday I got a programming project at the university - compression with the LZ78 algorithm. And today LiveOverflow uploads a video with almost the same algorithm. What a coincidence, thanks!
I grew up playing this game. I didn't realize the fan base for it was big enough that a reverse engineering of the game happened. I would love to see that with Hot Wheels Stunt Track Driver.
@@Galahadfairlight Yep. was thinking of noisetracker though. I always thought MK was for mahoney and kaktus as well but googling now says its Unknown of D.O.C.'s initials (shrug)
@@upthebuffer1921 Yeah, it would have originated from Noisetracker, but then thats where Protracker originated from. Noisetracker was a bug fixed and redone Soundtracker, and Protracker was a continuation from Noisetracker but by different people. Protracker just seems to be a generic term now for anything on Amiga that used the mod format now.
I’ve always been interested by digging around the files of games, which consequently often involves reverse engineering proprietary formats like these in order to extract something of value out.
Compression algs are usually pretty old. They haven't really evolved much. The newest significant discovery is ANS (Asymmetric Numeral Systems) in 2014
I got this video recommendation and thought "Hey, I used to reverse engineer, weird compression codes!". When watching, I kept getting weird deja-vu. I went to my repo where I had the painstakingly reverse engineered PP20 decoder from ~12years ago and the code was almost identical. So went to the Frogger repo and saw my name in the decoder file as the original author :D. Nice nostalgia attack :D. I already forgot that I was already asked to allow using the code. A note on the origin of this code. I had some Amiga mod music files (mostly game music) and working on a mod player. Some were compressed. Many using PowerPacker. The only thing I could get my hands on was the .library binary file with 68000 machine code, which I had to disassemble and convert to Java a couple lines of Java and catch all bugs testing on real files. Quite fun. Thanks for the video and a reminder of the past hobbies.
Hehe, wow. Nice work. This version of Frogger still feels like "new Frogger" to me with the early 1980s version being the original (particularly the one published for the PC by Sierra in 1983).
No, this could be a youtube automatisation translation bug, on your national/regional prefered language youtube would automatically write and translate some title on preview, but when you click the video it shows real title. I. My case, I sometimes get titles in russian on the previews, but when I click they become english. There are different variants of english in yt database, you could be getting indian english or even UK english. Yes they are technically the same language but its not what youtube thinks
This is common in UA-cam if the creator changes the title after you receive the notification - the notification simply has the title as it was when you received it
To try and answer "what is up with those 7s?", my guess is that this is a form of customizable compression. There seem to be multiple (3?4?) compression levels that each run of bits is annotated as. Each compression level could have a different max size, which would be nice if you knew your data had specific characteristics about the necessary sizes. By setting the max size, you read (and contain) only the necessary number of bits as needed. This could save on compression. 7s might just be the default bit length (thus 128 is default max size) for all compression levels. I'm sure you could just hack the first 4 bytes and bitstuff appropriately to test this idea. This all is sort of guesswork on my part. I'm not familiar with this compression algo (aside from this video). This is just "engineer's intuition".
I remember 'compression level' coming up quite a bit. I wonder if one of those 'compression levels' is simply a very heavily used minimum lengths that could be referenced in one of those references for long lengths rather than the variable compress length it did at the end. Delving into the original packer's sources or decompiled main function could be interesting.
1991 I went to an amiga computer camp where I was learning amiga basic. Each of us were developing a game with a frog who has to cross a street with cartraffic. Perspective was from up. So frogger reminds me on this long time ago. Now I know the source of the challenge. Wikipedia: Frogger, also called Frogger: He's Back, is an action video game remake of Konami's 1981
The DOS game Zone 66 uses LZW compression, and I've been trying to figure out how to unpack it for years. I'm going to have to rewatch this video a few more times
LZW is quite different from LZSS. While there are somewhat similar principles involved, learning about LZSS won't help you too much with LZW. If you do want to learn more about LZW decoding, maybe look into some GIF image decoders... that uses LZW and tutorials on that would probably be more applicable to the Zone 66 data.
Speaking of game asset file formats, the TPAC format used in Mount & Blade II: Bannerlord is still uncracked. The game is brand new and no one has figured out how the file format works. If you are interested in this type of thing, check it out. Once it's cracked there will be so many possibilities for mods.
The compression as implemented is not using the PP20 format as well as it could. It was likely programmed to conform to the reverse-engineered decoder in a too simple way. There are a couple of ways to encode the input to conform to the decoder and achieve much better compression. The original PowerPacker had settings for offset/length limits to tune the speed/effectiveness of its compression. (I do not think original PP would compress AAAAA... as repeating of A. Instead it would take more of the AAAA and produce a shorter sequence. One thing that people seem to be confused about is why it is doing the decompression in a reverse order and had some of other quirks like the "header" and offset at the end. It is designed to be **in-place** decoder, you allocate memory for the uncompressed file, load the file into the buffer and unpack in the same buffer and have assurance that you do not overwrite the source before it is processed. That is also why you can have non-compressible file encoded using the original data, so unpacking in-place is a no-op. Also if the uncompress loop output catches up to the source data, it knows that it is done, saving some cycles for hard to compress files.
funny, i also had a similar compression algorithm to look at a while back, when i wanted to look at what data was transmitted by a particular game to the game server. if anyone is interested, here is the story: at first i noticed that the json it sent was starting out as readable text, but was more and more garbled up the longer the string got. so i reverse engineered the relevant part of the game. turned out the game dev implemented their own adapted version of LZSS, so i wrote a decoder for it. to be more precise, i copied the reverse-engineered logic from the game into my code without understanding it and wrote tests for it to make sure i copied it correctly and it still behaved like the original. then i refactored my code step by step, verifying via the tests that my code still behaves exactly like the original decoder, and after a while i had some very human-readable and understandable code that was doing the exact same thing as the game and i could analyze it properly and understand what it does. in hindsight, knowing LZSS before trying to do this would have been really helpful, but i guess thats almost always the case, right? ^^ but i wasnt done yet, because for the game to still work while im reading what it sends, i had to also encode it again and send it to the server, as well as doing everythin for the server response as well. i chose the lazy approach and only encoded literal chunks. this made the "compressed" payload larger than the uncompressed json strings, since there were now all of these additional "the next block is a literal block" headers in there. but it worked and neither the game nor the server noticed that i was decompressing, reading and "compressing" their communication data in real time. it was really funny to see that i could get away with that xD this all took way too much time, but it was a really fun project to do. i often compare these kinds of project to dark souls bosses: it is a lot of work up front, but when you finally get it done and it works, you feel like a total badass, even when you cheese the second stage :D
I was just about to post that pp20 is an Amiga power packer header when you mentioned it. There's a plugin for total Commander to unpack Amiga powerpacker files.
haha I really enjoyed this as an old Amiga scene person because as soon as I saw PP20 i thought of PowerPacker.. i didn't skip ahead and got a nice surprise when it actually turned out to be that :D Memories.. thank you, i still remember... I think there's possibly a PP13B as well but its been a long time and I could be wrong.
Sounds fairly similar to Shining Force 2's character pattern compression. A 0 bit means copy the next word into the buffer. A 1 bit is followed by 11 bits of backward offset and 5 bits of repeat count. It starts that far back and copies forward from that location. The longer patterns such as the fonts are compressed in a much more complex manner involving building words a nibble at a time using a rotating stack palette and varying numbers of bits to indicate the rotation index. Map data is compressed with yet another algorithm.
I wish people would nerd out about Harry Potter & The Chamber of Secrets on Game Boy Color... It's a hidden gem RPG and I'd love to learn it's scripting language so I can port the other movies.
I really hope some good mods are made for the game. I have it on my Windows 98 build and I would love to play them on sorta accurate period correct hardware. (Most of it is tail end win98 hardware for maximum performance to play with KernelEx)
Cool. Right now trying to reverse out some compression stuffs of a PSP game which is LZSS compressed, because I need the text (to practice Japanese while translating without OCR) and music, and LiveOverflow posted this
pp20 is power packer the game may have been on amiga too because i have seen the power packer used for making the music files mod be compressed and yet playable or even used as an archiver.
He might talk about well documented algorithms, ACE entry points, or even pokered. I hope he asks Crystal_ for some information too, that guy is the god of gen1 and gen2 Pokemon.
Very informative, thanks! I'm trying to decipher the object files from Ridge Racer on PS1. Unfortunately, there are no recognisable file headers at all, most of the chunks of data have the first few bytes left blank, so hard to know where to start.
I've been wanting to get into understanding save game checksums, but I have no clue where to start as far as reversing. I am very adapt to reversing in-game memory functions but algorithms still confuse me
This Pokemon world seemed gigantic to me as a child and they still feels bigger than Skyrim , probably because I played Skyrim for much longer and nocliped over the map a few times ;)
i started to reverse some '00 good piece of electronic and this really inspiring. i thought about doing some videos like yours also but i have so much to do yet especially SW wise (vxworks stuff)
It'd be cool if one could reverse engineer the save file from Paradox Games, it could be used for cheating through, because ironman mode is the one that compresses save game data.
It's not that surprising to find old file formats still in use. tar was created in 1979 for writing to tape drives and we still use that frequently today.
The Amiga was around for quite a ways into the Windows 95/PS1 era and it was a real devs workhorse: Psygnosis for example was making the leap to PS1 and you can see the fingerprints of Amiga dev tools like Dpaint all over some of their games, in Alundra for example.
I'd be nervous making videos about modding/reverse-engineering Nintendo games, aren't they known for throwing copyright strikes left and right in these situations?
MWD means just a another version of *WAD* files. *WAD* format got know in *DOOM* Engine Files, but it was first used in *WOLFENSTEIN 3D* both by *ID Software* Not related to WAD, but related with reversing *You must take a look at **www.OpenTTD.org* history. A whole hacking community joined forces to reverse a entire game and release a opensource version of *Transport Tycoon Deluxe*
In resume, first was TTDPatch that patched the original EXE for improvements and remove some limitations. In 2003, *Ludvig Strigeus* (uTorrent and Spotify Developer) reversed it with community help and wrote the first version in C. After that, is history...
Okay, so you find an ID which is PP20, now since I'm older than you and been coding longer, I remember a list of file ids. I haven't watched enough into the video, but my guess is that it's a PowerPacker compressed file. Which was actually an Amiga compressor that wasn't as good as Crunchmania. Let's see how good my guess is.
While i do like the technical aspect of thge compression/decompression algorithm i was a bit disappointed by the fact that the "Frogger" game discussed is not the original "Frogger" game (see en.wikipedia.org/wiki/Frogger) but a modern game. I was curious to see what kind of compression they would have had in the game from the 80s.
I doubt they had much in the way of compression, at leastnot on home consoles like the atari. What they did have on the atari though was screen modes, where the the background was either repeated, or mirrored, which I guess you might be able to call that a space saving technique. They didn't have alot of of processing power to spare for compression, most of the game logic was handled between line updates, in the over scan. For the atari specifically you probably could have made a chip that did the decoding for you, they certainly did have a few cartridges with extra hardware, but they costed more to produce but it's totally possible given the stripped down bear silicon nature of it. But you're talking about the arcade game, they probably didn't need it there either, most arcade games were purpose built around the game they contained, and given their sheer cost I'm sure they could splurge a bit on tech.
No way. I got this video recommended cause I am currently getting into RE and try to reverse the save game files of Assassins Creed 4. Then I searched for some tools that can detect the compression (file was clearly compressed) so I just gave 7zip a shot and opened it to show me some info and it actually detected the file as compressed with LZMA. WTF?
An example why I love the internet. There are people still exploring the 1997 game Frogger! In this video we will look at an old compression algorithm to learn how a compression works in general.
Ok but why would yo
im working aswell on an old game but on the server side. If you have any questions feel free to message me.
Kneesnap is the guy who abused a privilege escalation bug on public Minecraft servers in 2014/2015, including on my Minecraft server. This was done without permission and was done to destroy my server.
I can't help but think this compression format is optimized for tape storage. Write forward, then read in reverse so you don't have to rewind. Inline "dictionaries" to prevent seeking, assuming in particular that you're dealing with "big" files that don't fit in memory, and possibly streaming your output to another tape.
It was first designed in 1982 so I suppose it is likely that it could have been deisnged for cassette. Do many tape decks support playing backwards? If you turn over a cassette and begin playing it is turning backwards on the first side but you are now reading on the other side. Or do you mean some other kind of tape storage like those tape formats purposely designed for data storage?
It's actually how the fastest decompression algorithm work, I was talking about ANS in another thread, and the reason why it's decompressed backwards is because of the algorithm. The best compression ratio/decompression speed algorithms available today work like that, they are based on Asymetric Numeral Systems (ANS) and on Finite State Entropy (FSE). They're the most efficient for sth like tapes up to today where these algorithms are used in the latest games, see Oodle library, ZSTD and the like. They're all based on LZ principles for codec.
Interesting, I've never considered that, but it would make a LOT of sense. Maybe someone from the era is familiar?
@@bangerbangerbro I was thinking of the archival tapes they used to use for mass storage. They could freely seek back and forth, but performance was limited by the physical locations of data on the tape, how much the tape needed to be advanced or rewound from its current position to reach the desired data.
@MrOokaze This still doesn't alone explain why they would put data size information at the *end* of the file, where the program has to scan past all those 0's to find it. It's not really a safe assumption (especially in the 1980's!) that a big compressed file will fit entirely into memory, and so "they have to load it into memory either way" doesn't work if half the file is evicted by the time it reaches the end.
@@Keldor314 The Amiga didn't need to scan past the file size. If you couldn't safely allocate memory in AmigaDOS EXEC, you didn't get to load the file in any case, so the chances of not being able to load a compressed PP20 file that wouldn't physically fit into memory was nil.
Also, there is no "scanning" needed. When you physically load the file in AmigaDOS or any AmigaDOS compliant util like Power Packer was, you would know ahead of time by interrogating the file info block as to how big that file is.
As you know how big the file is, you would simply add the filesize to the location you loaded the file into memory, then subtract 5 bytes to get to the depack file size (1 byte for the $1f control byte, and 4 bytes to get to the start of the longword with the depack size).
Now that you have the depack size, you would then again allocate memory in AmigaDOS using EXEC, and it would tell you if you had enough available memory to depack or not.
There was absolutely NO loading of files in AmigaDOS and "hoped" they fit, Amigas had as robust a file system with file allocation to ensure you could load (or not) with absolute confidence you were not loading "blind".
Live, I faced a similar issue with compression. Bit of a funny story actually, I tried to reverse engineer a GBA game, and it had compressed elements in it as well. There were no file headers, since GBA games are built differently than PC games. I can PM you the whole story it's very interesting, but I tracked down the lead developer and he told me so much about the compression algorithm they used, as well as why they used it and why they implemented it the way they did. Fascinating stuff! They came up with their own pointer system to locate strings in compressed chunks!
woooow that sounds awesome. you should write that up somewhere!
@@LiveOverflow I know right? I'll email you the story. Geek out a bit. I'm not much of a writer, and also wouldn't know where to post or how interesting that would be to most people.
could you also send it to me
soo... if its a GBA game, chances are that it uses some form of the algorithm explained in the video, namely LZ77, another one of those Lempel-Ziv algorithms... The reason being that the GBA hardware came with a summary of built-in BIOS functions you can call via Software Interrupts, and 2 of them are used to decompress file formats in eighter Huffman Encoding, or LZ77, whereas the latter is widely used on different games (for example Pokémon, if thats a thing that interests you :P)
@@1337SBird you're right! It's based on LZ77! JCALG1, in fact.
This video reminds me of my first or second semester of studying mathematics where I learned about the Huffman algorithm. I was so impressed by it (and the simplicity of the theory behind it) that I implemented it myself and experimented with tokenization for better compression and block sizes for faster compression. I also tried to optimize the binary output format to minimize seeking :)
There was this game called Knight Online (well it's there on Steam now) that I played about 10 years ago before it became shit and the data files were encrypted.
These data files were tables for timers, duration, damage for spells, attacks and items which could be modified in memory after they were decrypted but after regular patching of the game causing pointer changes and what not I spent a day in a debugger to find out how the files were decrypted.
The end result was a function that had about 25 lines of code but after some refactoring and heavy use of logic I got it down to 3 lines.
A week after I released it to the public the game patched again but this time they used some heavy, possibly 256 bit, algo that no one would be able to decrypt.
The cat was out of the bag tho and since I had the decrypted tables already I threw in a hook after the decryption routine to just copy the already decrypted files into memory overwriting what the decryption routine had just put there (altered to cheaty tables).
They didn't know what to do after that...
Did editing these tables actually impact gameplay?? That would be pretty dumb on the developer's part. World of Warcraft does not function that way.
Luigi Auriemma wrote so many awesome tools, especially the fsb extreactor and repacker made modding games so much simpler, this guy is a genius.
Don't forget QuickBMS. That is his crowning achievement right there, and his most well known tool.
I love this kind of bit/byte level explanation of comp sci/comp-e! It's very useful, and not enough people seem to care about this low level stuff. It's nice to find this kind of thing available.
PowerPacker was a compression format heavily used for Amiga software. Written by Nico Francois. Nico wrote a lot of Amiga FIDO BBS software (Trapdoor, I believe was written by him, although my memory is a little hazy about that now)
The Amiga and its demo scene, all my childhood, there were some pretty amazing experts in my time and even before, the quality has only regressed through the years, as people were not constrained by the hardware as we were, so don't have to think as much. I'm disappointed because back in the day, I thought I'd see more and more proficient people in CS field, but it actually has gone in reverse.
With everything backwards, instinctively it feels like all the compressed data is being done on the stack as it's growing downwards. That's just how it feels to me.
Hello! I'm a compression algorithm for youtube comments and I want to say that most of the comments are "Why would yo".
Underrated comment right here lol.
Wow, i played this as a kid, crazy to see there's still an active community for it!
This is great! Love how much carefully you walked through the algorithm with the data.
Superb!
Love,love, love your outro explaining WHY this was useful for real world skills even though you're using an old video game as data - the concepts and knowledge are universally beneficial for computer scientists.
Thank you kneeslap & thank you Liveoverflow - stuff like this reminds me why I LOVE this subject!!
Oh my God!!! I've been looking for this game for years as this is my childhood. I was searching for "animals jumping across road" cuz that's all I remembered. And I finally know the game name! Thank you!
Just today I was investigating the Oodle Kraken real time compression algorithm and I was studying the ANS used by it (with FSE). Very interesting and more advanced than what I coded 20+ years ago, when I applied a subset of ANS encoding known as Huffman encoding, very efficient for sentences. It was good times when I learned these things back in the day, I used this compression in one of my own software, for which I still have the code developed originally on AmigaOS, now ported to GNU Linux GTK.
Hey, I worked with this guy for a while! I was working on FrogLord for a bit before I figured out that I couldn't contribute much.
Literally yesterday I got a programming project at the university - compression with the LZ78 algorithm. And today LiveOverflow uploads a video with almost the same algorithm. What a coincidence, thanks!
Literally yesterday I was trying to scrap out scipts and game music from a PSP game and faced LZ compressions
No way! I actually met Kneesnap when he was an admin in a Minecraft server. I definitely wasn't expecting to see him here!
hello again! Which one? I still do Minecraft servers for fun :P
@@proccessingunit2337 I see! well, hi!
I'd love to see more reverse engineering of file formats. Really interesting! =)
I grew up playing this game. I didn't realize the fan base for it was big enough that a reverse engineering of the game happened. I would love to see that with Hot Wheels Stunt Track Driver.
Frogger was awesome, easily one of the best modernizations of an old arcade favourite to come out of the 90s
LZSS algorithms also used on Final Fantasy 7(1997), Super Monkey Ball 2(2002), F-ZERO GX(2003), and Legend of Zelda Ocarina of Time 3D(2011).
My brain got fried halfway through following this
can relate
Can relate, too much weed
Aww ive not seen that Powerpacker v2 header since 1991. There must be something wrong with me that I get nostalgic about 2 bytes.
4 bytes dude, "PP20" is a longword, i.e. 4 bytes ;)
@@Galahadfairlight Sir Galahad you are of course right. I am indeed nostalgic for a *longword*. "PP20" is right up there with "M.K." :))
@@upthebuffer1921 Ah, A Protracker fan I see? M.K stands for Mahoney and Kaktus ;)
@@Galahadfairlight Yep. was thinking of noisetracker though. I always thought MK was for mahoney and kaktus as well but googling now says its Unknown of D.O.C.'s initials (shrug)
@@upthebuffer1921 Yeah, it would have originated from Noisetracker, but then thats where Protracker originated from. Noisetracker was a bug fixed and redone Soundtracker, and Protracker was a continuation from Noisetracker but by different people.
Protracker just seems to be a generic term now for anything on Amiga that used the mod format now.
I’ve always been interested by digging around the files of games, which consequently often involves reverse engineering proprietary formats like these in order to extract something of value out.
Thank you! @15:48 love the "autodidact" mention!
Compression algs are usually pretty old. They haven't really evolved much. The newest significant discovery is ANS (Asymmetric Numeral Systems) in 2014
I got this video recommendation and thought "Hey, I used to reverse engineer, weird compression codes!".
When watching, I kept getting weird deja-vu. I went to my repo where I had the painstakingly reverse engineered PP20 decoder from ~12years ago and the code was almost identical.
So went to the Frogger repo and saw my name in the decoder file as the original author :D. Nice nostalgia attack :D. I already forgot that I was already asked to allow using the code.
A note on the origin of this code. I had some Amiga mod music files (mostly game music) and working on a mod player. Some were compressed. Many using PowerPacker.
The only thing I could get my hands on was the .library binary file with 68000 machine code, which I had to disassemble and convert to Java a couple lines of Java and catch all bugs testing on real files. Quite fun.
Thanks for the video and a reminder of the past hobbies.
Love the way you work through problems and love even more how you explain really complex ideas, always fascinating, always interesting!
Hehe, wow. Nice work. This version of Frogger still feels like "new Frogger" to me with the early 1980s version being the original (particularly the one published for the PC by Sierra in 1983).
@3:15: Binary date is not "jibberish"... and it does NOT indicate that the data is encrypted or compressed - it might be... but not necessarily .
Holy crap. Never in my life had I expected any sort of LiveOverflow/Druaga1 crossover.
A Druaga1 LiveOverflow crossover I AM GOING TO FREAK OUT
Confirmed, LO is a smoker
Your channel is phenomenal. Thank you for the great work, dude!
Wait what? My UA-cam notification showed 'Reverse engineering old ...' but when I clicked it the title shows 'Why would yo'. Did you change the title?
Yaa bro
Clickbait mr codedbrain hahha
Reverse Engineered old Compression Algorithm for Frogger
No, this could be a youtube automatisation translation bug, on your national/regional prefered language youtube would automatically write and translate some title on preview, but when you click the video it shows real title. I. My case, I sometimes get titles in russian on the previews, but when I click they become english. There are different variants of english in yt database, you could be getting indian english or even UK english. Yes they are technically the same language but its not what youtube thinks
This is common in UA-cam if the creator changes the title after you receive the notification - the notification simply has the title as it was when you received it
To try and answer "what is up with those 7s?", my guess is that this is a form of customizable compression. There seem to be multiple (3?4?) compression levels that each run of bits is annotated as. Each compression level could have a different max size, which would be nice if you knew your data had specific characteristics about the necessary sizes. By setting the max size, you read (and contain) only the necessary number of bits as needed. This could save on compression. 7s might just be the default bit length (thus 128 is default max size) for all compression levels. I'm sure you could just hack the first 4 bytes and bitstuff appropriately to test this idea.
This all is sort of guesswork on my part. I'm not familiar with this compression algo (aside from this video). This is just "engineer's intuition".
I remember 'compression level' coming up quite a bit. I wonder if one of those 'compression levels' is simply a very heavily used minimum lengths that could be referenced in one of those references for long lengths rather than the variable compress length it did at the end.
Delving into the original packer's sources or decompiled main function could be interesting.
I love stuff, really nice to see how things are put together and the strange but curious history of this stuff.
1991 I went to an amiga computer camp where I was learning amiga basic. Each of us were developing a game with a frog who has to cross a street with cartraffic. Perspective was from up. So frogger reminds me on this long time ago. Now I know the source of the challenge. Wikipedia: Frogger, also called Frogger: He's Back, is an action video game remake of Konami's 1981
I love this content, it hits home very well.
I'm a software engineer. I'm wasting time with a video on "Frogger". Worth it.
hey. I'm a software engineer. I'm wasting years and years reverse engineering frogger games. Worth it.
I love your enthusiasm. Instant subscribe.. Thank you for this video!
The DOS game Zone 66 uses LZW compression, and I've been trying to figure out how to unpack it for years. I'm going to have to rewatch this video a few more times
LZW is quite different from LZSS. While there are somewhat similar principles involved, learning about LZSS won't help you too much with LZW. If you do want to learn more about LZW decoding, maybe look into some GIF image decoders... that uses LZW and tutorials on that would probably be more applicable to the Zone 66 data.
Speaking of game asset file formats, the TPAC format used in Mount & Blade II: Bannerlord is still uncracked. The game is brand new and no one has figured out how the file format works. If you are interested in this type of thing, check it out. Once it's cracked there will be so many possibilities for mods.
The compression as implemented is not using the PP20 format as well as it could. It was likely programmed to conform to the reverse-engineered decoder in a too simple way. There are a couple of ways to encode the input to conform to the decoder and achieve much better compression. The original PowerPacker had settings for offset/length limits to tune the speed/effectiveness of its compression. (I do not think original PP would compress AAAAA... as repeating of A. Instead it would take more of the AAAA and produce a shorter sequence.
One thing that people seem to be confused about is why it is doing the decompression in a reverse order and had some of other quirks like the "header" and offset at the end. It is designed to be **in-place** decoder, you allocate memory for the uncompressed file, load the file into the buffer and unpack in the same buffer and have assurance that you do not overwrite the source before it is processed. That is also why you can have non-compressible file encoded using the original data, so unpacking in-place is a no-op. Also if the uncompress loop output catches up to the source data, it knows that it is done, saving some cycles for hard to compress files.
funny, i also had a similar compression algorithm to look at a while back, when i wanted to look at what data was transmitted by a particular game to the game server. if anyone is interested, here is the story:
at first i noticed that the json it sent was starting out as readable text, but was more and more garbled up the longer the string got. so i reverse engineered the relevant part of the game. turned out the game dev implemented their own adapted version of LZSS, so i wrote a decoder for it. to be more precise, i copied the reverse-engineered logic from the game into my code without understanding it and wrote tests for it to make sure i copied it correctly and it still behaved like the original. then i refactored my code step by step, verifying via the tests that my code still behaves exactly like the original decoder, and after a while i had some very human-readable and understandable code that was doing the exact same thing as the game and i could analyze it properly and understand what it does. in hindsight, knowing LZSS before trying to do this would have been really helpful, but i guess thats almost always the case, right? ^^
but i wasnt done yet, because for the game to still work while im reading what it sends, i had to also encode it again and send it to the server, as well as doing everythin for the server response as well. i chose the lazy approach and only encoded literal chunks. this made the "compressed" payload larger than the uncompressed json strings, since there were now all of these additional "the next block is a literal block" headers in there. but it worked and neither the game nor the server noticed that i was decompressing, reading and "compressing" their communication data in real time. it was really funny to see that i could get away with that xD
this all took way too much time, but it was a really fun project to do. i often compare these kinds of project to dark souls bosses: it is a lot of work up front, but when you finally get it done and it works, you feel like a total badass, even when you cheese the second stage :D
I was just about to post that pp20 is an Amiga power packer header when you mentioned it.
There's a plugin for total Commander to unpack Amiga powerpacker files.
Ah, the nostalgia... Feels like a!
I've been using QuickBMS for the mods I use in a game. It's an AMAZINGLY powerful tool!! The amount of supported softwares are gigantic.
haha I really enjoyed this as an old Amiga scene person because as soon as I saw PP20 i thought of PowerPacker.. i didn't skip ahead and got a nice surprise when it actually turned out to be that :D Memories.. thank you, i still remember... I think there's possibly a PP13B as well but its been a long time and I could be wrong.
Damn i Haven't thought about frogger in probably 20 years. Blast from the past
Lzss was very popular in the 90s and early 2000s. The ps2 also use an lzss variant in the bootrom.
The audio tracks on the disk have some serious bass!
Sounds fairly similar to Shining Force 2's character pattern compression. A 0 bit means copy the next word into the buffer. A 1 bit is followed by 11 bits of backward offset and 5 bits of repeat count. It starts that far back and copies forward from that location.
The longer patterns such as the fonts are compressed in a much more complex manner involving building words a nibble at a time using a rotating stack palette and varying numbers of bits to indicate the rotation index. Map data is compressed with yet another algorithm.
I wish people would nerd out about Harry Potter & The Chamber of Secrets on Game Boy Color... It's a hidden gem RPG and I'd love to learn it's scripting language so I can port the other movies.
I remember using quickBMS to extract assets from Star Wars: The Old Republic, but I wasn't very good at it and got a few textures at most.
never cease to amaze me
I used to love frogger. Thanks for this video
On the other end of the gaming spectrum fail0verflow did some interesting work on the PS4, I'm looking forward to people running linux on the PS5.
Best ps1 game right here and the soundtrack is great for working out
Would have liked to see compression ratios vs deflate for example.
I really hope some good mods are made for the game. I have it on my Windows 98 build and I would love to play them on sorta accurate period correct hardware. (Most of it is tail end win98 hardware for maximum performance to play with KernelEx)
Cool. Right now trying to reverse out some compression stuffs of a PSP game which is LZSS compressed, because I need the text (to practice Japanese while translating without OCR) and music, and LiveOverflow posted this
I've been watching you for years and didnt even realise i wasn't subscribed i feel like a criminal omg ive subbed im so sorry lol
PP was very popular tool back than. I already knew what "PP20" stands for.
pp20 is power packer the game may have been on amiga too because i have seen the power packer used for making the music files mod be compressed and yet playable or even used as an archiver.
Why would yo
u
@@d9zirable iy
Ironcreater2 hy
just can't wait to see what you are going to do with pkmn red!!
He might talk about well documented algorithms, ACE entry points, or even pokered. I hope he asks Crystal_ for some information too, that guy is the god of gen1 and gen2 Pokemon.
I loved this game as a kid. I didn't know anyone was still messing with it. I'm definitely going to be looking into this community
Very informative, thanks! I'm trying to decipher the object files from Ridge Racer on PS1. Unfortunately, there are no recognisable file headers at all, most of the chunks of data have the first few bytes left blank, so hard to know where to start.
I've been wanting to get into understanding save game checksums, but I have no clue where to start as far as reversing. I am very adapt to reversing in-game memory functions but algorithms still confuse me
This Pokemon world seemed gigantic to me as a child and they still feels bigger than Skyrim , probably because I played Skyrim for much longer and nocliped over the map a few times ;)
i started to reverse some '00 good piece of electronic and this really inspiring.
i thought about doing some videos like yours also but i have so much to do yet especially SW wise (vxworks stuff)
It'd be cool if one could reverse engineer the save file from Paradox Games, it could be used for cheating through, because ironman mode is the one that compresses save game data.
Omg, I know PowerPacker!! I also wrote a decompressor for it once 😆
I loved this game as a child :')
Another reason for gibberish content: Bit sized data. (for example dumoed Bitstreams)
Awesome! Maybe you can explain the Pokemon Missingno in the RED Edition?
Awww Let's get that "Froggler" boy excited guys!
I had this game as a child and completely forgot it existed
This is the weird shit why I subbed to you
Reverse engineering a legacy algorithm using IntelliJ, with a teaser for Pokemon Red? Is... is it my birthday?!
There are people who care about such a recent newschool game? I thought it was going to be about the original Frogger from 1981.
Very awesome I like this 😯
I had this game for the PS1 way back in the day I never beat it though it was tough as nails.
love your vids m'dude
It's not that surprising to find old file formats still in use. tar was created in 1979 for writing to tape drives and we still use that frequently today.
The Amiga was around for quite a ways into the Windows 95/PS1 era and it was a real devs workhorse: Psygnosis for example was making the leap to PS1 and you can see the fingerprints of Amiga dev tools like Dpaint all over some of their games, in Alundra for example.
wow thanks bro
I'd be nervous making videos about modding/reverse-engineering Nintendo games, aren't they known for throwing copyright strikes left and right in these situations?
6:21 so what happens when you compress multiple times the decompress multiple times
Druaga1 is awesome!!!! This was a great video. I learned a lot.
Why _would_ yo?
MWD means just a another version of *WAD* files.
*WAD* format got know in *DOOM* Engine Files, but it was first used in *WOLFENSTEIN 3D* both by *ID Software*
Not related to WAD, but related with reversing *You must take a look at **www.OpenTTD.org* history.
A whole hacking community joined forces to reverse a entire game and release a opensource version of *Transport Tycoon Deluxe*
In resume, first was TTDPatch that patched the original EXE for improvements and remove some limitations.
In 2003, *Ludvig Strigeus* (uTorrent and Spotify Developer) reversed it with community help and wrote the first version in C.
After that, is history...
The WAD files are definitely not the wads found in doom. A file type can be named anything, regardless of what it contains.
Okay, so you find an ID which is PP20, now since I'm older than you and been coding longer, I remember a list of file ids. I haven't watched enough into the video, but my guess is that it's a PowerPacker compressed file. Which was actually an Amiga compressor that wasn't as good as Crunchmania. Let's see how good my guess is.
While i do like the technical aspect of thge compression/decompression algorithm i was a bit disappointed by the fact that the "Frogger" game discussed is not the original "Frogger" game (see en.wikipedia.org/wiki/Frogger) but a modern game. I was curious to see what kind of compression they would have had in the game from the 80s.
I doubt they had much in the way of compression, at leastnot on home consoles like the atari. What they did have on the atari though was screen modes, where the the background was either repeated, or mirrored, which I guess you might be able to call that a space saving technique.
They didn't have alot of of processing power to spare for compression, most of the game logic was handled between line updates, in the over scan. For the atari specifically you probably could have made a chip that did the decoding for you, they certainly did have a few cartridges with extra hardware, but they costed more to produce but it's totally possible given the stripped down bear silicon nature of it.
But you're talking about the arcade game, they probably didn't need it there either, most arcade games were purpose built around the game they contained, and given their sheer cost I'm sure they could splurge a bit on tech.
Is the offset to allow long strings of zeros, by skipping bytes inside an array that is initially filled with zeros?
When I saw the pp20 I already knew. I think the author of power packer was nico Francois
>There are communities for anything you can imagine.
ANYTHING.
Even weeing on each other. (Hit me up).
I wonder at which point is it more efficient to reimplement the game rather than reverse engineer binaries 🤔
Any idea if the videos are available on any platforms like lbry?
Files can have any extension, whatever you want, if your file does not follow a known format why would you use a known extension?
No way. I got this video recommended cause I am currently getting into RE and try to reverse the save game files of Assassins Creed 4. Then I searched for some tools that can detect the compression (file was clearly compressed) so I just gave 7zip a shot and opened it to show me some info and it actually detected the file as compressed with LZMA.
WTF?
God this game always kicked my ass. But it was really fun anyways.