Recently Super Mario 64, Ocarina Of Time and even Jak & Daxter with its Lisp dialect GOAL got reverse engineered and even ported to pc. It's a work of art. For those interested: tools with disassemblers and decompilers like Ghidra or IDA are pretty helpful.
Gen 1-3 of Pokemon have been reverse engineered as well. I actually have a tutorial series on using the Gen III decomps, they are super cool. Its so easy to make romhacks now.
@itzhexen what in the world kind of point are you trying to make here? sounds like you're really cool with IP given your second comment. might want to seriously re-consider that viewpoint. IP is nothing to be proud of or to defend. it is unnatural and it is immoral. in its current form and with the trends of rights-holders continuing their attempts to expand it, it also should be ruled as unconstitutional. the way it is used now is immoral, unethical, and actually in many cases, illegal. fun fact about developers, artists, authors. aka the people actually doing the work: many of them oppose the current system. they have no meaningful rights to their work, they're treated and used like slaves. they are often not paid the royalties they're due because "production committees" are designed to run into the red. how strongly they support IP in general varies, but many of them actually don't like it. small creators don't benefit from it at all because it costs millions to enforce IP. artists and inventors are usually primarily motivated to share their ideas and creations. IP law has been restricting this for too long. it's not protecting anyone, it's exploiting creators and consumers to feed the soulless giga-corporations and their top level executives who've already accumulated what 100% counts as excess wealth. i'm done defending or making excuses for this system. it's outlived its usefulness and needs to be fully dismantled, ASAP. the concept of owning IDEAS is just asinine. also, several companies have in fact, lost source code and assets because of improper archival practices. that leaves relying on reverse-engineering or even the efforts of others having done the same. in perfect dark's case, it's more likely because RARE keeps proper archives.
Reverse engineering is a very useful skill to learn early on, but also one with a very steep learning curve. It happened to be how I learned programming back in the 80s on the C64 as access to any useful technical literature in my rural town was impossible to come by. I spent much more time in the machine code monitor examining games and demos to slowly learn how things were done than I ever did actually playing the games themselves. Perhaps it was a happy coincidence as I may not have gone that direction if I could simply have learned everything from reading books. Practising reverse engineering can greatly improve your analytical skills and pattern recognition as well as a more in depth knowledge of the underlying platform; skills that are, in fact, very useful in regular engineering. So learning reverse engineering enables you to become a much better engineer.
your very right, people need to learn the very lowest level of programming (asm -> c -> c++) before moving on to very higher level control languages like python or javascript because they will have much better understanding and control over what they want to accomplish and how what ever language they use does that. an example of this is writing an algorithm that is simple in function and does not have quadratic cost and instead of allowing iterations and number of arguments passed to grow in relation to the number of arguments passed instead use pointers instead of wasteful copying and making new tables, variables and bloating memory. but this is only my interpenetration so take it with a grain of salt
One thing not mentioned in the video is that reverse engineering is more or less regulated depending on the country. For instance, my understanding is that, in the US, the person who reverse engineers some software and the person who implements a "clone" of said software need to be two different people. Furthermore, the latter must have no prior experience with the original software and must use only the specification written by the former person to write the implementation. The DMCA added further restrictions on reverse engineering. Other countries may have their own restrictions.
In the EU, reverse engineering is an inalienable right, i.e. even when the license agreement contains a prohibition on reverse engineering, that is not enforceable and you can reverse engineer anything you have a license for, or even when you don't if you do it for someone who has a license. You can implement the findings in your own program as well. However, this is limited to interoperability only, you are not allowed to use this to create a copy of the algorithms, e.g. you can use it to figure out how it defines a word so that your implementation will yield the same numbers but you can't copy the implementation of how it counts the words. Clean room reverse engineering is still useful here, if you describe the former but not the latter, it makes it unambiguous that the latter hasn't been copied.
I loved xoreaxeaxeax's talks about his movfuscator: First, he stumbles upon the fact that the MOV instruction turns out to be Turing complete. He then proceeds to write a compiler that turns any piece of code, even the compiler itself, into a program that ONLY uses MOV instructions. (At a hefty performance cost obviously.) And he then spends his time making the reverse engineered flow graphs in IDA to render a picture of himself, and some profanities. It's at a next level.
I maintain several legacy systems that were written in C. They were written in an idiosyncratic style, often by people who wanted to demonstrate to the world how smart they were. It can take a while to figure out just what the code is doing. Sometimes you get a head start: this code/data/whatever was written by the same people at about the same time, so it will resemble other things they've done.
May not be completely true, but i watched a documentary about Compaq where they actually had a software engineer de-compile the IBM Bios, just to see how it worked. Once he did so, he reported to Compaq that "this isn't hard, there's nothing special here." But since he had seen the code, he was not allowed to work on the project anymore, since he might be influenced by the code he had seen. Compaq had to hire an entire team of programmers to look at how the bios calls worked, without seeing the coded behind the calls, so that they could replicate the outcome, without necessarily replicating the IBM code that generated the original outcome. So when IBM sued Compaq, they could legitimately claim that their compatible BIOS was not based on the code IBM had copyrighted from Microsoft. Nor was the Compaq DOS based on the code of Microsoft DOS.
Similar thing happened with Connectix and Sony. Connectix reverse engineered the BIOS of the PS1 and when Sony sued them they were able to claim that no Sony code was being distributed or used in their emulator. (Sony, of course, sued them to oblivion but they were technically in the right)
I think that when playing these kind of games its easier to win on a social level. Because your software can be objectively better, stolen or created, but what makes you win is the market. People.
"Nor was the Compaq DOS based on the code of Microsoft DOS." Did you mean they used MS-DOS rather than IBM's PC-DOS? DR-DOS was released after the first Compaq.
Reverse engineering also applies to archives/files, i.e. watching for patterns in data and figuring out how it is used in order to reconstruct or convert/export. common practice for the modification or extension of a program’s assets. Most modern formats follow the structure of [header + chunk(s)], so finding the definitions leaves only making use of the data in each of the chunks
I did a bunch of this several years ago with a particular game. Knowing various text & numeric values, I tore about the data file that I knew stored that information. I never decoded all the info in the file, but I got far enough for what I wanted (a map of the locations & types of all the star systems in the game). Based on emails, there have been several people over the years since who have benefitted from my hobby.
I reverse engineer files for the fun of it 😄. The latest is the save game format for Trails Of Cold Steel, because I don't want to miss anything. I also made tool for the Witcher to copy my inventory from one save game to an older save because I did something earlier that broke a later quest. I didn't mind going back to the older save, but I didn't want to lose all the ingredients I had picked in the meantime.. I also reverse engineered the encryption Final Fantasy X uses on its save games because I forgot I had to sponsor the traveling salesman. I changed the amount I had given him and deducted the sum from my GIL.. I may be cheating, but at least I'm honest 😋. I honestly think I've spent as much time having fn with game files as actually playing games. Nothing multi-player though. I'm not an a-hole.
A very famous, difficult and consequential case of reserve engineering: Alan Turing and his team's cracking of the Enigma machine. It's hard to think of more important technical problems to solve than that..
Though a lot of the actual reverse engineering of the enigma machine was done by often-overlooked teams in Poland. The work of Turing and his team built heavily on this to try to find ways to exploit flaws in the design.
Most lines in these cases are boilerplate added by the compiler/standard initialization and finalization/operations and system calls required to match the executable format and platform. Equivalent programs generated from assembly language may have less of these things. GNU programs can have a lot.
I am always more scared, what these few lines might cause in the hidden, if i type any "source" datapath of the code. Probably it will log my try and send via a little cheeky email an attention message to the owner of the software. Who knows. 😅
@@xrafter Bloated code, not bloated program. The gains from standardization and extra useful features largely compensate the negligible overhead in a compiled C program. If you are concerned of performance at this level, you should go right to assembly solutions (or specialized alternatives).
@@hassanachek All I can do is javascript. $/~ npm i reverse-string // Reverse a string const reverse = require('reverse-string'); reverse("Engineering Computerphile"); 🤣
Been Reverse-engineering for over 30 years, no University degree and learned everything from software design patterns to how the SoC bring up is done. You learn the real way it works by reverse-engineering. I started out with C-64 and the super snapshot cartridge and moved on from there to the latest ARM hardware .
Hey, how do I get into reverse engineering(that's not the main part, I want to get into hardware hacking)? will learning a microcontroller help? Which one would you recommend for a beginner? I'm thinking of PIC18
@@swarooprajpurohit110 If you want to learn hardware hacking then learning a microcontroller will help. Its important to learn the basics like hex, dec, binary and how addressing works. Eventually for hardware hacking you need to know how MMU, interrupt controller, CPU and fabric work for SoC. Also knowing how to inspect and manipulate i2c, SPI and other chip/die to chip/die communication. Pic will give you basics but best to understand ARM . Arm V8-A/R/M for example.
I have actually had to reverse engineer more than one industrial automation system with nothing more than a poorly drawn schematic (which wasn't up to date) and raw code. First step was identifying all I/O points, so i knew what a particular input or output did. Then based on that you could figure out a lot of the basic logic of the machine. This is a start/stop circuit. This is a closed loop control system. This is a permissive or interlock logic block. Still takes a lot of time, but it can be very rewarding when you finally get to the point where you can actually maintain and/or improve the machine automation.
@@jamesking2439 Ladder logic is not compiled. But yeah, I had to understand the code before I could make any meaningful progress. But you run into a lot of basic sequences that are almost universal.
Some of my favorite (hardware) reverse-engineering going on in recent times is the work by CuriousMarc and his cohorts at the Computer History Museum, reverse-engineering the computers and radio equipment used in the Apollo missions. Their many-part series can be found here on UA-cam.
The best example of clean room reverse engineering is GTA4. One of the best trainer programs was built with the software in the next room. Engineers with gta looked at how certain actions worked in memory and passed their findings to totally separate engineers who worked out how to interrupt and alter what it did. Rockstar tried to sue them for hacking and theft, but it failed because they hadn't built any software using any of the GTA code.
You haven't mentioned reverse engineering to bypass copy protection/licensing. I've reverse engineered a few small things that i liked but never had the opportunity to license. In one case, it was as simple as finding the reference to a string that bothered you about purchasing a license and in the jump instruction simply make it so it never performs the jump and proceeds as normal. In another case i reverse engineered the licensing key algorithm. Then i wrote a keygen for it. Reverse engineering is a lot of fun, sometimes hair pulling frustrating, but a lot of fun and the moment you make a breakthrough you feel like you are on top of the world.
I remember the first time I changed a JNE to a JE - 0x75 to 0x74 or from *u* to *t* - and saved the binary so that it would now accept any random key. Seeing it work was exhilarating! More serious software protections and license validators require a lot more than this to break them, and it's rarely that simple.
@@seif1293 You use a disassembler like IDA shown in the video, although IDA is commercial and has only a limited free version. There are free alternatives, Ghidra is probably the most powerful (actually made by the NSA! no joke). Once you find the JNE - meaning Jump if Not Equal, i.e. jump to where it shows the error message if what you entered is not equal to the valid password/license key - the disassembler can show you the exact position in the binary where the JNE is located. Then you use a hex editor to open the file, go to that position, and change it to the binary value for a JE. So in the hex editor you'll see a *u* and you just replace it with a *t* then save, and run it again. If you got it wrong you just inverted some random condition in the code that has nothing to do with what you wanted, and this might have "interesting" side-effects.
I reverse engineered the codes that Lemmings and Lemmings 2 gave out so you could restart at a later level - the programs even accepted codes which were impossible to be obtained from the program but were valid as far as the algorirhm checking the codes was concerned. (I guess part of the code generated was ignored by the checker but was included to create multiple codes, or may have been used by the game to change the difficulty slightly.)
I am kind of surprised you didn't bring up the example of ReactOS being a reverse engineering of the Windows NT kernel. A truely massive undertaking since the NT kernel is big, complex, and probably bloated.
With the BIOS Compaq had 2 separate teams, one team analyzing the IBM BIOS and another team implementing their BIOS clone. The only communication between the teams was the documentation the analyzing team created by reverse engineering. No one from the analyzing team was allowed to speak with the implementation team or join it. IBM was never able to sue Compaq for their BIOS because it was a clean room implementation, running on standard hardware everyone can get. IBM was too slow for the consumer market, they are building business machines, even up to today, and that's part of the story why the IBM clones took over the market. They were compatible, cheaper, had agreements with Microsoft for MSDOS and could adopt much faster to newer hardware while IBM worked on the PS/2 machines. I think we can clearly say: We are lucky it went this way!
Season 1 of the TV Series "Halt and Catch Fire" (available on Netflix) shows the reverse-engineering technique explained early in this video. Although not specifically calling out Compaq, it's their story without the name :). Take it with a grain of salt, though. It shows them using a volt meter and oscilloscope to come up with the BIOS code ...but the show is very entertaining overall.
3:26 Unix executables were originally "just machine code". Later on, when they wanted a format with headers and whatever, they made that format, but didn't actually change the implementation of exec. Instead, the binary started with the machine code to jump past the headers. And that's the story of the a.out magic number.
definitely do the video on executable formats! Is there a video on how modern computers boot their operation systems? If not, booting/BIOS video would be very much appreciated.
Step 1: Define all the things Step 2: Figure out the data structures, algorithms, logic, and control that goes into the programs Step 3: Do C things because C :) Step 4: Go to 11:06 of the video for an IDA practical example Step 5: ??? Step 6: PROFIT!1!1!
I'm a web developer and came across reverse engineer recently. I wanted to learn but lack of information online, courses, and such are a pain in the ass. I will try to learn it again in tbe weekend because it seems like a cool thing to do.
9:00 yes, waterfall is doing it wrong :) And i agree with the "show me your data structures and your program will become obvious" to a large degree. Though a lot of the interesting part of programming is how to model edge-cases in the domain you are working on, and how you structure your code to handle adapting to new functionality or handling of bugs and edge cases. Programs are not (anymore) made as a single release, and then a few years later you get the next major version that is made from the bottom as a different program that may be a newer and better implementation of the same high level ideas and concepts. As systems get larger, maintenance/support and feature expansion time and time between bottom-up full rewrites get longer because the costs become much larger. Also, with agile development mindsets, the focus becomes on continuous development and deployment and delivering incremental value at low incremental risk of things going wrong.
back when typewriters are a thing, line feed goes to the next line, carriage return goes to the start of the line. you needed both to go to the start of the next line. i dont know what happened when computers were invented, but different OS writers decided they needed different combinations of those to be used
@@jkoh93 Unix and unix-like systems (including macOS) use LF only. Windows (and DOS) use CR LF, in that order. The classic Mac OS and a few others used CR only. And there were even more combinations and other character encodings used in different machines and operating systems.
@@jkoh93 so... LF moves the cursor vertically, CR moves it horizontally? Why do *NIXoids omit the CR? I'm sure one of the Professors has some intersting stories to tell why those things are the way they are, hence the seperate video about spaces 🤩.
@@maxine_q Yup. Pretty much. Now editors and other simple text file readers need to accept the various combinations of line delimiters... or convert from one convention to another. As stated, it all started with mechanical printers that had CR and LF as separate physical motions.
@@Computerphile Running Fedora 36 Silverblue freshly installed, only this video seems to render with awful stutter, please consider refraining from 50fps, it might affect other linux users.
I do a little reverse engineering. Usually with mobile apps, or web code, to look for vulnerabilities. Found some interesting communications protocol in the 'Wind' scooter app (the older yellow hire scooters), no free rides, but a denial of service if you were careful. Those scooters were in Nottingham, too, the same place these guys come from.
@@filip0x0a98 For mobile apps (android only), you can find free online services to both download the .apk files and to de-compile them, but beware of the multiple pop-ups and questionable links. It usually helps to use more than one service, as sometimes one may fail due to code obfuscation attempts. Web apps are easier, as you can just use the developer console with Chrome browser (once you find a piece of code of interest, you can set breakpoints and examine variables). Other than that, it's a case of homing in on a piece of code that is of interest (syntax is similar between Java, Javascript and C++), usually something that does network access, or for the scooters, it was the part that communicates over Bluetooth Low Energy. Sometimes you find a howler of a vulnerability, or other times, not so bad.
I have a lot of experience with reverse engineering web apps. It helps me build stuff when no direct api is provided. I haven't really done much more than that. Computer science isn't exactly my field.
Have an idea for a similar episode in the future. Steve should pull out a BBC Micro, C64, ZX Spectrum or any other 8-bit machine having BASIC. Then off camera develop a program which first asks a user to input any number of words. Then the program would calculate their number and display it. Then show only the running program to the camera, and ask the cameraman to also come up with a BASIC (or any other language both can code) program that does the same. Suppose the BASIC will be different. But there you have it - Reverse Engineering.
My favourite reverse engineering task is to port a program to another programming language. If you have the source code this is a fun task, if you don't have source code then the first step is a pain, trying to disassemble binary into machine instructions and translating from there.
15:33 Library or external function calls and system calls are being conflated. Typical user code doesn't directly make system calls. These are implemented as "magic" instruction (this is all generalized) which allow a userland program to execute a small chunk of kernel code. By magic instruction I mean "an instruction which causes the kernel to shut up and pay attention", i.e. throws an interrupt of some sort. The mnemonic "syscall" is an assembly instruction for several different architecture's instruction, even on the same hardware, various operating systems (and even versions of them) can use different instructions. (Nothing is stopping you from saying "A syscall instruction is any instruction which causes a protection fault when writing to a valid syscall number." Provided your kernel can tell that happened and react to it appropriately, the mechanism doesn't matter. ) While the mechanism you use to call them varies, all of them need you to tell the kernel _what_ you want them to do as well as _that_ you want them to do something. The parameters to the syscalls go in registers (or in a specific region of memory) and include the system call number you want to run, telling the kernel how the arguments should be interpreted. Once you have set up the parameters, you execute your magic instruction, which interrupts the kernel, looks at your request and fills in the reply. Then the process returns from kernel mode, and continues. These operations are often by system library code, because different systems might use different system calls, but as long as the library call takes the same arguments, your code will still work. That's why there's a distinction between library code and system calls. The only other times you need to know what a system call is are if you are reversing a statically linked binary (which puts all the library code it is going to use directly into the program. And it can make them massive) or if you are working with shellcode (like an exploit for a vulnerability might use.)
I'm trying to reverse engineer a Z8002 system ATM , managed to get it disassembled , and can work out some of the routines . I not sure if it was written in assembler or C (knowning sometimes helps) pity Ghidra doesn't support the Z8k
If you’re attempting to disassemble non x86/64 code using IDA, take a look at the pricing….I bet you’ll be surprised. To save a lookup, it’s more than $10,000 for the full package, $5,000 for just MIPS and MIPS 64….
Surely by now, a machine learning system could be trained on billions of lines of code for various kinds of programs alongside the binaries, and after training be able to take a binary as an input and generate a complete, high-level code that, when compiled, is functionally completely identical to the original binary?
Isn’t thinking of something and trying to figure out how it works (inventing it) basically reverse engineering something that hasn’t been designed yet?
I think this video is more like an introduction to the technical aspect of reverse engineering. Maybe for a "fun little youtube video" would have been cool to discuss the ethic/legal aspects and what is done in practice to accomplish a successful reverse-engineered solution without being exposed to legal trouble... I worked for a company that provided alternatives to IBM solutions. We made mainframe emulators, we had clean rooms and fun like that! :) We also had employees who were former IBM employees and that made "reverse" engineering (of something they knew exactly how it is coded) even more fun! :)
If the code you are analyzing is written in some high level OOP language, chances are a modern decompiler will give you at least some C equivalent, not just assembly code; in some cases it might be able to give you some snippets in the original programming language, since OOP compilers are pretty deterministic in theory. And then some languages don't even complie their code, it might not even be obfuscated, so why even bother...
Some guy I know took apart a car insurance app, it was just a web application so everything was in JavaScript; it didn't take a long time to find several vulnerabilities
I didn't come away with a good impression of how good a job reverse engineering can do in an average case. Can it understand the whole program? Does it penetrate one level deeper than assembly and then give up?
Reverse engineering is basically learning how things work. That is why all the restrictions in licenses that prohibit reverse engineering / de-compilation etc. should be ignored and made irrelevant. To bar humans from seeking knowledge and understanding is to be anti-competitive and anti-science.
I'll never forgive what Rockstar Games did to the re3/revc project because for them it was a "threat to their economy" while they were still making millions with GTA Online and released GTA The Defective Edition to earn even more money at the cost of quality control. I still believe a similar project with the same quality will appear at any moment since Vice City is one of my favourite games with a lot of potential beyond its aesthetic or gameplay. I know the re3/revc team didn't follow the clean room design to archieve a true RE project but I think with the current documentation available from GTAModding wiki and other sources is possible to archieve the same goal without the need of Ghidra or a team to document the decompilation As for me, I wanna get into RE but since im a PHP developer I have a long track in front of me to archieve something, so I wanna start with something like learning C and decompile other games
Disassembled code is machine code (binary) converted back to human-readable assembly language (text). They usually have a one-to-one correspondence (each binary opcode is generated by a unique assembly instruction and operands combination), so disassembling is a 100% deterministic. There are some tools that try to convert assembly to C (or at least to a mix of C code blocks and assembly instructions), but their efficacy is limited and the generated code may be as hard to read as the original assembly.
So if someone makes their own offline launcher for adobe and removes the subscription DRM, is that not Reverse engineering? Could they not legally sell that then? if not why not? and how then?
moral of the story, document your ABI as well (I never do either), 'but implementation details are suppoesd to be hidden' lies, just one of the many lies we tell are selves to make today's work 'more efficient' where in a number of years some poor sod is going through your modules disassembly line by line, although it's surprisingly fun to do, piece such things together
I wanna reverse engineer a script in lua and I don't know how, can u please explain, it's for a exploit in a game , I wanna see how it works so I can make my own
Unless you're referring to the rep/repe/repne prefixes, not directly. It's easy enough to implement a loop by a jump instruction that points to earlier code, but even then, the idea of a loop "emerges" from the behavior of the code; there's no direct "loop" instruction. Indeed, you can just as easily jump backwards without it being a loop.
2 роки тому
@@quantumdude836 Then look what the opcodes E0 to E2 do ;) They are the loop instructions. They implicitly use the CX register to count to zero, if I remember correctly.
@@quantumdude836 No problem, modern x86 has so many instructions. I tend to remember the high level ones, because I'm impressed how high level they are. call, ret, loop, looks like concepts from regular languages. You can even say something like ret 24 to automatically clean the stack when returning.
wait why can't we take the assembly code and pass it to a "decompiler" and generate C code from that? that'd be much easier. Like yk compilers can generate assembly from C, why can't you write a software that does the exact opposite process that a compiler does and generate C from the asm?
You can write software that decompiles a binary to c, ghidra basically does that for you. The problem is that while the output might be valid code it will be very different and much harder to read compared to the source code. This is because the compiler among other things does a lot of optimizations which can drastically change the underlying logic of the program. That being said the decompiled code is still very useful for figuring out generally what the program is doing.
Because there is no one solution to solving problems. Software can be written in unlimited ways. Thats the creative process. Imagine you have to duplicate a painting you‘ve never seen, but you know of what colors & paper it‘s been made.
@@crunchyplasma1876 well we know what optimizations it does right? (idk something like loop unrolling, function inlining and stuff? lol idk I've never studied compliers) so can't we just reverse that when we see stuff that looks similar? Like you know how compilers have so many rules on how to parse human written C, can we reverse those rules and maybe add new rules in order to generate C that can be easily read by humans? "That being said the decompiled code is still very useful for figuring out generally what the program is doing." oh that's good!
@@mastershooter64 You'll never get the names of functions or variables back. Now imagine you have the source code of a program but all the functions and variable names are just labeled a, b, c, and so on. Custom data types are also completely gone.
@@mastershooter64 fundamentally, different C source files could compile to the exact same assembly. that information in the source is simply lost. its like trying to reconstruct a word from only the first letter.
"Nothing is ever a waterfall in computer science, unless you're doing it wrong." Casually roasting one of the standard software development models 😂 But I get his point, the waterfall model is just an oversimplified, idealized concept of how software development could go, which in theory looks nice, but practically never matches reality.
Pretty sure step 1 should be to write tests that define the behaviour for the original system. Once you have those then you can start thinking about writing your own program to pass the tests.
@@dayansiddiqui4426 there are no guarantees in life. though i would be confident in saying that if you don't write regression tests and then do a big refactor/rewrite you'll be far worse off
If you're a junior dev working in an establish / legacy code base a big part of your job is reverse engineering. "Yeah some guy who used to work here wrote that in perl back in 1998. Nobody has any clue how it works but its a vital component of our payroll system and its broken."
Time to develop a reverse engineer coding AI - that learns the function, inputs and outputs of the target software and then writes the code for the operating environment of choice.
@@bosch5303Yes - the AI - which understands the overall purpose or function of the app - based on its learning / 'hands on' experience could work around these issues by either recoding the entire app or parts that can't be meaningfully decompiled. I suppose with more thought - it might be quicker and easier to just get the AI to rewrite the entire app.
@@hbm293 historically: space, tab, vertical tab, form feed. With multiline flag, also carriage return and line feed. With Unicode flag, anything with the property Whitespace.
We really need a law requiring software vendors to distribute full, annotated Source Code with everything they sell. If everybody had to do it, nobody could get away with plagiarism.
But there is a way to get pseudo source code from a program its called a decompiler. IDA can also do it but it requires an addon that costs several thousands of dollars, Ghidra can do it for free.
And in the case of the binary he hexdumped, fernflower can be used(or any other Java decompiler) as it can be seen the first four bytes are "CAFEBABE" in hex which is the magic number for Java class bytecode
@@isse6790 if it remains as popular, it can only get better as more developers start contributing to it. seen a lot of people get by just fine with it though. the main issue honestly is the NSA developed it so it's hard to trust given their history of hacks/malware "to catch terrorists". people can always pirate IDA, too if they really just MUST use it, specifically.
@@isse6790 can just blame java on the slowness, it's easiest. decompiling is really difficult, so making something comparable to a specific software product/company that effective has a monopoly on this would take time. not like IDA is sharing the fruits of their labor, are they?
Recently Super Mario 64, Ocarina Of Time and even Jak & Daxter with its Lisp dialect GOAL got reverse engineered and even ported to pc. It's a work of art. For those interested: tools with disassemblers and decompilers like Ghidra or IDA are pretty helpful.
Gen 1-3 of Pokemon have been reverse engineered as well. I actually have a tutorial series on using the Gen III decomps, they are super cool. Its so easy to make romhacks now.
@itzhexen what in the world kind of point are you trying to make here? sounds like you're really cool with IP given your second comment. might want to seriously re-consider that viewpoint. IP is nothing to be proud of or to defend. it is unnatural and it is immoral. in its current form and with the trends of rights-holders continuing their attempts to expand it, it also should be ruled as unconstitutional. the way it is used now is immoral, unethical, and actually in many cases, illegal.
fun fact about developers, artists, authors. aka the people actually doing the work: many of them oppose the current system. they have no meaningful rights to their work, they're treated and used like slaves. they are often not paid the royalties they're due because "production committees" are designed to run into the red. how strongly they support IP in general varies, but many of them actually don't like it. small creators don't benefit from it at all because it costs millions to enforce IP. artists and inventors are usually primarily motivated to share their ideas and creations. IP law has been restricting this for too long. it's not protecting anyone, it's exploiting creators and consumers to feed the soulless giga-corporations and their top level executives who've already accumulated what 100% counts as excess wealth. i'm done defending or making excuses for this system. it's outlived its usefulness and needs to be fully dismantled, ASAP. the concept of owning IDEAS is just asinine.
also, several companies have in fact, lost source code and assets because of improper archival practices. that leaves relying on reverse-engineering or even the efforts of others having done the same. in perfect dark's case, it's more likely because RARE keeps proper archives.
@itzhexen at least there will be 40 years of reverse engineered game to enjoy until then. Quite the backlog
Where can you get these games for PC? Or where can you get the source code for those games, and compile them myself?
Reverse engineering is a very useful skill to learn early on, but also one with a very steep learning curve. It happened to be how I learned programming back in the 80s on the C64 as access to any useful technical literature in my rural town was impossible to come by. I spent much more time in the machine code monitor examining games and demos to slowly learn how things were done than I ever did actually playing the games themselves. Perhaps it was a happy coincidence as I may not have gone that direction if I could simply have learned everything from reading books. Practising reverse engineering can greatly improve your analytical skills and pattern recognition as well as a more in depth knowledge of the underlying platform; skills that are, in fact, very useful in regular engineering. So learning reverse engineering enables you to become a much better engineer.
Agreed.
your very right, people need to learn the very lowest level of programming (asm -> c -> c++) before moving on to very higher level control languages like python or javascript because they will have much better understanding and control over what they want to accomplish and how what ever language they use does that. an example of this is writing an algorithm that is simple in function and does not have quadratic cost and instead of allowing iterations and number of arguments passed to grow in relation to the number of arguments passed instead use pointers instead of wasteful copying and making new tables, variables and bloating memory.
but this is only my interpenetration so take it with a grain of salt
@@unguidedone yesn't
You are one of my heroes.
One thing not mentioned in the video is that reverse engineering is more or less regulated depending on the country. For instance, my understanding is that, in the US, the person who reverse engineers some software and the person who implements a "clone" of said software need to be two different people. Furthermore, the latter must have no prior experience with the original software and must use only the specification written by the former person to write the implementation. The DMCA added further restrictions on reverse engineering. Other countries may have their own restrictions.
White room reverse engineering
WineHQ uses a black-box testing approach.
It’s also mentioned a lot in user agreements that it’s not allowed.
In the EU, reverse engineering is an inalienable right, i.e. even when the license agreement contains a prohibition on reverse engineering, that is not enforceable and you can reverse engineer anything you have a license for, or even when you don't if you do it for someone who has a license. You can implement the findings in your own program as well. However, this is limited to interoperability only, you are not allowed to use this to create a copy of the algorithms, e.g. you can use it to figure out how it defines a word so that your implementation will yield the same numbers but you can't copy the implementation of how it counts the words. Clean room reverse engineering is still useful here, if you describe the former but not the latter, it makes it unambiguous that the latter hasn't been copied.
Reverse engineering is free speech !
I loved xoreaxeaxeax's talks about his movfuscator: First, he stumbles upon the fact that the MOV instruction turns out to be Turing complete. He then proceeds to write a compiler that turns any piece of code, even the compiler itself, into a program that ONLY uses MOV instructions. (At a hefty performance cost obviously.)
And he then spends his time making the reverse engineered flow graphs in IDA to render a picture of himself, and some profanities. It's at a next level.
"Nothing is ever a waterfall in Computer Science except you are doing it wrong"
100% on point
It's more like when you flush your toilet, except it's time and money doing down the drain.
Same thing can be said for agile. Many small waterfalls are fine.
8:55 words to live by :D
No loops, no calls to other functions, no sense.
I maintain several legacy systems that were written in C. They were written in an idiosyncratic style, often by people who wanted to demonstrate to the world how smart they were. It can take a while to figure out just what the code is doing. Sometimes you get a head start: this code/data/whatever was written by the same people at about the same time, so it will resemble other things they've done.
Please elaborate further, that sounds extremely interesting
No kidding! How many times have you run across "Duff's Device" 😆
May not be completely true, but i watched a documentary about Compaq where they actually had a software engineer de-compile the IBM Bios, just to see how it worked. Once he did so, he reported to Compaq that "this isn't hard, there's nothing special here."
But since he had seen the code, he was not allowed to work on the project anymore, since he might be influenced by the code he had seen. Compaq had to hire an entire team of programmers to look at how the bios calls worked, without seeing the coded behind the calls, so that they could replicate the outcome, without necessarily replicating the IBM code that generated the original outcome.
So when IBM sued Compaq, they could legitimately claim that their compatible BIOS was not based on the code IBM had copyrighted from Microsoft. Nor was the Compaq DOS based on the code of Microsoft DOS.
Similar thing happened with Connectix and Sony. Connectix reverse engineered the BIOS of the PS1 and when Sony sued them they were able to claim that no Sony code was being distributed or used in their emulator. (Sony, of course, sued them to oblivion but they were technically in the right)
Halt and Catch Fire!!!
I think that when playing these kind of games its easier to win on a social level. Because your software can be objectively better, stolen or created, but what makes you win is the market. People.
"Nor was the Compaq DOS based on the code of Microsoft DOS."
Did you mean they used MS-DOS rather than IBM's PC-DOS?
DR-DOS was released after the first Compaq.
@@swarooprajpurohit110yes I read it to and remembered the series
Had to stop watching it as things got too repetitive
Reverse engineering also applies to archives/files, i.e. watching for patterns in data and figuring out how it is used in order to reconstruct or convert/export. common practice for the modification or extension of a program’s assets. Most modern formats follow the structure of [header + chunk(s)], so finding the definitions leaves only making use of the data in each of the chunks
I did a bunch of this several years ago with a particular game. Knowing various text & numeric values, I tore about the data file that I knew stored that information. I never decoded all the info in the file, but I got far enough for what I wanted (a map of the locations & types of all the star systems in the game). Based on emails, there have been several people over the years since who have benefitted from my hobby.
M
Also protocols, been dealing with serial communications protocol with for a laser engraver recently.
@@jursamaj can i ask what game that was?
I reverse engineer files for the fun of it 😄. The latest is the save game format for Trails Of Cold Steel, because I don't want to miss anything. I also made tool for the Witcher to copy my inventory from one save game to an older save because I did something earlier that broke a later quest. I didn't mind going back to the older save, but I didn't want to lose all the ingredients I had picked in the meantime..
I also reverse engineered the encryption Final Fantasy X uses on its save games because I forgot I had to sponsor the traveling salesman. I changed the amount I had given him and deducted the sum from my GIL.. I may be cheating, but at least I'm honest 😋.
I honestly think I've spent as much time having fn with game files as actually playing games.
Nothing multi-player though. I'm not an a-hole.
A very famous, difficult and consequential case of reserve engineering: Alan Turing and his team's cracking of the Enigma machine.
It's hard to think of more important technical problems to solve than that..
Though a lot of the actual reverse engineering of the enigma machine was done by often-overlooked teams in Poland. The work of Turing and his team built heavily on this to try to find ways to exploit flaws in the design.
Not to forget the Germans. If they hadn’t invented Enigma, neither the Polish nor British cryptanalysts could’ve cracked it.
@@Am6-9 not to forget Adam and Eva
People always talk of Turin and forget about the true genius, Marian Rejewski.
Let's not forget the ribs. Eve came from Adam's rib
I'm always amazed and overwhelmed by the sheer number of lines that even the smallest executable has when decompiled to assembly.
Most lines in these cases are boilerplate added by the compiler/standard initialization and finalization/operations and system calls required to match the executable format and platform. Equivalent programs generated from assembly language may have less of these things. GNU programs can have a lot.
@@leogama3422
So you are telling me GNU is bloat?
I am always more scared, what these few lines might cause in the hidden, if i type any "source" datapath of the code.
Probably it will log my try and send via a little cheeky email an attention message to the owner of the software.
Who knows. 😅
@@xrafter Bloated code, not bloated program. The gains from standardization and extra useful features largely compensate the negligible overhead in a compiled C program. If you are concerned of performance at this level, you should go right to assembly solutions (or specialized alternatives).
@@leogama3422
No, I don't have problems with this.
GCC = GNU compiler collection.
"gnireenignE" - Easy.
Not quite right you forgot the
"eliphretupmoC"
Write it with C or Assembly 😝
@@yannoone1150 Maybe I should have put "moderate" instead of 'easy'
Oooh I'm jealous I didn't think of this. Nice!
@@hassanachek All I can do is javascript.
$/~ npm i reverse-string
// Reverse a string
const reverse = require('reverse-string');
reverse("Engineering Computerphile");
🤣
Been Reverse-engineering for over 30 years, no University degree and learned everything from software design patterns to how the SoC bring up is done. You learn the real way it works by reverse-engineering.
I started out with C-64 and the super snapshot cartridge and moved on from there to the latest ARM hardware .
That's awesome man, can you give some pointers for someone like me for getting started in reverse engineering.
Hey, how do I get into reverse engineering(that's not the main part, I want to get into hardware hacking)? will learning a microcontroller help? Which one would you recommend for a beginner? I'm thinking of PIC18
@@swarooprajpurohit110 If you want to learn hardware hacking then learning a microcontroller will help. Its important to learn the basics like hex, dec, binary and how addressing works. Eventually for hardware hacking you need to know how MMU, interrupt controller, CPU and fabric work for SoC. Also knowing how to inspect and manipulate i2c, SPI and other chip/die to chip/die communication. Pic will give you basics but best to understand ARM . Arm V8-A/R/M for example.
I have actually had to reverse engineer more than one industrial automation system with nothing more than a poorly drawn schematic (which wasn't up to date) and raw code.
First step was identifying all I/O points, so i knew what a particular input or output did. Then based on that you could figure out a lot of the basic logic of the machine. This is a start/stop circuit. This is a closed loop control system. This is a permissive or interlock logic block.
Still takes a lot of time, but it can be very rewarding when you finally get to the point where you can actually maintain and/or improve the machine automation.
So you were able to make changes before the whole program was decompiled?
@@jamesking2439 Ladder logic is not compiled. But yeah, I had to understand the code before I could make any meaningful progress. But you run into a lot of basic sequences that are almost universal.
Some of my favorite (hardware) reverse-engineering going on in recent times is the work by CuriousMarc and his cohorts at the Computer History Museum, reverse-engineering the computers and radio equipment used in the Apollo missions. Their many-part series can be found here on UA-cam.
Nothing's ever a waterfall unless u are doing it wrong. Beautiful
The best example of clean room reverse engineering is GTA4.
One of the best trainer programs was built with the software in the next room. Engineers with gta looked at how certain actions worked in memory and passed their findings to totally separate engineers who worked out how to interrupt and alter what it did.
Rockstar tried to sue them for hacking and theft, but it failed because they hadn't built any software using any of the GTA code.
A whole video on what a space is would educate some people, that's for sure. It's practically impossible to get people to agree on it.
You haven't mentioned reverse engineering to bypass copy protection/licensing.
I've reverse engineered a few small things that i liked but never had the opportunity to license.
In one case, it was as simple as finding the reference to a string that bothered you about purchasing a license and in the jump instruction simply make it so it never performs the jump and proceeds as normal.
In another case i reverse engineered the licensing key algorithm. Then i wrote a keygen for it.
Reverse engineering is a lot of fun, sometimes hair pulling frustrating, but a lot of fun and the moment you make a breakthrough you feel like you are on top of the world.
I remember the first time I changed a JNE to a JE - 0x75 to 0x74 or from *u* to *t* - and saved the binary so that it would now accept any random key. Seeing it work was exhilarating!
More serious software protections and license validators require a lot more than this to break them, and it's rarely that simple.
@@desmond-hawkins how do you change it like that
@@seif1293 You use a disassembler like IDA shown in the video, although IDA is commercial and has only a limited free version. There are free alternatives, Ghidra is probably the most powerful (actually made by the NSA! no joke). Once you find the JNE - meaning Jump if Not Equal, i.e. jump to where it shows the error message if what you entered is not equal to the valid password/license key - the disassembler can show you the exact position in the binary where the JNE is located. Then you use a hex editor to open the file, go to that position, and change it to the binary value for a JE. So in the hex editor you'll see a *u* and you just replace it with a *t* then save, and run it again. If you got it wrong you just inverted some random condition in the code that has nothing to do with what you wanted, and this might have "interesting" side-effects.
I reverse engineered the codes that Lemmings and Lemmings 2 gave out so you could restart at a later level - the programs even accepted codes which were impossible to be obtained from the program but were valid as far as the algorirhm checking the codes was concerned. (I guess part of the code generated was ignored by the checker but was included to create multiple codes, or may have been used by the game to change the difficulty slightly.)
I am kind of surprised you didn't bring up the example of ReactOS being a reverse engineering of the Windows NT kernel. A truely massive undertaking since the NT kernel is big, complex, and probably bloated.
Already back in the 90s some GPU drivers recognized benchmarking and gave "incredible results". ;-)
With the BIOS Compaq had 2 separate teams, one team analyzing the IBM BIOS and another team implementing their BIOS clone. The only communication between the teams was the documentation the analyzing team created by reverse engineering. No one from the analyzing team was allowed to speak with the implementation team or join it. IBM was never able to sue Compaq for their BIOS because it was a clean room implementation, running on standard hardware everyone can get.
IBM was too slow for the consumer market, they are building business machines, even up to today, and that's part of the story why the IBM clones took over the market. They were compatible, cheaper, had agreements with Microsoft for MSDOS and could adopt much faster to newer hardware while IBM worked on the PS/2 machines. I think we can clearly say: We are lucky it went this way!
Except for Bill Gates using the money to tests vaccines on africans.
Season 1 of the TV Series "Halt and Catch Fire" (available on Netflix) shows the reverse-engineering technique explained early in this video. Although not specifically calling out Compaq, it's their story without the name :). Take it with a grain of salt, though. It shows them using a volt meter and oscilloscope to come up with the BIOS code ...but the show is very entertaining overall.
Yes, great series!
@MenaceInc was on Amazon prime. Might still be there.
3:26 Unix executables were originally "just machine code". Later on, when they wanted a format with headers and whatever, they made that format, but didn't actually change the implementation of exec. Instead, the binary started with the machine code to jump past the headers. And that's the story of the a.out magic number.
"leave that as an exercise to the viewer" is the most university professor thing to say ever! I almost got PTSD from such lecture notes or textbooks 😂
The funny thing is, this is 1 to 1 applicable in getting from idea to a product. Having this conversation with managers is important
A video on executable formats would be amazing
definitely do the video on executable formats! Is there a video on how modern computers boot their operation systems? If not, booting/BIOS video would be very much appreciated.
Good ol' hexadecimal hacking, I love it! 😍 Great explaination of the methodology needed for reverse engineering! 👍
Step 1: Define all the things
Step 2: Figure out the data structures, algorithms, logic, and control that goes into the programs
Step 3: Do C things because C :)
Step 4: Go to 11:06 of the video for an IDA practical example
Step 5: ???
Step 6: PROFIT!1!1!
Reverse engineering is when an engineer looks at a mirror
🤣 that's a good one
I thought it was when an engineer starts working directly in the final product and delivers the broken prototype to the client at the end...
Dr. Steve has been on a roll lately. I've rather enjoyed the past few episodes that he's hosted.
Three comments saying "first"? What a dilemma
Don't even try to understand the "why," it will only depress you.
They think it is a non deterministic problem
It's a racing condition problem
It's a racist problem
Trilemma.
I'm a web developer and came across reverse engineer recently. I wanted to learn but lack of information online, courses, and such are a pain in the ass. I will try to learn it again in tbe weekend because it seems like a cool thing to do.
9:00 yes, waterfall is doing it wrong :)
And i agree with the "show me your data structures and your program will become obvious" to a large degree. Though a lot of the interesting part of programming is how to model edge-cases in the domain you are working on, and how you structure your code to handle adapting to new functionality or handling of bugs and edge cases. Programs are not (anymore) made as a single release, and then a few years later you get the next major version that is made from the bottom as a different program that may be a newer and better implementation of the same high level ideas and concepts. As systems get larger, maintenance/support and feature expansion time and time between bottom-up full rewrites get longer because the costs become much larger. Also, with agile development mindsets, the focus becomes on continuous development and deployment and delivering incremental value at low incremental risk of things going wrong.
I'm curious as to why that executable contains several Java class files, as shown by the magic number showing up several times.
Oh yes, please do avideo on spaces. I never understood why there are line feeds and carriage returns and why those are two different things...
back when typewriters are a thing, line feed goes to the next line, carriage return goes to the start of the line. you needed both to go to the start of the next line. i dont know what happened when computers were invented, but different OS writers decided they needed different combinations of those to be used
@@jkoh93 Unix and unix-like systems (including macOS) use LF only. Windows (and DOS) use CR LF, in that order. The classic Mac OS and a few others used CR only. And there were even more combinations and other character encodings used in different machines and operating systems.
@@jkoh93 so... LF moves the cursor vertically, CR moves it horizontally? Why do *NIXoids omit the CR? I'm sure one of the Professors has some intersting stories to tell why those things are the way they are, hence the seperate video about spaces 🤩.
@@LupinoArts Probably as simple as: we don't want to use two bytes when one is enough to indicate a new line.
@@maxine_q Yup. Pretty much. Now editors and other simple text file readers need to accept the various combinations of line delimiters... or convert from one convention to another. As stated, it all started with mechanical printers that had CR and LF as separate physical motions.
What a coincidence! I've been into cracking software lately.
Is this the first computerphile video in 50 fps?
Because I noticed and liked it.
I did a bunch of them at 1080p50 at the back end of 2016 but mostly got complaits so switched to 4k instead. Now we have both :)
@@Computerphile Running Fedora 36 Silverblue freshly installed, only this video seems to render with awful stutter, please consider refraining from 50fps, it might affect other linux users.
@@Computerphile Well, you've got at least one vote in favour of the higher frame rate, Sean. Don't forget to read the manual, though. 🙃
@@xDJKerox seems specific to you, so you should probably fix that extremely specific issue rather than asking the world to refrain from 50fps?
@@Computerphile 50fps causes a doubled frame every 6 refreshes of a standard 60hz screen. Some people don't notice, some can't help but notice.
I do a little reverse engineering. Usually with mobile apps, or web code, to look for vulnerabilities. Found some interesting communications protocol in the 'Wind' scooter app (the older yellow hire scooters), no free rides, but a denial of service if you were careful. Those scooters were in Nottingham, too, the same place these guys come from.
Interesting, could you recomend some techniques / programs you use ? Thanks
@@filip0x0a98 For mobile apps (android only), you can find free online services to both download the .apk files and to de-compile them, but beware of the multiple pop-ups and questionable links. It usually helps to use more than one service, as sometimes one may fail due to code obfuscation attempts.
Web apps are easier, as you can just use the developer console with Chrome browser (once you find a piece of code of interest, you can set breakpoints and examine variables).
Other than that, it's a case of homing in on a piece of code that is of interest (syntax is similar between Java, Javascript and C++), usually something that does network access, or for the scooters, it was the part that communicates over Bluetooth Low Energy. Sometimes you find a howler of a vulnerability, or other times, not so bad.
@@threeMetreJim thanks
I have a lot of experience with reverse engineering web apps. It helps me build stuff when no direct api is provided.
I haven't really done much more than that. Computer science isn't exactly my field.
Have an idea for a similar episode in the future.
Steve should pull out a BBC Micro, C64, ZX Spectrum or any other 8-bit machine having BASIC. Then off camera develop a program which first asks a user to input any number of words. Then the program would calculate their number and display it.
Then show only the running program to the camera, and ask the cameraman to also come up with a BASIC (or any other language both can code) program that does the same. Suppose the BASIC will be different. But there you have it - Reverse Engineering.
Awesome video, many thanks! Can you please make the videos to which you are referring throuhhout the video?
My favourite reverse engineering task is to port a program to another programming language. If you have the source code this is a fun task, if you don't have source code then the first step is a pain, trying to disassemble binary into machine instructions and translating from there.
15:33 Library or external function calls and system calls are being conflated.
Typical user code doesn't directly make system calls.
These are implemented as "magic" instruction (this is all generalized) which allow a userland program to execute a small chunk of kernel code.
By magic instruction I mean "an instruction which causes the kernel to shut up and pay attention", i.e. throws an interrupt of some sort. The mnemonic "syscall" is an assembly instruction for several different architecture's instruction, even on the same hardware, various operating systems (and even versions of them) can use different instructions. (Nothing is stopping you from saying "A syscall instruction is any instruction which causes a protection fault when writing to a valid syscall number." Provided your kernel can tell that happened and react to it appropriately, the mechanism doesn't matter. )
While the mechanism you use to call them varies, all of them need you to tell the kernel _what_ you want them to do as well as _that_ you want them to do something.
The parameters to the syscalls go in registers (or in a specific region of memory) and include the system call number you want to run, telling the kernel how the arguments should be interpreted.
Once you have set up the parameters, you execute your magic instruction, which interrupts the kernel, looks at your request and fills in the reply. Then the process returns from kernel mode, and continues.
These operations are often by system library code, because different systems might use different system calls, but as long as the library call takes the same arguments, your code will still work.
That's why there's a distinction between library code and system calls.
The only other times you need to know what a system call is are if you are reversing a statically linked binary (which puts all the library code it is going to use directly into the program. And it can make them massive) or if you are working with shellcode (like an exploit for a vulnerability might use.)
I'm trying to reverse engineer a Z8002 system ATM , managed to get it disassembled , and can work out some of the routines . I not sure if it was written in assembler or C (knowning sometimes helps) pity Ghidra doesn't support the Z8k
are you sure? I think I've seen Z8000 files (.cspec, .slaspec, etc). Maybe you just need to customize the implementation a bit.
@@u0000-u2x ah! So ghidra might work? I had a look at the web site and could onl see Z80 options. Thanks I'll take a closer look 8-)
@@wktodd yeah I was mistaken sorry... indeed the Z80 is there. You could implement a custom processor though. Ghidra is extensible.
Wish I knew how to code.
The concepts just seem like logic.
Thanks
If you’re attempting to disassemble non x86/64 code using IDA, take a look at the pricing….I bet you’ll be surprised.
To save a lookup, it’s more than $10,000 for the full package, $5,000 for just MIPS and MIPS 64….
Amazing video! Loved this topic! Thanks for sharing that
Surely by now, a machine learning system could be trained on billions of lines of code for various kinds of programs alongside the binaries, and after training be able to take a binary as an input and generate a complete, high-level code that, when compiled, is functionally completely identical to the original binary?
Isn’t thinking of something and trying to figure out how it works (inventing it) basically reverse engineering something that hasn’t been designed yet?
I think this video is more like an introduction to the technical aspect of reverse engineering. Maybe for a "fun little youtube video" would have been cool to discuss the ethic/legal aspects and what is done in practice to accomplish a successful reverse-engineered solution without being exposed to legal trouble... I worked for a company that provided alternatives to IBM solutions. We made mainframe emulators, we had clean rooms and fun like that! :) We also had employees who were former IBM employees and that made "reverse" engineering (of something they knew exactly how it is coded) even more fun! :)
Might have been good to mention self-modifying code, and the challenge of reverse-engineering that.
Can you do a video on the Learning With Errors problem and why it's supposedly quantum-proof? Thanks.
If the code you are analyzing is written in some high level OOP language, chances are a modern decompiler will give you at least some C equivalent, not just assembly code; in some cases it might be able to give you some snippets in the original programming language, since OOP compilers are pretty deterministic in theory.
And then some languages don't even complie their code, it might not even be obfuscated, so why even bother...
Some guy I know took apart a car insurance app, it was just a web application so everything was in JavaScript; it didn't take a long time to find several vulnerabilities
Javascript bundles are incredibly easily to reverse engineer. However you won't be able to do it with most pieces of software that easily.
I didn't come away with a good impression of how good a job reverse engineering can do in an average case. Can it understand the whole program? Does it penetrate one level deeper than assembly and then give up?
Reverse engineering is basically learning how things work.
That is why all the restrictions in licenses that prohibit reverse engineering / de-compilation etc. should be ignored and made irrelevant.
To bar humans from seeking knowledge and understanding is to be anti-competitive and anti-science.
please make a video about what a space is :D
I'll never forgive what Rockstar Games did to the re3/revc project because for them it was a "threat to their economy" while they were still making millions with GTA Online and released GTA The Defective Edition to earn even more money at the cost of quality control.
I still believe a similar project with the same quality will appear at any moment since Vice City is one of my favourite games with a lot of potential beyond its aesthetic or gameplay. I know the re3/revc team didn't follow the clean room design to archieve a true RE project but I think with the current documentation available from GTAModding wiki and other sources is possible to archieve the same goal without the need of Ghidra or a team to document the decompilation
As for me, I wanna get into RE but since im a PHP developer I have a long track in front of me to archieve something, so I wanna start with something like learning C and decompile other games
Great video as always!
Great content. And the endless paper.. great memories :) By the way, are you also doing RE of 32/64 intel on your m1? If so, how? Thank you.
i hoped he talked more about how "disassable" bytecode differs from mashine code or that you can also reverse assambler code back to c/cpp
Disassembled code is machine code (binary) converted back to human-readable assembly language (text). They usually have a one-to-one correspondence (each binary opcode is generated by a unique assembly instruction and operands combination), so disassembling is a 100% deterministic.
There are some tools that try to convert assembly to C (or at least to a mix of C code blocks and assembly instructions), but their efficacy is limited and the generated code may be as hard to read as the original assembly.
What about reversing into a pseudo-C like Ghidra? I've heard of it a bit, but not a lot of info about it
Do a video about reverse engineering of Diablo I
So if someone makes their own offline launcher for adobe and removes the subscription DRM, is that not Reverse engineering? Could they not legally sell that then? if not why not? and how then?
that was a brilliant video !
surprised the word *debugger* wasn't mentioned
When do their stock of old printer paper become empty?
moral of the story, document your ABI as well (I never do either), 'but implementation details are suppoesd to be hidden' lies, just one of the many lies we tell are selves to make today's work 'more efficient' where in a number of years some poor sod is going through your modules disassembly line by line, although it's surprisingly fun to do, piece such things together
Where can I buy a shirt like yours? Love it!
Can it helps to find who hacked your phone or find who cloned your phone or PC?
I wanna reverse engineer a script in lua and I don't know how, can u please explain, it's for a exploit in a game , I wanna see how it works so I can make my own
13:35 x86 machine code does have loops.
Unless you're referring to the rep/repe/repne prefixes, not directly. It's easy enough to implement a loop by a jump instruction that points to earlier code, but even then, the idea of a loop "emerges" from the behavior of the code; there's no direct "loop" instruction. Indeed, you can just as easily jump backwards without it being a loop.
@@quantumdude836 Then look what the opcodes E0 to E2 do ;) They are the loop instructions. They implicitly use the CX register to count to zero, if I remember correctly.
@ huh, I completely forgot about those
@@quantumdude836 No problem, modern x86 has so many instructions. I tend to remember the high level ones, because I'm impressed how high level they are. call, ret, loop, looks like concepts from regular languages. You can even say something like ret 24 to automatically clean the stack when returning.
LOVE you guys!
Brilliant video!
What if I am hiding a Exe in a picture. How would I go around finding such information? And how would I Reverse engineer such file?
Kind of left out “vulnerability research” as a reason to reverse engineer.
Summarised it as "cyber security", but it was in there.
wait why can't we take the assembly code and pass it to a "decompiler" and generate C code from that? that'd be much easier. Like yk compilers can generate assembly from C, why can't you write a software that does the exact opposite process that a compiler does and generate C from the asm?
You can write software that decompiles a binary to c, ghidra basically does that for you. The problem is that while the output might be valid code it will be very different and much harder to read compared to the source code. This is because the compiler among other things does a lot of optimizations which can drastically change the underlying logic of the program. That being said the decompiled code is still very useful for figuring out generally what the program is doing.
Because there is no one solution to solving problems. Software can be written in unlimited ways. Thats the creative process.
Imagine you have to duplicate a painting you‘ve never seen, but you know of what colors & paper it‘s been made.
@@crunchyplasma1876 well we know what optimizations it does right? (idk something like loop unrolling, function inlining and stuff? lol idk I've never studied compliers) so can't we just reverse that when we see stuff that looks similar? Like you know how compilers have so many rules on how to parse human written C, can we reverse those rules and maybe add new rules in order to generate C that can be easily read by humans?
"That being said the decompiled code is still very useful for figuring out generally what the program is doing."
oh that's good!
@@mastershooter64 You'll never get the names of functions or variables back. Now imagine you have the source code of a program but all the functions and variable names are just labeled a, b, c, and so on. Custom data types are also completely gone.
@@mastershooter64 fundamentally, different C source files could compile to the exact same assembly. that information in the source is simply lost. its like trying to reconstruct a word from only the first letter.
Reverse engineering: some video cards had the string "Compatibility require the string (c) IBM here"
"Nothing is ever a waterfall in computer science, unless you're doing it wrong." Casually roasting one of the standard software development models 😂 But I get his point, the waterfall model is just an oversimplified, idealized concept of how software development could go, which in theory looks nice, but practically never matches reality.
Pretty sure step 1 should be to write tests that define the behaviour for the original system. Once you have those then you can start thinking about writing your own program to pass the tests.
how do you guarantee your tests cover all the capabilities of the original program?
@@dayansiddiqui4426 there are no guarantees in life. though i would be confident in saying that if you don't write regression tests and then do a big refactor/rewrite you'll be far worse off
Can an IA reverse engineer a binary? Is there any example?
If you're a junior dev working in an establish / legacy code base a big part of your job is reverse engineering.
"Yeah some guy who used to work here wrote that in perl back in 1998. Nobody has any clue how it works but its a vital component of our payroll system and its broken."
Time to develop a reverse engineer coding AI - that learns the function, inputs and outputs of the target software and then writes the code for the operating environment of choice.
Easier said than done
you don't need ai for that, it's called a decompiler lol
Thing is that one piece of bytecode can be decompiled in different ways, depending on context and where the process started
@@bosch5303Yes - the AI - which understands the overall purpose or function of the app - based on its learning / 'hands on' experience could work around these issues by either recoding the entire app or parts that can't be meaningfully decompiled.
I suppose with more thought - it might be quicker and easier to just get the AI to rewrite the entire app.
@@RickOShay orrrr anybody with more than a braincell will know what a program is doing by looking at system calls and strings?
Could deeplearning improve reverse engeniering to get a highlevel output?
Yep. None of the widely used disassemblers/decompilers do that yet, but there's active research in this area
Please create a video about ReactOS, an open source Windows clone.
A space is anything that matches a \s in a Perl regex :)
And how is the check for \s implemented in the code implementation of Perl regex?! 😂😂
@@hbm293 historically: space, tab, vertical tab, form feed. With multiline flag, also carriage return and line feed. With Unicode flag, anything with the property Whitespace.
Where can I get Ida? I don't find it in the Ubuntu repo.
It's commercial closed source, so you wouldn't.
We really need a law requiring software vendors to distribute full, annotated Source Code with everything they sell. If everybody had to do it, nobody could get away with plagiarism.
Everytime I watch this videos I am like "oh yeah, this one make sense... this one I get..." and then the hex editor opens ._.
Video about executable pls
11:34 "CAFEBABE" 😂
The java class file magic string is, indeed, cafebabe. Someone thought they were very funny.
If you are lucky things can be much easier... if you find a decompiler that works on that specific code.
The video looks choppy, it's not filmed for 60hz screens
I reverse-Engineer games all the time, to develop hacks for them. :)
Same for bypassing anti-cheats.
asset ripping/modding/porting for me
studying their code is also just a fun learning exercise, too
fun stuff
Same mate👍🏼
Reverse Engineering is also called dismantling~
I almost want to ask for someone to cover ECMA-48
Well done.
08:06 (refering to C) and then that high-level language...
@hk C was never a high level language. Higher than assembler. Perhaps an intermediate language at best. Just about any other language is higher.
@@billr3053 Babbage, the high level assembler for the GEC 4000, is definitely lower than C.
Ah yes, reverse engineering. A must know for game modders
But there is a way to get pseudo source code from a program its called a decompiler. IDA can also do it but it requires an addon that costs several thousands of dollars, Ghidra can do it for free.
And in the case of the binary he hexdumped, fernflower can be used(or any other Java decompiler) as it can be seen the first four bytes are "CAFEBABE" in hex which is the magic number for Java class bytecode
free but worse
@@isse6790 if it remains as popular, it can only get better as more developers start contributing to it. seen a lot of people get by just fine with it though. the main issue honestly is the NSA developed it so it's hard to trust given their history of hacks/malware "to catch terrorists". people can always pirate IDA, too if they really just MUST use it, specifically.
@@ETXAlienRobot201 It's been open source for several years now and is still much slower and worse than IDA.
@@isse6790 can just blame java on the slowness, it's easiest.
decompiling is really difficult, so making something comparable to a specific software product/company that effective has a monopoly on this would take time. not like IDA is sharing the fruits of their labor, are they?