In the article, you said this: "0x41414141 is the hex representation of the string “AAAA”, which mean’s that our string of As overflowed into the return address and the CPU tried to execute it." What rule or mechanism causes the return address to be placed in memory immediately after the 64 bytes of storage that were allocated for the password?
@@MatkatMusic _"What rule or mechanism causes the return address to be placed in memory immediately after the 64 bytes of storage that were allocated for the password?"_ It is called a stack, and it isn't after a return address, it is immediately before the string "AAAA". And it isn't the password it is the return address of the subroutine where the password is displayed on the screen.
@@MatkatMusic That's just how the CPU works (in this case x86). It's done to allow functions to be nested arbitrarily (otherwise, assigning a variable to the return value of a function wouldn't be seamless like it is). Whenever you CALL a function the return address is placed on the stack automatically. Returning will take the value on the top of the stack and return to it- whether or not that value is in fact the place it SHOULD return to.
A few additional notes for those who are interested: - At 3:13, we use `objdump` to find the address of the `debug()` function. Note that this only works for program binaries that are compiled with no ASLR (hence, the `-no-pie` GCC flag). Enabling ASLR will randomize the base address of the executable. However, note that even when ASLR is enabled, it does not mean that the program is invulnerable. Some techniques would allow attackers to leak the base address of the executable and thus, they would still be able to find the address of the `debug()` function by using relative offsets. - At 4:14, we determine our initial guess for the payload size by looking into the source code for the length of the `password` buffer, which is 64 bytes. This does not mean that compiled programs with no source code available to us are not vulnerable to this attack, assuming that they also use `gets()`. First off, we can reverse engineer and statically analyze the compiled code to find that `64` constant. A much easier method would be to dynamically run the program using `gdb` and inspect its memory during runtime to figure out the length of the payload needed to overwrite the return address. Alternatively, there is also a great library called `pwntools` that can also allow us to figure out the payload length by inputting a de Bruijn sequence. - This attack can only be performed if there are no stack canaries (hence, the `-fno-stack-protector` GCC flag). However, even with stack canaries enabled, this (again!) does not mean that programs are invulnerable. Some techniques would allow attackers to leak the value of the stack canary, which would render it useless since now, attackers can simply include the stack canary in their payload. - What if there are no functions like `debug()` in the first place? After all, a real production-grade application would surely have no such tantalizing target functions that allow attackers to drop a shell and execute arbitrary code? Well, even without the `debug()` function being present, as long as attackers are still able to overwrite the stack, they can still perform a technique called return-oriented programming or a return-to-libc attack. The basic idea would be to "return" to and jump around between snippets of code in other functions of the binary or even in library functions (such as in `libc`). These code snippets are called "gadgets". With a sufficiently large library such as `libc` (which is definitely used by many applications worldwide), return-oriented programming is Turing-complete. Hence, attackers can eventually do whatever they want, including dropping a shell and executing arbitrary code. The lesson to learn here is that mitigations can be bypassed and they should not be considered silver bullets in production-grade applications. Buffer overflow is also generally considered the simplest class of vulnerabilities (there are more involved ones such as double-free attacks, race condition attacks, and side-channel attacks). Try to avoid having buffer overflow vulnerabilities in the first place at all costs by reviewing your code and by using tools that could analyze your code and detect such vulnerabilities (either statically or dynamically). That is the only way to make your code truly more secure!
the biggest vuln in your server is that your service says the words "welcome to" in its announcement banner, and nothing about unauthorized access being prohibited, thus ensuring that anyone who breaks in cannot be prosecuted under the precedented legal defense "It literally welcomed me in, therefore my access was not prohibited"
@@monad_tcp It does yeah but I feel like it puts a lot of people off, I have been interested in coding since I was very young but never thought I was smart enough until I decided to give it a go regardless.
@@JM-Games what's the problem with putting people off? I doesn't put me off, but there are a lot of thinks that do like accounting, sales, management, medical doctor. Not everything is for everyone. People only say or care about that because the barrier of entry for computing is very low, then they get disillusioned because the learning curve gets steep very fast. It is hard, I had enough with pretending it isn't. Even thou I earn a lot of money with consulting because of incompetent begginers doing shit.
echo "Perfect video; Amazing Content; No Waste Time; Sponsor at the end, that i see entirely for respect and becouse doesn't ruin nothing; = Great Job; More More More!!!"
Great demonstration! In fact, there is yet another vuln in this code introduced purely by using the system() function at all. You can read why system() is discouraged in programs that run as any privileged user on the manpage for that Linux C function under the Notes section.
@@LowLevelTV Gives an easy opportunity to introduce how to use fork(), the exec() family of functions, and wait() as a future video idea. Keep up the good work
Plese make more videos like this! As someone who didn't understand this kinda stuff well I was able to (kind of) understand how a buffer overflow works! I plan to to take cybersecurity as like a hobby in the future so your videos are really nice
OK, this was, obviously, way oversimplified, with much more info available to us than would be to a typical attacker, but still it illustrates the principle of buffer overrun exploit, the most common of them all, very nicely.
@@houssem009 If I'm understanding it correctly, he was gaining access to the shell (hence why it said entering debug mode) but since the program was crashing the shell would close before a would-be hacker could do anything. Adding the parenthesis and semicolon bypasses that. I'm not too familiar with Python/linux, but it reminds me of how the system("pause"); comand in C++ keeps the terminal window open and wait for user input
Another vuln is that your password check function assumes it is a character, and dosent expect anything else. This will cause a vulnerability. People write code assuming the user will do as intended. If the user writes anything else that is not intended, have a check for that, if the user writes something larger than what is allocated, expect that, and have a response for it.
Great video but what I don't get is how the buffer overflows. Why does it keep loading your input into the memory if there is not enough space for it in the buffer you've declared? Is it strictly because of how the gets() function works?
Lets say you're given some data. You load all that data into an array, and you use the gets() function to read a string from it. At a basic level, C is a very simple language. It only knows exactly what you have told it and so when you give it an array and tell it to pull out a string from it, it doesn't know where the end of the string is yet. What gets() does is that it keeps reading bytes until it reaches a special character that indicates that it has reached the end of the string. If it never reaches this character, it will keep reading bytes, and because you didn't tell the function (nor is there a way to) how big the buffer you want to put the final string into is, it will keep on reading forever if it never encounters this character. This is why gets() is considered unsecure/unsafe, and why as he mentioned that it will give you compiler warnings to not use that function. The recommended alternative is the function fgets(), which has an input that is "read at most this many characters". You would set this to the size of your buffer so that if it never encounters the character indicating the end of the string, it would stop before writing data past the buffer.
The 64 in "char password[64]" is only there for C to know how many bytes to subtract from the hardware stack pointer. Once compiled, your program has no idea how big your array actually is. In fact, the only information the CPU has about your array is where it begins. The data type is only "known" in the sense that any index into the array is auto-multiplied by the size of each entry at compile time, without you having to write it as such. Now, as for gets(), it takes the beginning of your array as an argument, takes the user input, and copies it verbatim to your array. At no point is there any bounds check because as I said, that info is lost.
C is a very low-level language. A 64-byte array is simply a place in memory where 64 bytes are reserved. There's no checks to ensure you don't write outside that reserved area. You're expected to do the checks yourself in whatever way is most appropriate for your program. If you write past the end of those 64 reserved bytes in memory, you start overwriting stuff you weren't meant to overwrite. It could overwrite other code that would have otherwise executed (and now whatever you wrote to memory might get executed as code), or it might get lucky (or unlucky, depending on your perspective) and not overwrite anything important so nothing bad happens and no one ever finds the issue.
Hey I had a question, at 5:48 you said that we had to reverse the order of the hex sequence since intel was 'little endian' , so why wasn't the hex sequence reversed in the compiled binary at 3:28
The hex number 0x8049296 is stored as 96 92 04 08 in computer memory. But when we need to display it as hex computer will convert it to big-endian because that's how we read it. It's a bit hard to explain.
In 3:28, a tool is being used to display the binary, and it prints addresses in "human readable" form -- i.e. not just hex bytes, but the actual whole hex number in a way that makes sense to humans (like, so that 1000 is bigger than 0010).
Yes, but it's still going to be relative to main. Now, you still need its absolute address to perform this hack, since RET takes the top of the stack verbatim as the destination.
Because in this simple example you have the source and access to the binary you're trying to exploit compiled with symbol information and branching there is what you want it to do More often with something like this example you'd end up digging through the assembly for somewhere useful to return to, or if you find a way to predict the buffer address and there's no data exec protection you could supply your own code
Real hacking is a lot of hoping and guessing. With experience you can recognise common patterns and know where to guess first, but it often just comes down to persistence and chance.
Modern (post 90s) c compilers have compiler flags that protect function return addresses (shadow stack). In fact trying this on gcc version 13.2.1 gives with NO compiler flags gives: *** stack smashing detected ***: terminated rather than a segfault (perhaps my cpu just has better CFI?).
1. Finds address of debug function 2. Accesses debug function with the overflowing byte string where the overflow is the debug functions address. (Piping from exploit.py to hacked.c) - happens at gets(password). 3. Use file descriptor command to gain access to the system function which gives access to the shell. (Where the password is) 4. Cat password Where in the gets function is the debug function executed? In a catch block while printing?
It's not. It's using the CPU's return instruction, which is a computed GOTO to whatever is on top of the stack. gets() is what allows us to change the value of the top of the stack which we couldn't really do normally
Hey! I am loving the recent content but have to ask a question, arent operating systems increasingly protective over buffer overflows? Like modern windows versions make it near impossible from what I have heard.
It's much rarer to see one quite this simple show up anymore but the basic principle still applies. Anytime data can be affected that shouldn't be, it's a potential inroad to a working exploit
I gave an example in a response below, but anytime you have user supplied data stored before something they shouldn't be able to influence, bounds checking becomes important. They could overwrite other information your process uses that the OS or hardware don't know you care about: bool is_root_user = some_check(); char user_supplied[8]; gets(user_supplied); Supplying "AAAAAAAA\x01" could be a problem on little endian hardware, or even 'AAAAAAAA'+(anything non-null), depending on how the bool is evaluated later in the machine code You can still smash all sorts of things it's not specifically trying to prevent
To an extent yes. For example, one common protection is making the data segment non-executable. This makes buffer overflows where malicious code is injected into the buffer impossible. Generally those work by filling the buffer with instructions and NOP's, and then over running it and jumping back into the buffer, then executing arbitrary code. Those, you can't do. But, these shown in the video are still possible. No malicious code is executed - we're jumping to a function that already exists! We can somewhat prevent these types of exploits by shuffling memory around run to run. But, while the memory may be offset, it will never be rearranged. The structure is set at compile time. So, we can still trial and error our way into an exploit.
Yes, there's a few mechanics to protect the computer: * Randomization of where the program resides in memory. The exploit in the video requires the hacker to know the actual memory address of debug, which can be anywhere even if it's always the same relative to main. * Stack Canary: When a function is called, the compiler adds a hidden variable to the stack. It is checked before returning, and if it has changed the computer will segfault.
@@LowLevelTV can you put a readme with a quick guide on disabling protections? (I think it would be an interesting topic for a video) Anyway ty for the good content ^^
I don't think that c++ uses gets() but arrays still forget their element count when you pass a pointer to them as a function argument. I think so, anyway
You dont use null terminated strings in C++. you dont pass raw pointers in C++ either. std::string_view holds the pointer and the length of the string. So in C++ you never iterate over the buffer unless the buffer length variable is corrupted, which in any language is unavoidable. Also i think a huge mistake in programs is storing raw password in memory or in files. Also C++ is fully backwards compatible with C. So you CAN use null terminated strings, raw pointers or even use gets() but that is just the programmer not correctly utilising C++ but lets you write your own containers easily. Also in C++ you would not use char buffer[64]. You would use std::array and i am pretty sure std::cin with std::array would not produce a buffer overflow.
Why wouldn't your code be compiled as a PIE with ASLR and SSP? Buffer overflows while still a major problem deferring address resolution until runtime with a cannery to pick up accidental overruns would lessen the number of people that could feasible attack the process.
Because this is an illustrative example of the basic concepts. There has been tons of research put into predicting canaries, DEP bypasses, and you can always overrun things like heap buffers or buffers that are in memory before other important variables. If your memory looks like: bool has_root; char user_input[8]; And don't do proper bounds checking you could still have a very bad day You can't rely on the compiler and hardware and ignore fundamental security concepts
Hello, there's something I don't understand in the video in the python script at the end: in the payload we do an '\x08\x04\x92\x96'[::-1] but it's not valid in little endian because the payload comes out this way: 69x\29x\40x\80x\ and the \x is behind the shellcode bytes.
Cybersecurity noob here; I have a question. You only knew the address of debug() because you had access to the compiled executable. How would someone hack in over the net not knowing the address of the function?
I hope not... I have high hopes that it was made as perfect as possible by Microsoft, making sure that the mistakes by the past programming languages will not get repeated 😌
No. It (presumably) has other weaknesses but it's much MUCH harder to screw yourself over with memory issues. Being a managed language means it incorporates checks and safeties into the memory access code, so you can't just write past the end of an allocated buffer. C expects you to create your own checks and safeties, which is impossible to do reliably and is why we have higher-level languages like C#.
@@harissalam1421 It's not drivel at all. 70% of both Microsoft's and Google's security bugs have been memory-related. That tells me that no matter how hard you try, it's impossible to be absolutely certain you haven't introduced memory bugs. Probably the really good programmers have very low incidence of memory issues, but the evidence suggests that no matter how good you are, you'll screw up eventually.
Security on computers is just like security anywhere else, its all about how much security you need. For a lot of software, this exploit (except for the fact that it casuses a crash when the user inputs a password that is to long) is not something you need to worry about. For the same reason a pad lock is fine to use on your bike. Its hard enough to discourage people from even bothering because the benefit of breaking in is not worth the hassle of figuring out how. In my experience, buffer overflow attacks are (for any meaningful gains) usually way to hard to be a worth while endevaor (asides from possibly causing software to crash and using that to cause a service to be down).
They're hard because the hardware was built to make them harder. I imagine in the days of MS-DOS they were piss easy (no ASLR, no protected regions, no segfaults) you could just JMP anywhere you wanted
For a school exercice i am trying to catch a flag by reading a non public file inside a db. The requesthandler is profided and the password use a bf of size 72. whenever i try a bucrash i get this error ERROR InvalidStackCanary ERROR SubProcessCrashed
Yep. Buffer overflow vulnerabilities can occur in almost any hardware. They're one of the most basic forms of vulnerability and have existed since the beginning of computers. Even if the code is stored in separate memory from the main RAM, there's still the potential to trigger different behaviour by changing the state of the main RAM.
Don't forget that main was called by your OS. And since main calls another function, main has to store its link register value somewhere so that when "bl gets" is run, the link register isn't clobbered. And where does LR get stored? On the stack.
theoretically it shouldn't overwrite the return address since the stack grows in the opposite direction of where the return address is stored, could someone explain this happens .
The stack grows that way when you add things to it with PUSH, but you're using strcpy (which in x86 is REPNE MOVSB) with the stack as the destination. The CPU isn't using PUSH or POP to access the stack at this point so it's written to like any heap-allocated array would be.
@@williamdrum9899 The byte that have the index 0 of an array is the last allocated byte on the stack ? the video talks about the stack attack not the heap ?
what function instead of read() should we use to avoid buffer overflow from this situation. On windows, we have scanf_s, do we just use scanf with %[num]s or there exist some other "safer" function on unix?
You could use "fgets". The following should work, but not tested: char myString[30]; fgets(myString, 30, stdin); Edit: However... usually for C programs I've been in the habit of recommending Rust since C doesn't have as many safeguards as C++ nowadays.
@@wrnlb666 Personally the things keeping me from Rust are the insane levels of abstraction used, the insane overhead (i.e. overuse of macros), and the long compile times. The memory management I found quite easy since a lot of tutorials are available on UA-cam, but regardless wouldn't recommend if you already know safe modern C++. Lot easier and more straightforward to impl. this feat. in C++ with std::cin, std::string, and getline.
The reason gets() is unsafe is because it neither knows nor cares how big your array is. It'll blindly write user input to memory until the user is done. fgets takes the count parameter and won't write past that
I am note even remotely a coder, but I did write a horribly if then nested piece of code in LUA in Minecraft with the ComputerCraft mod and have been wanting to make a mostly secure password Programm from scratch for me there, too. I wonder if I can break that like this too ...
Assuming your program was some remote server, how would you get the address from it? Is that just based on hoping the program is the public build artifact?
well , that's actually where the difficulty of exploiting a remote software comes from. Usually what you would try to do , is to play with it , and fuzz it until you leak a program's name , a version , an OS version and so on. if you know what program is running and what version , you can just download it , and test at your side while emulating the same environment as your target. you could decompile it , and check what interesting functions are in what memory offset.
If your vulnerable program had the +s as premision, it could be used to privilage escalation, pleas make this remark in your next video, since it is important.
If you have access to the compiled binary cant you just edit it and write a call to the debug function at the start of the program? I cant see why you would need to write a python exploit and overflow the buffer.
Then you're just hacking your own computer. An actual hacker is interested in hacking someone else's. In theory this program is being hosted on a different computer, the python script just tells you what to write to gain access to the debug console.
When i try this in my WSL2 ubuntu installation, I successfully build the program using the build.sh file and cause the segfault with a big input, but it doesn't say "core dumped" and the dmesg command doesn't say anything about a segfault. Is there some other stuff I need to do?
In order to let the function return to the correct place in the code when it finishes, a function's return address is pushed onto the stack when it is called. The stack is the same place that local variables go, so in a way the return address is like the "0th" local variable that a function has. Add that to the fact that new items go on the stack at lower addresses (the bottom of the stack is at its highest address), and that means that the local variables of a function will come somewhere in memory before -- and close to -- their return address. Thus, when buf is overflowed, it at some point overwrites the return address and once that is read by the CPU it just blindly follows it to somewhere it's not meant to go.
In order for the CPU to know where to return after a function call, a pointer to the line of code after the call is stored, either in a special register or in this case on the stack. The CPU doesn't know or care if this return address is correct. The CPU's return instruction sets the instruction pointer equal to this return address (effectively a goto that ignores scope), even if the address was altered during the function. C is smart enough not to store local variables over the return address, but arrays... not so much.
I appreciate this video, however can you do an example that might be more subtle? Just something that doesn't involve gets or another already known red-flag function.
I think we should just leave memory unsafe languages in the past. Having possible buffer overruns, bad pointers, etc. is just to dangerous in today's world. And they easily avoidable by using a memory safe language.
@@williamdrum9899 bad pointers and memory bugs often come from string inputs. But not all. Many come from offset miscalculations, bad array indexes, use after free, double free, the list goes on. These are all bugs resulting from manual management of pointers. Especially electronics like robots in factories these kinds of bugs can be catastrophic. So no there is no argument to continue to create new commercial products in c/c++ besides things like missing tools/ game engines.
@@williamdrum9899 rust is as fast as c and c++. Which is the reason it is so appreciated by the community. It produces memory safe easy to read high level feeling blazingly fast code.
@@redcrafterlppa303 That's fair. I guess C is like the QWERTY Keyboard of programming languages: its purpose is no longer needed but we keep using it anyway despite that, because so many people know it and it usually gets the job done
Go check out my Medium post on the same topic: medium.com/@lowlevellearning/your-first-buffer-overflow-b44e5ba5598a
This is cool and all but won't allow you to hack any websites that you don't have physical access to.
In the article, you said this: "0x41414141 is the hex representation of the string “AAAA”, which mean’s that our string of As overflowed into the return address and the CPU tried to execute it."
What rule or mechanism causes the return address to be placed in memory immediately after the 64 bytes of storage that were allocated for the password?
@@MatkatMusic _"What rule or mechanism causes the return address to be placed in memory immediately after the 64 bytes of storage that were allocated for the password?"_ It is called a stack, and it isn't after a return address, it is immediately before the string "AAAA". And it isn't the password it is the return address of the subroutine where the password is displayed on the screen.
@@MatkatMusic That's just how the CPU works (in this case x86). It's done to allow functions to be nested arbitrarily (otherwise, assigning a variable to the return value of a function wouldn't be seamless like it is). Whenever you CALL a function the return address is placed on the stack automatically. Returning will take the value on the top of the stack and return to it- whether or not that value is in fact the place it SHOULD return to.
the Waffle House has found its new host.
A few additional notes for those who are interested:
- At 3:13, we use `objdump` to find the address of the `debug()` function. Note that this only works for program binaries that are compiled with no ASLR (hence, the `-no-pie` GCC flag). Enabling ASLR will randomize the base address of the executable. However, note that even when ASLR is enabled, it does not mean that the program is invulnerable. Some techniques would allow attackers to leak the base address of the executable and thus, they would still be able to find the address of the `debug()` function by using relative offsets.
- At 4:14, we determine our initial guess for the payload size by looking into the source code for the length of the `password` buffer, which is 64 bytes. This does not mean that compiled programs with no source code available to us are not vulnerable to this attack, assuming that they also use `gets()`. First off, we can reverse engineer and statically analyze the compiled code to find that `64` constant. A much easier method would be to dynamically run the program using `gdb` and inspect its memory during runtime to figure out the length of the payload needed to overwrite the return address. Alternatively, there is also a great library called `pwntools` that can also allow us to figure out the payload length by inputting a de Bruijn sequence.
- This attack can only be performed if there are no stack canaries (hence, the `-fno-stack-protector` GCC flag). However, even with stack canaries enabled, this (again!) does not mean that programs are invulnerable. Some techniques would allow attackers to leak the value of the stack canary, which would render it useless since now, attackers can simply include the stack canary in their payload.
- What if there are no functions like `debug()` in the first place? After all, a real production-grade application would surely have no such tantalizing target functions that allow attackers to drop a shell and execute arbitrary code? Well, even without the `debug()` function being present, as long as attackers are still able to overwrite the stack, they can still perform a technique called return-oriented programming or a return-to-libc attack. The basic idea would be to "return" to and jump around between snippets of code in other functions of the binary or even in library functions (such as in `libc`). These code snippets are called "gadgets". With a sufficiently large library such as `libc` (which is definitely used by many applications worldwide), return-oriented programming is Turing-complete. Hence, attackers can eventually do whatever they want, including dropping a shell and executing arbitrary code.
The lesson to learn here is that mitigations can be bypassed and they should not be considered silver bullets in production-grade applications. Buffer overflow is also generally considered the simplest class of vulnerabilities (there are more involved ones such as double-free attacks, race condition attacks, and side-channel attacks). Try to avoid having buffer overflow vulnerabilities in the first place at all costs by reviewing your code and by using tools that could analyze your code and detect such vulnerabilities (either statically or dynamically). That is the only way to make your code truly more secure!
Thanks for the great comment!!
Grateful cause you took the time to share you knoledge!
Isn't this exploit also hardware specific? I think this wouldn't work on ARM unless the functions involved pushed LR.
Trick is to never take user input at all 😉
I hope that YT is going to introduce the feature "save this great comment". 👍
the biggest vuln in your server is that your service says the words "welcome to" in its announcement banner, and nothing about unauthorized access being prohibited, thus ensuring that anyone who breaks in cannot be prosecuted under the precedented legal defense "It literally welcomed me in, therefore my access was not prohibited"
Does that really work in the US?
@@oliver24x You can't just invite someone to your house only to shoot them for intruding your house.
I'd argue this is wrong. If you use a fake ID to enter an office, and a receptionist welcomes you, does that suddenly give you the clear?
@@schlopping you can argue whether it's wrong or not all you want. this is established case law.
Viruses apparently are just like vampires....
In the end it's not necessarily about being a genius, it is about understanding how it works..
I think it should be a joint effort between coders to try and get rid of the illusion of you need to be a genius to code, its simply not true.
@@JM-Games why thou, it makes so we earn more
but on the other hand it takes years to become good, you don't need to be genius , you only need to study.
@@monad_tcp It does yeah but I feel like it puts a lot of people off, I have been interested in coding since I was very young but never thought I was smart enough until I decided to give it a go regardless.
@@JM-Games what's the problem with putting people off?
I doesn't put me off, but there are a lot of thinks that do like accounting, sales, management, medical doctor.
Not everything is for everyone.
People only say or care about that because the barrier of entry for computing is very low, then they get disillusioned because the learning curve gets steep very fast.
It is hard, I had enough with pretending it isn't.
Even thou I earn a lot of money with consulting because of incompetent begginers doing shit.
keep uploading more exploit dev / binary exp videos bro, great job
thanks!
echo "Perfect video; Amazing Content; No Waste Time; Sponsor at the end, that i see entirely for respect and becouse doesn't ruin nothing; = Great Job; More More More!!!"
I appreciate that!
Great demonstration! In fact, there is yet another vuln in this code introduced purely by using the system() function at all. You can read why system() is discouraged in programs that run as any privileged user on the manpage for that Linux C function under the Notes section.
Thanks for the info!
@@LowLevelTV Gives an easy opportunity to introduce how to use fork(), the exec() family of functions, and wait() as a future video idea. Keep up the good work
This is my favourite channel about programming. Thanks a lot mate!
Happy to hear that!
Plese make more videos like this! As someone who didn't understand this kinda stuff well I was able to (kind of) understand how a buffer overflow works! I plan to to take cybersecurity as like a hobby in the future so your videos are really nice
You got it
OK, this was, obviously, way oversimplified, with much more info available to us than would be to a typical attacker, but still it illustrates the principle of buffer overrun exploit, the most common of them all, very nicely.
Wow :) very good desc of how a buffer exploit works. I knew the theory but great to see how it actually works in practice.
though i knew most of what was said in the video, it was still really well made and i enjoyed it regardless, good job :D
Glad you enjoyed it!
i didnt understand whet he did at 6:35
@@houssem009 If I'm understanding it correctly, he was gaining access to the shell (hence why it said entering debug mode) but since the program was crashing the shell would close before a would-be hacker could do anything. Adding the parenthesis and semicolon bypasses that. I'm not too familiar with Python/linux, but it reminds me of how the system("pause"); comand in C++ keeps the terminal window open and wait for user input
I like how you release this video after I had a project in my computer security class on performing buffer overflow attacks
Another vuln is that your password check function assumes it is a character, and dosent expect anything else. This will cause a vulnerability. People write code assuming the user will do as intended. If the user writes anything else that is not intended, have a check for that, if the user writes something larger than what is allocated, expect that, and have a response for it.
Glad that your channel got a sponsor! Thanks for the content
Any time!
Thought you were John Hammond for a little bit and I was like "wow, he looks so different cleanshaven!"
Great video but what I don't get is how the buffer overflows. Why does it keep loading your input into the memory if there is not enough space for it in the buffer you've declared? Is it strictly because of how the gets() function works?
Lets say you're given some data. You load all that data into an array, and you use the gets() function to read a string from it. At a basic level, C is a very simple language. It only knows exactly what you have told it and so when you give it an array and tell it to pull out a string from it, it doesn't know where the end of the string is yet. What gets() does is that it keeps reading bytes until it reaches a special character that indicates that it has reached the end of the string. If it never reaches this character, it will keep reading bytes, and because you didn't tell the function (nor is there a way to) how big the buffer you want to put the final string into is, it will keep on reading forever if it never encounters this character. This is why gets() is considered unsecure/unsafe, and why as he mentioned that it will give you compiler warnings to not use that function.
The recommended alternative is the function fgets(), which has an input that is "read at most this many characters". You would set this to the size of your buffer so that if it never encounters the character indicating the end of the string, it would stop before writing data past the buffer.
The 64 in "char password[64]" is only there for C to know how many bytes to subtract from the hardware stack pointer. Once compiled, your program has no idea how big your array actually is. In fact, the only information the CPU has about your array is where it begins. The data type is only "known" in the sense that any index into the array is auto-multiplied by the size of each entry at compile time, without you having to write it as such.
Now, as for gets(), it takes the beginning of your array as an argument, takes the user input, and copies it verbatim to your array. At no point is there any bounds check because as I said, that info is lost.
C is a very low-level language. A 64-byte array is simply a place in memory where 64 bytes are reserved. There's no checks to ensure you don't write outside that reserved area. You're expected to do the checks yourself in whatever way is most appropriate for your program. If you write past the end of those 64 reserved bytes in memory, you start overwriting stuff you weren't meant to overwrite. It could overwrite other code that would have otherwise executed (and now whatever you wrote to memory might get executed as code), or it might get lucky (or unlucky, depending on your perspective) and not overwrite anything important so nothing bad happens and no one ever finds the issue.
Absolutely love this channel :) keep up the good work!
Glad you enjoy it!
I would recommend cyclic the command to find the location of where to put in the instruction pointer
"2" instead of "too" in the password would have been even cooler
damn it missed opportunity
Found this channel by pure chance and I'm glad I did!
Or was it by pure well working algorithm?
For me it was. Great channel!
this is a GOLDEN content. Please keep it up. I love your channel it's unique.
Well, this video just made me turn all notifications from this channel to on. Great vid
Welcome aboard!
Hey I had a question, at 5:48 you said that we had to reverse the order of the hex sequence since intel was 'little endian' , so why wasn't the hex sequence reversed in the compiled binary at 3:28
The hex number 0x8049296 is stored as 96 92 04 08 in computer memory. But when we need to display it as hex computer will convert it to big-endian because that's how we read it. It's a bit hard to explain.
In 3:28, a tool is being used to display the binary, and it prints addresses in "human readable" form -- i.e. not just hex bytes, but the actual whole hex number in a way that makes sense to humans (like, so that 1000 is bigger than 0010).
Because debug screens show it in the way we read numbers, not the way the computer does.
Isnt the address of debug() function going to change when it is loaded into memory? I remember something like relocateable executable, but not sure.
Yes, but it's still going to be relative to main. Now, you still need its absolute address to perform this hack, since RET takes the top of the stack verbatim as the destination.
Thanks. For DOS I think it was easier if you used a near call.
How do we know (as hypothetical hackers) that we want to look for the debug() function?
Because in this simple example you have the source and access to the binary you're trying to exploit compiled with symbol information and branching there is what you want it to do
More often with something like this example you'd end up digging through the assembly for somewhere useful to return to, or if you find a way to predict the buffer address and there's no data exec protection you could supply your own code
Real hacking is a lot of hoping and guessing. With experience you can recognise common patterns and know where to guess first, but it often just comes down to persistence and chance.
I think this is a perfect start
Thanks for practical demonstration!
Thank you!
Modern (post 90s) c compilers have compiler flags that protect function return addresses (shadow stack). In fact trying this on gcc version 13.2.1 gives with NO compiler flags gives: *** stack smashing detected ***: terminated
rather than a segfault (perhaps my cpu just has better CFI?).
1. Finds address of debug function
2. Accesses debug function with the overflowing byte string where the overflow is the debug functions address. (Piping from exploit.py to hacked.c) - happens at gets(password).
3. Use file descriptor command to gain access to the system function which gives access to the shell. (Where the password is)
4. Cat password
Where in the gets function is the debug function executed? In a catch block while printing?
It's not. It's using the CPU's return instruction, which is a computed GOTO to whatever is on top of the stack. gets() is what allows us to change the value of the top of the stack which we couldn't really do normally
@@williamdrum9899 thanks
@@williamdrum9899 thanks
Exploit dev is so cool! Keep doing these!
Where have you been all my life?
I needed this content ❤
Hey! I am loving the recent content but have to ask a question, arent operating systems increasingly protective over buffer overflows? Like modern windows versions make it near impossible from what I have heard.
It's much rarer to see one quite this simple show up anymore but the basic principle still applies. Anytime data can be affected that shouldn't be, it's a potential inroad to a working exploit
I gave an example in a response below, but anytime you have user supplied data stored before something they shouldn't be able to influence, bounds checking becomes important. They could overwrite other information your process uses that the OS or hardware don't know you care about:
bool is_root_user = some_check();
char user_supplied[8];
gets(user_supplied);
Supplying "AAAAAAAA\x01" could be a problem on little endian hardware, or even 'AAAAAAAA'+(anything non-null), depending on how the bool is evaluated later in the machine code
You can still smash all sorts of things it's not specifically trying to prevent
@@charlesnathansmith thanks for the replies really helps me understand!
To an extent yes. For example, one common protection is making the data segment non-executable. This makes buffer overflows where malicious code is injected into the buffer impossible. Generally those work by filling the buffer with instructions and NOP's, and then over running it and jumping back into the buffer, then executing arbitrary code. Those, you can't do. But, these shown in the video are still possible. No malicious code is executed - we're jumping to a function that already exists! We can somewhat prevent these types of exploits by shuffling memory around run to run. But, while the memory may be offset, it will never be rearranged. The structure is set at compile time. So, we can still trial and error our way into an exploit.
Yes, there's a few mechanics to protect the computer:
* Randomization of where the program resides in memory. The exploit in the video requires the hacker to know the actual memory address of debug, which can be anywhere even if it's always the same relative to main.
* Stack Canary: When a function is called, the compiler adds a hidden variable to the stack. It is checked before returning, and if it has changed the computer will segfault.
What protections did you disable for this demo?
Literally all of them lmao
@@LowLevelTV can you put a readme with a quick guide on disabling protections? (I think it would be an interesting topic for a video)
Anyway ty for the good content ^^
I have a question:
In c++, is the string standard library safe or is it vulnerable to similar attacks to what’s seen here?
string isn't a lib 💀
@@filipoda123 it’s part of the standard library
@@loganiushere well yea, but if the standard lib would be that vulnerable then that language should be already updated long ago
I don't think that c++ uses gets() but arrays still forget their element count when you pass a pointer to them as a function argument. I think so, anyway
You dont use null terminated strings in C++. you dont pass raw pointers in C++ either. std::string_view holds the pointer and the length of the string. So in C++ you never iterate over the buffer unless the buffer length variable is corrupted, which in any language is unavoidable. Also i think a huge mistake in programs is storing raw password in memory or in files. Also C++ is fully backwards compatible with C. So you CAN use null terminated strings, raw pointers or even use gets() but that is just the programmer not correctly utilising C++ but lets you write your own containers easily. Also in C++ you would not use char buffer[64]. You would use std::array and i am pretty sure std::cin with std::array would not produce a buffer overflow.
This is so freaking cool. Edit: i click on 1 video and watch all 3 related to it. Man this is so awesome.
Why wouldn't your code be compiled as a PIE with ASLR and SSP? Buffer overflows while still a major problem deferring address resolution until runtime with a cannery to pick up accidental overruns would lessen the number of people that could feasible attack the process.
Because this is an illustrative example of the basic concepts. There has been tons of research put into predicting canaries, DEP bypasses, and you can always overrun things like heap buffers or buffers that are in memory before other important variables.
If your memory looks like:
bool has_root;
char user_input[8];
And don't do proper bounds checking you could still have a very bad day
You can't rely on the compiler and hardware and ignore fundamental security concepts
This video is here to illustrate the reason those features exist
is it bad that i dont space my brackets? I have them attached to their control struct or function.
Hello, there's something I don't understand in the video in the python script at the end: in the payload we do an '\x08\x04\x92\x96'[::-1] but it's not valid in little endian because the payload comes out this way: 69x\29x\40x\80x\ and the \x is behind the shellcode bytes.
I don't see why it would come out that way, why are the nibbles of each byte reversed by the [::-1]
Cybersecurity noob here; I have a question.
You only knew the address of debug() because you had access to the compiled executable. How would someone hack in over the net not knowing the address of the function?
If they have access to the executable then they can look at the assembly all they want and there's a lot of different way to try to break a program
They can write their own code in the string using hex codes and then set the return address to it.
Wow. Does C# have same weaknesses?
(Perfect video, 8 min long, easy to follow and understand, well explained, not too long)
I hope not... I have high hopes that it was made as perfect as possible by Microsoft, making sure that the mistakes by the past programming languages will not get repeated 😌
No.
No. It (presumably) has other weaknesses but it's much MUCH harder to screw yourself over with memory issues. Being a managed language means it incorporates checks and safeties into the memory access code, so you can't just write past the end of an allocated buffer. C expects you to create your own checks and safeties, which is impossible to do reliably and is why we have higher-level languages like C#.
@@harissalam1421 It's not drivel at all. 70% of both Microsoft's and Google's security bugs have been memory-related. That tells me that no matter how hard you try, it's impossible to be absolutely certain you haven't introduced memory bugs. Probably the really good programmers have very low incidence of memory issues, but the evidence suggests that no matter how good you are, you'll screw up eventually.
Security on computers is just like security anywhere else, its all about how much security you need. For a lot of software, this exploit (except for the fact that it casuses a crash when the user inputs a password that is to long) is not something you need to worry about. For the same reason a pad lock is fine to use on your bike. Its hard enough to discourage people from even bothering because the benefit of breaking in is not worth the hassle of figuring out how. In my experience, buffer overflow attacks are (for any meaningful gains) usually way to hard to be a worth while endevaor (asides from possibly causing software to crash and using that to cause a service to be down).
They're hard because the hardware was built to make them harder. I imagine in the days of MS-DOS they were piss easy (no ASLR, no protected regions, no segfaults) you could just JMP anywhere you wanted
Not into C at all but very educational and well explained. Thank you
For a school exercice i am trying to catch a flag by reading a non public file inside a db. The requesthandler is profided and the password use a bf of size 72. whenever i try a bucrash i get this error ERROR InvalidStackCanary
ERROR SubProcessCrashed
LOVE this video!!!!! Awesome!!!
Yay! Thank you!
Is it possible to use the buffer overflow on an MCU considering that MCU is running its program from the flash and the buffer, obviously, is in RAM?
Yep. Buffer overflow vulnerabilities can occur in almost any hardware. They're one of the most basic forms of vulnerability and have existed since the beginning of computers. Even if the code is stored in separate memory from the main RAM, there's still the potential to trigger different behaviour by changing the state of the main RAM.
@@clonkex Thanks for your answer.
Why there is the full name of fonction in thé executable, that take place for nothing right?
This was really interesting! Loved it, keep em coming!
More to come!
In better processors (ARM, Aurix) the return-address (of the instruction pointer) is stored in a register i.e. separated from the stack.
The caller's caller and so on are still on the stack, at least in ARM.
@@xanri7673The caller's what?
@@zofe the caller's caller
If you call function B from function A and C from function B, you have to store the return to function A during your call to C.
Don't forget that main was called by your OS. And since main calls another function, main has to store its link register value somewhere so that when "bl gets" is run, the link register isn't clobbered. And where does LR get stored? On the stack.
so great explanation and great demonstration
keep it up bro
if you can share more in medium community it would be great
Thank you, I will
Great info! *SUBSCRIBED*
theoretically it shouldn't overwrite the return address since the stack grows in the opposite direction of where the return address is stored, could someone explain this happens .
The stack grows that way when you add things to it with PUSH, but you're using strcpy (which in x86 is REPNE MOVSB) with the stack as the destination. The CPU isn't using PUSH or POP to access the stack at this point so it's written to like any heap-allocated array would be.
@@williamdrum9899 The byte that have the index 0 of an array is the last allocated byte on the stack ?
the video talks about the stack attack not the heap ?
Why would the executable file contain any information about the names of the functions?
Makes debugging easier.
what function instead of read() should we use to avoid buffer overflow from this situation. On windows, we have scanf_s, do we just use scanf with %[num]s or there exist some other "safer" function on unix?
You could use "fgets". The following should work, but not tested:
char myString[30];
fgets(myString, 30, stdin);
Edit: However... usually for C programs I've been in the habit of recommending Rust since C doesn't have as many safeguards as C++ nowadays.
@@v01d_r34l1ty rust is a good language, but it is indeed harder than c and cpp. The memory management of rust is just so different.
@@wrnlb666 Personally the things keeping me from Rust are the insane levels of abstraction used, the insane overhead (i.e. overuse of macros), and the long compile times. The memory management I found quite easy since a lot of tutorials are available on UA-cam, but regardless wouldn't recommend if you already know safe modern C++. Lot easier and more straightforward to impl. this feat. in C++ with std::cin, std::string, and getline.
The reason gets() is unsafe is because it neither knows nor cares how big your array is. It'll blindly write user input to memory until the user is done. fgets takes the count parameter and won't write past that
can u please tell me the distro ure using in this video
I am note even remotely a coder, but I did write a horribly if then nested piece of code in LUA in Minecraft with the ComputerCraft mod and have been wanting to make a mostly secure password Programm from scratch for me there, too. I wonder if I can break that like this too ...
Lua is not going to act at all in the same way as C
Not in the same manner. Lua doesn't let you arbitrarily write to the stack. To be fair, most C functions don't either except the input/output ones
Very nice video; though how would you find the debug() address on windows?
Assuming your program was some remote server, how would you get the address from it? Is that just based on hoping the program is the public build artifact?
well , that's actually where the difficulty of exploiting a remote software comes from.
Usually what you would try to do , is to play with it , and fuzz it until you leak a program's name , a version , an OS version and so on.
if you know what program is running and what version , you can just download it , and test at your side while emulating the same environment as your target.
you could decompile it , and check what interesting functions are in what memory offset.
by tunneling the file contents? i dont see how remote changes the exploit.
You are doing Very great job 👍
Many many thanks
Why would hacker have the execution file? And does this trick works on interpreted languages?
Doesn't work on interpreted languages
c: a language
python: a snake language for hackers
accurate
Brilliantly, my friend(:
I just discovered a whole new interest watching this video
Really nice👍👍
Thanks a lot 😊
Wow that's pretty cool.
amazing man !!! thank you very much !!
Right when you pressed enter at 2:21 UA-cam crashed because it was updating lmao
amazing content man! thank you
Please keep uploading!
dude this was fantastic! Thanks!
Anytime
you forgot bonus points if you use alphanumeric shellcode.
ASCII SHELLCODE EZ
If your vulnerable program had the +s as premision, it could be used to privilage escalation, pleas make this remark in your next video, since it is important.
If you have access to the compiled binary cant you just edit it and write a call to the debug function at the start of the program? I cant see why you would need to write a python exploit and overflow the buffer.
Then you're just hacking your own computer. An actual hacker is interested in hacking someone else's. In theory this program is being hosted on a different computer, the python script just tells you what to write to gain access to the debug console.
When i try this in my WSL2 ubuntu installation, I successfully build the program using the build.sh file and cause the segfault with a big input, but it doesn't say "core dumped" and the dmesg command doesn't say anything about a segfault. Is there some other stuff I need to do?
Can you make one of these videos with the arduino environment: methods such as Serial.read() etc ??
But why buffer overflow causes instruction pointer to be overwritten?
In order to let the function return to the correct place in the code when it finishes, a function's return address is pushed onto the stack when it is called. The stack is the same place that local variables go, so in a way the return address is like the "0th" local variable that a function has. Add that to the fact that new items go on the stack at lower addresses (the bottom of the stack is at its highest address), and that means that the local variables of a function will come somewhere in memory before -- and close to -- their return address. Thus, when buf is overflowed, it at some point overwrites the return address and once that is read by the CPU it just blindly follows it to somewhere it's not meant to go.
In order for the CPU to know where to return after a function call, a pointer to the line of code after the call is stored, either in a special register or in this case on the stack. The CPU doesn't know or care if this return address is correct. The CPU's return instruction sets the instruction pointer equal to this return address (effectively a goto that ignores scope), even if the address was altered during the function. C is smart enough not to store local variables over the return address, but arrays... not so much.
Hack The Box speedrun incoming?
🤫
I appreciate this video, however can you do an example that might be more subtle? Just something that doesn't involve gets or another already known red-flag function.
This can also be done with printf() if you allow the user to type their own format string (something you should never allow)
THANKS@!
thanks man, more please
More to come!
Whats the point of gets?
reads an input string.
I think we should just leave memory unsafe languages in the past. Having possible buffer overruns, bad pointers, etc. is just to dangerous in today's world. And they easily avoidable by using a memory safe language.
For internet apps, definitely. For electronics that don't take strings as input, not a big deal
@@williamdrum9899 bad pointers and memory bugs often come from string inputs. But not all. Many come from offset miscalculations, bad array indexes, use after free, double free, the list goes on. These are all bugs resulting from manual management of pointers. Especially electronics like robots in factories these kinds of bugs can be catastrophic. So no there is no argument to continue to create new commercial products in c/c++ besides things like missing tools/ game engines.
@@redcrafterlppa303 Sometimes you gotta go fast
@@williamdrum9899 rust is as fast as c and c++. Which is the reason it is so appreciated by the community. It produces memory safe easy to read high level feeling blazingly fast code.
@@redcrafterlppa303 That's fair. I guess C is like the QWERTY Keyboard of programming languages: its purpose is no longer needed but we keep using it anyway despite that, because so many people know it and it usually gets the job done
Please tell me how i can be able to leaen a new language without forgetting C
this guy is p cool.
low level gang 4 lyfe
#LLG
Hi I really enjoy your vids. Thank you for uploading them.
My pleasure!
Just make the string store infinite of characters. Easy Solution!
If buffer overflows are such an obvious exploit, why do we permit core dumps?
Interesting for debugging your local code, but a "how to fix this hole" closing would be better for teaching.
At 2:00 I appreciate the craft of running just the right amount of comands to reach pid 4567
C A L C U L A T E D
In no way did he start of the video by breaking bad.
You say the buffer is too small, I say your data is too greedy.
Hmmm,,, Reminds me of the book by jon erickson. Hacking Art of Exploitation.
great tutorial, but what if you don't have access to the source code?
The hacker just tries this anywhere there is user input to see if the vuln existes
What if we use strdup or fgets instead of gets, obviously gets is deprecated now.
He's showing you why gets is deprecated
gets() is a dangerous deprecated C function...
void gets(char* array)
{
while(1)
{
array[0] = userinput;
array++;
if (userinput == ENTER_KEY){goto done;}
}
done:
}
The title implies the solution is to just make the buffer longer. It should be called "what happens when you don't use fgets()"
\x90\x90\x90...
A manager 😂😂😂
XD
i never seen some one use tail linux for coding or exploit finding or hacking
I love you man
I like your funny words magic man