Be Careful When Using scanf() in C
Вставка
- Опубліковано 15 жов 2024
- Today we learn why using the scanf() function in C might cause some serious problems and how to do it properly.
◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: www.neuralnine...
💻 The Algorithm Bible Book: www.neuralnine...
👕 Programming Merch: www.neuralnine...
🌐 Social Media & Contact 🌐
📱 Website: www.neuralnine...
📷 Instagram: / neuralnine
🐦 Twitter: / neuralnine
🤵 LinkedIn: / neuralnine
📁 GitHub: github.com/Neu...
🎙 Discord: / discord
🎵 Outro Music From: www.bensound.com/
I was like “oh he’s gonna use another scan function” and then you whipped out the %12s and it was magnificent
Be careful when using C.
With great power comes great responsibility
"don't do this"
C stands for Careful
@@synthesoul Tony Stark writes in C. Spiderman uses Rust.
C is Gods programming language, of course humans can't make any good use of it
Writing a letter when program expects a number:
Pascal: throws an error and ends
Python: throws an error and ends
Perl: throws an error and ends
Golang: assigns 0 to the variable
C: loops itself to infinity, forcing you to use task manager to shut the program down before it sets your computer on fire.
Yeah, but you can always get the return of scanf and test it, if it's a 0, clean the stdin and try again
@@jelsonrodrigues I would even argue that you SHOULD check the return value. If you blindly trust an integer variable that should have got assigned a value by scanf, but wasn't, then maybe garbage was left in the variable...
c forces you to actually think about what exactly you are doing because you have so much more control
Ctrl+C stops any activity in any terminal
If you're a better programmer, you'd design the program to be able to handle a char when it's expecting a number.
Thank you for covering this topic. Many tutorials for beginners use scanf() or gets(). Which obviously can be really dangerous.
What should be used then
@@anibal.07 it´s not that it shouldn´t be used but the issue should be adressed
@minhhoanglazy it´s so dangerious that you can basically fill all your inputs with wrong information, which either makes the whole program not work at best and at worst, makes it for a hacker super easy to make it do whatever he wants
@@anibal.07 Use scanf but use it in the right way. For example: scanf("%12s", string) or scanf("%12[^
]", string).
@@anibal.07 fgets(buf, sizeof(buf), stdin)
Just a reminder. As I recall, in standard C99, compilers do not allocate the extra byte for the null termination on char-arrays (strings). So char my_string[20] can be up to 19 characters long, since the last one must be a null, to indicate the end. My understanding is that this is meant to be for use-cases when you are storing small integers in an array, and want to save space. This way, it won't be adding bytes to arrays unpredictably, everything is always the size you tell it to be.
I'm sure everyone here already is well aware of this, just in case some people forgot.
If you declare the array with a length, like _char s[5] = "abcdefg";_ then that will be the number of bytes allocated. If you initialize that with a string, then only that number of characters will be copied from the string even if it is longer than the buffer. In the previous example that would be "abcde" without a terminating '\0'. If the initializer string is shorter that the declared array, then _all_ remaining elements are filled with 0 (0-padded), so for example using "ab" instead in the above would initialize the array to "ab\0\0\0". If you instead declare the array without specifying a length, then the length is calculated automatically to match the initializer string including a \0 terminator, so e.g. _char p[] = "abc";_ would make the buffer 4 bytes long ("abc\0").
@@benhetland576 the first part of your comment "even if it is longer than the buffer" is incorrect. The C standard states:
No initializer shall attempt to provide a value for an object not contained within the entity being initialized.
So which part of the C standard are you getting that information from?
Yes, of course for a declaration like this there is no extra byte and it will be exactly the size you entered.
The compiler doesn't know what you want to do with the array obviously. Could be anything.
@@lennymclennington Yes, you are right! I stand corrected, and the truncation rule only applies to the NUL terminator. My experience with that behavior goes way back to even pre-standard C, but such is -- at best -- merely an implementation-specific feature today.
@@benhetland576 what part of the standard adds an exception for a NUL terminator with respect to an initializer being unable to provide a value that is not contained within the object that it initializes? Or are you saying that a the behaviour you were describing only works with a NUL terminator and that in itself is an implementation detail? Sorry if I'm misinterpreting your comment.
It's been almost 3 months since uni started and our teacher is giving us approximately 15 exercises a week and many of the stuff needed for these exercises to be "bug free" are not included in his lectures... Videos like this are a life saver cause now I can do my about 3/10 exercises assigned this week without worrying about them having this exact issue (which I did face and wanted to find a way to overcome...), Also, even though it's a 12min video the information presented is well structured enough that it feels as way less time has elapsed and I really appreciate that.
@@freedomgoddess Naip, this is email is a mistake but it's far too inconvenient to change it now :P
I'll give you some advice when you get a lot of exercises to code in the university. Get yourself a case of cold beer and free up your nights. Your best ideas happen after the 10th beer. If you want to be good at programming, it needs to not just be what you're studying, but also your hobby. These exercises should be fun. And I'll tell you this much, as helpful as some of these videos are, you're cheating yourself of the experience of solving them yourself.
Again back to C. We need more C tutorial in this great channel.
Explains buffer overflow without saying buffer overflow great vid
3:57
3:57
08:00
8:00
Thank you so much. C tutorials are almost always a mess as they teach what is wrong. This is superb.
This video is great, because most of the times when we start learning C, we're just taught that a way to take user imput is by using scanf. If you're lucky enough or do your research, you'll probably discover that there are also other functions for that. And even though nowadays there's so much info on the internet, we're not taught so much about programming safely. So we keep learning, our programs become bigger and bigger, and eventually we find that our program crashes and we don't know why. And the bigger our project is, the harder it is to find every error.
And this is just considering we do C projects for ourselves. Imagine if this happens in a company project or stuff. Not only you'd have a potential error, you'd have fatal security issues in your system, that someone with enough knowledge could exploit.
And this doesn't only apply to C. C just doesn't provide any safety measures natively, but it can happen in other languages too! So in summary, it's always a good programming practice to introduce safety checks in your code, and if you're working in a company, this becomes a must.
Thank you very much! Im a beginner in c and my teacher told us to use the scanf() function to read in some user input. I asked him how to failcheck/make sure the user puts in the right things on scanf, but he told me you just dont. Right after that we talked about buffer overflows in another class lol. Nicely explained even for ones without great knowledge
for some reason some university professors are just bad and teach wrong. In my experience this correlates heavily with Ph.D/Full Professor = good, Master/Bachelor/Undergrad/Assist. Prof./T.A. Lecturer = bad. When signing up for lecture periods always check the academic credentials of the lecturer, and ratemyprofessor for reference. I don't think anyone gets a Ph.D in STEM without knowing shit.
@@tacokoneko it's not even in Uni, it's just regular school 😅
@@tacokoneko a lot of full professors get contemptuous--research first ,students ? Just a pain in the ass
My C programming teacher (CS Engineering school) didn't know about dynamic memory allocation, we ended up teaching him how to use the linux man
in case some of you are interested, I have C project with function that allows you to initialise a variable of any integer type safely, which means that you can’t write letters or any other symbols as input and it tracks type overflow. I can send it to you if you want. it also contains beautiful algorithm of defining min and max value of any integer type using bits operations that i could not find on the internet
Sounds really useful, may I have it ?
im very interested. could you send it to me if thats no problem, thanks ^^
if you ever upload it to the internet, would you be so kind to share it? :]
same here, very interested
POV: He doesn't have it
That should be a %11s, because scanf will take the input and add a \0 at the end of it. So, with your code, if I enter "123456789012" then it will save 13 characters and potentially override part of our password string.
test your code
Order and sanity got totally fucked up when he named the executable _main.o_ as if it were an object file... (Linux wouldn't care, but a make system will.)
That triggered me too
Can you please elaborate ?
@@KaleshwarVhKaleshwarVh He named the executable "main.o", but ".o" files are supposed to be "object" files, not binaries. Its like making a docx file, but changing the extension to "txt", sure you can use it if you open it correctly, but most people will see it and assume its a txt file.
It's 20. When declaring array you have to explicitly include the null byte. :) The string literal will be automatically null-terminated for you though.
I'm new to C/C++ and I feel lucky that I got to know this early on. Thank you so much for the heads up!
This was really good! I would love to see more C and program security videos.
Yes, extremely useful.
But I also recommend fgets().
It can also get strings with spaces, and it also avoids the scanf problem.
A thing to remember, though, is that this function also eats the line-changing character.
So I usually get rid of the last character and replace it with '\0'(null character)
It's funny that Python was made in C, yet only C has the buffer overflow issue 😂
If myinput has length of 12(including null character) so in scanf it must be %11s otherwise if input is more than 11 characters the next storagespace/variable will be empty(thus mypassword is empty string)
Actually, scanf does not write a null, so the input array would just have no null terminator. It would then read mypassword as part of the same string, you wouldn't be able to pass the check, but it's still incorrect
@@LoganDark4357 Actually %s or %11s does store a null into the receiving buffer, but %c or %11c behaves as you say. The %s variants would also stop at the first whitespace if it occurs before the length limit, so a space in the password wouldn't work correctly in this example.
@@benhetland576 If you watch the video, when he overflows the first buffer, scanf does not insert a null into the middle of the second buffer.
@@LoganDark4357 I have rewatched the video, but I could not find any case showing evidence that scanf did not add a \0 after it had filled its input into the buffer(s). Please provide a timestamp for where you saw it!
@@benhetland576 Looks like I was referring to 3:25, but upon closer inspection he didn't actually overflow it little enough for part of the original "helloguys" to show through. So I performed my own testing, and it actually seems you were correct.
0 logandark ~ cat kek.c
#include
int main() {
char first[20];
char second[20] = "hellothisisathing";
scanf("%s", first);
printf("%s
", first);
printf("%s
", second);
}
0 logandark ~ gcc kek.c -o kek
0 logandark ~ ./kek
hello
hello
hellothisisathing
0 logandark ~ ./kek
hellothshshdsjfjgasdkjfha
hellothshshdsjfjgasdkjfha
hellothisisathing
0 logandark ~ ./kek
sdfjhdslfjkdhsfdlksjhfdsljkfhdkljfhdslkfjds
sdfjhdslfjkdhsfdlksjhfdsljkfhdkljfhdslkfjds
jfhdslkfjds
By the way, you can also use "%ms" as format string for scanf(). In this case, scanf() will allocate a buffer that is large enough to hold the entire input string, including the \0. In turn, you are obliged to free() this buffer when you finish your business with it.
If you are telling us to use free(), does it mean by using *%ms* we are dynamically allocating memory to the buffer ?
@@sushilkatikia1384 Indeed. We are allocating memory in the heap, not the stack. So it won't be freed automatically.
This is interesting, but according to chatGPT it's not part of the C standard.
scanf is a family of library functions that aged poorly and should probably be used as little as possible in favor of (reading and) parsing a string yourself.
Not only do you run into issues like Rockstar accidentally building a quadric-time parser because sscanf builds a string stream first which (depending on the C library) always counts its length, but it's also unreliable without checking the return value.
dude thank you! i really like how you explained this! -i've watched 6 videos and didn't get it until yours lol thank you again !
This video is quite nice for me I'm interested in programming and I appreciate C language but appearantly writing C is easy but writing it correct is quite hard and writing C correct is not a popular video topic on youtube thx for the video!
12s is incorrect! You need 11s to make room for the null terminator!
Basically if he didn't have stack smashing detection a hacker can pretty much turn that scanf() into a goto that ignores scope. If this was an 8 bit or 16 bit cpu (which didn't have protected memory) he could execute basically any code he wanted to on the machine
That's why you should use getline. In fact the gcc documentation recommends never using a compiler that doesn't support that function
In c++ there is scanf_s for a safe output using an extra size Parameter. I don't know about c, but in c++, it's that.
For string input is better to just use fgets or getline functions
Coe Joelson, poderia explicar melhor ? To aprendendo C e nunca ouvi falar desses comandos kkk e nem sei se eles estão inclusos nas bibliotecas que uso tbm
@@vitorfontolan8517 Ambas as funções são inclusas com o header stdio.h, tem a documentação certinha de cada uma na internet, ou se vc tiver utilizando linux pode utilizar o comando man no terminal para saber mais sobre cada uma. Elas são específicas para pegar strings. Você consegue delimitar o número de caracteres que serão pegos com a fgets e a getline. Com a getline se você não sabe o número de caracteres que serão lidos de antemão, ela vai alocando memória dinamicamente e para de ler quando o usuário digitar o
só tem que se lembrar de usar o free depois.
@@vitorfontolan8517 fgets("%s",CharArray,stdin);
Very interesting.I tested the code. Success.
at 10:49 you write a lot of A's to the string but scanf only scans the first 12 right so your string is full with A's but isn't there a problem with the last byte that should be \x00?
Yes, the %12s indicates that it will read 12 characters from the input, but write 13 to the buffer (the 12 from the input, plus a null terminator). It fixes the specific issue seen here, but is still incorrect and can cause other buffer overflows
An upcoming version of Clang has -Wfortify-source, which will warn on this code:
warning: 'scanf' may overflow; destination buffer in argument 2 has size 12, but the corresponding specifier may require size 13
Alternatively, Address Sanitizer will catch this sort of thing
use scanf_s and strcmp_s to have more control with buffers xD so it doesn't go buffer overflow. strcmp_s is kinda nice. if somehow the buffer doesn't get null-terminated for some odd reason. that might be a bug in the future xD.. so it doesn't compare the whole bytes until it hits 0 somewhere else in the memory. or you can crash because you end up going into pages that have no permission to read xD
Just use *n* functions like snscanf, snprintf, strncmp!
Other functions like fgets are also safe from buffer overflows.
As someone in a class going over c this is quite interesting
Actually the output, that you are getting, is not an object file. It's an elf executable. You shouldn't use the .o extention for it as this might confuse some people if they are trying to follow a tutorial. By default gcc outputs a file named "a.out" and it is an elf executable. You should keep using that or at least not use "main.o" as the file name as that can slow your development down as you need to select the proper file starting with "main." and doing that isn't fast as normally you can just use tab until you see that "a.out" is the file, that you will execute, but when using the name "main.o" you have to type the whole file name for it to work.
it doesn't really fix that much, for example program shouldn't take password123AAAAAAAA and print SUCCESS - which it does in this case. How do you deal with this?
would this matter if we are scanning a single character?
The '
switched to cmd, it didn't let me execute the python script, I used open() and wrote the A's and \x00 to the file, then fed the file as input, but it still gives me Fail! ....
That's some quality information!
But what would happen if the password is exactly 12 characters, let's say "password1234", and someone input it "password12345", and since "%12s" would make the program only read the first 12 characters, wouldn't that be a success?
Yes it would be a success , but the our program won't be hacked but the problem is when u give a random input and overwrite the password buffer and still get a success.
Your explanation was really fantastic
What it's the proper way to get user console input? It seems to do super strange things when you try to pass a string type variable to an int type variable (infinite loops, printing some random stuff, or even crashing).
Very nice explanation i readed the entire stackoverflow post about this topic and i didn't understood a thing but thanks to you the entire scanf vulnerability is clear to me now
Btw why didn't you used scanf_s ?
Thank you very much, i was seching for a easy to use way to fix this.
i love the "i prefer linux for c" massively juxtaposed by it being wsl
I know this is a bit old but what editor is he using? nv? that's what he typed but i can't find an editor called nv. Maybe nv is an alias?
Very interesting video! Just a question, is there any specific reason for not using the “&” in the scanf() function? I normally see for example scanf(“%s”, &myinput)
value of array variable is address already, so you don't need to use &.
@@impossible2hn614 oh that makes sense, thanks for the explanation!
This is why you should always practice clean coding. Especially with languages like C/C++ or low-level programming. Garbage collection is very important.
Very informative. Appreciated!
Heyy can you do like some C++ projects and other tips, pls :)
Any specific topic ideas?
@@NeuralNine Maybe one using OOP, that could be nice cause im still a beginner in C++
Thanks for the great video; I've had this problem using scanf() trying to get an integer ie. Any key eg. ESC will cause a Segmentation Fault. Can you recommend something for this. Thanks in advance.
Does this problem happen only when we use scanf() to input string type data?
See C can be dangerous in uncareful hands, but also it's been around long enough that all the gotchas are well known.
it doesn't give any stack smashing error and I can't write over also, I'm using dev c++ do you know why?
What compiler, operating system, and STL do you have?
How is it called the
Wow did not know about this, thanks!
Good tutorial man! Question: how did you setup Ubuntu Vim like this in Windows?
My guess is Windows Subsystem (WSL 2.0) to run command line linux. From there the steps are all the same as terminal linux, you go to ~/.config/nvim/init.vim for neovim or ~/.vimrc for vim and edit the file.
He has some additional themes on his terminal too. Eyeballing it, it seems like he is using zsh with powerlevel10k for the fancy looking terminal. Follow the steps exactly on Arch Linux's wiki, it is quite simple if you just do it step by stel.
WSL, install Vim on it
Why not insert input like this, instead of python?
echo "text" | ./main.o
It's not so easy to put NULL character in bash (which is necessary to do strcmp)
@@elonmaks3117 I think there's an option echo -e which allows for backslash characters.
sorry u were right, just tested and echo -e with \0 works well!
Hey amn did you make that theme yourself our you downloaded?
If you make it yourself could you pass it i would love to make a vscide theme like that.
Why did you use python to print?
How we can set up scanf format to let users input strings with spaces too?
scanf("%[^
]") : I think this should work.
why does it overwrite the next array though?
what would've happened if the neighbouring array was also reading from stdin will buffer overflow still happen?
And what if the length isn’t known in compile time? Just use string concatenation to build the format string?
Use sizeof(CharrArray)
Make theses type of videos 🔥 i really enjoy this 🔥
Are BufferOverflow or similiar user input vulnerabilities present in Java or Python?
No. At least they're not as easy to create. Java and Python have way more going on behind the scenes. C is basically just universal assembly, which is why it doesn't really protect you from things like this.
good one... also can use fgets() function to be on safer aise
Great contents. Subbed.
Alright so if I create a character array like I have posted below. How many bytes should I allocate so it is safe/secure? I'm assuming the name is already 5 bytes but I also have to include another byte for the null byte so that would be 6 bytes. So would that mean the character array and the scanf function should be both 6 bytes?
char firstName[] = "Daniel";
Why such compression ???
this was a great tutorial
a=1
while (a
Error:
Undefined reference to "main"
Line 1. Undeclared variable "a"
Line 1. Expected semicolon.
Line 2. Invalid rvalue.
Line 3. Expected "{".
Line 3. Undefined reference to "print".
Line 3. Expected semicolon.
Line 4. Expected semicolon.
Process returned 1 (0x01)
20 GOTO 10
Yo what is the name of that text editor? It looks like vi but with colors to show what mode the user is on, does anyone know?
Compiling the executable with a dot o extension... yah that's totally normal.
Please create C programing language lessons tutorial..... 🌹🌹🌹🌹🌹💖💖💖💖💖
Grate video 💡
Very interesting video. Thank you
Alguna página donde pueda practicar o información detallada de programación en c
you can just use this:
#include
int main(){
char* myinput;
char* mypassword = "mypassword";
scanf("%s", myinput);
. . .
etc.
This never throws an error!
Him not clearing the terminal is bothering me uh.......
hello Neureline how can we create a management stock in python
may I know which font you use for the terminal?
I think it is JetBrains Mono Nerd Font
한국인 있음?? 그니까 %s로만 받고 프린트하면 스택오버플로우 되서 비밀번호 뱉으니까 %사이즈s하라는거??
I thought you show us how set limit via size_t variable type.
the safest alternative to scanf() is gets()
2023 : Does not work anymore I think compilers have fixed this issue.
Why did he not put the "&" sign in the scanf?
that's the 'address-of' operator
Because arrays work like pointers. They're just pointing to the first element in an array
There is scanf_s as well right?
If you are using MSVC, yes. But keep in mind that Microsoft's implementation of Annex K is non-standard and basically none of the other major implementations supports Annex K in any reliable form. That being said, unless you specifically need a lazy method to parse an input string, don't use scanf()/sscanf(). If all you want to do is read from stdin, just use fgets() or fread().
(I would link to sources here but google's webfilter is being a pain)
What editor are you using?
I guess, It's neovim
Can I use "gets()" to take input ?
Try it and see what ld has to say about that...
@@erikkonstas
its working good.
@@codewithsmoil4098 So you're a troll... great!
@@erikkonstas
Lmao
what I learned: don’t code in c, code in C++
it is great thing to know. tnx
I mean... it's in the name of the function... scanf, scan _formatted_. User input is the least formatted thing there is.🤨
Perhaps typing a little less frantically would produce fewer typos for your viewers to watch you correct.
It's C.
Write your own scanf and then you won't run into this problem.
Wow... what an easy fix! Why the fuck do so many tutorials not include that
We need tutorial for C programming
Very cool!
thanks a billion!
4:25 19 bytes