Apologies for the bad audio quality. Unfortunately there is nothing I can change about that now. I would suggest watching with subtitles turned on. At least the auto generated ones seem to be decent enough. My newest video already has a lot better audio mixing and going forward I will put a lot more care into making sure that it isn't all over the place. Thanks, and have a great day
I've written code in C for very small embedded systems (a few KB of ROM and 256 bytes of RAM). In these systems, there was no OS and no file system whatsoever. There was also no dynamic memory allocation, so no need for malloc and free. All I/O were done via the physical GPIO pins or a few peripherals (UART, ADC, etc). Needless to say, there was no runtime library, just a processor specific initialization code in assembly that zeroed out areas of the RAM and set the stack pointer before calling main(). Returning from main() would end up in an infinite loop which would cause the watchdog timer to reboot the processor. I was very surprised when I switched to more modern processor that had enough memory that including the runtime library allowed such luxuries as a serial debug console!
@@XeenimChoorch-nx8wx The end products were simple computer peripherals like keyboards and mice. This was 20 years ago or so. Nowadays, even those simple devices run on 32 bit processors (ARM CORTEX M0 cores or similar) at speeds much greater than my first PC did!
I'd always heard about demosceners excluding the CRT in their perpetual quest for smaller and smaller executable sizes, but I never actually found out how they did that. So your video was an instant click for me. Thanks for putting this all together.
This will be very helpful. I've been experimenting with writing my own OS-like code that can open EXE files and get them running, so I'd always like to prevent any extra code I didn't write myself from getting in the way of me following and understanding the execution flow. I've seen the "no default libs" option before but never knew how to get past all the missing function calls.
12:30 - 10:1 odds that your memset function's loop is being optimized by the compiler into ... a memset call. This is a common compiler optimization, because the compiler has a highly optimized variant that it can generate and include in your binary. A bit strange that it also requires an external symbol to link to.
Looping over an array and initializing its elements is a common pattern in some code bases, so the compiler may choose to translate all memset-equivalent code into `call memset`-which works great in every case except when you're implementing memset.
it's fairly easy in Linux just to make syscalls directly from C, so you can just remove the standard runtime library as a flag in gcc and directly make syscalls for things you need. nice thing is the man pages will tell you exactly how to do it for any architecture.
Do you also know how that is with MinGW-w64? And what I do know is that Rust also has a no std option and that it's quite straightforward. That specific part of Rust is what I yet have to explore though.
Most compilers have the ability to strip out unused code from statically linked libraries (certainly gcc does). Because of this, I usually don't try to optimize away from using a C runtime library entirely (as you mentioned, you give up some convenience, as well as security and portability). So I include it, and get the compiler and linker to remove the parts of the runtime that's not being used. The result is a drastically thinned down executable, still written with the convenience of using the C runtime, but only actually including the parts of it you actually used. Sort of like writing your own replacement functions, without having to actually do that :-) And it can easily get "hello world" style code down to a few KB.
Dunno, but the default GCC output for such a simple console Win app is about 10 kB, and TinyC compiler output is 3.5 kB. So for me I don't think I'd bother.
Depending on the code being compiled and the compiler chosen, some optimizations actually _increase_ the size of machine code. This can be caused by the added performance benefit of inlining short functions among other things, which increases the binary size to the benefit of execution performance.
I once wrote a hello world program on Windows that used the NtWriteFile system call to print, which is convenient since NTDLL is guaranteed to be loaded into any process. By walking the PEB you can find its address in your address space, and walk its exports to find NtWriteFile (to avoid relying on fixed syscall numbers) and can print text without any explicit DLL imports. Got that working in 1.50KB (could probably make it smaller if you messed with alignments and such)
@@declanmoore Interestingly in NTFS each file has 1KB entry in the journal, which is used to store the content of the file for really small files. Would be impressive to get to the size that can fit into the journal entry and display 0 space used on disk. Once your journal entry exceeds 1KB, it require at least single cluster on the filesystem, which usually means additional 4KB. Testing on a txt file I could get little over 700 bytes/characters before it gets assigned a cluster.
At the risk of being pedantic, the data[] array should be sized as DATA_SIZE + 1. Otherwise, if you read in your data file of size DATA_SIZE and it does not contain a '\0', then the puts(data) could try to read past the end of data[], potentially causing a seg fault. --That Guy
Yeah the example code used was just about good enough for what I was trying to show in the video. I knew the input file size so it just worked out haha
During the days of Microsoft C 6.0 this was the norm. It was well known that Microsoft assigned interns and other lightweights to write the C library code and the quality was garbage. Thus every ISV had their own highly optimized C library. 🤷🏻♂️
Instant subscribe. The world of these very low level innards is murky, and documentation tends to be sparse, so anything like this is most welcome to me. (I need to know these very low level innards, but that's a long story :-)
Maybe it was mentioned but there is a WIndows API for the memory setting in your example called ZeroMemory I think as I recall. So, it could replace the memset you implemented. Thus the executable would have been a little smaller as well.
The music at the beginning was a bit too loud. Otherwise, it was a great video Edit: As for the missing memset intrinsic function. Have you tried assembling the memset.asm file and linking it directly with the executable? It's located in: PathToVisualStudio\VC\Tools\MSVC\[version]\crt\src\x64\ folder Edit2: I don't think it's possible because memset.asm defines some external symbols that are provided by the c runtime. You would probably have to reimplement vcruntime_startup and some other files to get it to work
does the memset dilemma have something to do with microsoft porting stuff from ucrt to msvcrt but due to legacy reasons some of the functions has to remain in the ucrt?
@@maksymiliank5135 it's just a default library. the C standard doesn't say "you better compile with libc, or else!" yes, it's expert mode, because there's a lot of features they pack in along with it, and a lot of C standard libraries that won't work without it, or without a local equivalent. but in embedded or portable programs there can be very good reasons to avoid linking the CRT
The implementation closely resembles opting for direct register execution over loading a library onto the Arduino. It encounters the same challenges, where including a library results in consuming more SRAM and EEPROM space. Another approach involves bare-metal coding using languages like C, C++, or Rust, manipulating PIND or DDRD registers.
You seem to be using “CRT” and “C standard library” interchangeably. My understanding is that the CRT is the tiny part of the libc that gets statically linked to your program (even on Linux) and handles program initialization: setting up the stack, clearing BSS, etc. You could live without it in assembly but, unlike the rest of the libc, the CRT is absolutely needed in C programs.
Yeah, I was getting very confused as he mentioned fopen() which is libc. But I've never seen someone refer to libc as the "C runtime library". Personally when I hear "runtime", I think VMs, interpreters, etc.
This was very fascinating to watch and I haven't seen something like this since the 90's when people would do coding competitions for compiling programs with the smallest footprint and that was back when it was the norm for software to be anywhere from 200k to 2MBs especially if it was a windows 95 application. It's crazy how we live in the time of Terabyte drives now and I still manage to fill a drive up to the max mostly because of installing video games. I wish more companies and developers would be mindful about this sort of thing and try to reduce the size of their software but it almost seems like a lost art from a bygone era unless it's designed for an embedded device.
I've written a bare metal kernel in c and assembly. no libraries everything implemented from scratch, also malloc, free and more. Very interesting for learning.
This video is good, but there's two things to consider: 1) In your own implementation of puts(), the parameter declaration should be "const char*" instead of just "char*" or you might get a warning or error from the compiler when passing a string literal (actually the compiler might put the string literal in the non-constant data section which is suboptimal); 2) The memset() function is declared in the "string.h" standard header. This header file declares the intrinsics the compiler uses, so you can safely include it on most platforms. There's also ZeroMemory() and FillMemory() functions (as macros) in the Windows API.
Very nice, my question is: when some external static or dynamic libraries are linked they will force using CRT ? Im making a compiler with plans for C interop, not sure is it possible to detect if CRT is required for particular compilation.
Yes. If any static lib is pulling in the CRT the project has to link to the CRT as well. Or at least provide missing implementations of the unresolved external symbols that occur (so you then do not pull in the whole CRT but just provide the missing function). I am not sure of a good way to check for inclusion of the CRT but I suspect you could use something like dumpbin.exe and check the external symbol names. As for dynamic libs... I am not too sure as I have not used them much in personal projects (apart from kernel32).
This is much easier to do on Windows than it is on Linux. In fact, you don't want to do this on Linux. It would mean having to implement a whole host of basic features, including many syscalls, dynamic linking and DNS hostname lookups to name a few. Linux doesn't have an intermediate "kernel" library like Windows - glibc IS that library, in fact. This gets really hairy with dynamic linking too, because glibc implements that in a very specific way and you'd basically have to copy the implementation if you want important libraries, like OpenGL or Vulkan, to load properly in your project. IMO, this is probably one of the worst things about Linux - that there is such a hard dependency on the c-standard library.
@@knghtbrd While true, this can also lead to issues. Static linking glibc C doesn't also statically link all of its dependencies (it makes heavy use of dlopen). This would mean that your binaries could break if those other dependencies get updated. This is actually one of the main reasons why containers and Docker exists.
I get why you may want to do this for really tiny embedded platforms, but in any other situation, why would you want you code to *not* be portable? Disk space is cheap, but portability to other platforms (and/or compilers) is really valuable.
Thanks man this was awesome! btw I didn't find the music too loud, it is very soft so even if it reached the same loudness you voice always overpowered it. Maybe it can be a little lower and your voice a bit louder, but it wasn't obstructive. Because I even gave this a prelisten whiling driving home from work and I still heard it fine.
I still use fopen() to this day because back in the day when I used lower level functions to try to get rid of the C runtime, the speed of the read/write to file were so much slower. That means fopen() might in fact do much more to speed up random access reading. I suggest benchmarking before going through too much trouble. If you don't care about speed (like if you don't read complex files like compressed files), then you might save a lot of those kilobytes! But it is soooo slow to wait for your program to finish processing those files. Same goes with allocation: Getting rid of malloc() for some system level API might make your program slower as well, since malloc() does sub allocation much closer to your own program, which is quicker!
Wow! I really like this video. Rare opportunity to learn more advanced and low level stuff. How really it works on OScompiler level. I would like more videos like this. In the end you didn't show how to call Kernel API, did you? Am I missing sth? Audio was OK for me. I like your calm voice, this is the mood I like to program in the most.
Dani shows great programming knowledge here, I appreciate that. I just don't know if it's worth to avoid the standard libraries nowadays - the popular pragma is to not reinvent the wheel. Therefore I think this may be useful in very specific cases, like when you have extreme small amount of available memory, or you have some other serious constraints which you can't circumvent. But again, the content is great!
"don't reinvent the wheel" is a good way to be stuck working around using wooden wagon wheels, when you need a tractor tread for your tank, or sticky summer tires for your maserati. or stuck using a swiss army knife when you need a samurai sword, or a screwdriver that won't strip every 5th screw.
Please. Either lose the music or lower the volume to 1%. When the content is good and you don't have an indian accent you dont need any music. The content is great!
Ok, this is really cool! Great video. Quite involved indeed but many things are not strange to me, except the very specific compiler settings around security and things. However I did not expect native Windows libraries to be so compact in nature actually, because from what I know this is still the just Win32API layer, which I even use from C# sometimes. I thought the true kernel API is yet even different and also quite undocumented. I think maybe part of the huge binary size issue is that MSVC does not separate C and C++ code very well. From my experience in the past, linking C++ results in bigger binaries than C, but that was with different compilers like GCC, which did a better job, although still links to this same library as well. Maybe anyone else knows more about this?
I was wondering how to do this in WebAssembly without the WebAssembly C Runtime Gule code, and I was lazy to do my own research (I don't know even C language) so now I have most of the answer... thanks ✨🤗
I was always very curious how the more involved keygens and patcher tools get their binaries so ridiculously small despite creating visually pleasing ui and even include some music. I know theoretically that they are obviously shedding a lot of statically linked dependencies as well as using packers etc. But its very interesting to see what it actually takes in practice to get an application built and running like this.
I suspect that the array initializer causes the compiler to emit a function call to memset() somewhere below the level where the builtin would normally be applied. If so, that’s arguably a bug if not documented, or a misfeature if it is.
I kept getting this error when compiling main.c(7): error C2220: the following warning is treated as an error main.c(7): warning C4028: formal parameter 3 different from declaration I don't really understand the error, but I found the fix. replace SIZE_T with uintptr_t
@@blarghblargh From an ergonomic standpoint, the best would probably be to use a macro. Something like #define ret ExitProcess( #define urn +0) then you could do ret 20 urn; and it would use 20 as the exit code, or just ret urn; to return nothing which would default to 0.
@@adamp9553 true. I wasn't concerned about the performance with that comment. I was wondering whether you would ever use the result variable in that function. Would you maybe free all the left-over allocs and close file handles in there?
Once you realize all the crap under the hood is just more programs, it gets easier 🙂 still needlessly obscure though due to our modern societies infatuation with surface appearances
I love both Rust and Zig, but I also have my complaints. So I am making my own compiled LLVM-based language! It will have all the safety of Rust, but all the crazy features of Zig! Whilst also having a package manager for dependencies, but having those dependencies stored in such a way where it is fine if the package registry gets nuked.
Theoretically it should be possible to create standard c library for web assembly and compile and link c code to wasm, then use such file in web browser as replacement for JavaScript.
The amount of hoops people have to jump through just to get a non bloated executable is staggering IIRC there are also a lot of hoops to jump through with the math library
@@DaniCrunch I think lumberjackdreamer understood this. His question is, how about defining the varible WITHOUT initialisation (hence without ={0}) and provide the trailing zerobyte (in order to have a proper zero terminated string) with data[read]=0; (after checking that read > 0 ).
The linker ought to only statically link code that's actually used, maybe the Microsoft compiler just isn't as good as gcc in this respect! I'm working with an embedded riscv platform and I don't really feel it's worth it to omit the libc, having printf/fprintf/sprintf is far too useful. Even with these things, size reports my final executable (compiled and linked with gcc, using newlib for the libc) with various stdio/stdlib functions used is only 8k, including the bss and data sections.
(The "music" at the b/g made me jump: I thought it was somebody's ringtone. Anyway, it makes it harder to concentrate. If some people need it, they can launch their own media player themselves.)
Why people confuse CRT and stdlib? C runtime is responsible for calling main function, while C standard library contains functions from standard like malloc, printf and others.
@@user-sb5vt8iy5q it shouldn't even be 100 KB. Did he forget to add the striping symbols flag for the compilation? (it's -s on GCC, don't know what is it on cl.exe). It could easily reduce the size from 100 => 20
@@user-sb5vt8iy5q 97% waste is bloat. modern programmers often fail to understand this, hence we're running around with supercomputers struggling to launch our applications
@@user-sb5vt8iy5q It's not about how many Kb it is but that this program literally does almost nothing and uses 120 Kb (where 115 Kb doesn't contribute to what it does in any meaningful way).
Can you please adjust audio levels for the next video? The speech part is at less than < -19 db, and even at maximum volume in Windows, my speakers, and UA-cam, it's very difficult to hear.
Bruh it's Window, you linking again stable kernel32 lib which actually a bridge to underline NtKernel which what actual dealing with syscall, if microsoft change syscall (which is usally happen in window version), they just update NtKernel and on kernel32 entry pointing to them, code link against kernel32 will continue to work, that why Window is more backward compatible than Linux barebone syscall
Hah! I did this in 1984 because I was developing a "bare metal" PC co-processor board with no OS of its own. I'm a bit surprised at the code size you ended up with. 4k seems high.
I don’t think I utilised all compiler flags to trim out unused Windows functionality from the final executable but not too sure. I have seen some blog posts that go way harder on trying to make it as small as possible (mostly demoscene related) but I really just wanted to focus on removing the c lib and getting it to compile in this video. I think the smallest possible windows executable is around 300 bytes. But that won’t do anything haha
On linux there is trim program to delete all the unused functions which reduces size of executable. Is it actually useful to trim all unused crt functions?
The point of the c runtime is to facilitate portability... if the program needs to be lean and mean I think it is reasonable to use assembly language than using windows API.
Hmm... you should look into how to write NT Native programs. Early boot processes and services do not use C runtime . You need to use the DDK to compile these.
Really enjoyed this vídeo, Thanks! I would love to see something about how those .bat files work, perhaps in comparison to bash. I've never heard anyone speak about it
I was paying attention to the bat file. it was pretty simple to follow. %VARIABLE% is essentially equivalent to $VARIABLE in bash , otherwise it was pretty straight forward. I liked the use of pushd , I didn't know it was available for cmd.exe. That's way more switches than I usually pass to gcc though. Not going to loose sleep of a few hundred K these days.
just one tiny person's perspective but I wasn't able to watch this (and I would have loved to!) because the text is way too small. In my case I can only find time to watch stuff like this for a few minutes before I go to sleep and therefore it's too small to see on my phone. thanks anyway! hope to see more and I subscribed anyway
I see what I can do about that going forward. I can probably zoom in much more on the parts of the code I am talking about instead of trying to have the whole text editor on screen. I kind of liked it but I haven’t considered smaller screen sizes at all. Thanks for the feedback!
@@DaniCrunch when I make videos for Michigan TypeScript I usually aim for between 20 and 25 vertical lines on the screen. That seems to be enough. 30 works okay too if you're in Zen mode or some editor mode that's full screen with no other stuff taking up space
@@blarghblargh On Linux, its glibc and a crt lib that starts up the program. I suspect the crt name (a name I have never understood) is the root of the msvcrt lib name, and I think it could be possible that msvcrt is the startup and the libc combinined. Hard to confirm, since it is closed source. On linux, glibc is its own project. For windows mingw they just use the msvcrt lib.
I guess for web developers, this is the native equivalent of writing node scripts that is run outside of a browser 😁 But this is more just for the fun of making a tiny executable rather than a sensible optimisation strategy (outside of very special cases such as writing operating systems or embedded stuff), right? 😅 In a tiny simple app you can of course; as you demonstrate, save a bit of executable size; but surely you could also shrink it quite a lot by using a compiler that optimises away unused code in the runtime libraries? (I mean if even the build tools of the rather bloated world of javascript is somewhat able to do dead code elimination, surely a good C compiler is also able to do so?) Or is it just that this runtime is so fundamental that in practice you'd always either include it or dynamically link it since in a more substantial program the runtime size will be negligible anyway, so compilers don't bother pruning it? Also if your app uses so little of the standard runtime; would it even be much if any issue with having the correct dynamically linked library, wouldn't practically _any_ version support the same fopen and memory operations identically?
Rust need include its backtrace lib for nice error message, and other "safe" wrapper, you can get rid of those and write entirely unsafe code, which is almost as same as writing plant C
Apologies for the bad audio quality. Unfortunately there is nothing I can change about that now. I would suggest watching with subtitles turned on. At least the auto generated ones seem to be decent enough. My newest video already has a lot better audio mixing and going forward I will put a lot more care into making sure that it isn't all over the place. Thanks, and have a great day
Kind of fits you though
sounds clear to me, the background music is a bit tedious but can hear every word
I think this video is interesting enough that it doesn't need any background music.
Yes. And his way of speaking is pretty calm and clear.
Idk i like his taste of music it sets up a good atmosphere for me
At least the level of the background music is too high and this way distracts from the interesting stuff.
It’s a decent touch
I think the music is nice, the video is better with it than it would be without. Also, don't think it's loud
I've written code in C for very small embedded systems (a few KB of ROM and 256 bytes of RAM). In these systems, there was no OS and no file system whatsoever. There was also no dynamic memory allocation, so no need for malloc and free. All I/O were done via the physical GPIO pins or a few peripherals (UART, ADC, etc). Needless to say, there was no runtime library, just a processor specific initialization code in assembly that zeroed out areas of the RAM and set the stack pointer before calling main(). Returning from main() would end up in an infinite loop which would cause the watchdog timer to reboot the processor. I was very surprised when I switched to more modern processor that had enough memory that including the runtime library allowed such luxuries as a serial debug console!
That’s what I’m doing now! Good times
Did you work on a CIA spy drone disguised as a fly? Seems like a tiny device lol
@@XeenimChoorch-nx8wx The end products were simple computer peripherals like keyboards and mice. This was 20 years ago or so. Nowadays, even those simple devices run on 32 bit processors (ARM CORTEX M0 cores or similar) at speeds much greater than my first PC did!
Rare ocassion to learn about windows. Great job.
I do all my C++ programming directly against Win32.
I'd always heard about demosceners excluding the CRT in their perpetual quest for smaller and smaller executable sizes, but I never actually found out how they did that. So your video was an instant click for me. Thanks for putting this all together.
This will be very helpful. I've been experimenting with writing my own OS-like code that can open EXE files and get them running, so I'd always like to prevent any extra code I didn't write myself from getting in the way of me following and understanding the execution flow.
I've seen the "no default libs" option before but never knew how to get past all the missing function calls.
Glad it was of use
12:30 - 10:1 odds that your memset function's loop is being optimized by the compiler into ... a memset call. This is a common compiler optimization, because the compiler has a highly optimized variant that it can generate and include in your binary. A bit strange that it also requires an external symbol to link to.
Looping over an array and initializing its elements is a common pattern in some code bases, so the compiler may choose to translate all memset-equivalent code into `call memset`-which works great in every case except when you're implementing memset.
it's fairly easy in Linux just to make syscalls directly from C, so you can just remove the standard runtime library as a flag in gcc and directly make syscalls for things you need. nice thing is the man pages will tell you exactly how to do it for any architecture.
Do you also know how that is with MinGW-w64?
And what I do know is that Rust also has a no std option and that it's quite straightforward. That specific part of Rust is what I yet have to explore though.
Most compilers have the ability to strip out unused code from statically linked libraries (certainly gcc does). Because of this, I usually don't try to optimize away from using a C runtime library entirely (as you mentioned, you give up some convenience, as well as security and portability). So I include it, and get the compiler and linker to remove the parts of the runtime that's not being used. The result is a drastically thinned down executable, still written with the convenience of using the C runtime, but only actually including the parts of it you actually used. Sort of like writing your own replacement functions, without having to actually do that :-) And it can easily get "hello world" style code down to a few KB.
could you share the commands you use to achieve that, and the results in terms of KB?
gcc-strip
Dunno, but the default GCC output for such a simple console Win app is about 10 kB, and TinyC compiler output is 3.5 kB. So for me I don't think I'd bother.
That'll be because the CRT is linked dynamically, and so you end up with the dependency fun he mentioned at the start.
Wow, I wrote an article on that a decade ago. It was hell for me to get it run. So much effort you put on this video.Good job
Great video! I'm curious if you use -Os to optimize for size. Could you get the executable even smaller?
For this I used -O2. I usually want to put focus on speed but would be indeed interesting to see how small it would get when using size optimisations.
Depending on the code being compiled and the compiler chosen, some optimizations actually _increase_ the size of machine code. This can be caused by the added performance benefit of inlining short functions among other things, which increases the binary size to the benefit of execution performance.
I once wrote a hello world program on Windows that used the NtWriteFile system call to print, which is convenient since NTDLL is guaranteed to be loaded into any process. By walking the PEB you can find its address in your address space, and walk its exports to find NtWriteFile (to avoid relying on fixed syscall numbers) and can print text without any explicit DLL imports. Got that working in 1.50KB (could probably make it smaller if you messed with alignments and such)
@@declanmoore Interestingly in NTFS each file has 1KB entry in the journal, which is used to store the content of the file for really small files. Would be impressive to get to the size that can fit into the journal entry and display 0 space used on disk. Once your journal entry exceeds 1KB, it require at least single cluster on the filesystem, which usually means additional 4KB. Testing on a txt file I could get little over 700 bytes/characters before it gets assigned a cluster.
I tried the example from this video, /O1 generate 3 KB binary while /Os generate 4 KB binary.
At the risk of being pedantic, the data[] array should be sized as DATA_SIZE + 1. Otherwise, if you read in your data file of size DATA_SIZE and it does not contain a '\0', then the puts(data) could try to read past the end of data[], potentially causing a seg fault. --That Guy
Yeah the example code used was just about good enough for what I was trying to show in the video. I knew the input file size so it just worked out haha
During the days of Microsoft C 6.0 this was the norm. It was well known that Microsoft assigned interns and other lightweights to write the C library code and the quality was garbage. Thus every ISV had their own highly optimized C library. 🤷🏻♂️
Instant subscribe. The world of these very low level innards is murky, and documentation tends to be sparse, so anything like this is most welcome to me. (I need to know these very low level innards, but that's a long story :-)
Then you should definitely check out handmade hero. Thank me later :D
good video, but very quiet! youtube reports a volume of -18.9 dB, where regular videos are around -5 dB or so
Maybe it was mentioned but there is a WIndows API for the memory setting in your example called ZeroMemory I think as I recall. So, it could replace the memset you implemented. Thus the executable would have been a little smaller as well.
ZeroMemory sets the memory to zeros, while this implementation of memset sets it to an arbitrary value.
Couldn't you just avoid zeroing out the buffer and use the number of bytes read instead of strlen to find the end?
Yeah I think memory is 0 set by default in windows too
The music at the beginning was a bit too loud. Otherwise, it was a great video
Edit: As for the missing memset intrinsic function. Have you tried assembling the memset.asm file and linking it directly with the executable? It's located in: PathToVisualStudio\VC\Tools\MSVC\[version]\crt\src\x64\ folder
Edit2: I don't think it's possible because memset.asm defines some external symbols that are provided by the c runtime. You would probably have to reimplement vcruntime_startup and some other files to get it to work
I love the whole conversation you had with yourself 😂 thanks for leaving the traces though, super instructive! Thanks
That’s what programming is- conversations with oneself xD
does the memset dilemma have something to do with microsoft porting stuff from ucrt to msvcrt but due to legacy reasons some of the functions has to remain in the ucrt?
probably they didn't test for compiling without CRT which you are not supposed to do anyways, and that's the reason this bug was never fixed.
@@maksymiliank5135 it's just a default library. the C standard doesn't say "you better compile with libc, or else!"
yes, it's expert mode, because there's a lot of features they pack in along with it, and a lot of C standard libraries that won't work without it, or without a local equivalent. but in embedded or portable programs there can be very good reasons to avoid linking the CRT
Nice Video! It reminds me of when i am working on my operating system project, and havent fully implemented the CRT, just enough to get it going :)
The implementation closely resembles opting for direct register execution over loading a library onto the Arduino. It encounters the same challenges, where including a library results in consuming more SRAM and EEPROM space. Another approach involves bare-metal coding using languages like C, C++, or Rust, manipulating PIND or DDRD registers.
You seem to be using “CRT” and “C standard library” interchangeably. My understanding is that the CRT is the tiny part of the libc that gets statically linked to your program (even on Linux) and handles program initialization: setting up the stack, clearing BSS, etc. You could live without it in assembly but, unlike the rest of the libc, the CRT is absolutely needed in C programs.
Yeah, I was getting very confused as he mentioned fopen() which is libc. But I've never seen someone refer to libc as the "C runtime library". Personally when I hear "runtime", I think VMs, interpreters, etc.
You've now got me intrigued in doing something similar in Linux, which, i feel would be much easier than doing it on windows somehow
Done: (Admittedly, not as lean as his, but no c runtime on Linux)
// compile: gcc -nostdlib -nostartfiles -static -o hello hello.c
#define SYS_WRITE 1
#define SYS_EXIT 60
#define STDOUT 1
void _start() {
char msg[] = "Hello, World!
";
long len = sizeof(msg) - 1;
asm volatile (
"syscall"
: // No output operands
: "a" (SYS_WRITE), // Syscall number (rax)
"D" (STDOUT), // File descriptor (rdi)
"S" (msg), // Message to write (rsi)
"d" (len) // Message length (rdx)
);
asm volatile (
"syscall"
: // No output operands
: "a" (SYS_EXIT), // Syscall number (rax)
"D" (0) // Exit status (rdi)
);
asm volatile("hlt");
}
If anyone curious, I compile 'Example app with C Runtime' program on Linux (Arch BTW) using command 'gcc src/main.cpp'.
The result is 15K.
This was very fascinating to watch and I haven't seen something like this since the 90's when people would do coding competitions for compiling programs with the smallest footprint and that was back when it was the norm for software to be anywhere from 200k to 2MBs especially if it was a windows 95 application. It's crazy how we live in the time of Terabyte drives now and I still manage to fill a drive up to the max mostly because of installing video games. I wish more companies and developers would be mindful about this sort of thing and try to reduce the size of their software but it almost seems like a lost art from a bygone era unless it's designed for an embedded device.
Modern bloatware will include several hundred megabytes of frameworks just for a single function lol
I've written a bare metal kernel in c and assembly. no libraries everything implemented from scratch, also malloc, free and more. Very interesting for learning.
You have a git repo? I’d be interested to see it. Won’t copy it I promise lol
This video is good, but there's two things to consider: 1) In your own implementation of puts(), the parameter declaration should be "const char*" instead of just "char*" or you might get a warning or error from the compiler when passing a string literal (actually the compiler might put the string literal in the non-constant data section which is suboptimal); 2) The memset() function is declared in the "string.h" standard header. This header file declares the intrinsics the compiler uses, so you can safely include it on most platforms. There's also ZeroMemory() and FillMemory() functions (as macros) in the Windows API.
Very nice, my question is: when some external static or dynamic libraries are linked they will force using CRT ? Im making a compiler with plans for C interop, not sure is it possible to detect if CRT is required for particular compilation.
Yes. If any static lib is pulling in the CRT the project has to link to the CRT as well. Or at least provide missing implementations of the unresolved external symbols that occur (so you then do not pull in the whole CRT but just provide the missing function). I am not sure of a good way to check for inclusion of the CRT but I suspect you could use something like dumpbin.exe and check the external symbol names. As for dynamic libs... I am not too sure as I have not used them much in personal projects (apart from kernel32).
Excellent video, Dani. Thanks!
This is much easier to do on Windows than it is on Linux. In fact, you don't want to do this on Linux. It would mean having to implement a whole host of basic features, including many syscalls, dynamic linking and DNS hostname lookups to name a few. Linux doesn't have an intermediate "kernel" library like Windows - glibc IS that library, in fact. This gets really hairy with dynamic linking too, because glibc implements that in a very specific way and you'd basically have to copy the implementation if you want important libraries, like OpenGL or Vulkan, to load properly in your project. IMO, this is probably one of the worst things about Linux - that there is such a hard dependency on the c-standard library.
There are alternatives (much smaller) for glibc on Linux, but it only makes sense to use them if you're building a static binary.
This is just misinformation. You don't even need a C library to build a working application on Linux. I'd guess you've never heard the word syscall.
@@anon_y_mousse Did you even read my comment??? Clearly not.
@@knghtbrd While true, this can also lead to issues. Static linking glibc C doesn't also statically link all of its dependencies (it makes heavy use of dlopen). This would mean that your binaries could break if those other dependencies get updated. This is actually one of the main reasons why containers and Docker exists.
@@delicious_seabass The point is that syscalls don't require what you're claiming they do. They're extremely easy to use.
How to create one of the most unsafe program ever written :) Besides the joke, that is really great!
I get why you may want to do this for really tiny embedded platforms, but in any other situation, why would you want you code to *not* be portable? Disk space is cheap, but portability to other platforms (and/or compilers) is really valuable.
For better understanding how C compiling works ;-)
Thanks man this was awesome! btw I didn't find the music too loud, it is very soft so even if it reached the same loudness you voice always overpowered it. Maybe it can be a little lower and your voice a bit louder, but it wasn't obstructive. Because I even gave this a prelisten whiling driving home from work and I still heard it fine.
I still use fopen() to this day because back in the day when I used lower level functions to try to get rid of the C runtime, the speed of the read/write to file were so much slower. That means fopen() might in fact do much more to speed up random access reading. I suggest benchmarking before going through too much trouble. If you don't care about speed (like if you don't read complex files like compressed files), then you might save a lot of those kilobytes! But it is soooo slow to wait for your program to finish processing those files. Same goes with allocation: Getting rid of malloc() for some system level API might make your program slower as well, since malloc() does sub allocation much closer to your own program, which is quicker!
fopen() is just a wrapper around the system call to open()… you should be able to get the same performance just writing your own call
Wow! I really like this video. Rare opportunity to learn more advanced and low level stuff. How really it works on OScompiler level. I would like more videos like this. In the end you didn't show how to call Kernel API, did you? Am I missing sth? Audio was OK for me. I like your calm voice, this is the mood I like to program in the most.
Dani shows great programming knowledge here, I appreciate that. I just don't know if it's worth to avoid the standard libraries nowadays - the popular pragma is to not reinvent the wheel. Therefore I think this may be useful in very specific cases, like when you have extreme small amount of available memory, or you have some other serious constraints which you can't circumvent.
But again, the content is great!
Well, modern programs also aren't very good, so doing the opposite of what everyone is doing might be a good idea.
"don't reinvent the wheel" is a good way to be stuck working around using wooden wagon wheels, when you need a tractor tread for your tank, or sticky summer tires for your maserati.
or stuck using a swiss army knife when you need a samurai sword, or a screwdriver that won't strip every 5th screw.
Please. Either lose the music or lower the volume to 1%.
When the content is good and you don't have an indian accent you dont need any music.
The content is great!
Ok, this is really cool! Great video. Quite involved indeed but many things are not strange to me, except the very specific compiler settings around security and things.
However I did not expect native Windows libraries to be so compact in nature actually, because from what I know this is still the just Win32API layer, which I even use from C# sometimes. I thought the true kernel API is yet even different and also quite undocumented.
I think maybe part of the huge binary size issue is that MSVC does not separate C and C++ code very well. From my experience in the past, linking C++ results in bigger binaries than C, but that was with different compilers like GCC, which did a better job, although still links to this same library as well. Maybe anyone else knows more about this?
Great and informative content. Thank you for the time and effort invested 🙏
Would be cool to see the same walkthrough for a linux binary
there's similar channel which does this: Nir Lichtman
Thanks for the video. Please reduce the volume of the background music. Thankfully it ends @ ~2:25 but unfortunately, it returns later.
I was wondering how to do this in WebAssembly without the WebAssembly C Runtime Gule code, and I was lazy to do my own research (I don't know even C language) so now I have most of the answer... thanks ✨🤗
Love to see such things. Can you make it even smaller by implementing the funcionality using assembly language?
I was always very curious how the more involved keygens and patcher tools get their binaries so ridiculously small despite creating visually pleasing ui and even include some music. I know theoretically that they are obviously shedding a lot of statically linked dependencies as well as using packers etc. But its very interesting to see what it actually takes in practice to get an application built and running like this.
I suspect that the array initializer causes the compiler to emit a function call to memset() somewhere below the level where the builtin would normally be applied. If so, that’s arguably a bug if not documented, or a misfeature if it is.
I kept getting this error when compiling
main.c(7): error C2220: the following warning is treated as an error
main.c(7): warning C4028: formal parameter 3 different from declaration
I don't really understand the error, but I found the fix.
replace SIZE_T with uintptr_t
we need to see your code
@@plato4ek I downloaded the source from the description and ran it. I got the same error. How do I send my code?
Hi, thanks for this great video!
Can't you just write "ExitProcess(main());"?
Isn't your memset implementation way slower than normal memset?
A decent compiler will optimize temporary variable usage in the same way, with likely identical code to "ExitProcess(main())".
@@adamp9553 Yeah, but it'd still be better to just make `main` the entry point and call ExitProcess() instead of return.
@@anon_y_mousse arguable
@@blarghblargh From an ergonomic standpoint, the best would probably be to use a macro. Something like #define ret ExitProcess( #define urn +0) then you could do ret 20 urn; and it would use 20 as the exit code, or just ret urn; to return nothing which would default to 0.
@@adamp9553 true. I wasn't concerned about the performance with that comment. I was wondering whether you would ever use the result variable in that function. Would you maybe free all the left-over allocs and close file handles in there?
I will never not be amazed by programmers, this is the lind of code I wish I code make
Once you realize all the crap under the hood is just more programs, it gets easier 🙂 still needlessly obscure though due to our modern societies infatuation with surface appearances
@@XeenimChoorch-nx8wx I find it unfortunate it is like that
I love both Rust and Zig, but I also have my complaints. So I am making my own compiled LLVM-based language! It will have all the safety of Rust, but all the crazy features of Zig! Whilst also having a package manager for dependencies, but having those dependencies stored in such a way where it is fine if the package registry gets nuked.
Me too, what's the name of urs?
@@cyrilemeka6987 I am still thinking.
@@cyrilemeka6987 I am still deciding.
great video, very useful. Any ideas on linux equivalents for opening files and puts? debian eg.
I'd like to see this as well.
open() and write(), but without glibc or equivalent library providing a C wrapper you may need to write architecture-specific asm to call them.
Can you do the same on Linux?
Theoretically it should be possible to create standard c library for web assembly and compile and link c code to wasm, then use such file in web browser as replacement for JavaScript.
I think that's what emscripten does
The amount of hoops people have to jump through just to get a non bloated executable is staggering
IIRC there are also a lot of hoops to jump through with the math library
Do you really need the memset()?
You could just zero terminate the data.
data[read] = 0;
I do zero the array but the compiler is replacing that code with a call to memset during compilation.
@@DaniCrunch I think lumberjackdreamer understood this.
His question is, how about defining the varible WITHOUT initialisation (hence without ={0}) and provide the trailing zerobyte (in order to have a proper zero terminated string) with data[read]=0; (after checking that read > 0 ).
@@DaniCrunch
In the code you zero the whole array. That’s a waste of time, you only need to zero the last element of the array.
thanks for this, very informative
The linker ought to only statically link code that's actually used, maybe the Microsoft compiler just isn't as good as gcc in this respect! I'm working with an embedded riscv platform and I don't really feel it's worth it to omit the libc, having printf/fprintf/sprintf is far too useful. Even with these things, size reports my final executable (compiled and linked with gcc, using newlib for the libc) with various stdio/stdlib functions used is only 8k, including the bss and data sections.
I wonder how much LTO could do here. It should also remove the dead code from the statically linked library.
(The "music" at the b/g made me jump: I thought it was somebody's ringtone. Anyway, it makes it harder to concentrate. If some people need it, they can launch their own media player themselves.)
Why people confuse CRT and stdlib? C runtime is responsible for calling main function, while C standard library contains functions from standard like malloc, printf and others.
Fantastic! Thank you very much! 🍀
I wonder if there's a compiler option that optimizes the exe for your hardware and OS, the only one I know does something similar is march=native
Interesting how to compile it without all the bloat. I still wonder why its 4kb, and not more like 400 bytes.
You can optimize it further ;-)
Finally someone removing that useless bloated c runtime
100kb is bloated kek
@@user-sb5vt8iy5q it shouldn't even be 100 KB. Did he forget to add the striping symbols flag for the compilation? (it's -s on GCC, don't know what is it on cl.exe). It could easily reduce the size from 100 => 20
@@user-sb5vt8iy5q 97% waste is bloat.
modern programmers often fail to understand this, hence we're running around with supercomputers struggling to launch our applications
Google about tinyC compiler ;-)
@@user-sb5vt8iy5q It's not about how many Kb it is but that this program literally does almost nothing and uses 120 Kb (where 115 Kb doesn't contribute to what it does in any meaningful way).
A bit offtopic, but what is your keyboard? It sounds amazing.
Can you please adjust audio levels for the next video? The speech part is at less than < -19 db, and even at maximum volume in Windows, my speakers, and UA-cam, it's very difficult to hear.
And what will you do if the Linux Kernel Changes there API ???
nothing. it's a windows program
Bruh it's Window, you linking again stable kernel32 lib which actually a bridge to underline NtKernel which what actual dealing with syscall, if microsoft change syscall (which is usally happen in window version), they just update NtKernel and on kernel32 entry pointing to them, code link against kernel32 will continue to work, that why Window is more backward compatible than Linux barebone syscall
very interesting but i can’t hear you over the music at the start
Thanks for this video! I learn lot of new things!
Hah! I did this in 1984 because I was developing a "bare metal" PC co-processor board with no OS of its own. I'm a bit surprised at the code size you ended up with. 4k seems high.
I don’t think I utilised all compiler flags to trim out unused Windows functionality from the final executable but not too sure. I have seen some blog posts that go way harder on trying to make it as small as possible (mostly demoscene related) but I really just wanted to focus on removing the c lib and getting it to compile in this video. I think the smallest possible windows executable is around 300 bytes. But that won’t do anything haha
On linux there is trim program to delete all the unused functions which reduces size of executable. Is it actually useful to trim all unused crt functions?
The point of the c runtime is to facilitate portability... if the program needs to be lean and mean I think it is reasonable to use assembly language than using windows API.
Awesome material man! 👏
I just had to disable the runtime on my embedded application because their memcpy was stomping on mine! Good times
Hey, I'm not programming on (and not even using) Windows.
But the video was nice.
You seem to be a guru in the C language. Can you make more videos a your tricks in C. For example how to read a 2D array of 2000 chars X 1800 lines.
I am flattered but I am not a guru at all. I started doing C a year ago. Before that it was mostly C#
I really liked this video
but is that really practical
what do you think?
practicality depends on the goal. removing dependencies and shrinking binary size can be an important goal for some programs.
I just love you C mans.
Hmm... you should look into how to write NT Native programs. Early boot processes and services do not use C runtime . You need to use the DDK to compile these.
12:58 As some sort of fallback thing I reckon
thnx man! very interesting; but please 🙏 increase the audio level in your video, it is super hard to hear anything
Really enjoyed this vídeo, Thanks! I would love to see something about how those .bat files work, perhaps in comparison to bash. I've never heard anyone speak about it
I was paying attention to the bat file. it was pretty simple to follow. %VARIABLE% is essentially equivalent to $VARIABLE in bash , otherwise it was pretty straight forward. I liked the use of pushd , I didn't know it was available for cmd.exe. That's way more switches than I usually pass to gcc though. Not going to loose sleep of a few hundred K these days.
If the next video is about SDL2, we know you are building an GB emulator... Which is nice, ofcourse and very handy to test the assembler.
Certainly, it's only a way to use C properly.
Nice C video, although I'm not interested in Windows anymore.
Sadly my bad hearing can't differentiate between your calm voice and the music.
Love these kind of videos!
Music was louder than your voice and everything very low volume when played on my PC. To hard to hear what you were saying
just one tiny person's perspective but I wasn't able to watch this (and I would have loved to!) because the text is way too small. In my case I can only find time to watch stuff like this for a few minutes before I go to sleep and therefore it's too small to see on my phone. thanks anyway! hope to see more and I subscribed anyway
I see what I can do about that going forward. I can probably zoom in much more on the parts of the code I am talking about instead of trying to have the whole text editor on screen. I kind of liked it but I haven’t considered smaller screen sizes at all. Thanks for the feedback!
@@DaniCrunch when I make videos for Michigan TypeScript I usually aim for between 20 and 25 vertical lines on the screen. That seems to be enough. 30 works okay too if you're in Zen mode or some editor mode that's full screen with no other stuff taking up space
12:58, sounds like a bug
I guess if someone needed the application that small it would be worth it to just implement the functions out of the cstdlib
The background noice made it hard to listen. Moving on.
What you are calling the "C runtime library" is glibc on Linux. On Windows, its the msvcrt library.
yup. the C runtime library is indeed what those two libraries are abstractly called
@@blarghblargh On Linux, its glibc and a crt lib that starts up the program. I suspect the crt name (a name I have never understood) is the root of the msvcrt lib name, and I think it could be possible that msvcrt is the startup and the libc combinined. Hard to confirm, since it is closed source. On linux, glibc is its own project. For windows mingw they just use the msvcrt lib.
crt is C RunTime, msvcrt is MicroSoft Visual C RunTime @@scottfranco1962
@@scottfranco1962 counterpoint: musl libc
This is awesome 💯😎
the music is too loud
Voice too low, music too high.
I guess for web developers, this is the native equivalent of writing node scripts that is run outside of a browser 😁
But this is more just for the fun of making a tiny executable rather than a sensible optimisation strategy (outside of very special cases such as writing operating systems or embedded stuff), right? 😅
In a tiny simple app you can of course; as you demonstrate, save a bit of executable size; but surely you could also shrink it quite a lot by using a compiler that optimises away unused code in the runtime libraries? (I mean if even the build tools of the rather bloated world of javascript is somewhat able to do dead code elimination, surely a good C compiler is also able to do so?) Or is it just that this runtime is so fundamental that in practice you'd always either include it or dynamically link it since in a more substantial program the runtime size will be negligible anyway, so compilers don't bother pruning it? Also if your app uses so little of the standard runtime; would it even be much if any issue with having the correct dynamically linked library, wouldn't practically _any_ version support the same fopen and memory operations identically?
on linux the C runtime is always dynamically linked
126kb is pretty small compared to Rust's final executable
Rust need include its backtrace lib for nice error message, and other "safe" wrapper, you can get rid of those and write entirely unsafe code, which is almost as same as writing plant C
Nice information
Amazing video