This is such a masterful explanation. Super technical, as it must be. Fully enjoyed every syllable, and the graphics... well, a gif = 1000 words. Maybe not for everyone, but my god this is amazing. Thank you Mr. Garage. ;-)
I remember my first Windows 1 application. 4 pages of code and all I got was a window saying "Hello World!". But forgot the Exit button so had to reboot the computer to get out.
Those were the days! I wasn't inducted until Windows 2.0, when you were still allowed to ship programs with a free Windows runtime. Charles Petzold wrote the bible. Looking back on it, it reminds me of Cobol: you had to write a ton of code before you got any output. It was touted as "object oriented" due to the event driven messaging: quite the claim! 16 bits, local and global heaps. Near and far pointers. Cooperative multitasking. I don't miss any of that nonsense, but it's just replaced with a ton of new nonsense! Plus ca change.
@@nezbrun872 the windows communicating with messages can rightfully considered to be object orientation, just not in the way it evolved to be now. More like a primitive sort of Smalltalk or Objective-C.
People need to understand the quality and type of knowledge Dave is giving out here. This is the kind of information you would normally only ever get by walking into Dave's office, hoping he's not busy, and asking him "how does this work?" My last company called this "tribal knowledge" and every company goes to great lengths to try and extract it from old timers and inject it into newbies. in my experience, you just have to get an old timer to tell you before they retire. Dave is going way above and beyond by giving that kind of knowledge out for free for everyone to absorb
Reminds me of a time with my late friend Paul. We worked to develop a partition manager that enabled us to remove the C: partition from view, but it was accessible by Windows because it was in memory. Helped keep it secure. We used some of the same compilers, albeit older versions. I miss my genius friend muchly… have fun hacking the sky Paul! 😊
I remember doing a challenge to make the smallest possible ELF binary that runs and exits with a particular exit code; I was able to do some tricks by setting the "start of .text" field to get it to load a portion of the ELF header itself as if it were program code, and carefully bit-stuffing opcodes into unused header bytes (and selecting whatever other options would mean the bytes could be interpreted as assembly code without crashing). Doing this, I was able to get the program down to just 52 bytes -- exactly the size of the smallest possible ELF header.
Fun fact; the actual smallest is 45 bytes. If you truncate off the header, missing bytes are zeros. This requires cheating, and won't work on a processor that has working NX.
@@joshuahudson2170 though if you wanted the smallest executable, period, you'd probably have to look past ELF to the other type of Linux executable header: the shebang. (e.g. the file with the contents `#!/bin/true` is also a valid executable on linux)
It's very rare to find experts in their craft who also post quality youtube videos. For me you're right up there with the Bisqwit yt channel. Thank you for all the video series' Keep doing what you're doing, Dave :)
@@empresagabriel Bisqwit has programmed professionally, you can see that if you look at his CV on his site. He's not currently employed as a programmer/engineer though.
I love just listening to people that can explain things simply.. Please keep this up Dave... Oh, one other thing just found you LED series it time to binge watch...
There's another classic way to find KERNEL32's address without using the import table. At your program's entry point (usually in the CRT code, but since that's cut out it would be main in this example,) ESP will contain a pointer into KERNEL32 since the program's entrypoint is actually called from the function in KERNEL32 that creates the thread, so a return to that function is on top of the stack at the entrypoint (your main function.) So the first instruction of your program can store the initial value at ESP, then you need to round it down to the nearest 0x1000 bytes and search backwards (0x1000 bytes at a time) for the beginning of the DLL, by looking for valid PE Headers (using the DOS header signature, MZ.) Then you can traverse its export table. You can run into memory you don't have permission to read in this process so you'll want to set up an exception handler that basically does nothing using SEH. Manually going through the export table this way also has an interesting side effect: it bypasses Compatibility Shims, since those are usually returned by GetProcAddress
@@DavesGarage Also, I thought of another potential savings: you might be able to save space by searching by ordinal instead of storing all the string names of the imports you need. I don't know if the code would still be short enough to justify it, but might be worth a shot. You can store them as WORDs so they'd be significantly shorter than strings. Might even get away with BYTEs if the numbers are small enough.
@@tomysshadow No, ordinals are not stable between Windows versions or updates. It would work only on that single computer. Exported functions are identified mostly by name, sometimes by ordinal (rare). For that you need to look for that DLL's import library from all the SDKs there are (even the very old ones). You will learn that InitCommonContrls was once exported by ordinal (most other functions were never exported by ordinal only by name). Thus it needs to have the same ordinal forever in the future. Otherwise old exe built by the old SDK would stop working on new Windows.
@@MarekKnapek You are right - I thought I had seen packers do this to save space before, but I must've been mistaken. I compared KERNEL32 from XP and 10 and the ordinals aren't the same
Hi Dave, I know it's a long shot that you might read this. I think it would be an interesting video if you could talk about the file systems, like FAT12/16/32 & NFTS, and there history. Like how they work, file limitations, and if Windows or Linux is better when it comes to them.
I like the idea too! If I may be so bold - the one question that came to mind during this wonderful episode was... With all these optimizations for bytes, was the startup time changed? Call me an old optimist :)
@@zgelrevol9682 Still not going to be any sort of delay noticed, as all of these programs will fit into the L1 cache in their entirety, and thus will all execute in 8 bus clock cycles, and the biggest delay will be all the calls to the L2 cache for the DLL calls, likely cached there as they would be needed for other processes all the time, plus would have been called to load the EXE itself. The biggest delay would be the tens of thousands of CPU ticks that it takes for the glacial slow (to the CPU, which would context switch after the initial burst of calls as the process is now in a wait state) IO process to both graphics memory, and to the GDI instance to draw the window on screen. Probably all versions will have the window open before you have lifted your finger off the keyboard, and before the keyboard has sent the key lift code back to the south bridge keyboard controller itself. You might see a timing difference if you used a copy of Windows 98 (likely this code would run on it, though you would have to explicity use 16 bit code and change the linked libraries to ones that 98 worked with, and if you went further it would run on Win3.0 as well) on an original Pentium 25MHz, where you could literally see windows being drawn on screen, and Win98 would run very slowly.
If you want some real in-depth explanations of these filesystems, and how to take a closer look at their data structures (with the help of TSK, “The Sleuth Kit”), you should take a look at Brian Carrierʼs book on “File System Forensic Analysis”. While this title is now 17 years old, and its author, sadly, never published a revised edition, itʼs - apart from ReFS, which isnʼt covered at all - still the only real reference (that I know of, in book form) when it comes to “Windows filesystems”!
@@DavesGarage Thank you for the nice comment. Just a heads up, you've got a scammer on here commenting and asking people to telegram them, and there channel is using your picture. I've reported their channel to UA-cam, but hopefully, you might be able to take action as well.
When I was at school, we weren't supposed to have executable files on our user areas on the brand-spanking-new Windows NT 3.1 server they'd installed just that summer. Presumably they didn't want us messing anything up, or playing games while we were supposed to be learning how to do a mail merge or whatever (or teaching the IT teacher how to do it because he couldn't find the right page in his notes) We quickly figured out that just changing the extension to something innocuous wouldn't stop our precious .exe files being detected and auto-deleted, but lo and behold, using a hex editor to change that "MZ" to something else would keep them from being spotted. Seems weird looking back on it that they'd be so precious about us running executables but they'd given us access to all kinds of programming tools. Turbo C, Pascal, ASM, we could do a lot of damage with those! But Qbasic made it easy for everyone to start making their own games, and VB Classic made it even easier for us to access all the things we weren't supposed to. I guess we did all learn a lot about computers though, so maybe they had the last laugh!
Security is hard, and even harder today. Many are still naive about how easily it is to bypass many security principles and as some one once stated "the defender needs to find all the holes, the attacker needs only one".
Pure gold. I recall a senior programmer explaining that the "compiler will do all the optimization". Then a low level (ASM) programmer analyzed the time in each routine with a Program Performance Analyzer and recoded the choke points in assembly - the increased performance was substantial, as in "wow that is WAY faster". The original (high level) programmer insisted that now the code was not portable which given the embedded nature of the custom hardware was a lame excuse really and never a design requirement. Oh, and it took far less ROM space at a time when ROM was a major expense in the COGs of the product. To me this illuminated the strength of a full stack programmer. From bits to drivers all the way up to presentation layer (and now network) a true programmer should know when and where to apply their toolkit to make the best solution than can be achieved. I will note the same coding god was one of the first to truly master multicore/multiprocessing. His intimacy with the hardware made for design win after design win.
You really demystified this process for me! Seeing how that works in C is actually really encouraging. I'm learning C, and making a GUI window seemed really complex and daunting, but it actually seems pretty straightforward since you just have to use functions built into Windows and respond to system messages. I was worried that I would have to write the functions that do things like tracking the mouse position.
Its massive for those starting out to learn from this style. Pause is your friend, you might use the transcript to make a checklist of sorts. I agree with gower1973 - there is sooo many functions. Never let the learning side go!
I thought your first version was masterful, it's incredible how small you have made this. I love this small Win app project of yours. It's so interesting and it explores something that many of us are interested in but don't have the time or expertise to play with. You are a master Dave, it's really incredible what you do. Thank you for taking the time and uploading these for us.
There is an article on doing something similar for Linux elf executables, from the early 00s. The smallest "it runs" files was 46 bytes, the smallest "it doesn't break the rules" was 76. Part of the way it got the size so small (aside from not being a GUI application), was embedding the executable code itself inside the metadata header of the program. It couldn't _remove_ the header, as the elf-checker would refuse to run it, but it could take the stretches of 0s in the header and fill them with useful code. I wonder if a similar approach could shave a few dozen bytes off in this case. It does rely on a hexeditor to do, though.
For me, most important thing to be learn from ASM - are pointers, which are also used in C/C++ and even C#, other instructions can be directly mapped from other languages. Say in C/C++/C#: int a = 10, b = 20, c = a * b; In ASM it will be: MOV EAX, 10 // a MOV EBX, 20 // b MOV ECX, EAX // a MUL ECX, EBX // ecx == 10 * 20 EAX, EBX, ECX - are commonly used registers, fastest memory. Each operation takes 1 cycle (except MUL), ie if a processor is running on 4GHz, it will execute 4'000'000'000 cycles/operations per second, however if a processor is superscalar (which are commonly used today), then multiple instructions can be executed at once (if they are independent)
@@29Aios Thanks for the example, but is there any reason to store a in EAX then move EAX to ECX? why not directly move a to ECX and please excuse my ignorance in Assembly 😌
@@ahmad-murery This is an example, EAX value can be used later, but if not, it can be simplified as you said. There are many optimization tricks, like if you want to move 0 to EAX, it can be done as XOR EAX, EAX, and this 1 byte instruction, however MOV EAX, 0 is 5 bytes
@@29Aios Thanks Oleg it makes sense now, Coming from high-level programming languages makes these things look complicated. I always wanted to learn how to program my MSX1 machine but life is more complicated than assembly to me where I'm living. Thanks once again Oleg and have a nice day/evening 👍💯
If you ever want to do something like assembly programming, but simpler, to get a sort of simple idea of it, there's a fun game called Human Resource Machine. I enjoyed it a lot. It's not quite like assembler, but it's in the ballpark. If you have programming experience, it's easy to get started. There's challenges in the game to minimize for size or speed I think, which can be fun, or tedious, depending :)
I wish people like you still worked at Microsoft. I just recently installed Win10 as a second OS alongside Ubuntu. I allocated 40GB thinking that would be plenty and Windows took up 37GB by itself!!!
I smiled when you mentioned Assembler and optimization as I was an IBM Systems Programer on S/360 & S/370 specializing in optimization back in the day when we had to fit program code in 64K. I also dabbled in programs for DOS and Windows.
Me too. Abbreviated to Sysprog. Reversing a translate (TR) to re-organize memory was a cool trick. (Except we didn't use the word cool then). A 4mb upgrade to our 370 cost £2,000,000
@@PaulCotterCanada One year in 1980's the company I worked for here in New Zealand spent $145 million dollars for 3 S/370 with extra memory. Tell that to kids today and they don't believe you.
It could be amazing to hear about you talking about any emergencies at MS, like a personal view of what it was like to respond go big exploits being taken advantage of
I'll *absolutely* be linking this to others I meet who just really want to understand how some programs work in the Windows OS at the assembly level. This does a fantastic job explaining everything in a succinct way that can be searched for online later if necessary! It's also the video that made me realize that I'm starting to become comfortable reading asm, and boy does that thought make me feel really strange - but powerful ahahah~ Thanks for the great video Dave, and special thanks for making the original task manager program!~
Hi Dave ! I'm Russian programmer, and started programming 8086 since 90', but before, I was programming ZX80/Elbrus/ДВК 1,2,3/Robotron1715/БК0010/Нейрон/Other Then, at this time there was a confusion, that when a code is smaller then it should be faster, actually not - say ASM "loop label" is slower on one tick than "dec cx; jcxnz lable" on x86. It was 30 years ago, but I still remember that goal we were going to archive, and was coded in ASM.
Yes, but the idea was in code size, ie. if it's smaller than it's faster, and in most cases its true, but not always for x86, let me show in the code: 0: 66 b9 0a 00 mov cx,0xa 4: 66 31 c0 xor ax,ax 00000007 : 7: 66 40 inc ax 9: e2 fc loop 7 b: 90 nop c: 90 nop d: 66 b9 0a 00 mov cx,0xa 11: 66 31 c0 xor ax,ax 00000014 : 14: 66 40 inc ax 16: 66 49 dec cx 18: 75 fa jne 14 Both blocks do the same, increment *ax* register 10 times, but 1st block (7-9) 4 bytes only, second block (14-18) is 6 bytes. Let's findout timings per instruction. I've googled "8086 instructions timing", used second link because of 8086-Pentium cycles. DEC Decrement operand bytes 8088 186 286 386 486 Pentium r16 1? 3 3 2 2 1 1 UV Jcc Jump on condition code operand bytes 8088 186 286 386 486 Pentium near8 2 4/16 4/13 3/7+m 3/7+m 1/3 1 PV LOOP Loop control with CX counter operand bytes 8088 186 286 386 486 Pentium short 2 5/17 5/15 4/8+m 11+m 6/7 5/6 NP So, Loop instruction on 8088 - 286 is a bit faster, however on 386+ "dec cx; jne lable" is much faster, about 2-3 times
@@29Aios 386 is when I was cutting my teeth on asm. Before that it was basic and fortran. Padding is often overlooked code alignment and cache misses are big factors.
@@stolenlaptop You are right. Alignment is most important for data, so, say if you load a 32 bit register from mem [0x0], it will be 2x faster over loading from mem [0x01], because data is within 2 32/64 bit blocks, so processor needs to load 2 blocks instead of 1 aligned. But cache misses, what do you mean ?
Could tricks with overlapping header regions (like for ultra-tiny ELF files) be applied to EXE? If you search "tiniest ELF program", there's a really good article on the muppetlabs blog that gives a writeup of the technique as applied to ELF. Either way though, I'd love to see a deep dive into the internals of the EXE format, and what you'd be able to achieve in a from-scratch binary that doesn't rely on linker tools.
No. The headers of EXE files don't use offsets, but just follow right after each other. What you can do is overwrite the DOS header and the DOS stub program that is usually at the start. All windows cares about is the magic number (MZ's signature) and the offset to the PE header, which is at the end of the DOS header. Usually, this offset is set to 0x100 which is right after the stub program that tells you "this program cannot be run in ms-dos mode". You can change that to immediately follow the DOS header. Next you can set the file offset of the .code section to 0, and your complete file can be mapped as code. Set the entry address to 4 and the very first instruction can start right after the DOS signature. You just have to make sure the code jumps over the PE header. This will save you ehm... 248 bytes give or take. P
@@jbird4478 What do you mean? The MZ header does have an offset that indicates where the PE or NE or whatever header is located. In the demo scene is was common to make the MZ and PE header overlap. Just as an example, I just opened the famous ".kkrieger" by the german demo scene group "farbrausch". The first 16 bytes are 4D 5A 66 61 72 62 72 61 75 73 63 68 50 45 00 00 Which reads "MZfarbrauschPE\0\0" So the PE header offset is actually located in the PE header itself. The offset to the PE header is located in the file at 0x003C from the beginning of the file. So they cleverly shifted the PE header so the only relevant field in the MZ header for windows (the offset to the PE header) is located at a position in the PE header that is unused, not important or the actual offset value is acceptable at this point in the PE header. Actually, since the PE header is actually smaller than the MZ header, I think the offset to the PE header is actually located behind the PE header ^^.
@@Bunny99s Yes, I mentioned that offset. The PE headers themselves don't have offsets, unlike ELF files. In PE all the headers just follow after each other. In ELF files, the different headers are located with pointers, which is why you can make them overlap in some cases.
@@jbird4478 You're right, I glanced over it :) I guess I was kinda triggered by your first statement. Yes, the additional optional headers of the PE header do not have offsets. The PE header just contains the count of headers. Most products of the demo scene only contain the absolute minimum (usually 1). Though I think that the PE header and the optional header still fits almost within the MZ header :)
@@Bunny99s The optional header of a PE file is 240 bytes, and despite the name, it is not optional. You might get away with cutting the end from that header, because that describes things that aren't always necessary, but I don't know. According to the specs and the WinNT header it is just a single fixed structure. What is optional are the sections and their headers, but you'd need at least 1 of those.
This reminds me of a 362 byte MS-DOS device driver that I wrote to correct the MS-DOS date and time after an add-on hard disk manufacturer's bios had destroyed it. It was a rush and it was originally just under 2K. The manufacturer of the add-on said that was too big - I told them it didn't matter provided it was under 2K as that was the cluster size on their HDD and once the driver had done its work it quit with its end address equal to its start address so no memory was used. But they wanted it to "look smaller" so at 3 AM that day I had it down to 362 bytes :) 8086 assembler none of this fancy 32 bit stuff!
Wow, Dave - Happy New Year! Absolutely loved this episode! I love the asm deep dive and the offer to allow us to help. Have to agree with others, its masterful indeed! More formally however I think you touched on something to be loved - teamwork. By doing it the way, and at the level you do, it brings out the best in competition. Along the way you put an sweet sample for any watcher to get their hands dirty and learn. I didn't see the jag back to C coming, yet fully believe you will take asm across the line for the checkered flag. As you might guess., I cut my teeth on 6502 asm. Be well and rock on!
This video explains windows development on a meta level better than so many books and tutorials or I went through only when I was younger. Amazing work!
I remember, back in the day, there was at least one packer that would overlap a lot of the DOS Header with the PE Header (possibly the optional header too). Not sure how well that holds up with the newer windows OS versions (a lot of these packing methods start to sacrifice cross OS compatibility to eek out a few more bytes of file size savings)
As I used to mess around with asm and win32 API and spend time on small Internet forums about asm and RE and stuff back in 2000s I could've NEVER imagined that one day I'll be listening to one of the guys who used to actually develop and work with this complexity I was trying to understand back then.. Dude you are a legend
Small is beautiful! its efficient, its fast and light on resources too... not touched an assembler or machine code monitor since the 90's but you have given me some very warm fuzzy memories - thankyou
Just had to write a small poll(2) based server for some FreeBSD systems I work on and was curious how big it was. The normal compile came in at just over 31k. Setting CFLAGS to -Os (gcc) and running strip(1) I got it down to 22k. With some other tricks - including upx - I got it down to 10k. But this a program that does real work. It is interesting how much baggage Windows brings along even though Unix is so much older.
Great video! Brings back a lot of memories of similar optimizations I did coding Z-80, 6502 and 8086 projects over the past 4 1/2 decades. It makes me want to fire up masm just to play for a while.
@@williamdrum9899 Whenever diving into a new language/environment, I try to create MANY different tiny projects, each tackling tiny parts of what I know exists in many larger solutions that I've coded over the years. After a few months of doing this, I will then tackle larger projects that use all of the snippets I've written in the past. I'm always happy with the results. Don't give up... just learn to add more functionality to prior projects... eventually, you will have something of value.
@@wintercoder6687 That's kind of what I'm doing now in MS-DOS. I've got a printer that can do color, jump into substrings, loop, etc. All written in asm (it uses the ascii codes above 127 as control codes)
What a fantastic episode and blog! As someone who's learning to code, I never thought that so much would be going on under the hood just to link to the system libraries.
I once wrote a Windows program for a Production Ctrix farm in assembly. Well, when I say "production" it only had to work for a week or so. The company was moving from "Old system" to "New system". Most people accessed the system through Citrix published apps. We didn't want to just delete the app for the old system as we would then get swamped with support calls for "app doesn't work". Instead I wrote a small program that just popped up a windows and said "Remember all that training you got about the new system? How about you try the new app." - or something to that effect. On desktops we could just replace the old app with a batch file, but you can't have a batch file as a published app (or couldn't - I haven't used Citrix for some years now), so it needed to be an executable. Since software programming wasn't our thing I didn't have access to a (legal) development environment, so I was looking around for something that I could use for free to do this one time task. Stumbled across MASM and decided to write it in assembly. It worked and didn't crash the Citrix farm. After a week or two, once everyone was used to the new system we just deleted it. I was always surprised that I was able to get a working Win32 app written in assembly. It probably wasn't as neat and well formed as Dave's, but it did work for the few days we needed it. Thanks for the nostalgia from this old nerd 😃
This video took me down memory lane when I was still a student learning Assembly programming. We were just programming for the 8086 on DOS back then (1991). Thanks for this! :D
And with every dave's awesomeness video explaining how windows get drawn and resized it frustrates me anew that the company i started at has been working for years with a program that is not resizable and in a window, can only be maximized and then puts the program in a corner at the same size and fills the rest with white, and nobody has enough reach to do anything about it lol
Just wanted to say that I have a ton of respect for you, given that you are producing these videos now that you are retired. It's really quite neat remembering some of the stuff I learned back in the day. We are a diminishing breed, people who program in assembly, know the win32 api...
crikes! I wish I learned this stuff years ago (engineering, electronics and chemistry were fun and all, but...). I remember back in 1980 as a kid, wondering how to get the information for programming this kind of stuff - but, back then, "kids" simply weren't allowed to use computers in schools (mine would literally, expel a student if they touched any computer - really). Fortunately, I had a rich friend who's parents could afford a machine and he actually let me use it and showed me the ropes. Unfortunately, getting decent documentation on any computer programming from the library ( besides punched cards and teletype machines) was difficult or impossible (you needed to be in college to even get your hands on any juicy bits of information - forget getting anything from IBM or any big company at the time - trade secrets and all, especially as a teenager!). Glad to see the world has opened up since then.
Is it possible get in a job as assembly enginner nowadays?,asking because i dont want to memorize libraies that someone created ,i want to create my codes with C and Asm but looks like very hard to find job with basic ptog languages
Back in the 90s when I was a young hacker want to be, I would open exe files in dos edit and thought how does anyone program? One would need a crazy keyboard. I did take note of the starting 2 bytes were Mz. Later in my hacking career I downloaded the virus workshop. Before you could use the program, it would prompt you to enter the first 2 byes of an exe file to test if you know what you were doing. I remember being so excited that I knew what to enter. Good times!
I used to love writing assembly (for x486 under MS-DOS) for writing code for my own purposes. I rarely had much opportunity to use it at work, except occasionally embedded in a C function. Thanks for shouting out Richard Feynman, too... he's one of my all time heroes.
Thanks Dave. This was a cool exercise. Brought me back to the 1980s. I worked writing programs for neuroscience. Trying to get them to run in 64K. MS-DOS, Early windows and PDP-11s . I just created a java program that is 50mg. I wonder if I can get it to 45mg. :)
It seems to me you could save several bytes by making the window struct part of the program and simply loading constant parameters as part of the code; thus, you don't need to load them into registers and store them into the struct...
@@Crecross I mean... Dave literally says to give the other options a look and try and make it smaller. I'm not good enough with Assembly to contribute much, but the above approach *would* shave a handful more bytes.
Back in high school I was messing around with Flat Assembler and managed to make a functional yet tiny 1.5 KB Windows application that had all sections (code, data, and import tables) merged into one. Good to know it's possible to go even lower than that.
Throwback to a time when every problem wasn't solved by throwing in another hundred megabytes (or so) of additional npm modules, and some custom tracking js on top of that for good measure.
My best efforts at hacking were to build a Batch file that converts a text file of hex characters (eg, "A0B1C2", ignoring spaces and newlines) into a binary file. I used this to create executables on systems where the creation of executables were somehow prevented. I challenge Dave to do that.
Like much of Windows 11, there are some nice features, but man are they poorly/lazily coded. We all knew code would become less optimised as systems became more powerful, but such basic things should not be lagging on hardware that could only be dreamt of back in the 90s and even 00s.
Well, it occupies 644 bytes. :-) One of my most favorite things is, in Linux, if you look at the man pages for /bin/true and /bin/false, the description is: "Does nothing, successfully." and "Does nothing, unsuccessfully."
Fun little history fact. Word Perfect was written in ASM, and this is why it was so fast. Up to Word Perfect 7 for Windows. But when Corel bought Word Perfect, they did not do their due dillegence and had a horrid supprise when they realized that Word Perfect 6 for windows 3.1 was actually ALSO written in Assembly. So, Word Perfect 7 for Windows 95 was a complete re-write with some 16bit components because, well, they couldn't re-write it all in two years!
Which is why my sister loved it, as it was fast enough on a 286/16 PC to keep up with her typing speed. Otherwise with other word processing she would regularly out type the keyboard buffer and lose characters. Only with a 100MHz Pentium and Win98 did Word finally get fast enough that she could not out type the buffer and she finally, after a decade or more of WP, changed to Word, as the company did not want to pay the cost of Corel's upgrade.
@@nickwallette6201 Because they had to re-write the whole thing, V7 for windows 95 was super buggy. It destroyed the reputation and ultimately led to business issues within Corel. The decision to make WP6 for Win3.1 in assembly was the first mistake, the second mistake is buying the company without due diligence about the state of the code base (or at least knowing about it and going forward). The third was veering away from their core business, which was Graphics stuff. They were a direct competitor to Adobe for years, and would probably have been to this day if they played their cards right. Adobe has a quasi-monopoly now on graphics software, Canva filled the gap that Corel had.
This tickled parts of my brain that have gone unmolested for too long. Coded for decades at a bank, a car, and aerospace companies. Sadly, I all too quickly moved into management and lost the hands-on experience. Watching this made me feel young again!
A always find that modern programs really need the optimization, speed and size that Assembly can provide. If you want to have some fun, just take a look at the .Kkrieger demo game. It's a full 3D shooter that fits under 97kb and runs on Windows.
I went deep down this sort of rabbit hole back in the 2000s. Imagine how much easier it was before ASLR... Malware and binary protection systems alike used techniques like that of removing and/or messing up the import tables so that people could not as easily depack executables and modify out protection. Lots of reversing tools of the era: LordPE, ImportREC, etc. that helped in that regard.
fantastically interesting and educational. I think I once used a Petzold book to do something basic, and sorta hated WIN32 type programming, but I still found this interesting. I guess technical videos like this will never get a ton of views, as the audience is small. But you deserve more views!
For whatever it's worth, I made a console hello world in C and compiled with GCC (windows) which even with flags got stuck around 21k. I switched to the TCC compiler and used it's -O2 flag and the same source compiled to 2k. I'm sure a real programmer (not me) could get that down into bytes, but I feel like I got something done today!
An enjoyable episode for the incurable tinkerer :) Nice to hear the tip of the hat to Mr Steve Gibson. Would actually be a fun episode to watch you guys talk low level assembly coding tricks.
@@DavesGarage Yes indeed, I imagine the same :) As a fellow developer working with backed code for the energy sector mostly working with C# and various other languages. I truly appreciate this kind of content where we go back to the roots of how things actually work behind the scenes. Newer generations of developers can easily forget the levels of abstractions that we just take for granted in our daily work. I feel the same about the demoscene in general. When coding can be as much appreciated as a form of art, and not just as a means to a business end :)
@@29Aios Exactly - as a kid I did primitive Z80 and 8086 dev and grasping even just the basic hardware focused concepts (interrupts, stack management, pointers, and the like) is huge, just to respect what is going on under the hood
@@stevepoythress4678 +1 ! Ah, interrupts :) They are totally disappeared today, so also LGDT/LIDT ASM instructions, they were used to remap 0000:0000 memory space to any 24-bit address of the x286, 16Mb space. Have a story about interrupts. In 90' I've created a resident prog to make memory snapshots of any program, named S&R, and then restore it by demand, which primary solved any floppy protection (save state when floppy is already checked, then restore file on any PC), but it was widely used in my local area to play games. So fellows saved games prior important event, and in case of failure just restored it. You could press F11/F12 to save/load snapshot anytime, and LIDT instruction just helped to intercept the keyboard interrupt 0x9 without boring that it has been already intercepted by someone else. Btw, in case of Mem386 utility, which was working in protected mode, and didn't allow any other program to execute high-privileged instructions like LIDT/LGDT, I still could intercept the interrupt by intercepting BIOS Clock once per second, and read the 60h port to know which key is pressed, but anyway, I couldn't read/write the protected memory. So only real mode games could be saved 😒
I share that view. I would've expected things to be more optimized by now, instead it keeps getting worse in certain cases. I think the only device that really encourages innovation by devs and also shows the power of optimization is the Nintendo Switch. It's processing power isn't really great by modern standards, yet games like BOTW still have something to show against more powerful systems
@@MartinDerTolle Yeah, it's only natural though - as long as there isn't any *need* to optimise we can spend the same time making new features - i.e. things that sell. At the end of the day, it's all about making money, not making the best (optimised) product.
What I'm trying to learn is optimization of how to write software which takes advantage of different opcodes of the CPU generation and stepping to get the most out of the h/w. e.g. SIMD/AVX instructions. With libraries, you can call a function and have the function determine the h/w it's running on and chose which version of the function optimized for the local h/w transparently. But in the main block of code CPU optimizations are manual and have to be accounted for there.
First of all, absolutely wonderful and impressive work, that executable is SMALL! In another direction though, I noticed that tiny.exe's memory usage is rather large. ~1.3k memory (private working set), ~10k working set, but more curiously ~131K Commit size! I would be curious to see the same exercice but in the direction of reducing the runtime memory usage instead. With nowaday's ever so popular Electron apps eating up absolutely absurd amount of memory at runtime, I'd love to see a trend of people pushing for lower runtime RAM footprint =)
I learnt to program Windows with Charles Petzold’s amazing ‘Programming’ series of books. When I moved from Win16 to Win32 his books made the transition so easy.
I tried every book back in the day to code to traditional Windows using the books available, and I could not make any sense of it, until I got this book: " "Introduction to Mfc Programming With Visual C++" by Richard M. Jones (2000). This book did the trick. I just code for fun and now I'm into playing chess more rather than programming, having done pretty much everything I wanted to do in programming, primarily using C#, including Azure cloud and web functions and writing a chess program. At some point I might learn assembly language just for fun, after I get the GM title, lol.
Hello! In my opinion - for the cheap MCU like STM8S writing in assembly have a sense - because the Cosmic and Raisonanse C compilers costs 1000$. But... when you have libraries and pure architecture - maybe, it may have sense for the PC or a server. The TCP, TLS protocol must be written in aseembly - for the best performance. It is also actual for RDBMS .
Option that is most possible to happen (WM_PAINT in this example) must be the first in decision chain. It will save CPU cycles. More common rule is to sort options by possibility / frequency of their appearance.
One peephole optimization that comes to mind is to skip the line to fetch the virtual address of the PEB, from the PEB. Just index FS directly to get the loader struct pointer. That shaves an entire instruction from the code! Additionally, all those push/call pairs are embedded structural information. Better to strip that stuff out and do a loop over the hashes. Use an escape value to switch the dll to look for.
FS:0 points at the TEB, which has a pointer to itself (0x18) and to the PEB (0x30). You can't directly offset from FS to the Ldr pointer, only to fields in the TEB.
Has anyone heard of, or used, a project from around 2000 called SpAsm or SpAsm32 - its a self compiling Windows Assembler with a built in Editor, where the programs sources are stored at the end of the executable - yes its a _single file EXE+SRC_
This is such a masterful explanation. Super technical, as it must be. Fully enjoyed every syllable, and the graphics... well, a gif = 1000 words.
Maybe not for everyone, but my god this is amazing. Thank you Mr. Garage. ;-)
Mr. Garage never disappoints with great programming content.
I remember my first Windows 1 application. 4 pages of code and all I got was a window saying "Hello World!". But forgot the Exit button so had to reboot the computer to get out.
Those were the days! I wasn't inducted until Windows 2.0, when you were still allowed to ship programs with a free Windows runtime. Charles Petzold wrote the bible.
Looking back on it, it reminds me of Cobol: you had to write a ton of code before you got any output.
It was touted as "object oriented" due to the event driven messaging: quite the claim!
16 bits, local and global heaps. Near and far pointers. Cooperative multitasking.
I don't miss any of that nonsense, but it's just replaced with a ton of new nonsense! Plus ca change.
@@nezbrun872 the windows communicating with messages can rightfully considered to be object orientation, just not in the way it evolved to be now. More like a primitive sort of Smalltalk or Objective-C.
Ohh the memories!
😂🤣Nice one👍 but this one is to do with chip assembler though, low level core stuff, geniuses these guys is👍😎✌.
Those are the golden era.
People need to understand the quality and type of knowledge Dave is giving out here. This is the kind of information you would normally only ever get by walking into Dave's office, hoping he's not busy, and asking him "how does this work?" My last company called this "tribal knowledge" and every company goes to great lengths to try and extract it from old timers and inject it into newbies. in my experience, you just have to get an old timer to tell you before they retire.
Dave is going way above and beyond by giving that kind of knowledge out for free for everyone to absorb
Thanks!
Reminds me of a time with my late friend Paul. We worked to develop a partition manager that enabled us to remove the C: partition from view, but it was accessible by Windows because it was in memory. Helped keep it secure. We used some of the same compilers, albeit older versions. I miss my genius friend muchly… have fun hacking the sky Paul! 😊
🤗🤗🤗
I remember doing a challenge to make the smallest possible ELF binary that runs and exits with a particular exit code; I was able to do some tricks by setting the "start of .text" field to get it to load a portion of the ELF header itself as if it were program code, and carefully bit-stuffing opcodes into unused header bytes (and selecting whatever other options would mean the bytes could be interpreted as assembly code without crashing). Doing this, I was able to get the program down to just 52 bytes -- exactly the size of the smallest possible ELF header.
Fun fact; the actual smallest is 45 bytes. If you truncate off the header, missing bytes are zeros. This requires cheating, and won't work on a processor that has working NX.
@@joshuahudson2170 though if you wanted the smallest executable, period, you'd probably have to look past ELF to the other type of Linux executable header: the shebang.
(e.g. the file with the contents `#!/bin/true` is also a valid executable on linux)
It's very rare to find experts in their craft who also post quality youtube videos.
For me you're right up there with the Bisqwit yt channel. Thank you for all the video series'
Keep doing what you're doing, Dave :)
Thanks for that!
Bisqwit is a very rare bird.
It's crazy that Bisqwit is such an amazing programmer without never programming professionally.
@@empresagabriel Bisqwit has programmed professionally, you can see that if you look at his CV on his site. He's not currently employed as a programmer/engineer though.
I love just listening to people that can explain things simply..
Please keep this up Dave...
Oh, one other thing just found you LED series it time to binge watch...
developers in 2023: "I'm gonna bundle a full web browser with my application!"
There's another classic way to find KERNEL32's address without using the import table. At your program's entry point (usually in the CRT code, but since that's cut out it would be main in this example,) ESP will contain a pointer into KERNEL32 since the program's entrypoint is actually called from the function in KERNEL32 that creates the thread, so a return to that function is on top of the stack at the entrypoint (your main function.) So the first instruction of your program can store the initial value at ESP, then you need to round it down to the nearest 0x1000 bytes and search backwards (0x1000 bytes at a time) for the beginning of the DLL, by looking for valid PE Headers (using the DOS header signature, MZ.) Then you can traverse its export table.
You can run into memory you don't have permission to read in this process so you'll want to set up an exception handler that basically does nothing using SEH.
Manually going through the export table this way also has an interesting side effect: it bypasses Compatibility Shims, since those are usually returned by GetProcAddress
Thanks, I'll give that a shot!
@@DavesGarage Also, I thought of another potential savings: you might be able to save space by searching by ordinal instead of storing all the string names of the imports you need. I don't know if the code would still be short enough to justify it, but might be worth a shot. You can store them as WORDs so they'd be significantly shorter than strings. Might even get away with BYTEs if the numbers are small enough.
@@tomysshadow No, ordinals are not stable between Windows versions or updates. It would work only on that single computer. Exported functions are identified mostly by name, sometimes by ordinal (rare). For that you need to look for that DLL's import library from all the SDKs there are (even the very old ones). You will learn that InitCommonContrls was once exported by ordinal (most other functions were never exported by ordinal only by name). Thus it needs to have the same ordinal forever in the future. Otherwise old exe built by the old SDK would stop working on new Windows.
@@MarekKnapek You are right - I thought I had seen packers do this to save space before, but I must've been mistaken. I compared KERNEL32 from XP and 10 and the ordinals aren't the same
Hi Dave,
I know it's a long shot that you might read this. I think it would be an interesting video if you could talk about the file systems, like FAT12/16/32 & NFTS, and there history. Like how they work, file limitations, and if Windows or Linux is better when it comes to them.
Thanks for the suggestion, I'll put it on my TODO list!
I like the idea too! If I may be so bold - the one question that came to mind during this wonderful episode was... With all these optimizations for bytes, was the startup time changed? Call me an old optimist :)
@@zgelrevol9682 Still not going to be any sort of delay noticed, as all of these programs will fit into the L1 cache in their entirety, and thus will all execute in 8 bus clock cycles, and the biggest delay will be all the calls to the L2 cache for the DLL calls, likely cached there as they would be needed for other processes all the time, plus would have been called to load the EXE itself. The biggest delay would be the tens of thousands of CPU ticks that it takes for the glacial slow (to the CPU, which would context switch after the initial burst of calls as the process is now in a wait state) IO process to both graphics memory, and to the GDI instance to draw the window on screen.
Probably all versions will have the window open before you have lifted your finger off the keyboard, and before the keyboard has sent the key lift code back to the south bridge keyboard controller itself. You might see a timing difference if you used a copy of Windows 98 (likely this code would run on it, though you would have to explicity use 16 bit code and change the linked libraries to ones that 98 worked with, and if you went further it would run on Win3.0 as well) on an original Pentium 25MHz, where you could literally see windows being drawn on screen, and Win98 would run very slowly.
If you want some real in-depth explanations of these filesystems, and how to take a closer look at their data structures (with the help of TSK, “The Sleuth Kit”), you should take a look at Brian Carrierʼs book on “File System Forensic Analysis”.
While this title is now 17 years old, and its author, sadly, never published a revised edition, itʼs - apart from ReFS, which isnʼt covered at all - still the only real reference (that I know of, in book form) when it comes to “Windows filesystems”!
@@DavesGarage Thank you for the nice comment. Just a heads up, you've got a scammer on here commenting and asking people to telegram them, and there channel is using your picture. I've reported their channel to UA-cam, but hopefully, you might be able to take action as well.
Love the Steve Gibson name drop!
More of a tip of the hat than a name drop really.
@@SteveMasonCanada True I think a lot of the younger viewer wouldn't know about Steve.
When I was at school, we weren't supposed to have executable files on our user areas on the brand-spanking-new Windows NT 3.1 server they'd installed just that summer. Presumably they didn't want us messing anything up, or playing games while we were supposed to be learning how to do a mail merge or whatever (or teaching the IT teacher how to do it because he couldn't find the right page in his notes)
We quickly figured out that just changing the extension to something innocuous wouldn't stop our precious .exe files being detected and auto-deleted, but lo and behold, using a hex editor to change that "MZ" to something else would keep them from being spotted.
Seems weird looking back on it that they'd be so precious about us running executables but they'd given us access to all kinds of programming tools. Turbo C, Pascal, ASM, we could do a lot of damage with those! But Qbasic made it easy for everyone to start making their own games, and VB Classic made it even easier for us to access all the things we weren't supposed to.
I guess we did all learn a lot about computers though, so maybe they had the last laugh!
the easiest way to teach someone something is to tell them to not do it, and let them teach themselves to spite you
Security is hard, and even harder today.
Many are still naive about how easily it is to bypass many security principles and as some one once stated "the defender needs to find all the holes, the attacker needs only one".
Pure gold. I recall a senior programmer explaining that the "compiler will do all the optimization". Then a low level (ASM) programmer analyzed the time in each routine with a Program Performance Analyzer and recoded the choke points in assembly - the increased performance was substantial, as in "wow that is WAY faster". The original (high level) programmer insisted that now the code was not portable which given the embedded nature of the custom hardware was a lame excuse really and never a design requirement. Oh, and it took far less ROM space at a time when ROM was a major expense in the COGs of the product. To me this illuminated the strength of a full stack programmer. From bits to drivers all the way up to presentation layer (and now network) a true programmer should know when and where to apply their toolkit to make the best solution than can be achieved. I will note the same coding god was one of the first to truly master multicore/multiprocessing. His intimacy with the hardware made for design win after design win.
You really demystified this process for me! Seeing how that works in C is actually really encouraging. I'm learning C, and making a GUI window seemed really complex and daunting, but it actually seems pretty straightforward since you just have to use functions built into Windows and respond to system messages. I was worried that I would have to write the functions that do things like tracking the mouse position.
It’s a massive api that’s been around for thirty years, just read the docs there’s a function for everything 😂
Its massive for those starting out to learn from this style. Pause is your friend, you might use the transcript to make a checklist of sorts. I agree with gower1973 - there is sooo many functions. Never let the learning side go!
Program on Linux and let the GUI games begin!
I thought your first version was masterful, it's incredible how small you have made this. I love this small Win app project of yours. It's so interesting and it explores something that many of us are interested in but don't have the time or expertise to play with. You are a master Dave, it's really incredible what you do. Thank you for taking the time and uploading these for us.
There is an article on doing something similar for Linux elf executables, from the early 00s. The smallest "it runs" files was 46 bytes, the smallest "it doesn't break the rules" was 76. Part of the way it got the size so small (aside from not being a GUI application), was embedding the executable code itself inside the metadata header of the program. It couldn't _remove_ the header, as the elf-checker would refuse to run it, but it could take the stretches of 0s in the header and fill them with useful code. I wonder if a similar approach could shave a few dozen bytes off in this case. It does rely on a hexeditor to do, though.
Wow, for years I used to think that Assembly is hard but after this video I think it's very very hard 😎
Thanks Dave
For me, most important thing to be learn from ASM - are pointers, which are also used in C/C++ and even C#, other instructions can be directly mapped from other languages.
Say in C/C++/C#:
int a = 10, b = 20, c = a * b;
In ASM it will be:
MOV EAX, 10 // a
MOV EBX, 20 // b
MOV ECX, EAX // a
MUL ECX, EBX // ecx == 10 * 20
EAX, EBX, ECX - are commonly used registers, fastest memory. Each operation takes 1 cycle (except MUL), ie if a processor is running on 4GHz, it will execute 4'000'000'000 cycles/operations per second, however if a processor is superscalar (which are commonly used today), then multiple instructions can be executed at once (if they are independent)
@@29Aios Thanks for the example,
but is there any reason to store a in EAX then move EAX to ECX?
why not directly move a to ECX
and please excuse my ignorance in Assembly 😌
@@ahmad-murery This is an example, EAX value can be used later, but if not, it can be simplified as you said. There are many optimization tricks, like if you want to move 0 to EAX, it can be done as XOR EAX, EAX, and this 1 byte instruction, however MOV EAX, 0 is 5 bytes
@@29Aios Thanks Oleg it makes sense now,
Coming from high-level programming languages makes these things look complicated.
I always wanted to learn how to program my MSX1 machine but life is more complicated than assembly to me where I'm living.
Thanks once again Oleg and have a nice day/evening 👍💯
If you ever want to do something like assembly programming, but simpler, to get a sort of simple idea of it, there's a fun game called Human Resource Machine. I enjoyed it a lot. It's not quite like assembler, but it's in the ballpark. If you have programming experience, it's easy to get started. There's challenges in the game to minimize for size or speed I think, which can be fun, or tedious, depending :)
I wish people like you still worked at Microsoft. I just recently installed Win10 as a second OS alongside Ubuntu. I allocated 40GB thinking that would be plenty and Windows took up 37GB by itself!!!
Great to see you touch base with the demoscene! :D
Size demos were always lots of fun for me.
I smiled when you mentioned Assembler and optimization as I was an IBM Systems Programer on S/360 & S/370 specializing in optimization back in the day when we had to fit program code in 64K. I also dabbled in programs for DOS and Windows.
Me too. Abbreviated to Sysprog. Reversing a translate (TR) to re-organize memory was a cool trick. (Except we didn't use the word cool then). A 4mb upgrade to our 370 cost £2,000,000
@@PaulCotterCanada One year in 1980's the company I worked for here in New Zealand spent $145 million dollars for 3 S/370 with extra memory. Tell that to kids today and they don't believe you.
It could be amazing to hear about you talking about any emergencies at MS, like a personal view of what it was like to respond go big exploits being taken advantage of
I'll *absolutely* be linking this to others I meet who just really want to understand how some programs work in the Windows OS at the assembly level. This does a fantastic job explaining everything in a succinct way that can be searched for online later if necessary! It's also the video that made me realize that I'm starting to become comfortable reading asm, and boy does that thought make me feel really strange - but powerful ahahah~
Thanks for the great video Dave, and special thanks for making the original task manager program!~
This was exciting and fun, both for the history and the challenge. Absolutely love this and will be looking at source code to learn! Thank you.
Hi Dave !
I'm Russian programmer, and started programming 8086 since 90', but before, I was programming ZX80/Elbrus/ДВК 1,2,3/Robotron1715/БК0010/Нейрон/Other
Then, at this time there was a confusion, that when a code is smaller then it should be faster, actually not - say ASM "loop label" is slower on one tick than "dec cx; jcxnz lable" on x86.
It was 30 years ago, but I still remember that goal we were going to archive, and was coded in ASM.
In asm I always used " sub reg, value" or "dec reg" then jz or jnz for speed.
Yes, but the idea was in code size, ie. if it's smaller than it's faster, and in most cases its true, but not always for x86,
let me show in the code:
0: 66 b9 0a 00 mov cx,0xa
4: 66 31 c0 xor ax,ax
00000007 :
7: 66 40 inc ax
9: e2 fc loop 7
b: 90 nop
c: 90 nop
d: 66 b9 0a 00 mov cx,0xa
11: 66 31 c0 xor ax,ax
00000014 :
14: 66 40 inc ax
16: 66 49 dec cx
18: 75 fa jne 14
Both blocks do the same, increment *ax* register 10 times, but 1st block (7-9) 4 bytes only, second block (14-18) is 6 bytes.
Let's findout timings per instruction. I've googled "8086 instructions timing", used second link because of 8086-Pentium cycles.
DEC Decrement
operand bytes 8088 186 286 386 486 Pentium
r16 1? 3 3 2 2 1 1 UV
Jcc Jump on condition code
operand bytes 8088 186 286 386 486 Pentium
near8 2 4/16 4/13 3/7+m 3/7+m 1/3 1 PV
LOOP Loop control with CX counter
operand bytes 8088 186 286 386 486 Pentium
short 2 5/17 5/15 4/8+m 11+m 6/7 5/6 NP
So, Loop instruction on 8088 - 286 is a bit faster, however on 386+ "dec cx; jne lable" is much faster, about 2-3 times
@@29Aios 386 is when I was cutting my teeth on asm. Before that it was basic and fortran. Padding is often overlooked code alignment and cache misses are big factors.
@@stolenlaptop You are right. Alignment is most important for data, so, say if you load a 32 bit register from mem [0x0], it will be 2x faster over loading from mem [0x01], because data is within 2 32/64 bit blocks, so processor needs to load 2 blocks instead of 1 aligned. But cache misses, what do you mean ?
This was absolutely wonderful!!! Thank you Dave!!!!
Could tricks with overlapping header regions (like for ultra-tiny ELF files) be applied to EXE? If you search "tiniest ELF program", there's a really good article on the muppetlabs blog that gives a writeup of the technique as applied to ELF. Either way though, I'd love to see a deep dive into the internals of the EXE format, and what you'd be able to achieve in a from-scratch binary that doesn't rely on linker tools.
No. The headers of EXE files don't use offsets, but just follow right after each other. What you can do is overwrite the DOS header and the DOS stub program that is usually at the start. All windows cares about is the magic number (MZ's signature) and the offset to the PE header, which is at the end of the DOS header. Usually, this offset is set to 0x100 which is right after the stub program that tells you "this program cannot be run in ms-dos mode". You can change that to immediately follow the DOS header. Next you can set the file offset of the .code section to 0, and your complete file can be mapped as code. Set the entry address to 4 and the very first instruction can start right after the DOS signature. You just have to make sure the code jumps over the PE header. This will save you ehm... 248 bytes give or take. P
@@jbird4478 What do you mean? The MZ header does have an offset that indicates where the PE or NE or whatever header is located. In the demo scene is was common to make the MZ and PE header overlap. Just as an example, I just opened the famous ".kkrieger" by the german demo scene group "farbrausch". The first 16 bytes are
4D 5A 66 61 72 62 72 61 75 73 63 68 50 45 00 00
Which reads "MZfarbrauschPE\0\0"
So the PE header offset is actually located in the PE header itself. The offset to the PE header is located in the file at 0x003C from the beginning of the file. So they cleverly shifted the PE header so the only relevant field in the MZ header for windows (the offset to the PE header) is located at a position in the PE header that is unused, not important or the actual offset value is acceptable at this point in the PE header. Actually, since the PE header is actually smaller than the MZ header, I think the offset to the PE header is actually located behind the PE header ^^.
@@Bunny99s Yes, I mentioned that offset. The PE headers themselves don't have offsets, unlike ELF files. In PE all the headers just follow after each other. In ELF files, the different headers are located with pointers, which is why you can make them overlap in some cases.
@@jbird4478 You're right, I glanced over it :) I guess I was kinda triggered by your first statement. Yes, the additional optional headers of the PE header do not have offsets. The PE header just contains the count of headers. Most products of the demo scene only contain the absolute minimum (usually 1). Though I think that the PE header and the optional header still fits almost within the MZ header :)
@@Bunny99s The optional header of a PE file is 240 bytes, and despite the name, it is not optional. You might get away with cutting the end from that header, because that describes things that aren't always necessary, but I don't know. According to the specs and the WinNT header it is just a single fixed structure. What is optional are the sections and their headers, but you'd need at least 1 of those.
I love these kind of videos, great work.
Finding and watching one of your video always make my day better!
Thanks Dave!
My pleasure!
This reminds me of a 362 byte MS-DOS device driver that I wrote to correct the MS-DOS date and time after an add-on hard disk manufacturer's bios had destroyed it. It was a rush and it was originally just under 2K. The manufacturer of the add-on said that was too big - I told them it didn't matter provided it was under 2K as that was the cluster size on their HDD and once the driver had done its work it quit with its end address equal to its start address so no memory was used. But they wanted it to "look smaller" so at 3 AM that day I had it down to 362 bytes :) 8086 assembler none of this fancy 32 bit stuff!
Wow, Dave - Happy New Year! Absolutely loved this episode! I love the asm deep dive and the offer to allow us to help. Have to agree with others, its masterful indeed! More formally however I think you touched on something to be loved - teamwork. By doing it the way, and at the level you do, it brings out the best in competition. Along the way you put an sweet sample for any watcher to get their hands dirty and learn. I didn't see the jag back to C coming, yet fully believe you will take asm across the line for the checkered flag. As you might guess., I cut my teeth on 6502 asm. Be well and rock on!
This is the content I subscribed for!!!
This video explains windows development on a meta level better than so many books and tutorials or I went through only when I was younger. Amazing work!
Nice It works. My first assembly GUI thought the day would never come. Thank you!
So beautifully epic, thank you so very much for existing! ... And making this video, of course... And all the others!
I remember, back in the day, there was at least one packer that would overlap a lot of the DOS Header with the PE Header (possibly the optional header too). Not sure how well that holds up with the newer windows OS versions (a lot of these packing methods start to sacrifice cross OS compatibility to eek out a few more bytes of file size savings)
As I used to mess around with asm and win32 API and spend time on small Internet forums about asm and RE and stuff back in 2000s I could've NEVER imagined that one day I'll be listening to one of the guys who used to actually develop and work with this complexity I was trying to understand back then.. Dude you are a legend
Small is beautiful! its efficient, its fast and light on resources too... not touched an assembler or machine code monitor since the 90's but you have given me some very warm fuzzy memories - thankyou
I am taking assembly at the community college this semester. I love this topic and will be replaying this video as more of it begins to make sense.
Just had to write a small poll(2) based server for some FreeBSD systems I work on and was curious how big it was. The normal compile came in at just over 31k. Setting CFLAGS to -Os (gcc) and running strip(1) I got it down to 22k. With some other tricks - including upx - I got it down to 10k. But this a program that does real work. It is interesting how much baggage Windows brings along even though Unix is so much older.
Great video! Brings back a lot of memories of similar optimizations I did coding Z-80, 6502 and 8086 projects over the past 4 1/2 decades. It makes me want to fire up masm just to play for a while.
Right!
I've tried learning it for 2 years now and I haven't been able to finish anything... did make a pretty cool demo though
@@williamdrum9899 Whenever diving into a new language/environment, I try to create MANY different tiny projects, each tackling tiny parts of what I know exists in many larger solutions that I've coded over the years. After a few months of doing this, I will then tackle larger projects that use all of the snippets I've written in the past. I'm always happy with the results. Don't give up... just learn to add more functionality to prior projects... eventually, you will have something of value.
@@wintercoder6687 That's kind of what I'm doing now in MS-DOS. I've got a printer that can do color, jump into substrings, loop, etc. All written in asm (it uses the ascii codes above 127 as control codes)
@@williamdrum9899 That's great! I have done similar with both HP PCL and Epson Esc/p control codes. Fun stuff!
What a fantastic episode and blog! As someone who's learning to code, I never thought that so much would be going on under the hood just to link to the system libraries.
I once wrote a Windows program for a Production Ctrix farm in assembly. Well, when I say "production" it only had to work for a week or so. The company was moving from "Old system" to "New system". Most people accessed the system through Citrix published apps. We didn't want to just delete the app for the old system as we would then get swamped with support calls for "app doesn't work". Instead I wrote a small program that just popped up a windows and said "Remember all that training you got about the new system? How about you try the new app." - or something to that effect. On desktops we could just replace the old app with a batch file, but you can't have a batch file as a published app (or couldn't - I haven't used Citrix for some years now), so it needed to be an executable. Since software programming wasn't our thing I didn't have access to a (legal) development environment, so I was looking around for something that I could use for free to do this one time task. Stumbled across MASM and decided to write it in assembly. It worked and didn't crash the Citrix farm. After a week or two, once everyone was used to the new system we just deleted it. I was always surprised that I was able to get a working Win32 app written in assembly. It probably wasn't as neat and well formed as Dave's, but it did work for the few days we needed it. Thanks for the nostalgia from this old nerd 😃
Great concept and implementation on this video Dave. Very enjoyable to watch. Thank you!
This video took me down memory lane when I was still a student learning Assembly programming. We were just programming for the 8086 on DOS back then (1991). Thanks for this! :D
And with every dave's awesomeness video explaining how windows get drawn and resized it frustrates me anew that the company i started at has been working for years with a program that is not resizable and in a window, can only be maximized and then puts the program in a corner at the same size and fills the rest with white, and nobody has enough reach to do anything about it lol
Great video Dave, really enjoyed that. It’s been a while since I did any Windows C programming, that was a trip down memory lane 👍
Glad you enjoyed it!
Just wanted to say that I have a ton of respect for you, given that you are producing these videos now that you are retired. It's really quite neat remembering some of the stuff I learned back in the day. We are a diminishing breed, people who program in assembly, know the win32 api...
crikes! I wish I learned this stuff years ago (engineering, electronics and chemistry were fun and all, but...). I remember back in 1980 as a kid, wondering how to get the information for programming this kind of stuff - but, back then, "kids" simply weren't allowed to use computers in schools (mine would literally, expel a student if they touched any computer - really). Fortunately, I had a rich friend who's parents could afford a machine and he actually let me use it and showed me the ropes. Unfortunately, getting decent documentation on any computer programming from the library ( besides punched cards and teletype machines) was difficult or impossible (you needed to be in college to even get your hands on any juicy bits of information - forget getting anything from IBM or any big company at the time - trade secrets and all, especially as a teenager!). Glad to see the world has opened up since then.
Beautifully explained. As a 14 year assembly software engineer I really appreciate the detail here. Thank you! Looking forward to your next video. 🙂
Is it possible get in a job as assembly enginner nowadays?,asking because i dont want to memorize libraies that someone created ,i want to create my codes with C and Asm but looks like very hard to find job with basic ptog languages
@@jackalturk6491 You might try looking at companies building products that operate closely with the operating system such as BMC, PKWare, Dell/EMC.
I like the Steve Gibson reference! I've been using his utilities for about as long as I can remember.
Back in the 90s when I was a young hacker want to be, I would open exe files in dos edit and thought how does anyone program? One would need a crazy keyboard. I did take note of the starting 2 bytes were Mz. Later in my hacking career I downloaded the virus workshop. Before you could use the program, it would prompt you to enter the first 2 byes of an exe file to test if you know what you were doing. I remember being so excited that I knew what to enter. Good times!
I've been searching for a video like this for literally years omg thx ♥️♥️♥️
I used to love writing assembly (for x486 under MS-DOS) for writing code for my own purposes. I rarely had much opportunity to use it at work, except occasionally embedded in a C function.
Thanks for shouting out Richard Feynman, too... he's one of my all time heroes.
I find asm more readable than C, especially when interacting with hardware
Thanks Dave. This was a cool exercise. Brought me back to the 1980s. I worked writing programs for neuroscience. Trying to get them to run in 64K. MS-DOS, Early windows and PDP-11s . I just created a java program that is 50mg. I wonder if I can get it to 45mg. :)
This is my fav vid of yours yet. Love Assembler.
...I was just about to say that!
You Mr. Dave are a God to me. Just awesome thanks
Brilliant. Wish I'd known all of this 20 years ago.
thank you for this great presentation. clear & concise. it was really interesting for me, as I was a Windows developer for many years.
Nice nod to "The Friendly Giant"
It seems to me you could save several bytes by making the window struct part of the program and simply loading constant parameters as part of the code; thus, you don't need to load them into registers and store them into the struct...
Try it out and report back.
He's clearly smarter than you. Don't try to contest.
@@Crecross I mean... Dave literally says to give the other options a look and try and make it smaller. I'm not good enough with Assembly to contribute much, but the above approach *would* shave a handful more bytes.
Back in high school I was messing around with Flat Assembler and managed to make a functional yet tiny 1.5 KB Windows application that had all sections (code, data, and import tables) merged into one. Good to know it's possible to go even lower than that.
I would press the like button twice if I could. It is so interesting listening to you Dave!
Throwback to a time when every problem wasn't solved by throwing in another hundred megabytes (or so) of additional npm modules, and some custom tracking js on top of that for good measure.
i laughed way too hard at the addition of "your token gray bearded wizard"
Super interesting. I wish I knew this decades ago. Thank you.
I love for XOR reg1, reg1 is the same as a MOV 0 instruction, and OR reg1, reg1 is just an alternate NOP instruction
XOR reg,reg is not the same as mov reg,0. XOR updates the flags. Likewise OR reg,reg is not a NOP. It updates the flags without changing a register.
More of this kind of content please
Reminiscing about .kkreiger ...
Thanks Mr.
love small code and win32 programming and optimization. thank you task manager man!
"I'll do it all in notepad and assemble it from the command line".
I like the cut of your jib, sir!
My best efforts at hacking were to build a Batch file that converts a text file of hex characters (eg, "A0B1C2", ignoring spaces and newlines) into a binary file. I used this to create executables on systems where the creation of executables were somehow prevented. I challenge Dave to do that.
I miss the old task manager... the new one looks better but it is a laggy nightmare.
They probably need to rewrite it from scratch to fix all the problems. I'm not sure they still have someone capable, though.
Well it's Windows, so ... ;^)
Like much of Windows 11, there are some nice features, but man are they poorly/lazily coded.
We all knew code would become less optimised as systems became more powerful, but such basic things should not be lagging on hardware that could only be dreamt of back in the 90s and even 00s.
The new one take day to even start
@@halano Are you running on Core 2 Duos still? Even on a Ryzen 5600G it opens instantly.
Excellent. So much good stuff here
Love this stuff, thank you.
"I've made a fully functional Windows program that fits in 644 bytes!"
"Wow, that's tiny, what does it do?"
"...What do you mean, _do?_ "
Well, it occupies 644 bytes. :-)
One of my most favorite things is, in Linux, if you look at the man pages for /bin/true and /bin/false, the description is: "Does nothing, successfully." and "Does nothing, unsuccessfully."
Fun little history fact. Word Perfect was written in ASM, and this is why it was so fast. Up to Word Perfect 7 for Windows. But when Corel bought Word Perfect, they did not do their due dillegence and had a horrid supprise when they realized that Word Perfect 6 for windows 3.1 was actually ALSO written in Assembly.
So, Word Perfect 7 for Windows 95 was a complete re-write with some 16bit components because, well, they couldn't re-write it all in two years!
Which is why my sister loved it, as it was fast enough on a 286/16 PC to keep up with her typing speed. Otherwise with other word processing she would regularly out type the keyboard buffer and lose characters. Only with a 100MHz Pentium and Win98 did Word finally get fast enough that she could not out type the buffer and she finally, after a decade or more of WP, changed to Word, as the company did not want to pay the cost of Corel's upgrade.
I bet that was a fun codebase to inherit...
@@nickwallette6201 Because they had to re-write the whole thing, V7 for windows 95 was super buggy. It destroyed the reputation and ultimately led to business issues within Corel. The decision to make WP6 for Win3.1 in assembly was the first mistake, the second mistake is buying the company without due diligence about the state of the code base (or at least knowing about it and going forward). The third was veering away from their core business, which was Graphics stuff. They were a direct competitor to Adobe for years, and would probably have been to this day if they played their cards right. Adobe has a quasi-monopoly now on graphics software, Canva filled the gap that Corel had.
This tickled parts of my brain that have gone unmolested for too long. Coded for decades at a bank, a car, and aerospace companies. Sadly, I all too quickly moved into management and lost the hands-on experience. Watching this made me feel young again!
Glad you found it interesting! It was my first real x86 asm since working on MS-DOS back in the early 90s!
A always find that modern programs really need the optimization, speed and size that Assembly can provide. If you want to have some fun, just take a look at the .Kkrieger demo game. It's a full 3D shooter that fits under 97kb and runs on Windows.
I love the way you explain 'things'. Anyway I am happy to know the artist who has created taskmgr, the probably most stable windows application.
I went deep down this sort of rabbit hole back in the 2000s. Imagine how much easier it was before ASLR...
Malware and binary protection systems alike used techniques like that of removing and/or messing up the import tables so that people could not as easily depack executables and modify out protection. Lots of reversing tools of the era: LordPE, ImportREC, etc. that helped in that regard.
I've been a Linux guy since about 1995 so have to sneak in the back but really enjoy these super technical videos. Thanks!
fantastically interesting and educational. I think I once used a Petzold book to do something basic, and sorta hated WIN32 type programming, but I still found this interesting. I guess technical videos like this will never get a ton of views, as the audience is small. But you deserve more views!
For whatever it's worth, I made a console hello world in C and compiled with GCC (windows) which even with flags got stuck around 21k. I switched to the TCC compiler and used it's -O2 flag and the same source compiled to 2k. I'm sure a real programmer (not me) could get that down into bytes, but I feel like I got something done today!
An enjoyable episode for the incurable tinkerer :) Nice to hear the tip of the hat to Mr Steve Gibson. Would actually be a fun episode to watch you guys talk low level assembly coding tricks.
Steve is a true wizard compared to me, as I haven't worked in x86 for almost 30 years, and I imagine he lives and breathes it ever day!
@@DavesGarage Yes indeed, I imagine the same :) As a fellow developer working with backed code for the energy sector mostly working with C# and various other languages. I truly appreciate this kind of content where we go back to the roots of how things actually work behind the scenes. Newer generations of developers can easily forget the levels of abstractions that we just take for granted in our daily work. I feel the same about the demoscene in general. When coding can be as much appreciated as a form of art, and not just as a means to a business end :)
Yay look forward to watching this later
Great idea for content - I've always held that every serious developer should take at least an intro ASM course
Totally agree!
Actually, knowing the x86, or even Z80 architecture is enough to get the idea
@@29Aios Exactly - as a kid I did primitive Z80 and 8086 dev and grasping even just the basic hardware focused concepts (interrupts, stack management, pointers, and the like) is huge, just to respect what is going on under the hood
@@stevepoythress4678 +1 !
Ah, interrupts :)
They are totally disappeared today, so also LGDT/LIDT ASM instructions, they were used to remap 0000:0000 memory space to any 24-bit address of the x286, 16Mb space.
Have a story about interrupts.
In 90' I've created a resident prog to make memory snapshots of any program, named S&R, and then restore it by demand, which primary solved any floppy protection (save state when floppy is already checked, then restore file on any PC), but it was widely used in my local area to play games. So fellows saved games prior important event, and in case of failure just restored it.
You could press F11/F12 to save/load snapshot anytime, and LIDT instruction just helped to intercept the keyboard interrupt 0x9 without boring that it has been already intercepted by someone else.
Btw, in case of Mem386 utility, which was working in protected mode, and didn't allow any other program to execute high-privileged instructions like LIDT/LGDT, I still could intercept the interrupt by intercepting BIOS Clock once per second, and read the 60h port to know which key is pressed, but anyway, I couldn't read/write the protected memory. So only real mode games could be saved 😒
Awesome stuff. I miss the days of keeping things small. Maybe when the hardware finally stops advancing apace, we as an industry will return to it.
I share that view. I would've expected things to be more optimized by now, instead it keeps getting worse in certain cases.
I think the only device that really encourages innovation by devs and also shows the power of optimization is the Nintendo Switch. It's processing power isn't really great by modern standards, yet games like BOTW still have something to show against more powerful systems
@@MartinDerTolle Yeah, it's only natural though - as long as there isn't any *need* to optimise we can spend the same time making new features - i.e. things that sell. At the end of the day, it's all about making money, not making the best (optimised) product.
@@dgkimpton You do realise that if there is no competition, there is no need to improve?
What I'm trying to learn is optimization of how to write software which takes advantage of different opcodes of the CPU generation and stepping to get the most out of the h/w. e.g. SIMD/AVX instructions.
With libraries, you can call a function and have the function determine the h/w it's running on and chose which version of the function optimized for the local h/w transparently. But in the main block of code CPU optimizations are manual and have to be accounted for there.
First of all, absolutely wonderful and impressive work, that executable is SMALL!
In another direction though, I noticed that tiny.exe's memory usage is rather large. ~1.3k memory (private working set), ~10k working set, but more curiously ~131K Commit size!
I would be curious to see the same exercice but in the direction of reducing the runtime memory usage instead.
With nowaday's ever so popular Electron apps eating up absolutely absurd amount of memory at runtime, I'd love to see a trend of people pushing for lower runtime RAM footprint =)
I learnt to program Windows with Charles Petzold’s amazing ‘Programming’ series of books. When I moved from Win16 to Win32 his books made the transition so easy.
one of your best videos!
I tried every book back in the day to code to traditional Windows using the books available, and I could not make any sense of it, until I got this book: " "Introduction to Mfc Programming With Visual C++" by Richard M. Jones (2000). This book did the trick. I just code for fun and now I'm into playing chess more rather than programming, having done pretty much everything I wanted to do in programming, primarily using C#, including Azure cloud and web functions and writing a chess program. At some point I might learn assembly language just for fun, after I get the GM title, lol.
13:20 - I would suggest a great program that could show Import Tables and so much more ;-)
I'm talking about DiE - "Detect it Easy".
@lornasandra And why should I send a message to a fake empty channel? ;-)
Hello! In my opinion - for the cheap MCU like STM8S writing in assembly have a sense - because the Cosmic and Raisonanse C compilers costs 1000$. But... when you have libraries and pure architecture - maybe, it may have sense for the PC or a server. The TCP, TLS protocol must be written in aseembly - for the best performance. It is also actual for RDBMS .
Great video, pretty good demo of why people don't write in assembler if they can avoid it, so hard! C forever!
Oh boy... I loooove this channel :)
100 kb and it only opens the window
ElectronJs out there: Hold my 200 MB
Option that is most possible to happen (WM_PAINT in this example) must be the first in decision chain. It will save CPU cycles. More common rule is to sort options by possibility / frequency of their appearance.
One peephole optimization that comes to mind is to skip the line to fetch the virtual address of the PEB, from the PEB. Just index FS directly to get the loader struct pointer. That shaves an entire instruction from the code! Additionally, all those push/call pairs are embedded structural information. Better to strip that stuff out and do a loop over the hashes. Use an escape value to switch the dll to look for.
FS:0 points at the TEB, which has a pointer to itself (0x18) and to the PEB (0x30). You can't directly offset from FS to the Ldr pointer, only to fields in the TEB.
Has anyone heard of, or used, a project from around 2000 called SpAsm or SpAsm32 - its a self compiling Windows Assembler with a built in Editor, where the programs sources are stored at the end of the executable - yes its a _single file EXE+SRC_