What's inside a .EXE File?

Inkbox

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 24 лис 2024

КОМЕНТАРІ • 557

@Sparkette Рік тому ⁺⁷⁷⁵
Actually, Windows hasn't been DOS at its core since Windows Me. Windows XP and later are based on Windows NT, which doesn't use DOS.
@FatManJerry Рік тому ⁺³¹
Windows xp still incorporated 9x for compatibility reasons. But it was based on nt
@amcfunk Рік тому ⁺⁶⁶
Indeed. The purpose of the little dos program at the beginning is to provide a nice error for anyone trying to run it from DOS.
@quickhakker Рік тому ⁺²⁴
and yet the files introduced in windows 10/11 (edge) still have the "this cant be run in dos mode" thing
@progect3548 Рік тому ⁺³
@@earthblob2058yknow that would probably explain why me was so buggy
it might’ve been an NT os rushed into being ported to 9x
@fraznofire2508 Рік тому ⁺¹²
@@quickhakkeramcfunk explained already why this is.. it’s to ensure proper error handling for DOS based operating systems that try to run it
@Bunny99s Рік тому ⁺⁷⁵⁸
This wasn't bad, but missed or simplified a lot about the actual exe content. Exe files (or PE files) are organised in sections. There are different sections in it and usually only one contains your code. There are other sections which may contain resources, text or much more importantly import / export sections. While an EXE file usually does not have an export section, it usually has an import section. The content is essentially a special "contract" by the OS and your application. When the OS starts your program, the OS takes care of loading your file into memory of its own process that the OS created. The OS will scan through the import table, look up shared libraries and imported function names and dynamically load those DLL into your application and also resolves those requested methods. That way your program actually has access to certain functions that are either part of the OS or some other utility libraries. The export table usually only exists when compiling a DLL file which internally is also a PE file. Of course the export section serves the opposite of the import section. So the OS can look up a method or other symbol that the library exports when loading the DLL for an application.
You can actually trim out a lot of the unnecessary code from a PE file. In the past I used a very small assembler called Flat Assembler (FASM). There you could even create your own MZ stub without all that message stuff that nobody needs anymore. In the Demoscene it was even common to have the MZ and PE header to overlap. The MZ header contains a special value that determines the position of the PE header. By cleverly offsetting the PE header you could (re)use otherwise unused or irrelevant bytes. I created key hook dlls that were only 2kb in size. Unfortunately Windows expects sections are placed at a certain alignment, so you can not shrink it too much. So Windows expects some empty space between sections. Though the demoscene usually makes use of almost every byte. Since Windows does not care about the content of those alignment sections, you can fill it with your own data.
The fun thing about FASM is that it's source code is available in its own assembly dialect. So it can compile its own source code to produce itself. Of course it's open source. Just google for flat assembler or FASM.
@tf_d Рік тому ⁺³²
Thank you for taking the time to write this out, I'll check out FASM.
@diobrando7642 Рік тому ⁺⁵
I have a simple question, wouldn't it be simpler to use syscalls/interrupts to call for OS functionalities?
@the_7th_sun Рік тому
Came here to say this, well said
@thediaclub4781 Рік тому ⁺⁷
@@diobrando7642 Windows doesn't support syscalls by user programs, but they are theoretically possible iirc. But you cannot replace whole libraries with some syscalls.
@龗 Рік тому ⁺⁴
@@diobrando7642 you shouldnt use direct syscalls, the syscall numbers change on windows
@rsa5991 Рік тому ⁺³²⁹
I did actually handcraft an EXE file. I did that as a part of writing a simple compiler for a stack-based language.
The hardest part was making the correct header. That took 6 hours - mostly because Windows never tells you what went wrong, it just refuses to load the file. But when I get that figured out - it was a pretty smooth sail.
@user-mc4rr9fe6y Рік тому ⁺⁷⁰
the real strategy is just to make your own OS that has its own header format for executables so you know the header
@gabrielschilive7675 Рік тому ⁺⁹
If I may ask, how did you do that? Like, what resources did you use?
@rsa5991 Рік тому ⁺⁴⁵
@@gabrielschilive7675 I tried to answer you, but my comment got deleted for links.
I used "PE Format" documentation from Microsoft Learn.
I also used "Tiny PE" research by Alex Sotirov. I didn't use most tricks from his research, but I used it as a guide, which fields are important, and which I can ignore.
@anon_y_mousse Рік тому ⁺⁹
@@gabrielschilive7675 You could also look for a copy of Inno Pascal. It's open source and a very simple Pascal compiler that directly generates PE executable files. It doesn't have every feature of Pascal, but it's super easy to understand the code.
@gabrielschilive7675 Рік тому ⁺³
@@anon_y_mousse Thank you. I had not thought about a compiler or linker before. Good idea!
@steamrangercomputing 9 місяців тому ⁺⁸⁶
Rarely these days do you hear people refer to C as high level, but I'm always glad when it is.
@Joker-fj8hg 8 місяців тому ⁺⁶
The definition I learnt from school: 1st generation is machine code, 2nd generation is assembly language, and both are low level. Newer generations including C are high level
@steamrangercomputing 8 місяців тому ⁺⁸
@@Joker-fj8hg Same. But now C is said to be low level compared to things like Python, which I suppose it is.
@YaySyu 7 місяців тому ⁺²
@@steamrangercomputingAs someone learning python who just glanced at some c, yeah. There's a learning curve..........
@steamrangercomputing 7 місяців тому ⁺⁶
@@YaySyu If you're educated on the basics of how programming languages and computers work, C is actually easier to learn than Python, as it straight up has less features, and thus less to learn.
What makes it so difficult for many people to learn is that what functionally it does have is a lot more powerful and closer to the hardware than Python, meaning people have to learn about things like memory management.
@pai64 6 місяців тому ⁺²
Its merely spectrum. Assembly is high level language compared to binary codes and python is high level language compared to those two.
@SilasonLinux Рік тому ⁺⁶⁰⁷
i don't think its true that windows is still running on dos nowadays though. thats my only critique. its running on the NT kernel now and has for a long time. I think that message about not running in dos mode was made for the time before home versions of windows used the NT kernel, so pre windows XP.
@andrewcrook6444 Рік тому ⁺⁹⁷
No what he meant was is you run the PE exe under DOS OS it runs a DOS part of the exe to display an error message if the same exe is run under Windows NT it jumps to the Window part of the exe. He didn’t mean that Windows NT runs on DOS which it doesn’t. on a side note, NT’s DOS support is via a virtual machine called NTVDM and Windows 16 bit apps via NTVDM and an extension called WOW. 64 bit Windows runs 32 bit Windows apps under WOW64. Windows 10 disabled NTVDM/WOW by default and Window 11 removed it so that if you want DOS and Windows 16bit support you have to use a virtual machine running a older version of Windows or something like DOSbox or WINE.
@mix3k818 Рік тому ⁺²³
@@somacruz8272 Parallel to DOS. Windows 9x is an extension of Windows/386 and Windows 3.x, which were a layer on top of the MS-DOS kernel. Windows NT on the other hand has always been its own kernel.
@andrewcrook6444 Рік тому ⁺¹⁴
@@somacruz8272 if you look at any book on internals of NT or a good book on operating systems with examples or even wikipedia you'll see that NT was a completely new architecture with the lowest API layer being the Hardware Abstraction Layer (HAL). Only companies really used NT it was for severs and powerful workstations it wasn't until Windows XP replaced MSDOS and Windows ME that Windows was completely built with using the NT/2000 architecture and became more mainstream for home users until it become the de facto.
@malibuclassic77 Рік тому ⁺²
@@somacruz8272No it wasn’t. Win95 was, but win2000 only shares UI
@Bunny99s Рік тому ⁺⁵
@@andrewcrook6444 Right, almost all "newer" operating systems usually switch the processor into protected mode very early during boot and use it, well, more properly :) Windows 95 and 98 did also switch to the protected mode (in which normal x86 real mode code would no longer work), but those didn't really care much about proper seperation between kernel and userspace. It was sorta there but they still supported v86 mode to run 16 bit dos applications natively. Those essentially breached almost all security measurements of the CPU.
With WinXP (which was build on NT) things got much better security wise. Though a lot of low level drivers could still easily bring down the whole system. BSODs were much rarer on XP, but still possible if some driver went havoc. Even with WinXP we still got dos support, but only in the 32 bit version. In the 64 bit version the support for 16 binaries was dropped. Which was a huge deal for me at the time as I was still occationally use my good old Borland Pascal 7 ^^. Though things have changed a lot since then. It was quite a journey.
@apo11ocat Рік тому ⁺³⁰⁶
7:06 A common misconception in Python is that each line is being read and executed in real time but what actually happens is that the interpreter compiles it to bytecode and saved to memory which will then be executed by the Python Virtual Machine in real time.
@jeremiefaucher-goulet3365 Рік тому ⁺⁴⁴
It's all a question of perspective. That bytecode is still "interpreted" in real time by the PVM.
@malikcurriah241 Рік тому ⁺¹²
@@jeremiefaucher-goulet3365 i think the point is that its not being translated from python language to bytecode, but from a more easily translated, compiled intermediate, like Java is.
@jeremiefaucher-goulet3365 Рік тому ⁺¹⁶
@@malikcurriah241 Exactly like Java bytecode yes. The JVM still needs to interpret and execute that intermediary bytecode at run time.
@TheOzumat Рік тому ⁺¹³
well, a .py file is still a .txt file with a fancy hat on.
@anon_y_mousse Рік тому ⁺⁴
@@TheOzumat You mean with a fancy skin on. :P
@fedotttbv Рік тому ⁺¹²²
Okay, I just want to say, that one of the reasons of big size of the .exe file is compiling mode - Debug. You can basicly see there three calls of third system interruption right after the end of "main" function - they are inserted by compiler to prevent running out of function (if you, for example, forgot to write "ret" instruction). Debug mode generates terrible amount of auxilary code, which can help you in debugging. All your actions, even in assembly, are checked by debugging instruments in runtime to help you in search of mistakes. So for pure research you should better disable all of debug utilities (part of them is still used even in "Release" mode) in project settings. But even with that, this video was interesting, thank you for your work.
@Gwarks337 Рік тому ⁺²¹
EXE File also contains Icons, Bitmaps, Cursors, Dialog Defintions. The function LoadBitmapA for example loads a bitmap inside the current exe file. Many of this resources can be viewed (and sometimes edited) with PE Explorer or similar programs.
@tomysshadow Рік тому ⁺⁴¹
I'm sure several people have pointed it out by now, but the extra code you were seeing is from the CRT (C Runtime,) since despite being written in assembly, you were compiling your program as a C program.
Before main is called, there has to be code to do things like, take in the command line and split it up into argc/argv, set up thread local storage, set up floating point numbers, etc. On Windows, this stuff is done by the executable itself, not the system. The code to do it is inserted by the compiler in a way that's transparent to programmers. You can turn it off, but then you'll have to implement those features yourself if you want to use them.
@williamdrum9899 Рік тому
So it's basically the "startup sequence" that most game console devs had to use before running their actual game logic
@snippykeegan Рік тому ⁺¹⁵
This is one time i wish i could double like a video.
It's a bit oversimplified for more advanced computer users, but for the layman just wanting to learn more this is fantastic.
@shackamaxon512 Рік тому ⁺¹⁸
I remember when MS DOS had a debugger. It was fun to start the debugger and tell it to just "go". Debug would dutifully attempt to execute whatever the IP register was pointing to. The machine would jump off a cliff if it could and you told it to
@glitchy_weasel Рік тому ⁺⁷²
For those interested, I find Dave's Garage "The World's Smallest Windows App" video a fantastic explanation of how you can take out everything but the bare minimum from a PE.
Very interesting video by the way, altough I feel like the viewer is left with more questions than answers. Anyways, keep up the good work and I hope to see your channel grow.
@zilog1 Рік тому ⁺³
saaame. i love that channel ^~^, also buizel is cute af. best pokemon.
@nicknorthcutt7680 8 місяців тому ⁺¹
Dave is a genius. The time it took him to write that program blew my mind!
@mattgio1172 Рік тому ⁺⁴⁰
Amazing video! I never really thought about exe files that way before - you explain it so well! I always learn something cool from your channel - thank you!
@mkd1964 Рік тому ⁺¹³
the MZ at the beginning of DOS executables stands for "Mark Zbikowski"... who was one of the main developers responsible for developing the file format.
@Finkelfunk Рік тому ⁺³⁵
CS student so I have a few notes on this:
6:29 When I first learned about Assembly I thought the same. This is NOT true however. Assembly is extremely hard to grasp on a physical scale. It is only when you get into the meat and nitty gritty details of how a processor _actually_ functions that you realize just how close Assembly code actually is to pure machine code. All Assembly effectively does is take a command in (like 'mov') and translate it into 1s and 0s. There is a 5060 page thick "Intel 64 and IA-32 Architectures Software Developer’s Manual" for x86 Assembly detailing what exactly each instruction means, but basically "mov eax, 0x5" gets translated _directly_ into "0xb8 0x05" in hexadecimal, with b8 being the opcode and referring to 'move the following to the eax register'. The instructions that are read are directly sent to something like the processors ALU and directly fed into the connected multiplexer. So the "add" instruction you put in actually controls that specific multiplexer in that specific register.
Now while this is not punching in bits into a machine by hand, you are really not gonna come any closer to controlling the pure bare bones hardware than this.
7:36 I presume you are referring to Python in this case because believe me when I say that every single one of us sucks at Assembly compared to the magic a compiler performs. A compiler is capable of spitting out insanely optimized Assembly code to the point where the only people on this planet capable of writing faster Assembly code than it are the people that actually program the damn things. Compilers do things like higher polynomial functions and division by invariant multiplication to make your code _way_ faster than you could ever do. And those are just some of the incredibly genius ways your code can be improved upon. To _really_ understand the full math a compiler uses to fold and optimize your code you basically need a PhD in Math and Computer Science.
All in all that topic is a thing you can really sink time into. :)
@InkboxSoftware Рік тому ⁺⁸
One thing I glossed over in the video that I should have gone into more detail with is that the compiling process is still doing a lot of work even from the assembly level. For example, when writing assembly code you can still have variables with their own set name and that's an abstraction that will delt with by the compiling process. Even the MOV instruction in assembly has a couple different machine code equivalents depending on the addressing mode and what it wants to do. So I agree that assembly is close to machine code, but it isn't always a 1:1 translation process. And then as I mentioned with the program I made, the linker added all the code for creating the window and turned my short program into a 48Kb program. And I think that is the point I was trying to make, not that assembly is significantly different from the machine code of the final executable, but that the compiling process, something that most programmers probably never give a second thought to, is doing a significant amount of work to deliver that final EXE.
@Smaxx Рік тому ⁺⁴
And don't forget to add to this that most (C) compilers are built in a two-step process, first compiling ("bootstrapping") itself using whatever tools are available, and then in a second pass, compiling itself using itself, because of well-known own optimizations etc. (and to verify everything is working, too).
@Finkelfunk Рік тому ⁺⁴
@@InkboxSoftware Oh yes absolutely. I mean there's several mov instructions depending on register and type of value you are handling. But at the end of the day your instruction really does get a 1:1 translation into a final binary instruction which makes this incredibly cool to use. Given the context it's very true though that the .asm file you put into the compiler is not 100% just what you will get out of it (as you rightly pointed out). The way this statement sounded to me was giving me more of a "Assembly is basically a more complex form of C" type of vibe (I hope you get what I mean by that haha). But in the context that the .asm file is not 100% all you are getting it is very much true. :)
@Bunny99s Рік тому ⁺⁸
@@InkboxSoftware Finkel - Funk is actually right. Assembly is a 1:1 translation to machine code. Yes, most assembly dialects support certain macros or some simple simplifications, but those are merely syntactic sugar. What you've seen in your disassembly is just boilerplate code that was generated by your linker, since you actually use C++. Try using an actual assembler that directly spits out the exe file. Another thing, which I mentioned in another comment, is that the PE file format has a lot of additional headers and sections that are / need to be initialized as well. Those are not really machine code as it's just part of the actual PE format. Call it metadata. This metadata can't be interpreted by the CPU but by your OS. When you start an application, there's a lot going on on the OS side before the actual execution of your code starts. Though that's all part of the OS.
Classical COM files under DOS only contained raw machine code from the very first byte. So you can write a program with just a few bytes and it would work (under DOS). Com files were always loaded at the memory address 0x0100, So absolute memory references were actually possible that way. It was literally the position in the file + 256 (==0x100).
I can recommend looking at FASM which is a very slim low level assembler which directly outputs whatever you want (MZ, PE, COFF, ELF).
Note: What most assemlers do for you is converting relative memory addresses or the addresses of labels for you. That's where it differs from the actual output. Though in the end the position of a label simply denotes the address. So the compiled code just contains an numeric offset at that place. Some decompilers would actually create fake labels for those, but of course they can't reconstruct the label (or variable) names, as they don't exist in the compiled code. Of course we talk about native x86 / x64 code here and not IL code (intermediate language) which is generated by .NET and can only run with the .NET framework which does the final JIT compilation on the target system.
So Finkel is right. Most decompilers usually show the actual bytes that make up that opcode right next to the actual instruction in assembly. It's a literal 1:1 mapping. There are "high-level" assemblers which give you support for some high level features like if statements, loops and simple data structures. Though those do not represent the actual hardware assembly language.
@Siissioe123 Рік тому ⁺¹
In fact Assembly is just machine code rappresented by words, each word (computation) has a '1' and '0' value. I could be wrong, but I beleive that the first 5 character are related to the operation to do, then there is the adress and then the numbers to do the operation with. However python is different, the Python compiler compiles the file in bytecode, then the bytecode gets runned by the Pyhton VM
@GS12478 Рік тому ⁺³⁰
That "Gesundheit" killed me🤣🤣
@puppergump4117 Рік тому ⁺¹
Is that a tf2 reference
@progamer3000-uz7pj 7 місяців тому
@@puppergump4117no
@iWhacko Рік тому ⁺¹³
you can create much smaller executables if you use masm for instance. it doesn't add all the "unnecessary" stuff if you dont need it, and you can set the data blocks yourself, optimising your executable.
@dingokidneys 8 місяців тому ⁺¹
There also used to be a tool called EXE2BIN which would then strip a lot of the stuff from an .EXE file and generate a .COM file which was much smaller. I don't know if you can still build and run .COM files in Windows. I haven't used Windows on the regular for years but back when I was learning to code in 16bit assembler that was how I made tools that I could add to a 1.44Mb bootable diskette for trouble shooting.
@TheMilli Рік тому ⁺²⁸
I just want to say that this was an amazing video. You could have just stopped after the theoretical first section, like most other videos do, but you went the extra mile and showed how it works in practice. Honestly, if the rest of your work is just half as good as this one, you've got potential for blowing up!
@carloslecina9029 Рік тому ⁺¹²
And this is why we need for the Community to release things more like dev tools instead of production apps. To better understand how things works internally, and to improve them.
@bayurukmanajati1224 Рік тому ⁺²
Well, he said it in the near end of the video.
"Computers are so fast today, we don't even need to optimize it to nth levels the further we are from the raw materials."
Well, if someone somewhere are having the capability and time to tweaking anything. Then dev tools would be common for sure.
@racsonp Рік тому ⁺¹
One of the best videos that I ever seen about "how the stuff works _
@UKGeezer Рік тому ⁺¹⁸
If I remember correctly, in DOS you could also create COM files as well as EXE. I think these were basically executables for small programs like command line utilities.
@jeremiefaucher-goulet3365 Рік тому ⁺¹⁴
Yep. COM files were just pure machine code and data. No header, nothing.
@maxmuster7003 Рік тому ⁺⁴
I like to create tiny com files with a little help from debug and i put all instructions to build a routine into batch files. Most of my batch files have to start with one or more parameter attached to build the routine.
@mattrogers6646 Рік тому ⁺¹
COM files are 16 bit only. I used to write x86 assembly to test antivirus heuristics. The average computer user really does not know how lucky we are to have so many hardware protections like NX bit, ASLR, and the move to NT kernel with Windows XP and later prevented so many virus opportunities that existed in 9X.
@maxmuster7003 Рік тому
@@mattrogers6646 In DOS we can switch from 16 bit mode into 32 bit mode and 64 bit mode with a com file and we can startup all cores of a multicore system. I like to use the not documented 16 bit BIG real mode(unreal mode) with a segment size of 4 gb for DS, ES, FS, GS segment and 64 kb for CS and SS segment and to open the 21th address line to write directly into the linear framebuffer using VBE graphic modes with address size prefixe on 80386+ CPU.
@maxmuster7003 Рік тому
@@mattrogers6646 If we boot MS DOS from a self made CD ROM a virus can’t infect our system files.
@Matojeje Рік тому ⁺¹
I really like the accompanying visuals you included at the end!
@Amonimus 6 місяців тому ⁺²
Surprized there's no mention that most of the time you can open an exe as a zip archive and see it broken down into smaller pieces.
@uaman11 Рік тому ⁺¹
yo i luv the europe analogy it made so much sense 😩👏
@Slurkz Рік тому ⁺⁵
Stellar analysis! I learned a lot. 💜 Thanks.
@held2053 Рік тому ⁺²
"Gesundheit" that really caught me offguard, as a german. but it is the most realistic reply. just "Bless you"
@rickintexas1584 8 місяців тому
I have been writing code since the late 70s. Back then we had to be super efficient with our logic because computers were so slow and limited. Modern languages allow me to focus on the problem I’m solving, almost ignoring the computer resources. Modern computers and IDEs are simply amazing.
@Smaxx Рік тому ⁺⁴
Was interesting and entertaining to watch, even though I knew what's "inside" and had my expectations about the video. 🙂
@alexandrosweeb8059 Рік тому ⁺¹⁵
I would still really like to know, What's inside a .EXE file!
@rsa5991 Рік тому ⁺⁵
Inside an EXE are:
- a header, that tells the required CPU type, minimal version of Windows and a position of the first instruction.
- a section table, that tells which parts of the file should be loaded in memory, and where.
- an import table, that tells the names of DLL files that must be loaded, and the names of the functions that your program needs. Windows will create an array of function pointers, that point to those functions.
- your machine code, your constants, and the initial values of the global variables.
And that's basically it
@MrCacoGames Рік тому ⁺¹
Roller Coaster Tycoon was coded in assembly, which made it run on every computer back then. I put this as a W for assembly language
@0x150 Рік тому ⁺⁸
7:00 python is actually compiled before being executed, just not into actual assembly, but into an intermediary bytecode format the python interpreter then runs. this is what’s inside the .pyc files you sometimes see in the pycache directory, just python bytecode.
a better example would be javascript, which is partially compiled and partially interpreted
@puppergump4117 Рік тому
Is that just to make it portable while reducing the size
@0x150 Рік тому ⁺¹
@@puppergump4117 No, python source code is generally more cross version than the bytecode format, and the compiled .pyc file is often several times bigger than the source file. It's just to precompile everything, so you don't have to do JIT compilation
@katieheart6156 Рік тому ⁺³
Python is actually compiled... in a way. While the entry file is only compiled in-memory, the imported libraries are (if the source has a newer date than the saved byte-code (if any), it will recompile it). They are compiled in a similar fashion to Java Executables, but in this instance, as platform-specific, Python byte-code (on the Windows installer, there is an option to precompile the standard library).
@Siissioe123 Рік тому ⁺¹
Yeah, the python interpreter converts your code to bytecode then it get executed by the python VM
@ReSuKi. Рік тому ⁺⁴
wow u derserve way more visibility this is a really great video thx !
@pabblo1 Рік тому ⁺⁸
7:41 It's surprising how fast computers are nowadays. Back in the 1980s, assembly code would've been the only option for programmers to code fast programs, as the processors inside 80s computers like the Commodore 64, ZX Spectrum or Atari XL/XE (maybe even the Atari ST or Commodore Amiga) were simply too slow to make a program run in an interpreted language (most often BASIC), and compilers were scarce for these computers.
@matrix01234567899 Рік тому ⁺¹⁰
Nowadays CPU are so complicated, that if you are not experienced in assembly, your assembly code will be probably slower than compiled c++.
@puppergump4117 Рік тому ⁺¹
@@matrix01234567899 Not just the cpu's, but the compilers as well. There's a limit to what a cpu by itself can optimize, but if everything is lined up for it perfectly it can eat at the code like no other
@teeesen Рік тому ⁺¹
This is not how I remember the 1980s. I wrote in assembly in exactly two circumstances. First for learning assembly programming (CDC 7600 and PDP-11). Second for programming a microcontroller (Intel 8048). Other than that, it was Fortran, Cobol, C, C++, Pascal, LISP, APL, Prolog, Turing, etc., etc. This was on mainframes, minis, workstations, and microcomputers. Understanding assembly was a great help in understanding how machines and compilers work. But actual use of assembly in university or industry was fairly rare.
@jbird4478 Рік тому
@@matrix01234567899 Almost certainly. In the past, optimizing assembly was mostly a matter of various tricks in arithmetic (like bit shifting instead of multiplying) and optimizing memory usage. Those things really do not matter anymore, and optimization is a matter of streamlining instructions so they can be better executed in parallel (even in single threads, processors can do multiple instructions simultaneously), optimizing memory usage not in terms of space but cache access, and various other aspects that can only be done by rigorous calculation and not by reasoning like in the old days.
@williamdrum9899 Рік тому ⁺¹
Assembly was mostly used for 80s computer games since you needed that extra speed
@JimCoder Рік тому ⁺¹
This was so much simpler in the 1980s. The .EXE was just a series of records, each with a memory address to which the machine code was to be loaded, a length of the record and the machine code to be loaded. I believe there was an entry point address too to indicate where the CPU should start executing. That was it. That's all that was necessary.
For a real challenge, analyze the .OBJ format. It's way WAY more complex.
I am a recovering bit twiddler. 😊
@ThunderBlastvideo Рік тому
This is a good introduction to computer architecture
@blainegwen4858 Рік тому
I wrote a lot in asm and hex a long time ago. This is a great video to see
@Janokins Рік тому ⁺⁴
Cool
I'd like to add that while you *can* get faster code by writing it in assembly, you should have some faith in the compiler, they're very smart these days. And I'm pretty sure the people who wrote them are smarter than me too.
@softwarelivre2389 Рік тому ⁺³²
Pretty cool. Would be nice to see what's inside of a .deb package as well. Pretty interesting stuff
@Aura_Mancer Рік тому ⁺¹²
Afaik that's a tar.xz with a different name? Maybe not exactly that, but it was a compressed tar of some sort.
@thepiratepeter4630 Рік тому ⁺¹⁹
The linux equivalent of an exe file is an elf file.
@tilsgee Рік тому
@@thepiratepeter4630 wait. So not .run / .sh file?
@adversemiller Рік тому ⁺¹⁶
@@tilsgee a .sh file is just a shell script, it's not a binary
@Username-xr5bx Рік тому ⁺⁷
It is just a Linux executable file packed with some meta data like the software repository of that executable
@billwall267 Рік тому ⁺¹
Interesting to note that the PE executable format is almost identical to the Unix COFF format, which is the predecessor to the modern ELF format used in Linux and many other operating systems today. PE is in fact sometimes known as PE/COFF.
@stickguy9109 Рік тому
This video is criminally underrated
@tf_d Рік тому ⁺⁷
2:38 "Yes, Windows is just DOS at the core still."
This isn't true in any sense. The MZ followed by the "This program cannot be run in DOS mode" Is there strictly for compatibility and does not affect the function of the program whatsoever.
@ravhi1000 Рік тому
What if you delete those MZ header manually? It still running right?
@tf_d Рік тому
@@ravhi1000 You have to do some additional configuring, but yeah, it can be removed.
@locobob 8 місяців тому
For everyone correcting the video saying that “windows is no longer based on DOS”: true, but, he didn’t say windows is based on DOS. He said the EXE file format hasn’t changed since the DOS days.
@UFO_researcher Рік тому ⁺⁷
An exe file is actually an archive format similar to .apk or .zip, this is demonstrated by opening a .exe file on a linux filesystem, it will display the contents as if it were an archive, and there you will find the icon, binary, etc.
@fllthdcrb Рік тому ⁺³
Opening it in what? Different tools will do different things with it. If you cat it on the command line, you'll see what looks like garbage (and likely also mess up terminal settings), because it's binary data. There aren't any core *nix tools I'm aware of that analyze .exe files, aside from the superficial analysis done by the "file" command (whose purpose is to identify many different types of files based on their content).
@UFO_researcher Рік тому
@@fllthdcrb I don't know, I just double clicked in ubuntu.
@fllthdcrb Рік тому ⁺⁴
@@UFO_researcher So it's one of the graphical file managers. Well, I figured that much. It's almost certainly based on a file association, which can change if, for instance, you install new applications. If you were to install Wine, for instance, double-clicking might allow you to run such executables instead of just analyzing them. The point is, you can't assume everyone has the same setup. Just talking about "opening" a file, by itself, isn't as helpful as you assume. You could, however, try to see the name of the application that opens.
@龗 Рік тому ⁺¹
im pretty sure that was an abstraction
@ieatthighs Рік тому ⁺²
you dont know what you are talking about
@michaelbyron1166 Рік тому
Ahhhhh.. the Altair 8800 .I remember those days. Simple and direct.
@brnsl420 8 місяців тому
Got this reccomended, i currently work on a presentation about how a computer Works. I think this May be a good visualisation to Show the difference between "Code types"
@matrix01234567899 Рік тому ⁺⁷
If you create a console application, it works by sending text by streams (stdin, stdout i stderr) and rendering console window is done by operating system (typically by conhost.exe).
Also disasembly dont show you what is inside exe file, but inside RAM.
If you really want to see what is inside this exe, open it using 7-zip
@ufufuawa401 Рік тому ⁺¹
> Also disasembly dont show you what is inside exe file, but inside RAM.
There's two type of analysis, static and dynamic.
A disassembler could be produce fake disassembly because code may changed at runtime.
> If you really want to see what is inside this exe, open it using 7-zip
7-zip is not for analyzing executable
@UFO_researcher Рік тому ⁺¹
Yes, I forgot about the legendary 7-zip.
@matrix01234567899 Рік тому
@@ufufuawa401 I meant this dissasembly in Visual Studio he used on video
No, 7-zip isn't, but it is still better than just guessing
@Siissioe123 Рік тому ⁺²
1:07 C is high level. No, it isn't if you relate it to assembly or machine code it is, but in fact, is a low level programming lenguage
@MaxCE 9 місяців тому
in terms of operating systems it's high level
@Siissioe123 9 місяців тому
@@MaxCE Yes, it is. It’s low level compared to python, and it’s high level compared to operating systems. We’re both right
@hugoboyce9648 Рік тому ⁺²
Neat video! Now i'm looking for one that explains exactly the same thing but for Linux machines ^^
@rursus8354 Рік тому
Your knowledge is excellent, but I'll have to pinpoint that there's a hole in your story: 1. first: an exe is a file, 2. yes, it is contains a header, and the machine code, 997: but then the multitude of exe formats? The hole is how the operating system starts the program: 3a. first the OS gets a command to start the program, 3b. it loads the program into memory, then it finds the addresses in the header, and translates those to physical addresses (more or less relocation, and similar add5ess translations), 3c. it looks up the program requirements of libraries (DLL:s), investigates whether those are loaded into the memory, if not loads them into memory, and then find the correct physical addresses of those DLL:s, and writes those into the program at appropriate locations, 4. it finds the program entry and start executing machine code from there. The exe file variants emerge from there.
@dread1089 Рік тому
Really interesting, looking forward for more.
@leowribeiro Рік тому ⁺¹
Short answer: "A bunch of 1s and 0s that represent micro processor instructions", simple as that.
@smc415 2 місяці тому ⁺¹
Slightly less short answer: "A bunch of 1s and 0s that represent micro processor instructions, that also aren't allowed to run in DOS mode"
@leowribeiro 2 місяці тому
@@smc415 Touché my friend.
@GameCyborgCh Рік тому
creates programs that adds 2 hard coded numbers together, gets an executable that wouldn't fit on a NES cartridge. What a time we live in
@MotownBatman Рік тому
New Sub! Dryden, Michigan
I solely sub'd for your Effortless ADHD Transition at "4:09"
About to turn 40, never treated for my extreme adhd growing up, That is exactly how I learned PCs in the 80/90s.
Learning how to replace the SOL.EXE icon in Windows 3.0 MME somehow turns into finding EVERY MsDos manual to teach myself QBasic within the same 60min lol
@laenprogrammation Рік тому
Just a little thing about the conclusion : most compilers optimize code way beyond our level of knowledge. To handicraft an assembly code which is actually faster requires extensive knowledge of the targeted instruction(s) set(s), so the best way to optimise is actually to ask the compiler to do it and then maybe optimise the generated code. Great video though
@jozsiolah1435 Рік тому
Lots of secrets are in these files, still unexplored. For example, I found a secret in chgcolor, that is a monitor driver for Dos. When all colors are defined by the user’s decision, games may have strange colours. When the reset is chosen then a restart, the b/w laptops can recognize 14 colours instead of the default 6. 2 colours remain missing, the lcd doesn’t recognize it in Dos mode. The 256 colour games will look much better, and more details can be seen using 256 kb video ram. Windows users need the wdl disks, 16 grays driver can recognize 16 different colours in Windows.
@Justinjaro Рік тому ⁺¹
Awesome video!
@maxmuster7003 Рік тому ⁺⁵
DOS executable com files do not need more than mashine code, but the file size is limited to 64 kb.
@Bunny99s Рік тому ⁺²
Right, com files did not have any header whatsoever. However they were always 16 DOS applications, so no longer supported on 64 bit systems. WinXP (32 bit version) did still support the execution of Dos and com filse.
@matrix01234567899 Рік тому ⁺¹
@@Bunny99s all 32 bit windows supported running 16-bit coms. 32-bit windows 10 supports it, but 64-bit windows xp not
@Bunny99s Рік тому ⁺¹
@@matrix01234567899 Yes, that's true, but running Win 10 on a 32 bit machine would be pure madness ^^. 32 Bit systems can only address 4GB of ram. That's nowdays barely an option anymore. Windows alone would chew that up :D
But yes, you're right. Almost all 32 and 64 bit CPUs (with the exception of AMD Ryzen) when running in 32 bit mode do still support the virtual 8086 mode. Though how well the support is depends on the actual application. Certain exotic hardware stuff may break old code. The best solution is usually to just use DOSBox and emulate a machine.
@matrix01234567899 Рік тому ⁺¹
@@Bunny99s On win10 even less than 4GB is not madness if you don't run webbrowser or other modern demanding software, OS itself (even win11) is ok with this amount of RAM.
To be correct, it is decision made by microsoft, that they stopped supporting 16-bit apps on 64-bit system, CPU itself don't block this option. When running 32 bit apps on 64 bit OS, or 16-bit apps on 32 bit os CPU change modes many many times a second, when OS do context switching.
@declanmoore Рік тому ⁺¹
PEs don't really do fat binaries (since the header only allows specifying one machine type), the portable just meaning that the format itself is professor agnostic
@GameInterest Рік тому ⁺²
Mac file forks are awesome in the way they work to get around this. Or at least they used to. I haven't messed around since OS 7.5 really. Also, Visual Basic 4.5 was the only Visual Basic to include a compile to .exe built in.
The more you know 🌈
@floorpizza8074 Рік тому ⁺¹
Hey! Nice to see a fellow old Mac fan. I programmed a bit on pre-OSX Mac operating systems starting at System 7.0, and ending with the release of OSX. And yes, I completely agree... the Mac Resource fork and Data Fork paradigm was *amazing* and way ahead of its time. Remember using ResEdit??? :D Gooood times!
Unfortunately, Resource/Data forks as they existed pre-OSX aren't implemented in OSX.
@InssiAjaton 8 місяців тому
I remember days, when we had just .COM files (CP/M era!). Then that became too limiting, being basically tied to just one hardware. And too small. So, .EXE was introduced, to allow choices in linking. Then more and more libraries to be linked, until the different versions of the .EXE were required. I essentially stopped bothering after MS-DOS 6.2. Still have Microsoft Macro Assembler 5.0, though.
@royz_1 Рік тому
Fascinating stuff.. thanks for sharing!
@SilverBullet93GT Рік тому ⁺⁵
all my EXEs live in TXTas
@moccaloto Рік тому
Remember the good old COM files for dos? 64kb of raw machine code with no header
@emilyow7956 Рік тому ⁺³
Python is indeed somewhat "compiled" before being interpreted.
@Nick12_45 9 місяців тому ⁺²
3:55 the captions say toes
@rubabmubarrat Рік тому
youtube : What's inside a .exe file?
_Me at 3 am: Lets find out!_
@drakopensulo Рік тому
You may be interested in "A smallest PE executable (x64) with every byte executed" . It has only 268 bytes.
@jbird4478 Рік тому
Apart from assembly really not being needed for performance anymore, most programmers will also fail at attempts to write assembly by hand that would outperform the optimizations done by modern compilers. Outside of embedded work, there really is no reason to write anything in assembly anymore. What's added to your EXE in this video is the C++ runtime code. You get a version tailored to console usage, but none of that is needed to run your sample program. If you would use an assembler (like NASM) instead of Visual Code, it will run just fine using only your code translated to machine language (well, and the PE header)
@Grendal62 Рік тому
HAHA! the ending,😂 very interesting i had no idea about decompiling, now i'm down the rabbit hole
@m0ment219 Рік тому ⁺³
Next time someone in Germany sneezes, I won't say "Gesundheit" but "$A9 $38 $8D $00".
@MangoNutella Рік тому
😂
@ChrisM541 Рік тому ⁺³
Heads up - no one...NO ONE, EVER, wrote programs in machine code. Why? simple - each instruction code's mnemonic was known at the CPU design stage, and remember, those mnemonics (and operands) formed the 1:1 machine code:assembly language instruction set. Assembly language = 1:1 human-readable (mnemonic) version of machine code CPU instructions and operands. Understand that assembly language mnemonics were constructed at the same time as the instruction set was constructed - the designers never, ever expected programmers to memorise the numeric equivalents when those much, much easier to remember mnemonics were also available.
So, in those 'old days', every machine code program was actually written on paper in assembly language, and beside each assembly code line, the equivalent machine code instruction/opcode was written. Hand-written labels for branches/jumping/data was also used, obviously. Then, when it came to the time when the program would be entered into the computer, that was when the machine code equivalent was used...entering in all those numbers.
I know this because I used to do it many years ago, and if you think about it, it makes 100% sense ;)
@palmercolson7037 Рік тому
So, what was the first assembler written in? It had to be written in machine code. The programs for the first computers were written in machine code with Assembly language developed later on because machine code was to difficult to write long programs in. A prime example was the Univac computer. Programs were written in machine code in the 1950s and no assembler was created until 1960.
@ChrisM541 Рік тому ⁺¹
@@palmercolson7037 You have failed to comprehend 100% of my post - that's quite an impressive achievement.
Your reply is somewhat confused...
"The programs for the first computers were written in machine code with Assembly language developed later on because machine code was to difficult to write long programs in. "
--> It's the other way round! Assembly languages are orders of magnitude 'easier' to write in than pure machine code, for self-evident reasons.
I'm also not talking about feeding assembly language into those early computers.
You also asked "So, what were the first assemblers written in?" My post perfectly answers this (hint: assembly language)...
Please re-read my post. Google anything your not sure about.
@jbird4478 Рік тому ⁺¹
@@ChrisM541 The very first assemblers were humans (of the female kind oddly enough). You are correct though. They would use tables on paper to correlate numbers to the mnemonics written on paper; basically the same thing as what assemblers still do.
@amghd5 4 місяці тому ⁺¹
a thing that windows s mode can't run but windows e mode , windows and wine can run
@SonicEXEProductions Рік тому
Literally every .exe creepypasta: main character evil lol
Me after watching this: wait, it’s all just data?
Other .exe files: always had been
@sofiaknyazeva Рік тому
Do note that writing in ASM will make faster is just a myth. Most hand written ASM isn't as efficient as the programmer might wants. It's not portable either. And after all you still need to rely on linkers to make the ASM code to object code which might have some overhead performance issues. Most modern linkers are extremely powerful nowadays, however, in general ASM isn't usable because of sacrificing portability and just a burden to the programmer.
Meanwhile, C and C++ compilers have gone so fast that it does beat hand written ASM and those old NASM linkers. People also might argue about the binary size, but this isn't 90's era, and having 2 to 4 TB disk space is normal nowadays, where the binary will just be under 500 KB (without stripping debug info).
If you, for some reason want to write ASM, most better approach is to embed ASM inside a C or C++ program (inline ASM). However, only if you know what you're doing, as it might not be the best possible way to achieve performance.
@ivirius.parody Рік тому ⁺¹
Actually, it all depends on the platform
@tomrow32 Рік тому ⁺¹
All Windows versions from XP onwards are not based on DOS. However, since Windows 95, 98 and Me are run on DOS, This "This program cannot be run in DOS mode." error handler needs to be there to stop DOS from trying to run invalid code and crashing. This error handler even exists in UEFI boot files, since they are based on the PE format.
Another strange similarity UEFI has to Windows is part of the EFI shell many PCs have built-in, to troubleshoot and provide basic functionality when the PC has no operating system installed.
If you type a command incorrectly in the EFI shell, often the error message that appears is a near verbatim copy of the error you get in Windows' legacy CMD command prompt, "???? is not recognized as an internal or external command, operable program or script file."
@HtcMega Рік тому ⁺¹
How much i read this "This program cannot be run in the DOS Mode" 😂
@test-rj2vl Рік тому ⁺¹
With right compiler settings I managed to get exe size to 2kb and on linux 0.5 kb binary. so yes, you have a lot of noise there. I don't think anyone writes exes manually however modifying exe with hex editor is not uncommon.
@AjinkyaMahajan Рік тому ⁺³
Interesting video.
Thanks
@ccmps Рік тому
To clear windows x DOS versions, this is a resume (from wiki)
Windows: Windows 1.0, 2.0, 3.x, 4.x (95, 98, Me) - boots DOS before Windows
Windows_NT: 3.x, 4.x, 2000, XP, Vista, 7, 8.x, 10, 11 - boots straight into Windows. It does not contain any DOS code, save perhaps in the NTVDM component. The notion that Windows_NT has any DOS code at it's core is simply not true.
@zzco Рік тому ⁺¹
"Yes, Windows is still just DOS at the core." -- no. That's just backwards COMPATIBILITY with MS-DOS.
That doesn't mean it still IS MS-DOS at its core.
@fuh_koff Рік тому
bro imagine having to putting every byte of data into your code. thank god I wasnt alive trying to this back then smh.
@eldorado3523 Рік тому ⁺¹
You're forgetting that while running stuff on an operating system you are never really programming the CPU directly, so you'll never be able to create your program without all the OS related fluff for it to work in that environment.
@bengt-goranpersson5125 Рік тому
4:00 "...I don't know 64-bit assembly so I don think that I'll be [hard cut] So first I had to get familiar with x64 assembly ..."
Gave me a good laugh. :)
@brunoruchiga22 Рік тому ⁺¹
great video!
@louisrobitaille5810 Рік тому
4:11 I'm glad I learned Assembly in college and didn't have to search stuff online on my own the first time I learned it 😅. It's been ~5 years since I've last touched an assembly program (in Linux too, not even Windows 🥲) and I can't even find anything remotely close to the pdfs and ppts that were shared in the class. I regret not saving all the documents they gave us somewhere on my computer 😫. Good job figuring out how to make it work, but I noticed that your code looks very different from what I learned 🤔. Probably because ASM Linux and Windows are that different 🤷‍♂️.
@LoFiAxolotl Рік тому
last time i concerned myself with .exe files was when security in games was so bad that you could just add a jump command and crack the game...
@arkiromi_is_awsome216 Рік тому ⁺¹
inside of a .exe file is a demonic hedgehog demon
@Ilovecheems891 11 місяців тому
Bruh
@arkiromi_is_awsome216 10 місяців тому
its a joke
@smc415 2 місяці тому ⁺¹
"demonic hedgehog demon" implies that there are also non-demonic hedgehog demons, meaning there is such a thing as a non-demonic demon
@MLGJuggernautgaming 9 місяців тому
Outside of OS differences, the main difference between any binaries would be the intended architecture. I’m sure it’s more complicated but that’s the basic difference, so fat binaries aren’t very common.
@vladislavkaras491 6 місяців тому
Thanks for the video!
@Noritoshi-r8m 7 місяців тому
Great video. Talking about super low level code, have you seen how the game Roller Coaster Tycoon was made in assembly? Could do a great video.
@yumyum7196 8 місяців тому
The exe (PE or PE+) may not have any machine (unmanaged) code in it actually. It could have zero machine code and instead have intermediary language (IL) code that targets the common language runtime (CLR) converting it into managed code that gets compiled just in time.
@redstonewizard08 8 місяців тому
Iirc C and C++ dont actually use ASM as an intermediary step. Most compilers like GCC and Clang will translate it into their respective IR which is then processed by the backend (libgcc or llvm) and then transformed directly into an object. I think MSVC does this too. Don't quote me on that, but iirc thats how it works.
@desupernoodle Рік тому
Its funny to think that Windows 10 is just built up on DOS. It's like a turtle trying to carry a skyscraper.
@Cyberfishofant Рік тому
it's not actually. windows 10/11 arw both based on NT
@realGamebreaker 4 місяці тому
That „Gesundheit“ got me laughing
@Fine_Mouche Рік тому ⁺¹
5:10 : the size of the .exe is it also related to the cluster size of the file system or not ?
@ltecheroffical Рік тому
also exes contain some metadata like if you were to assign a icon that's in the exe, assign a description of the program that's in the exe and other
@awesomereview2358 8 місяців тому
You’re mostly right, but you failed to mention how most exe need to talk to dynamic link library files, or DLLs for specific functions or functionality, and these files cannot be ran in win 32 mode so in reality exe and dynamic link, library files often work in conjunction
@BeginningTry3200 Рік тому
i subscribed just for the end

Наступне

Автоматичне відтворення