I started designing a 64-bit architecture about 6 months ago. This is a culmination of nearly 40 years of experience programming many different processors at the assembly and C language levels as well as 20 years of electronic design. Last year I bought a large FPGA developer's kit and I plan to implement it using Verilog. This design is very innovative in many ways and I am hoping that I haven't bitten off more than I can chew. It is a CISC processor that is more orthogonal than the AMD64 and even ARM because it will not have the severe restrictions on the sizes of immediate values and offsets that ARM has. Using state machines instead of microcoding it should get better performance than many other CISC processors from the past and should get 2X to 4X better code densities than AArch64 (i.e ARM64) while having twice the number of GPR's. It might even be the only 64-bit CISC processor ever designed.
Outstanding video! I designed an instruction set that uses 20-bit instructions, which includes 4 bits for the opcode, and 16 for the address or register arguments. Yes, that means only 16 instructions, but I do have a 1-position "stack", and do my register and other transfers by pushing and popping the stack. Mine are: NOP, ADD, AND, NOT, OR, SHR, SUB, XOR, LDA#, PSHr, POPr, RDA, STA, JC, JN & JZ. I don't have a CMP (compare), but that's nothing but a subtraction anyway, except it doesn't keep the answer, just the flags. I have a shift-right (SHR), but no shift left. For that, all you have to do is have the desired number to shift in A, then PSH A, Pop B, and ADD (ADD adds A & B, with result in A). Thus doubling A, which effectively shifts the bits one to the left. All good wishes!
4 bits for the opcode don't imply 16 operations because you can use other bits of the instructions in some cases, too. Read the book of Hennesy and Patterson about MIPS processor. MIPS professional chips use 6 bits of opcode and have more than 70 operations. This ISA is really a waste of bits but i thinks this is because the next videos go to explain how to make a compiler and emulator. This approach easy the things from the programmers point of view.
@clintonowino2619 No. I'm talking about MIPS ISA. MIPS R2000 use fixed-lenght instructions. I agree with you about simplicity but is not efficient. Read the last phrase of my post. Thanks.
Thats on of the main differences between CS and CE. In CS you mainly learn how to programm in a variety of languages. In CE you learn 1-2 languages to get the basics and then dive to how the hardware works to get a better understanding of it. As Im a CE student and have many friends in CS its very interesting to see the differences.
@@andrew7720 depends on the CS program. I'm CS and in machine level programming right now and we take an intro to digital logic design and a follow up computer architecture course. But yes they are different degrees obviously 😄
U H89 I could disagree with you more. In fact, the one of the most brilliant minds of computer science, Donald Knuth, invented a hardware neutral assembler to teach algorithms Mix and MMIX: en.wikipedia.org/wiki/MMIX
@@esra_erimez basically you can invent everything logical ... but the problem is: can you get it in efficient hardware? Is it cheap enough, with fewest complexity as possible, is it energy efficient, ... Basically your hardware design is somehow related to physics, (or what is possible on die), right?
@@U_H89 I see your point, but this is educational and I think most people will gain a lot of value from Gary's series. Exposure to op codes and mnemonics might spark the next Donald Knuth. If Gary tries to mass produce silicon, then I agree, he must be stopped :-)
What a cool idea! I would never have come up with it: Using a scripting language to build a compiler, let alone a CPU with your own ISA running byte code. This makes it really accessible. I'm looking forward for the other two videos!
This is pure gold to come up for my search since I’m literally designing a fundamentally different processor and processing philosophy right now for non-arithmetic flexile bit-count based (stronger) AI.
Don't think I've ever been so excited for a video series, would love to see you do a course that goes in depth with the core of computer science. Like this is a fantastic step but would love to see this expanded to further include the basics of a simple CLI operating system written for our own architecture, would happily pay for that sort of knowledge.
Creating a CPU instruction set has been done back in the 1970s minicomputer (midrange system) era. The Pick operating system from this era is a machine-independent operating system and database management system with its own virtual CPU instruction set and virtual Pick assembly language. The Pick operating system was ported to computers as large as an IBM mainframe to as small as a microcomputer (PC).
I want to program am fpga to act like an fpga, and i wonder how many times i can nest this until the hardware just cant keep up and becomes unbearably slow.
Nice video, Every programmer need to know Assembly and what actually goes on inside, so many programmers today do things like 246 * 2 were older programmers know that 246
John Cochran I know that modern compilers do a great job these days, was using it as a simple example but you can get to situations were one could structure their code with how the CPU works to produce a greater optimised result, also one may be using a Z80 c compiler were it’s not very good so still an advantage for one to understand Assembly as it makes s programmer think different to other none Assembly experiences programmers
@@insoft_uk Definitely it's quite useful to know assembly. Preferably more than one, with radically different architectures. I myself have used Z80, 6502, 680x0, and S/370. Knowledge of those languages and architectures have in my opinion made me a better programmer and quite aware of portability issues. But my own experience has also taught me that going straight to assembly is generally a waste of time. One example that comes to mind is a program I wrote many years ago where I figured that one section that would be used frequently was going to be the bottleneck, so I wrote that section in assembly and spent quite a bit of time optimizing it to remove every possible clock cycle (rest of program was in C). When I ran it, the program took a lot more time than expected. So I optimized that section of code even more and the program was still quite slow. I did a bit of searching and found a profiler for my system and ran that against my program. In doing so, I actually managed to find the true hotspot in my program that was taking so much time. Turned out to be the malloc() function in the C library. I disassembled that section of object code from the library, analyzed it, the replaced it with my own code that used a far more efficient algorithm. That one library function change caused my program to run in only 20% of the time it was running originally. That experience definitely taught me a lesson. Namely, attempting to predict a priori the hotspot of a program is an exercise in futility and it's far better to simply write the program in your language of choice. If the resulting execution time is acceptable, then you're done. If not, THEN run a profiler against the program to determine the true hotspot. Once that is known, check if a different algorithm can be used (changing the Big O performance of a routine has a much larger effect on program performance than any micro optimization ever will). Only after you've done that should you revert to assembly. In a nutshell, the real areas for improvements is selecting a better algorithm instead of merely micro optimizing the code you have. And current compilers are extremely good at micro optimizations. And current compilers are incapable of determining better algorithms for your code. So focus your time where it is better utilized.
John Cochran I agree, I always avoid Assembly myself but find it makes me think what’s involved for the CPU, I think it’s very useful that most modern new programmers could do with, nice to see the Bender’s brain 6502 in the list 👍, one CPU I would of like in the days to program but was a speccy user then a 68K then x86 yes I went to hell cast out of 68K heaven
Looks great !!! I did same thing, but it was more of a 32-bit register-based CPU. I have implemented Bit Shift, multiplication and Division. Including jump instruction. Language: C++ [std=11]
I'm watching because gary is awesome, although I don't know about programming I'm a fully beginner but hey gary you are doing alot of effort so I love you 🥰❤️😍😘
Can't wait for the next videos in this series; thank for you the elucidation Gary! Also, you missed to a great opportunity at 08:50 to say Eff Eff Eff Eff Eff Eff Eff Eff Garry Linneker :D
Keep up the good work. You have a great talent for explaining, and I'm sure your channel is going to grow a lot in short order. :-) One comment though: You've made the unstated assumption to make a big-endian CPU (which probably makes a lot of sense for an ISA with the purpose of being easy to decode by hand). You might have explicitly stated that in the video. (Or it might have just added confusion.) I'm sure you're going to mention that at some later stage where you're actually implementing it.
I am glad you like the channel. 😊 It is true, I did pick the endian-ness of the ISA without comment. I will make sure to talk about that at some point.
This reminds me of when I designed an ALU in Minecraft way long ago. The actual file has been lost to time and all I have is a poor quality screenshot. Watching this kind of makes me want to revive that based on what I can remember of my original plans. From what I do remember, it was significantly more complex than any person's first ISA should be.
As per your other comment. The Xerox Alto was a unique computer. That isn't how CPUs work today. While microcode still exists in some CPUs (mainly Intel), it isn't something that is exposed directly to programmers. You might find my video on RISC vs CISC enlightening about microcode: ua-cam.com/video/g16wZWKcao4/v-deo.html
@@GaryExplains I'm glad it's not fallen off the list. I've made a Ben Eater SAP1 computer and wanted to make an assembler for it. Your video was the top result on Google, so that's exciting!
Since it is an 8-bit architecture, then you can't directly load or store a 16-bit word. To load or store a 16-bit word, you split it into two 8-bit bytes called low and high.
There is one aspect of computer architecture that his been given short shrift. It is operant reference modes. In particulars with the IBM 360 and successor generation, the base plus displacement, with its puny 4096 value with no indirect addressing options. Addressbility for operands in established thru on of sixteen 32 bit general purpose registers. The range span of a single operand is the value in the register indexed by the displacement. The total range of addressibility at one time is number of available registers available plus each displacement maximum size. One has heard of the concept of disk drive thrashing for virtual memory systems in which real memory size is limited compared to the virtual memory space configuration. I assert the the IBM base plus displace the design has the potential to lead to address reference thrashing for very large unsegmented procedures have all instruction memory consumed by storage of base address values and load from memory to register instruction! Some of ways of workarounds would be the creation of an interpreted meta machine language that would do nothing more than artificially create operand pseudo architecture that would provide a base address reference addressibity bank and indirect addressing mode both pre and post indexing options. The performance hit would be massive, of course. For quantum computer architectures with near infinite storage size capability, it is critical that attention be paid to operand access design! I do not current have the ability and train to work within the realm of the higher hieroglyphics of abstract mathematics to construct proof of this assertion!
So you should what some instructions for the CPU could be. But how would you tell the CPU that when it sees this number, it is an instruction, and then what that instruction does? I'm looking to make an 8 bit CPU out of discreet components. How would I setup opcodes for a CPU like that?
You don't "define"opcodes in this case. The inner structure of the cpu, how it is build up logically, determines the functions and therefor the opcodes.
Hi there! I believe instruction to run is decoded by certain control logic (decoder). So to build a computer it is needed to build a certain control logic ( I think you have to encode your certain opcode and other parts of instruction). I have watched some video about the stuff: Core Dumped, CRAFTING A CPU TO RUN PROGRAMS. Hope will help.
@@GaryExplainssorry sir if you get fed up with my questions but if i am time to something then i want full understanding AND because i didnt get that how we will type instructio set in what langauge we willl write our instruction set or were we will write means in notepad or how to write my assembly language and in how means in notepad and how to make assembler for my own assembly language and where i will write the assembler in notepad or other editor so these questions are making me confuse and again sorry if you get fed of
@@GaryExplains i realized that, and i know people like them, lol it supposed to be a compliment for a good video... Sorry for the grant sir! Your content is awesome and im a big fan and a programmer, yet in day to day i write lots of backend and i kinda started to look into assembly and aida and reverse. So this video was great refresh for me, many thanks... For ALL you do even if i personally dont care much about speed tests...
Start your online course, you are so good at low level programming. Please Start it. I'm ready to buy your course and also suggest some books on the topic of each video you upload. 🙏
It doesn't. A CPU only understands binary. An assembler or compiler can understand whatever numbers in whatever notation in whatever base that the assembler writer wants.
Art school dropout here! I just found your channel and I love it. I'm working on a project but I'm kind of stuck... stuck in the sense of wanting to make the project far more complicated than needed haha. Ok. I'm taking a MIDI signal, breaking each note out into its own channel, then sending those channels to a DAC. From there I can control the voltages of my eurorack. I'm using an Arduino to do the logic (MIDI to notes to channels) which is all very straightforward. My question is, is there a way to setup a custom CPU with my own CPU instructions that handles nothing but the logic without needing the C/C++ libraries and all the Arduino overhead? I mean, I know there's a way... but what kind of CPU would I use? ARM? Atmega? Something smaller designed for fast simple logic? CPU instructions are pretty new to me. Had I picked engineering like a sane person (instead of the Arts), I suspect I would already know the answer. Also: if we take the most commonly used instructions and map them to commands that are mostly zeros, could that in-theory keep the chip cooler?
Great video, but I do have question. You say that the arch is 16bit, with 65536 of addressable memory. But what is the word length of the memory? is it 16bit or 8bit? Is your memory 8*65536=524288 bits long or 16*65536=1048576 bits long? In short, what does op 0x02rrabcd actually do when the register is 16 bits? Does it load from address $0xabcd to a register, which becomes 0x00yy or does it load from addresses $0xabcd,$0xabcd+1 to a register, which becomes 0xyyzz or does it load 16 bits from address $0xabcd to a register, which becomes 0xyyyy? Thanks for explaining.
Hey Gary is there a way to make your own x86 CPU run with ARM instructions? Lets say a FX-8370 or i5-6600K just to see how bad the performance would degrade and which programs would be a no go? :D
I'd suggest not using 0x00 as a LOAD opcode. Reserve 0x00 as NOP or similar because buffer overflows, null terminated strings and the fun to be had with that.
That would only be the case if code and data aren't separated. Harvard architecture separates instruction- from data memory and thus prevents such things from happening. Even strictly von-Neumann architectures such as Intel x86 or AMD64 feature NX-bits marking data as non-executable. Now this still won't prevent buffer overflows, but that's mainly because out-of-bounds memory access is not related to this problem.
Hey Gary! Big fan! Please please make a video on do task killer apps work? Do we need apps like Ram booster or can a processor handle all this by itself?
Here's my answer to the bloat of Intel, ARM, Risc V and even Agner Fogg's efforts... This is a very quick rough draft brainstorm so not written up properly and probably missing a few things. Maps to a single core X86 + rolled out vector operation loops + virtual machine all the way up to a fully blown, highly integrated APU.... 2 byte instructions.. The 'immediate values' are confusing because they are preloaded into registers after a procedure call. Only im8 is a true inline immediate value. The compiler would deal with this pseudo-immedite value referencing..... CISC on RISC, highly extendable yet very compact... Typed registers and virtual registers with automatic type conversion. rg7: Register 1..128, st7: Stack 1..128, iv7: 'immediate' value (ref) 1..128, ip7: 'immediate' pointer (ref) 1..128, im8: immediate byte value Data Types: x16 b8, b16, b32, b64, b128, b256 i8, i16, i32, i64, i128, i256 f32, f64, f128, f256 Registers: x128 .. each type has 8x registers (ex. b32_1, b32[1], b32[b8_2]) Stacks: x128 .. each type has 8x 2^1..8 sized vector stacks (ex. b32x4, b32x8[2], b32x16[b8_1], b32[1,2], b32[b8_1, b8_2]) Immediate Values: a method's immediates are pre+parallel loaded into a register file with 8 registers for each type (16 types) and 128 address pointers Virtual Registers: A,B,C,D .. A op (B) (op C) (op D) into A | D PA, PB, PC, PD .. memory pointers Instructions: 2 byte op codes // 59 // total: 232 16, assign P(A|B|C|D) to rg7|iv7|ip7|st7 1, load|save|copy from|to * P(A|B|C|D) * P(A|B|C|D) 4, load|save rg7 * at PA|PB|PC|PD 8, load|save st7 * push/pop? * at PA|PB|PC|PD 16 copy rg7|iv7|ip7|st7 * into P(A|B|C|D) 4, copy im8 to A|B|C|D 8, push|pop st7 * into|from P(A|B|C|D) 1, push im8 to main stack 2, push|pop main stack to/from rg7|st7 1, push|pop P(A|B|C|D) to/from main stack // 28 8, add|sub|mul|div A * st7 * pop? * into A|D 4, add|sub|mul|div A * rg7 * into A|D 4, add|sub|mul|div A * iv7 * into A|D 4, add|sub|mul|div A * ip7 * into A|D 8, add|sub|mul|div A * im8 * into A|D // 16 2, and A * st7 * pop? * into A|D 2, or A * st7 * pop? * into A|D 2, xor A * st7 * pop? * into A|D 3, and A * rg7|iv7|ip7 into A|D 3, or A * rg7|iv7|ip7 into A|D 3, xor A * rg7|iv7|ip7 into A|D 1, and|or|xor * P(A|B|C|D) * P(A|B|C|D) // 12 1, shift L|R * A|B|C|D by A|B|C|D | iv3 2, shift L|R * st7 * pop? 1, shift L|R * rg7 2, shift L|R * im8 1, rotate L|R * A|B|C|D by A|B|C|D | iv3 2, rotate L|R * st7 * pop? 1, rotate L|R * rg7 2, rotate L|R * im8 // 87 36, do if A|B|C|D * ge|g|e|le|l|ne * rg7|iv7|ip7 24, do if A|B|C|D * ge|g|e|le|l|ne * pop? * st7 24, do if A|B|C|D * ge|g|e|le|l|ne * im8 2, do if P(A|B|C|D) * ge|g|e|le|l|ne * P(A|B|C|D) 1, do if not * Z|S|O|C|Cached // 18 1, jump if Z|S|O|C * to|by * P(A|B|C|D) 3, jump by i8|(forward|back) by b8 4, jump to|by rg7|imm 2, jump to|by st7 * pop? 2, jump to|by P(A|B|C|D) * plus A|B|C|D * shift ev3 1, jump by P(A|B|C|D) * shift ev5 1, call rg7|ip7 1, call st7 * pop? 1, call P(A|B|C|D) * plus A|B|C|D * shift ev3 1, call relative, jump by P(A|B|C|D) * shift ev5 // 12 1, tiny op * 256: no op, return 1 P(A|B|C|D) op Self into Self * 32 ops: not, neg 1 P(A|B|C|D) op Self into A * 32 ops 1 P(A|B|C|D) op Self into D * 32 ops 2, A op into A|D * 256 ops 2, A op B into A|D * 256 ops 2, A op B op C into A|D * 256 ops 2, A op B op C op D into A|D * 256 ops Only one opcode is followed by a variable number of bytes: 1, Register Immediate Value ir8, Immediate Value // Must be proceeded by a Call or another RIV instruction
What is this supposed to be I am confused... I mean you shows us what you want to make but you dont show us how you do it or why you do it... it is an equivalent of "design your own car" video where the tutor says stuff like " I want to have an engine then I want to connect the engine to the wheels and the steering wheel" well obviously... "How" and "why" is what matters though. e.g what sorts of tolerances are needed for the axels how you will connect them to the engines out put what is the design of the engine why where those decisions in its design made etc.
I thought the video was about designing an instruction set and that is exactly what I show. The next video shows how to write an assembler for that instruction set and the code is in GitHub. What do you think is missing?
The more i learn about CPU architecture, the more i realise how silly complaining about the Technical Debt in PHP is. It's turtles all 0x00 the way 0x00 down.
Interesting idea. The problem is that you would then need to implement those instructions in hardware meaning you would need to merge CPU hardware and GPU hardware into the same design. The problem is that those are very two different things and getting them to work in the same design would be hard and ultimately inefficient and ineffective.
You should have explained all the weak points of OISC and why no computer manufacturer implemented it in real life. It is a computer system that is totally dependent on main memory access and the CPU will only run as fast as the main memory. You will need a huge cache to speed up the CPU.
Since the video doesn't suggest that Subleq is a viable option for real world processors, I didn't feel I needed to spell out the advantages or disadvantages. Subleq is purely an exercise in computer theory. Again, I thought that was obvious.
*making a *bad* processor isn't hard. I have been working on one and I am on like the 10th+ iteration. I am working on it down to the logic gates while trying to make it fast(ultimate goal is to run DOOM on it), and it is a whole another beast than what is shown here. You have to consider things like the efficiency of the ISA(how much data needs to be read to do any common computation, balancing act between how much data an instruction needs and how much compute it does), the slowest instruction(a basic RISC CPU's clock is decided by the slowest instruction excluding I/O), and how the ISA will work with pipe lining. And that is just the ISA, haven't even got to things like memory controller yet or floating point ISA extensions. My point being, that what he is doing is more like making a compiler for something like a esoteric languages. Once you move to advanced development, both are rabbit holes that no one person could fully understand.
Do it yourself... Noooo.... Copy and Paste.? Microsoft recently got to CISC-emulation in ARM. IF performance is not a criteria, only compatibility and "native"-support is needed, will it be possible to use that technology of microsoft to emulate graphics chips from AMD and Nvidia and Matrox and others.? IF the idea is remotely-plausible... who will - do it yourself.? Hmmmm... yoda is still bouncing around in starwars...
I started designing a 64-bit architecture about 6 months ago. This is a culmination of nearly 40 years of experience programming many different processors at the assembly and C language levels as well as 20 years of electronic design. Last year I bought a large FPGA developer's kit and I plan to implement it using Verilog. This design is very innovative in many ways and I am hoping that I haven't bitten off more than I can chew. It is a CISC processor that is more orthogonal than the AMD64 and even ARM because it will not have the severe restrictions on the sizes of immediate values and offsets that ARM has. Using state machines instead of microcoding it should get better performance than many other CISC processors from the past and should get 2X to 4X better code densities than AArch64 (i.e ARM64) while having twice the number of GPR's. It might even be the only 64-bit CISC processor ever designed.
Outstanding video! I designed an instruction set that uses 20-bit instructions, which includes 4 bits for the opcode, and 16 for the address or register arguments. Yes, that means only 16 instructions, but I do have a 1-position "stack", and do my register and other transfers by pushing and popping the stack. Mine are: NOP, ADD, AND, NOT, OR, SHR, SUB, XOR, LDA#, PSHr, POPr, RDA, STA, JC, JN & JZ. I don't have a CMP (compare), but that's nothing but a subtraction anyway, except it doesn't keep the answer, just the flags. I have a shift-right (SHR), but no shift left. For that, all you have to do is have the desired number to shift in A, then PSH A, Pop B, and ADD (ADD adds A & B, with result in A). Thus doubling A, which effectively shifts the bits one to the left. All good wishes!
4 bits for the opcode don't imply 16 operations because you can use other bits of the instructions in some cases, too. Read the book of Hennesy and Patterson about MIPS processor. MIPS professional chips use 6 bits of opcode and have more than 70 operations. This ISA is really a waste of bits but i thinks this is because the next videos go to explain how to make a compiler and emulator. This approach easy the things from the programmers point of view.
any source code available of your instruction set
@clintonowino2619 No. I'm talking about MIPS ISA. MIPS R2000 use fixed-lenght instructions. I agree with you about simplicity but is not efficient. Read the last phrase of my post. Thanks.
Great idea Gary. In Information Science you usually only learn how to DEVELOP your own programming language, not a CPU instruction set...
Thats on of the main differences between CS and CE. In CS you mainly learn how to programm in a variety of languages. In CE you learn 1-2 languages to get the basics and then dive to how the hardware works to get a better understanding of it. As Im a CE student and have many friends in CS its very interesting to see the differences.
@@andrew7720 depends on the CS program. I'm CS and in machine level programming right now and we take an intro to digital logic design and a follow up computer architecture course. But yes they are different degrees obviously 😄
U H89 I could disagree with you more. In fact, the one of the most brilliant minds of computer science, Donald Knuth, invented a hardware neutral assembler to teach algorithms Mix and MMIX: en.wikipedia.org/wiki/MMIX
@@esra_erimez basically you can invent everything logical ... but the problem is: can you get it in efficient hardware? Is it cheap enough, with fewest complexity as possible, is it energy efficient, ...
Basically your hardware design is somehow related to physics, (or what is possible on die), right?
@@U_H89 I see your point, but this is educational and I think most people will gain a lot of value from Gary's series. Exposure to op codes and mnemonics might spark the next Donald Knuth. If Gary tries to mass produce silicon, then I agree, he must be stopped :-)
What a cool idea! I would never have come up with it: Using a scripting language to build a compiler, let alone a CPU with your own ISA running byte code. This makes it really accessible. I'm looking forward for the other two videos!
This is pure gold to come up for my search since I’m literally designing a fundamentally different processor and processing philosophy right now for non-arithmetic flexile bit-count based (stronger) AI.
You want a Load based on register value as well
Example: Load from memory location R6 into R3.
Very useful.
I have a load, you need more face time
What a great little project. I wrote assembler in high school all those years ago, it was hard but fun. I'll have s go at this.
Also used a hex editor to hack into programs when I was a teen, it's all coming back to me.
Awesome video! You helped me understand CPU architecture for my exam. It's very interesting.
Watching this make me feel more and more I took for granted for every phone and every computer I used..
Don't think I've ever been so excited for a video series, would love to see you do a course that goes in depth with the core of computer science.
Like this is a fantastic step but would love to see this expanded to further include the basics of a simple CLI operating system written for our own architecture, would happily pay for that sort of knowledge.
First time watching a video of yours. Gary explains very well :)
Thanks for watching!
Creating a CPU instruction set has been done back in the 1970s minicomputer (midrange system) era. The Pick operating system from this era is a machine-independent operating system and database management system with its own virtual CPU instruction set and virtual Pick assembly language. The Pick operating system was ported to computers as large as an IBM mainframe to as small as a microcomputer (PC).
It took a while but it's finally here. Please do more of this kind of videos besides the next two in this series.
Im so hyped for your next epsisodes, loved this one already
gary made 2nd ep??
Keep going but pace yourself. You are doing a good job, be aware of burnout. Really love the videos. Thanks Gary.
Will there be Verilog and an FPGA too? 😀
I want to program am fpga to act like an fpga, and i wonder how many times i can nest this until the hardware just cant keep up and becomes unbearably slow.
Kendimi gurbette yurttaş görmüş gibi hissettim :)
Nice video,
Every programmer need to know Assembly and what actually goes on inside, so many programmers today do things like 246 * 2 were older programmers know that 246
I think that is the least of the problem, how about those 'apps' which are 100+ MB each to download on your smartphone. LOL
And for things like that, current optimizing compilers handle it automatically. You may write A * 2, and the compiler under the covers implements A
John Cochran I know that modern compilers do a great job these days, was using it as a simple example but you can get to situations were one could structure their code with how the CPU works to produce a greater optimised result, also one may be using a Z80 c compiler were it’s not very good so still an advantage for one to understand Assembly as it makes s programmer think different to other none Assembly experiences programmers
@@insoft_uk Definitely it's quite useful to know assembly. Preferably more than one, with radically different architectures. I myself have used Z80, 6502, 680x0, and S/370. Knowledge of those languages and architectures have in my opinion made me a better programmer and quite aware of portability issues. But my own experience has also taught me that going straight to assembly is generally a waste of time. One example that comes to mind is a program I wrote many years ago where I figured that one section that would be used frequently was going to be the bottleneck, so I wrote that section in assembly and spent quite a bit of time optimizing it to remove every possible clock cycle (rest of program was in C). When I ran it, the program took a lot more time than expected. So I optimized that section of code even more and the program was still quite slow. I did a bit of searching and found a profiler for my system and ran that against my program. In doing so, I actually managed to find the true hotspot in my program that was taking so much time. Turned out to be the malloc() function in the C library. I disassembled that section of object code from the library, analyzed it, the replaced it with my own code that used a far more efficient algorithm. That one library function change caused my program to run in only 20% of the time it was running originally. That experience definitely taught me a lesson. Namely, attempting to predict a priori the hotspot of a program is an exercise in futility and it's far better to simply write the program in your language of choice. If the resulting execution time is acceptable, then you're done. If not, THEN run a profiler against the program to determine the true hotspot. Once that is known, check if a different algorithm can be used (changing the Big O performance of a routine has a much larger effect on program performance than any micro optimization ever will). Only after you've done that should you revert to assembly.
In a nutshell, the real areas for improvements is selecting a better algorithm instead of merely micro optimizing the code you have. And current compilers are extremely good at micro optimizations. And current compilers are incapable of determining better algorithms for your code. So focus your time where it is better utilized.
John Cochran I agree, I always avoid Assembly myself but find it makes me think what’s involved for the CPU, I think it’s very useful that most modern new programmers could do with, nice to see the Bender’s brain 6502 in the list 👍, one CPU I would of like in the days to program but was a speccy user then a 68K then x86 yes I went to hell cast out of 68K heaven
Sweet I love your videos!! Thanks mate.
Looks great !!!
I did same thing, but it was more of a 32-bit register-based CPU.
I have implemented Bit Shift, multiplication and Division. Including jump instruction.
Language: C++ [std=11]
Love it, just commenting because it helps give visibility!
Yes! Back to Gary Lectures FeelsGoodMan
Hi professor! Would like to see some machine learning and computer vision videos by you. Keep explaining stuff and have a good day.
Thank you so much, I was waiting for someone to make it in 2019
This is going to be an awesome series. You know how to explain simple things simply!
I'm watching because gary is awesome, although I don't know about programming I'm a fully beginner but hey gary you are doing alot of effort so I love you 🥰❤️😍😘
Very nice, mate ! Thank's a lot !
CS engineer here, never learned this stuff ...
Excactly what i was looking for, for a long time now. Thanks!
I love videos that explorer computer architecture.
Thanks.
This was amazingly well explained. Thanks you very much!
Great video! I will definitely watch parts 2 and 3.
Can't wait for the next videos in this series; thank for you the elucidation Gary!
Also, you missed to a great opportunity at 08:50 to say Eff Eff Eff Eff Eff Eff Eff Eff Garry Linneker :D
Keep up the good work. You have a great talent for explaining, and I'm sure your channel is going to grow a lot in short order. :-)
One comment though: You've made the unstated assumption to make a big-endian CPU (which probably makes a lot of sense for an ISA with the purpose of being easy to decode by hand). You might have explicitly stated that in the video. (Or it might have just added confusion.) I'm sure you're going to mention that at some later stage where you're actually implementing it.
I am glad you like the channel. 😊 It is true, I did pick the endian-ness of the ISA without comment. I will make sure to talk about that at some point.
Amazing job Gary!
Good video!expecting more videos like this ,Thank you.
*GARY!!!*
*Good Morning Professor!*
*Good Morning Fellow Classmates!*
MARK!!!
MARK!!!
Servus
Morning Mark
@@nodaryuya6726 What is MARK?
Could someone explain the "ADDR H" and "ADDR L" columns and what those hex numbers refer to?
H is for High, L is for Low because it is 16 bit addressing but shown using 8 bit bytes.
This reminds me of when I designed an ALU in Minecraft way long ago. The actual file has been lost to time and all I have is a poor quality screenshot. Watching this kind of makes me want to revive that based on what I can remember of my original plans. From what I do remember, it was significantly more complex than any person's first ISA should be.
You should definitely revive it! It'd be so cool!
Professor Gary is back ☺️☺️
Super cool idea!
I have stuff to do, but this seems like a reaaaally cool side project!!!
I’ll do my best to resist, but by video 2, you may won me over! 😃😃😃
Thank you. Is 0x prefix counted as a byte of data by the computer?
No, 0x is just for us humans to know that a number is hex not decimal. 10 is different to 0x10, but both fit in one byte.
Wouldn't you need a linker as well as an assembler to be able to link your programs into one binary image?
Cool. What's the difference between microcoding and assembly? And why does one have to microcode if assemly is faster than high level languages?
What do you mean by the word "microcoding"?
Hi. I got microcoding in this interview (1 & 2): ua-cam.com/video/vwCdKU9uYnE/v-deo.html
As per your other comment. The Xerox Alto was a unique computer. That isn't how CPUs work today. While microcode still exists in some CPUs (mainly Intel), it isn't something that is exposed directly to programmers. You might find my video on RISC vs CISC enlightening about microcode: ua-cam.com/video/g16wZWKcao4/v-deo.html
Big shoutout to Gary for making this video possible!👊🏽
I can't find the third video. Really enjoyed the first two.
I am really glad you enjoyed it. Unfortunately I never made part 3, but it is on my TODO list!
@@GaryExplains I'm glad it's not fallen off the list. I've made a Ben Eater SAP1 computer and wanted to make an assembler for it. Your video was the top result on Google, so that's exciting!
This was a long time ago, but what does storing to memory using the high part or low part mean?
Since it is an 8-bit architecture, then you can't directly load or store a 16-bit word. To load or store a 16-bit word, you split it into two 8-bit bytes called low and high.
There is one aspect of computer architecture that his been given short shrift. It is operant reference modes. In particulars with the IBM 360 and successor generation, the base plus displacement, with its puny 4096 value with no indirect addressing options. Addressbility for operands in established thru on of sixteen 32 bit general purpose registers. The range span of a single operand is the value in the register indexed by the displacement. The total range of addressibility at one time is number of available registers available plus each displacement maximum size. One has heard of the concept of disk drive thrashing for virtual memory systems in which real memory size is limited compared to the virtual memory space configuration. I assert the the IBM base plus displace the design has the potential to lead to address reference thrashing for very large unsegmented procedures have all instruction memory consumed by storage of base address values and load from memory to register instruction! Some of ways of workarounds would be the creation of an interpreted meta machine language that would do nothing more than artificially create operand pseudo architecture that would provide a base address reference addressibity bank and indirect addressing mode both pre and post indexing options. The performance hit would be massive, of course. For quantum computer architectures with near infinite storage size capability, it is critical that attention be paid to operand access design! I do not current have the ability and train to work within the realm of the higher hieroglyphics of abstract mathematics to construct proof of this assertion!
I'm currently developing a scripting language that generates generic AST's. I think I'm going to target your CPU just for fun :)
Cheers
🤓
Why does this only have 1 comment
I love how he explains... I really wish he was my teacher
"He" is very happy you like how "he" explains! 😜 Thx Gary
So you should what some instructions for the CPU could be. But how would you tell the CPU that when it sees this number, it is an instruction, and then what that instruction does? I'm looking to make an 8 bit CPU out of discreet components. How would I setup opcodes for a CPU like that?
You don't "define"opcodes in this case. The inner structure of the cpu, how it is build up logically, determines the functions and therefor the opcodes.
Hi there! I believe instruction to run is decoded by certain control logic (decoder). So to build a computer it is needed to build a certain control logic ( I think you have to encode your certain opcode and other parts of instruction). I have watched some video about the stuff: Core Dumped, CRAFTING A CPU TO RUN PROGRAMS. Hope will help.
When will you try speed test g on an iPhone?
Did you read the FAQ garyexplains.com/faq
@@GaryExplains No, I hadn't seen that thanks! I really love Speed Test G by the way. Keep up the great work!
Sir can i change the instruction set of a processor
did you made assembly language in this video pls reply (am i right or not)
Eh? You already watched the video where I wrote the assembler for this instruction set.
@@GaryExplains so in that assembler video you make a assembly language using the instruction set (am i right)
Why don't you just watch the video rather than asking questions about it?
@@GaryExplainssorry sir if you get fed up with my questions but if i am time to something then i want full understanding AND because i didnt get that how we will type instructio set in what langauge we willl write our instruction set or were we will write means in notepad or how to write my assembly language and in how means in notepad and how to make assembler for my own assembly language and where i will write the assembler in notepad or other editor so these questions are making me confuse and again sorry if you get fed of
So opcode provides context?
Hi sir,
Where can I learn to design own processor??
The best videos on the subject are made by Ben Eater ua-cam.com/users/BenEater
Super interesting, thanks!!!! Sorry for the rest, i kinda fed up from the phone speed measurements...
Always trying to strike a balance, bro!
@@GaryExplains i realized that, and i know people like them, lol it supposed to be a compliment for a good video... Sorry for the grant sir! Your content is awesome and im a big fan and a programmer, yet in day to day i write lots of backend and i kinda started to look into assembly and aida and reverse. So this video was great refresh for me, many thanks... For ALL you do even if i personally dont care much about speed tests...
I took it as a compliment! :-) Thanks for the positive words. 😁
Start your online course, you are so good at low level programming. Please Start it. I'm ready to buy your course and also suggest some books on the topic of each video you upload. 🙏
I have an online Android development course here dgitacademy.com
Did you intentionally make instructions orthogonal by using the first nibble for action and the second for addressing mode?
How does your CPU understand hex?
It doesn't. A CPU only understands binary. An assembler or compiler can understand whatever numbers in whatever notation in whatever base that the assembler writer wants.
I not understand nothing but i want to learn 😐
Same here, welcome to class.
@Reflector I'd knew some pedantic fool would complain and what do we got ourselves here? An immature imbecile!BOOM!😮
What school did you go to Tommy? I wonder if you even had a girlfriend...
"I not understand nothing". What did you mean by that? You understood something from this video?
Rajesh Thomas yeah i understand that when she says "I cant feel it honey" it means i need to kick in the "High Performance Cores"😓
Hey Garry the redmi note 7 pro has just launched in india with the new SD 675. I want to see how does it run on speed test G
You should call it the Gaia/GAIA instruction set - Gary's Amazing Instruction set Architecture :)
LOL 😂
GAISC
Gary's amazing instruction set computer
Art school dropout here! I just found your channel and I love it. I'm working on a project but I'm kind of stuck... stuck in the sense of wanting to make the project far more complicated than needed haha.
Ok. I'm taking a MIDI signal, breaking each note out into its own channel, then sending those channels to a DAC. From there I can control the voltages of my eurorack. I'm using an Arduino to do the logic (MIDI to notes to channels) which is all very straightforward.
My question is, is there a way to setup a custom CPU with my own CPU instructions that handles nothing but the logic without needing the C/C++ libraries and all the Arduino overhead? I mean, I know there's a way... but what kind of CPU would I use? ARM? Atmega? Something smaller designed for fast simple logic? CPU instructions are pretty new to me. Had I picked engineering like a sane person (instead of the Arts), I suspect I would already know the answer.
Also: if we take the most commonly used instructions and map them to commands that are mostly zeros, could that in-theory keep the chip cooler?
Great video, but I do have question. You say that the arch is 16bit, with 65536 of addressable memory. But what is the word length of the memory? is it 16bit or 8bit? Is your memory 8*65536=524288 bits long or 16*65536=1048576 bits long? In short, what does op 0x02rrabcd actually do when the register is 16 bits? Does it load from address $0xabcd to a register, which becomes 0x00yy or does it load from addresses $0xabcd,$0xabcd+1 to a register, which becomes 0xyyzz or does it load 16 bits from address $0xabcd to a register, which becomes 0xyyyy? Thanks for explaining.
Thank you.
Hey Gary is there a way to make your own x86 CPU run with ARM instructions? Lets say a FX-8370 or i5-6600K just to see how bad the performance would degrade and which programs would be a no go? :D
I couldn't find your assembly language video !
Sorry, I forgot to add the playlist to the description. It is here: ua-cam.com/play/PLxLxbi4e2mYGvzNw2RzIsM_rxnNC8m2Kz.html
@@GaryExplains thanks... Keep them videos coming Gary , kudos!
very nice
I'd suggest not using 0x00 as a LOAD opcode. Reserve 0x00 as NOP or similar because buffer overflows, null terminated strings and the fun to be had with that.
That would only be the case if code and data aren't separated.
Harvard architecture separates instruction- from data memory and thus prevents such things from happening.
Even strictly von-Neumann architectures such as Intel x86 or AMD64 feature NX-bits marking data as non-executable.
Now this still won't prevent buffer overflows, but that's mainly because out-of-bounds memory access is not related to this problem.
Hey Gary! Big fan! Please please make a video on do task killer apps work? Do we need apps like Ram booster or can a processor handle all this by itself?
When will the next episode come?
The assembler one has already been published.
To say a processor has a 16 bit architecture always refers to it's data width not it's address space.
عاش يمعلم
When comes the third viedo?
Too Late! Already gave Microprocessor exam last yr!
waiting for next video
Here's something truly sad. I only just saw this video today. I already have this:
Main opcode halfword:
First format:
00[Op (4 bits)][Size 2 bits][Operand type 2 bits][Operand 6 bits]
Operations:
LD 0000
ST 0001
ADD 0010
ADC 0011
SUB 0100
SBB 0101
CMP 0110
AND 0111
OR 1000
XOR 1001
TEST 1010
SHL 1011
SHR 1100
ROL 1101
ROR 1110
XCHG 1111
Size:
Half-word 00
Word 01
Double-word 10
Quad-word 11
Operand type:
R0, Rn 00
Rn, 01
Rn, [abs mem] 10
Rn, imm 11
Second format:
010000000000[Op (1 bits)][Size 2 bits][Operand type 1 bits]
Operations:
MUL 0
DIV 1
Size:
Half-word 00
Word 01
Double-word 10
Quad-word 11
Operand type:
R0, 0
R0, [abs mem] 1
Third format:
0100000001[Op (3 bits)][Size 2 bits][Operand type 1 bits]
Operations:
FADD 000
FSUB 001
FMUL 010
FDIV 011
FCMP 100
Size:
Word 01
Double-word 10
Quad-word 11
Operand type:
R0, 0
R0, [abs mem] 1
Fourth format:
010000001[Op (2 bits)][Size1 2 bits][Size2 2 bits][Operand type 1 bits]
Operations:
FCVRT 00
FCVRTR 01
FICVRT 10
FICVRTR 11
Size1:
Word 01
Double-word 10
Quad-word 11
Size2:
Half-word 00
Word 01
Double-word 10
Quad-word 11
Operand type:
R0, 0
R0, [abs mem] 1
Fifth format:
0100000100[Op (4 bits)][Operand type 2 bits]
Operations:
JMP 0000
CALL 0001
JZ 0010
JNZ 0011
JC 0100
JNC 0101
JBE 0110
JNBE 0111
JL 1000
JLE 1001
JO 1010
JNO 1011
JPE 1100
JPO 1101
Operand type:
relative halfword 00
full address 01
10
[abs mem] 11
Sixth format:
0100000000010[Op (3 bits)]
Operations:
PUSHF 000
POPF 001
STC 010
CLC 011
BRK 100
INT imm 101
RET 110
Extended operand halfword:
First format:
0000000[Target type (3 bits)][n (6 bits)]
Target type:
Rn 000
[Dn] 001
[(-)Dn] 010
[Dn(+)] 011
[Dn + rel16] 100
[Dn + abs64] 101
Second format:
01[Target type (4 bits)][n (5 bits)][m (5 bits)]
Target type:
[Dn + Dm] 0000
[Dn + (Dm * 2)] 0001
[Dn + (Dm * 4)] 0010
[Dn + (Dm * 8)] 0011
[Dn + Dm + rel16] 0100
[Dn + (Dm * 2) + rel16] 0101
[Dn + (Dm * 4) + rel16] 0110
[Dn + (Dm * 8) + rel16] 0111
[Dn + Dm + abs64] 1000
[Dn + (Dm * 2) + abs64] 1001
[Dn + (Dm * 4) + abs64] 1010
[Dn + (Dm * 8) + abs64] 1011
Where is Part 2 please?
You should make op code 00 noop. That way if you memory is empty it does nothing.
That is assuming that unused memory gets initialised to 0.
how did you know that 0x00 is LOAD? in fact, how the cpu understand that 0x00 is to LOAD?
Because that is what I decided, it is an arbitrary decision by me, because I am the one designing the instruction set.
I want to understan how a cpu is design with a pen in a white paper, the link in silicius and not math calculation
😨 My Core2Quad is missing instruction sets! it wont boot Apex Legends because of that! Nooooo! 😱
@Reflector ☹️☹️
Here's my answer to the bloat of Intel, ARM, Risc V and even Agner Fogg's efforts... This is a very quick rough draft brainstorm so not written up properly and probably missing a few things. Maps to a single core X86 + rolled out vector operation loops + virtual machine all the way up to a fully blown, highly integrated APU.... 2 byte instructions.. The 'immediate values' are confusing because they are preloaded into registers after a procedure call. Only im8 is a true inline immediate value. The compiler would deal with this pseudo-immedite value referencing..... CISC on RISC, highly extendable yet very compact... Typed registers and virtual registers with automatic type conversion.
rg7: Register 1..128, st7: Stack 1..128, iv7: 'immediate' value (ref) 1..128, ip7: 'immediate' pointer (ref) 1..128, im8: immediate byte value
Data Types: x16
b8, b16, b32, b64, b128, b256
i8, i16, i32, i64, i128, i256
f32, f64, f128, f256
Registers: x128 .. each type has 8x registers (ex. b32_1, b32[1], b32[b8_2])
Stacks: x128 .. each type has 8x 2^1..8 sized vector stacks (ex. b32x4, b32x8[2], b32x16[b8_1], b32[1,2], b32[b8_1, b8_2])
Immediate Values: a method's immediates are pre+parallel loaded into a register file with 8 registers for each type (16 types) and 128 address pointers
Virtual Registers:
A,B,C,D .. A op (B) (op C) (op D) into A | D
PA, PB, PC, PD .. memory pointers
Instructions: 2 byte op codes
// 59 // total: 232
16, assign P(A|B|C|D) to rg7|iv7|ip7|st7
1, load|save|copy from|to * P(A|B|C|D) * P(A|B|C|D)
4, load|save rg7 * at PA|PB|PC|PD
8, load|save st7 * push/pop? * at PA|PB|PC|PD
16 copy rg7|iv7|ip7|st7 * into P(A|B|C|D)
4, copy im8 to A|B|C|D
8, push|pop st7 * into|from P(A|B|C|D)
1, push im8 to main stack
2, push|pop main stack to/from rg7|st7
1, push|pop P(A|B|C|D) to/from main stack
// 28
8, add|sub|mul|div A * st7 * pop? * into A|D
4, add|sub|mul|div A * rg7 * into A|D
4, add|sub|mul|div A * iv7 * into A|D
4, add|sub|mul|div A * ip7 * into A|D
8, add|sub|mul|div A * im8 * into A|D
// 16
2, and A * st7 * pop? * into A|D
2, or A * st7 * pop? * into A|D
2, xor A * st7 * pop? * into A|D
3, and A * rg7|iv7|ip7 into A|D
3, or A * rg7|iv7|ip7 into A|D
3, xor A * rg7|iv7|ip7 into A|D
1, and|or|xor * P(A|B|C|D) * P(A|B|C|D)
// 12
1, shift L|R * A|B|C|D by A|B|C|D | iv3
2, shift L|R * st7 * pop?
1, shift L|R * rg7
2, shift L|R * im8
1, rotate L|R * A|B|C|D by A|B|C|D | iv3
2, rotate L|R * st7 * pop?
1, rotate L|R * rg7
2, rotate L|R * im8
// 87
36, do if A|B|C|D * ge|g|e|le|l|ne * rg7|iv7|ip7
24, do if A|B|C|D * ge|g|e|le|l|ne * pop? * st7
24, do if A|B|C|D * ge|g|e|le|l|ne * im8
2, do if P(A|B|C|D) * ge|g|e|le|l|ne * P(A|B|C|D)
1, do if not * Z|S|O|C|Cached
// 18
1, jump if Z|S|O|C * to|by * P(A|B|C|D)
3, jump by i8|(forward|back) by b8
4, jump to|by rg7|imm
2, jump to|by st7 * pop?
2, jump to|by P(A|B|C|D) * plus A|B|C|D * shift ev3
1, jump by P(A|B|C|D) * shift ev5
1, call rg7|ip7
1, call st7 * pop?
1, call P(A|B|C|D) * plus A|B|C|D * shift ev3
1, call relative, jump by P(A|B|C|D) * shift ev5
// 12
1, tiny op * 256: no op, return
1 P(A|B|C|D) op Self into Self * 32 ops: not, neg
1 P(A|B|C|D) op Self into A * 32 ops
1 P(A|B|C|D) op Self into D * 32 ops
2, A op into A|D * 256 ops
2, A op B into A|D * 256 ops
2, A op B op C into A|D * 256 ops
2, A op B op C op D into A|D * 256 ops
Only one opcode is followed by a variable number of bytes:
1, Register Immediate Value ir8, Immediate Value // Must be proceeded by a Call or another RIV instruction
Speedtest g :- kirin 970 vs snapdragon 675
can i play apex legends with this?
mmm this sound like programming your own multi platform technology 😆
1:39 someone forgor to charge his phone
Designing a circuit
What is this supposed to be I am confused... I mean you shows us what you want to make but you dont show us how you do it or why you do it... it is an equivalent of "design your own car" video where the tutor says stuff like " I want to have an engine then I want to connect the engine to the wheels and the steering wheel" well obviously... "How" and "why" is what matters though. e.g what sorts of tolerances are needed for the axels how you will connect them to the engines out put what is the design of the engine why where those decisions in its design made etc.
I thought the video was about designing an instruction set and that is exactly what I show. The next video shows how to write an assembler for that instruction set and the code is in GitHub. What do you think is missing?
Yo, you should make a smartphone
pretty inefficient. 1 byte opcodes for RISC is enough optionally 1 or 2 byte operands for the opcode
Im gonna make a 1024core CPU now with 2048 threads.... huh
Jk haha
The more i learn about CPU architecture, the more i realise how silly complaining about the Technical Debt in PHP is. It's turtles all 0x00 the way 0x00 down.
if I built a computer the CPU instruction set would have instructions for gpu redering
Interesting idea. The problem is that you would then need to implement those instructions in hardware meaning you would need to merge CPU hardware and GPU hardware into the same design. The problem is that those are very two different things and getting them to work in the same design would be hard and ultimately inefficient and ineffective.
What's wrong with the ONE INSTRUCTION CPU, you don't even need an opcode for the instruction
So you watched my video about OISC and then came to another video of mine about designing a CPU instruction set to ask why am I doing it?
Yes, I want to know why you don't choose the easiest Instruction set ever for a CPU
LOL, Subleq is easy to implement, but it is hard to program and very inefficient. I thought that would be obvious!
You should have explained all the weak points of OISC and why no computer manufacturer implemented it in real life. It is a computer system that is totally dependent on main memory access and the CPU will only run as fast as the main memory. You will need a huge cache to speed up the CPU.
Since the video doesn't suggest that Subleq is a viable option for real world processors, I didn't feel I needed to spell out the advantages or disadvantages. Subleq is purely an exercise in computer theory. Again, I thought that was obvious.
Hiiiiii
Designing a processor isn't hard. Making a C compiler, that is beyond me.
*making a *bad* processor isn't hard.
I have been working on one and I am on like the 10th+ iteration. I am working on it down to the logic gates while trying to make it fast(ultimate goal is to run DOOM on it), and it is a whole another beast than what is shown here.
You have to consider things like the efficiency of the ISA(how much data needs to be read to do any common computation, balancing act between how much data an instruction needs and how much compute it does), the slowest instruction(a basic RISC CPU's clock is decided by the slowest instruction excluding I/O), and how the ISA will work with pipe lining. And that is just the ISA, haven't even got to things like memory controller yet or floating point ISA extensions.
My point being, that what he is doing is more like making a compiler for something like a esoteric languages. Once you move to advanced development, both are rabbit holes that no one person could fully understand.
I did, they stole it.
Who
Do it yourself... Noooo....
Copy and Paste.?
Microsoft recently got to CISC-emulation in ARM. IF performance is not a criteria, only compatibility and "native"-support is needed, will it be possible to use that technology of microsoft to emulate graphics chips from AMD and Nvidia and Matrox and others.? IF the idea is remotely-plausible... who will - do it yourself.? Hmmmm... yoda is still bouncing around in starwars...
Well done for missing the point. 🤦♂️