Address for A, B and branch address => 3 addresses total => single address is usually 42-bit (for 4 TB RAM) => 126-bit encoding for single instruction For comparison RISC use 16-bit (ATMEL MCUs, ARM Thumb) or 32-bit encoding (every high-performance RISC including new ARMv9) For comparison CISC x86 uses variable encoding with range from 8-bit to 120-bit (with avg instr length about 30-bit). So that's why this single instruction set nobody uses - it needs 4x times more memory just for encoding single instruction (so it's no go for MCU with tiny prog space), and it needs 2 instructions and one housekeeping instr = 3 instruction total => 12x more memory total in compare to standard 32-bit RISC. BTW you can add vector computing (or multiple other instructions) to be A, B, BranchAddr, VecA, VecB, etc. You can add whole x86 instruction set into one single mega instruction and get full modern PC functionality this way. Just it will result into several kilobit instruction length. So this is good mental exercising what you should avoid when designing good instruction set (or extension). Sad how only a few people realized this.
Given the basic assumption that you have addition defined in memory and/or an adder memory mapped to the CPU along with a way to jump(many ways to do this) that is certainly possible.
@@platin2148 XOR EA, EA (zero out) MOV (value assignment to memory), OK - but you're still missing one important instruction: branch
5 років тому+38
@@webgpu that has been proved by Stephen Dolan in Computer Laboratory, University of Cambridge www.cl.cam.ac.uk/~sd601/papers/mov.pdf and there's a compiler named movfuscator that compiles valid C code into only MOVs (or only XOR/SUB/ADD/XADD/ADC/SBB/AND/OR/PUSH/POP/1-bit shifts/CMPXCHG/XCHG) github.com/xoreaxeaxeax/movfuscator
I'm reminded of an old saying that, every piece of software has inefficiencies, meaning some instructions can be taken out without changing the functionality. And every piece of software has bugs in it, bits and parts that don't function as required. So by inference one can conclude that every piece of software can ultimately be reduced to a single instruction that doesn't work.
The earliest 6800 chips (not 68000 they were way later) were sort of 1 instruction in that if you executed it, it would never execute another instruction again. The assembler mnemonic was HCF . Halt and Catch Fire (and some really sarcastic developer really did put that in their assembler software). Basically there was a screw up where it set one accumulator, reset the other, and then dumped them both simultaneously onto the internal data bus. All in one clock cycle too. Quite efficient.
The 6502 also had a few HCF instructions, that would activate a debug functionality that just cycled the address bus through all possible values repeatedly. And the Pentium had its famous HCF instruction, that locked the data bus and then requested an interrupt handle, wich of course would never arrive.
This brings me back down memory lane, where I was writting code *on paper* for educational purposes. I took computer classes and for like a whole semester we didn't even touched a computer!
In the olden days you programed punch cards. CPU time was more expensive than human brain power. You triple checked your work! Often the code worked the first time. Otherwise it was a huge waste of money. My uncle use to program an old DEC VAX. I saw it when I was a kid I am 56 now. LOL
@@guym6093 For Reference, I'm 62. DEC were cool a cool computing Universe, until the PC changed the game. My most primitive experience was writing pseudo-code, converting to assembler, then using the reference manual to convert to machine code, then punch the machine code onto papertape, then give a command to the operating-system-less single board 8086 to load the paper-tape to an address and run it. Debug-Rinse-Repeat. Thus I became a firmware engineer. Thank goodness it wasn't a single instruction CPU, and that it wasn't snowing and uphill from my dorm to the lab.
I remember many years ago someone saying you could implement a cpu with just a subtract instruction. Thanks for finally completing my knowledge! I can now die happy :)
A friend in college actually built a one-instruction computer (not sure if it completely worked) as an electronics project that he did somewhere around late high school (I think). He described that one instruction as "subtract, and branch if negative." Somehow that sounds similar to what I think "SUBLEQ" stands for.
This only works because the instruction itself is doing multiple things at the same time: loading from two different addresses, storing into another, and branching. It's kind of like saying "this knife can open wine bottles and also provide light" and learning that it's a Swiss Army knife with a bottle opener and a mini-flashlight as part of its base.
Your comparison doesn't fit. This instruction does always the same thing because it's just a single instruction. With your Swiss army knife you choose the right tool to do the job, this instructions does always the same thing. All inputs are treated equally. Not more, not less. You don't press on the flashlight button if you open a wine bottle. The Swiss army knife is an example for a cpu that haves different instructions for different situations. This example is more like a hammer, you can use it to nail, to hammer the screw into a wall, to stop the housebreaker, to test the knee of the patient or to brake the glass in an emergency it's always the same instruction: hit x with strength y.
@@jondo7680 Even a hammer generally has a claw on the back you can use to remove nails as well as pound them in. It's one tool that can be used in more than one way.
generally its NOR that is used when only one logic operation will be implemented - not sure why - the earliest integrated cpu designs frequently used just one kind of transistor for the bulk of the die area. Might have something to do with the cost of early NOR gate lithography vs NAND.
@@styleisaweapon I guess it's because to have better logic 0 level. Earlier they didn't use CMOS, but NMOS. In NMOS NOR gates, the ground voltage is just 1 transistor away from output, that's why. Now CMOS is used, NAND is better because stacking of PMOS (in NOR) would slow down the performance.
I tend to think that an arbitrary unit vector is Turing complete... How and why? Any and all unit vectors have all branches of mathematics embedded within its properties. From a unit vector you can construct the unit circle! Once you have the Unit Circle you have algebra, geometry, trigonometry, and calculus... Even the simplest of all mathematical equations 1+1 = 2 satisfies the Pythagorean Theorem... Simply because the equation of the Unit Circle fixed at the origin (0,0) is a single form or specific set to the Pythagorean Theorem. From the Unit Vector you can construct any other value, digit or number, every linear equation and all polynomials, every trigonometric function, and all geometrical shapes from angles between two vectors with infinite area to a triangle all the way to a full circle. Once you have your polynomials, you can then also define your derivatives and integrals thus giving you calculus. Just follow the properties of vector arithmetic... Also vector arithmetic can be defined as a subset of Lambda Calculus which in itself has already been declared as Turing Complete thus making the Unit Vector Turing Complete. Now, as for how many instructions, that's dependent on the end user for which operations - linear and or affine transformations they chose to apply to the unit vector.
Well, you could just consider a group of many instructions to be one instruction where the opcode is considered an argument to the one instruction. Of course, it wouldn't be a very simple instruction.
Exactly dude. Address for A, B and branch address => 3 addresses total => single address is usually 42-bit (for 4 TB RAM) => 126-bit encoding for single instruction. For comparison RISC use 16-bit (ATMEL MCUs, ARM Thumb) or 32-bit encoding (every high-performance RISC including new AArch64 ARMv9) For comparison CISC x86 uses variable encoding with range from 8-bit to 120-bit (with avg instr length about 30-bit). So that's why this single instruction set nobody uses - it needs 4x times more memory just for encoding single instruction (so it's no go for MCU with tiny prog space), and it needs 2 instructions and one housekeeping instr = 3 instruction total => 12x more memory total in compare to standard 32-bit RISC. BTW you can add vector computing (or multiple other instructions) to be A, B, BranchAddr, VecA, VecB, etc. You can add whole x86 instruction set into one single mega instruction and get full modern PC functionality this way. Just it will result into several kilobits instruction length. So this is good mental exercising what everyone should avoid when designing good instruction set (or extension). Sad how only a few people realized this.
11:20 Doing some figuring on paper leads me to believe that maybe address -1 is the output. This is what I wrote when trying to decode this: a=0 Z=0-p (p=[H]) a=0-(0-p) a:0 (-1) If H==100 then a=0 Z=0-100=-100 a=0-(-100)=100 a:100 (-1) This shows me that (if im interpreting this right) the value at address 100 which is the letter H is being subtracted from the value at address -1. So that leads me to beleve the address -1 is reserved for screen output. Thanks for taking the 30 sec to explain that.
Because you always change the sign (due to using subtract), if you need to add you just subtract from 0. This means addition takes 2 operations and 1 extra memory location (to store the 0) compared to also having an addition operator. Because you need extra entropy in the form of data storage to pull this off, the subtraction-only computer is actually an addition and subtraction computer, but one of the ops is stored with the data itself (assuming Two's Compliment). You can't pull this off without the second operation being in the data itself, because ultimately negative values don't really exist in binary. Negative numbers are simply positive numbers (or rather, vectors - and positive and negative are the direction of the vector) with a yet-to-be-executed operator encoded along with them. -2 is "subtract 2's worth of value". +2 is "add 2's worth of value". 2 is a number. -2 is nearly a number, but convention says if the second value isn't supplied, use 0. Fractions are the same, two numbers yet to be divided. They aren't values in their own right yet. It's a beautiful example of how the line between operations and data is way more blurred than one might think. The addition of 1 needs also needs a "1" to be stored somewhere. This requires another bit of memory to store, and so in totality the subtraction-addition-and-increment is as entropy dense as a computer that has three op codes, addXY, addXY (but sign flip Y), and add1. They are the same amount of "complex", it is only that the former has more constants, and the latter more operations. TL;DR: It's only a single operation computer because of how the term "operation" is defined, and really when one says "special subtraction" one really means "the default is subtraction but constants stored in data can also modify it's behaviour". I'm not trying to nit-pick or be a dick or anything, i'm just trying to explain how you can't actually get Turin Completeness with 1 operation, you have to be a little crafty with the definition of operation.
@@nathangamble125 Yeah that's Two's compliment :) The first bit defines the sign, and in this instance, defines if your going to be doing addition or subtraction
I think its crazy that _just yesterday_ I added a comment to a library I've been writing, which reads exactly "//..because machine words have no sign" and today I am reading your youtube comment - my library comment is about the efficiency of a function on signed vs unsigned values - not all programmers may realize conversion from signed to unsigned is a completely free operation because its abstract - it may still lead to more or less efficiency because the abstract idea decides which future instructions to use (on AMD64, there are more forms of the signed multiply instruction than there are for unsigned multiply, even a 3-operator form...,) but thats a whole 'nother kettle of fish.
I was foolish enough to believe it literally. So I paused the video and attempted to implement addition using subtraction(easy), then multiplication and division follows. I could implement inverting using subtraction but I realized it is impossible to implement AND and OR using subtract. Thus, no turing completeness. The fact that it is a single Assembly instruction is true but never a single data operation. The LEQ part implements all branching and looping necessary that is fundamentally based on the three basic boolean gates.
Both NAND and NOR can be used. I can never remember which is which but one was best for TTL and the other best for NMOS (which has been replaced by CMOS now anyway). You can make a basic 2 input NOR with three resistors and an NPN transistor that more or less works.
Indeed. Would there somehow be a link between the two? I forgot the name of the mathematical rule to do stuff with NAND and OR. There was something with a double negator and swapping OR and AND
@@jeroenstrompf5064 I don't know about the double negation (rule) but the swapping if I recollect correctly is given by de Morgan´s Theorem - There are two “de Morgan´s” rules or theorems, (1) Two separate terms NOR´ed together is the same as the two terms inverted (Complement) and AND´ed for example: not(A+B) = not A . not B (2) Two separate terms NAND´ed together is the same as the two terms inverted (Complement) and OR´ed for example: not(A.B) = not A + not B or=+ and=.
When I click on this and saw you starting to explain the subleq instruction, I've immediately imagined something like what you has described. Maybe because in my times as IT engineering student, I learned to program for the URM machine. The Unlimited Register Machine, has only 4 instructions. All registers start at zero. Instruction Z(r), resets/zero the register, instruction S(r) put in the register the successor of is value. Instruction T(r1, r2) transfer/copy from register r1 to register r2, and J(r1,r2,i) jumps to instruction i, if registers r1 and r2 are equal. So one can create programs to add, to subtract, then with those we can create multiplications and divisions, and so on. Curiosity, all programs for URM are numerable, that is, there is a way to transform any program into a single number, or to transform a number (no matter how big it is) back into a program. As for the SUBLEQ, you didn't told how to set values in memory. Will 7, 5, 1, 25, or 173 magically appear in memory, if we have to use them? However this OISC is awesome!
Well, they wrote a C compiler that only utilizes MOV instructions for x86 (it turned out that MOV it's turing complete by itself). It's called movfuscator
Hey Gary! Another neat One Instruction Set Computer design is a Transport-Triggered Architecture, where you only have MOV, but by MOVing data into special registers you can perform arbitrarily defined operations on that data; you might have two registers, ACC and SUB, and to compute a minus b you MOV a ACC, MOV b SUB, where the SUB register takes in data and subtracts it from ACC at every clock cycle.
you should be able to design a cpu with only one instruction and two registers. one register contains the set of bits to be changed, the other register contains a set of bits that defines which bits are to be flipped. it would be a nightmare to try design the rest of the computer around the cpu, and I wouldn't want program for it, but you could do it. it would be up to the programmer to decide which bits needed to flip for any particular situation, making it not fun to program, but you could do it and it would be a very fast cpu. because it's just the one instruction it would be relatively simple to design the system for as many bits as you wanted. you could design the machine to use 256 bit registers, 1024 bit registers, whatever. actually getting it to accomplish anything would be a nightmare, but if someone were willing to tackle making a usable programming language for it, it would be very very fast.
@Richard Vaughn i could see why you would want to add that instruction, but it shouldn't be necessary as long as it pre-established where it's getting and putting data and the programmer to design the program accordingly. as long as you know before hand where the cpu is going to get and put data in a given cycle you can design the program accordingly and pre-initialize data in appropriate memory locations accordingly. it would be a nightmare to sort out, but it SHOULD be possible.
No, because this has no options, it does the same thing every time. Adding qualifiers or options is just disguising multiple instructions with the same op code. That is why MOV on x86 doesn't qualify for this. Notice there is no instruction code in these programs.
No, sorry I couldn't. My bad. But in my defense: A) There were no emojis to suggest you were being funny. B) I get plenty of stupid comments all day by people who are being serious but they are completely wrong about what they write.
requires being able to mov into the instruction pointer, an operation that while on the surface looks like any other mov instruction, is a separate instruction, with a separate opcode, and only a single operand. I am referring to that compiler you mention. You will find that not all assemblers use such ambiguous syntax as the standard "intel syntax" indicating that the syntax does not define the instruction. Even among the 64-bit general purpose registers, there are 4 register-to-register mov opcodes in play. (legacy to legacy, legacy to r8+, r8+ to legacy, and r8+ to r8+)
If you are referring only to compiler, ignore the following. iirc, some architectures have the instruction pointer in memory. So in that sense, it is still valid. The intention here is more towards making a new CPU and instruction set that is Turing complete, and not really to find an existing instruction that makes all existing CPU's Turing complete on its own.
Cool! Reminds me of a Textbook I read in University called Nand-to-tetris which went through the steps of building an entire VM that has a playable version of tetris, just from building an initial nand gate
All a computer does is essentially add numbers together anyways. It does this billions of times a second. From addition, all other mathematical operations can be formed such as division, multiplication and subtraction. From there, further layers of abstraction can be built upon to form a fully functional computer. Anyone interested in learning more should study discrete mathematics and take an introduction to computing systems.
That depends on how you define "instruction" in the context of the Turing architecture. A Turing machine certainly can be modelled as a single-instruction machine whose single instruction takes the state transition table as input.
Excellent video! Wow, i would hate to program that processor, though. Writing a language like Sweet16 would be laborious. I designed a CPU instruction set with only 16 instructions, so that we could keep the opcode to 4 bits. My instruction set is: Add, AND, NOT, OR, SHR (shift right), SUB, XOR, LDA #, PSH, Pop, RDA (read memory into A), STA (store), JC (jump if carry set), JN (jump if negative), JZ (jump if zero)... That's it. I used to program in Z-80 assembly, so this is what I thought I could live with if I had to write a language in it. I'm amazed you have done that! I have subscribed.
SUBLEQ is actually a complex instruction as it contains branching instruction. So your computer is actually CISC machine. Period. Proving: If I follow your logic, I can then make "single-instruction" CPU with instruction like SUBLEQSINJMPRET... Technically it would be "single instruction", with a lot of implicit logic behind the scenes.
Yes, that would be more efficient, but the point of the demonstration was to show that if you want to add a number, in this case 1, then you start with the number you want to add and then turn it negative. It would be the same if you wanted to add 7. You want +7 but to do that you need to make it -7. It you were adding a + b then you would need the negative value of b, etc.
It's kind of funny that the ultimate RISC is also the ultimate CISC -- the main distinguishing feature between RISC/CISC is not really fewer instructions, but restriction of source/destination operand types to exclude memory access in RISC, so they used to be called "load-store" architectures because you had to use instructions like LW and SW to transfer data between memory and registers. Although MOV on x86 is even more bloated than SUBLEQ, allowing for all sorts of crazy operations (like copying a word between two memory locations and incrementing index registers at the same time in one instruction).
@@alexandruilea915 Research that can lead to innovation. Why would anyone have spent their time researching quantum physics when classical physics worked so well for so long? Consider that without quantum physics the computing revolution wouldn't have happened. Research can sometimes be without a clear practical real world advantage as its end, and still be extremely valuable later on.
I designed a mov computer in 1990. Has registers only. Many of which are specialized and are ports to funcional units. The functional units are ALU, stack, jumping, memory etc. Constants could be taken from the next instruction. Can be extremely small or do multiple instructions in parallel. I mainly focused on the latter and it seemed to work well for digital signal processors. Instruction size of 16 bits combined in words of 32 or 64 bits. DSPs ofren use wide instructions. Moves can also happen while the function unit is still working, like memory access. This allows very simple parallism if organized correctly. Never made it to hardware though. Later added a few condition upcodes from moves to the same place. This can be used when you want an extremely small instruction size of 8 bits with only 16 registers/ports.
Wonderful! I've heard about the MOVE-processor in the 90's. Nice to see the concept now in practice. And although I haven't done machine code since halfway the 90's, your video feels like slipping on an old glove :)
It is called movfuscator but it uses different types of move (to address, from value, etc) so technically it is actually 3 or 4 instructions (I haven't check exactly how many).
Amazing, I wouldn't have even thought of this! If I had to guess, I would've thought maybe 10 instructions was the minimum possible, and I wouldn't have even known which of those instructions were necessary. Never would've guessed just 1 instruction, and that's an implied instruction at that, so no real opcode for it. I wonder if people are going to use this 1 instruction instruction-set as a proof of concept for future processors?
It’s often forgotten that the x86 is a RISC design with one instruction only. That instruction is PLEASE, which is often omitted because programmers are rather rude. Joke aside, this is an awesome video.
That reminds me of the "Computer '74", a project of the Dutch magazin "elektuur". The name not only stands for the year, but also for the piles of 74xx chips used to build that monster. It was programmed in octal, it had no opcodes but every command consisted of two addresses: from and to. It also had a hardware multiplier and a hardware divider!! It has not only memory addresses, but also special addresses for the adder, divider, and of course an address for the program counter. Funfact: It already had a diode matrix for the microcode.
In this way the execution of a program can be visualized in the square address x address as the temporal progression from one point to another. An instruction is a point in the room address x address x address. The program itself is a discrete sequence of points in this room.
Great video. There is also a really nice talk here on UA-cam by Christopher Domas on how you can actually get away with only using the MOV instruction on x86. Branching is the biggest issue there, so you basically have to loop through all your code for every branch. But it's really neat, and a cool obfuscation method :) Look up "the movfuscator" on youtube.
All boolean operations can be made with just a series of "Nand" gates. For example tie the inputs of a Nand gate together you get a Not gate. Put that not gate on the output of another Nand gate and you now have an And gate. Arrange Nand and And in parallel and you can create OR/NOR with a bit more you can create XOR. You now have the 6 basic boolean operators from which you can design any digital circuit including a cpu. A Nand gate is the simplest gate to make as it only requires two transistors.
Why not copy a to b like this: b = z - a b = z - b Assuming z is always 0 then this should work just fine, but takes half as many operations. Edit: Looked it up, and it doesn't work because the instructions can't be represented in subleq. All instructions must be precisely on the form: x = x - y
@Richard Vaughn I'm referring to z in the way the video uses inc1, that's it. It assumes inc1 is 1, I assume z is 0. If you have an issue with that then would you say that the add example in the video is incorrect?
A concept we discussed at school way back in time was a single instruction CPU whose only instruction was MOVE - take data from one location in memory and write it to another location. Now, it was cheating somewhat since it relied on existence of coprocessors which would watch certain memory locations and do arithmetic operations on them, putting result into another location. But while the SUBLEQ CPU is essentially an academic toy as pretty much any calculation is extremely inefficient, the MOVE CPU is much closer to how a real CPU microcontroller works. And it was still single instruction CPU in the sense that all instructions were move instructions.
Hi Gary! Interesting, but I am not sure if the increase in the number instructions is really giving a massive improvement in performance, even if we eliminate the instruction decoding. I read about a single instruction set CPU developed in Leyden university that I thought was very interesting, because it only has a load instruction with a very large number of registers that were input or output for arithmetic or logical instructions. That allows for the compiler to organize the code into multiple instruction pipelines. Therefore the parallel performance can be scaled by adding additional arithmetic/logical registers to the CPU. I thought that was a very interesting feature of that design. As to the subleq, to it really has only utility in illustrating the feasibility, similar to the original Turing machine. Sure you can write universal code with it, but I will be a collosal pain and is probably not very fast.
@@insoft_uk The word "reduced" does not refer to the number of instructions. It refers to the complexity of the instructions. The instruction he describes, once it gets the full instruction from the instruction memory, is then required to go out to memory, again, twice, to get the two operands and then decide where to go next. That by definition is a complex instruction. The fact that there is only one of them is irrelevant.
I always read the "Reduced instruction set" in RISC as a reduced set of instructions, not a set of reduced instructions. Do you have a quote or a link that would substantiate your interpretation?
Bogy Wan Kenobi Surely "Reduced," refers to the instruction set, not the instruction. So, by definition, a simple instruction set is one with few instructions.
@@110110010 Sure. Just do a google search for ARM or for that matter a youtube search for the same thing. It is not an interpretation. It is the definition. The fact that you don't know this is testimony to either your youth or lack of professionalism. In fact, google "arm vs. intel". As for me . . . I am 35 years a computer hardware engineer.;
At 5:56 you assume there is a 1 in memory, which cannot result from the substract instruction itself. You need at least a second instruction to set a value to memory.
There are other one instruction sets as well. I've built a couple in FPGAs. My latest is a transport triggered architecture, where the one instruction is a move from one register to another. All functional units (think your ALU, load/store unit, etc) have known addresses on the bus, and the instruction might also specify a sub address which allows the functional unit to perform multiple operations. One register that's connected to the functional unit will be the "trigger" instruction -- such that, whenever you schedule an instruction, and it's decoded, executed, if that instruction moves data into a functional unit's trigger register, the operation will be performed that was specified -- i.e., if i say to move the value `5` into the ALU's trigger function and specify the ALU's operation as addition, then it'll use the other input as the current value on the data bus, and the execution of the CPU happens as a side effect of the moving of data between registers. These are also known as exposed datapath computers. Really neat, and solve the VLIW register file pressure problem.
Yes, but you need software cycles to simulate other instructions,in other words, you need resource to generate "more complex instructions". So, this is the way big instructions set, when correctly used can run faster a software, and reduce his "space or memory usage" at the same clock speed. This one of the (also license story vs x86 world.) way you are using an arm soc in your phone. But this thing sometime go, out of control.
Haaa!!! That's *two* instructions in one. Not only a SUB, but also a BRANCH instruction fused into one. It is kind of a misleading video... *yet* i found the information in it AMAZING !!! (i just edited this comment lowercasing it:)
@@GaryExplains Ok, you could treat this question with two answers: Using a broad, abstract definition, it is "whatever you program the logic gates to do". In this case you could even build a "whole" program into One instruction, and give it a suitable name, like "CalculateTodaysWorldsMeanTemperature" passing as the only operand, a pointer to a table of relevant values to the calculation. The second answer could be, "one processing unit chosen from the minimal instruction set possible: , , , " these could be regarded as "one instruction" since they perform just one single, indivisible operation (explicitly given by its name). Regarding the second answer, the SUB instruction does , , and
@@Waouben i don't know if i understood your question correctly, so i'll answer this way: I certainly can add two numbers using my four types of instructions ,,, . In this case, only three types of instructions are needed ,,. First number is in addr, second number in addr+1, result in addr+2, in z80-style: LD a,(addr); // load LD b, a; // load (reg.) LD a,(addr+1); // load ADD a, b; // logic op LD (addr+2), a // store
@@webgpu I'm talking about the CalculateTodaysWorldsMeanTemperature instruction. My point being that while you can design an instruction that can do anything you want, be it an entire program, most of them don't allow you to build a general purpose processor on their own. Which mean that however complex SUBLEQ is, it's still one instruction.
5:30 I'm interested in seeing how you can get a "1" from scratch. All memory locations contain random bits at the start. Generate the standard Fibbonocci sequence in consecutive memory locations. How few primitive operations can you get away with, if SUBTRACT is not sufficient?
@@GaryExplains I know. But I'm interested in figuring out the minimum instructions needed when memory is randomized at startup. Having a pre-loaded constant with part of the program is no different then having another kind of instruction -- you are not, in fact, providing only a list of one kind of thing. Relying on numbers being part of the encoding of other instructions, I think, is cheating; or at least is another "thing" that you are counting on. Try it with a Harvard architecture, or an encrypted instruction so you can't depend on some instruction containing a specific number within its encoding. The idea of "minimal" is a bit more slippery and subtle than "one instruction" leads us to believe. That one instruction does different things, conditionally. Imagine a VLIW control word -- that's one instruction, a uniform format for everything the CPU does. But it is far from minimal. Your "one instruction" combines the actions of three simpler instructions, so it is a chamilian if you ignore the affects other than the one you want. Why not an instruction that does 20 different things, all of which can be easily ignored? That is, the fact that there is only one instruction is not a true measure of austerity. It also relies on self-modifying code, and on the ability to read constants as well as abstract instructions.
@@JohnDlugosz If you allow 1 to be available at a hardcoded memory address (which would be a simple bit of electronics and nothing to do with RAM etc) then the answer is still 1 subleq instruction because you can create every other number by incrementing from 0.
This is both a RISC and CISC architecture, since the latter is about the complexity of the indviidual operations. And I consider this minus one to be quite complex (involving two addresses and conditional jump.) Additionally, it is counter the idea behind RISC -- to only have simple instructions -- but since the formal definition of RISC deals with the cardinality of the set primarily, it is indeed RISC.
Well that's hard core RISQ. I didn't know it was possible to have one instruction. I heard in my youth that the simplest machine required a NAND, a NOT, and a XOR or something like that. Don't remember now. You should have demonstrated how you can do bit manipulations such as these. Absolutely necessary!
They may have been explaining that all logic can be built from a small number of logic gates. Apparently, you can build all digital logic from NAND, or NOR gates. In other words, choose one of the two, and then use those gates to build your logic. Sounds inefficient though.
Binary subtraction is actually a form of *addition.* It works like this: 1. Each bit of the _subtrahend_ ( i.e., the number that is to be subtracted from the other number, called _minuend_ ) is _inverted_ (i.e., if it's a one, it's changed to a zero, and vice versa), giving the _ones' complement_ of the subtrahend. 2. The ones' complement is then incremented by one to give the _twos' complement_ of the subtrahend. 3. The twos' complement is then *added* to the minuend, to give the result that includes a carry (i.e., an extra one to the left of the result, which is always discarded). At the hardware-level, computers always add; they *never* subtract.
@@GaryExplains We are talking about two different things. You are talking about microcode, and I am talking about what happens at the hardware-level (i.e., logic-gates). Computers _never_ subtract at this level; there's no such thing as a binary subtracter circuit. An adder circuit is used for subtraction. And to perform the operation of negation of any positive binary number actually means to take the twos' complement of that number.
I thought this might be bbj (byte byte jump, it's on esoteric langs) but I've realised I might be misremembering it. thank you for a tech video without binary the abstraction is nice
You can add one by storing -1 instead of 1 in your "inc1" memory address, taking away two steps a = a - inc1 Which is equivalent to a = a - (-1) = a + 1 Let me know if this is slower than the example in the video for some reason
6:01 -- This what is is being done. (1) *inc1 = 1* (2) *z=z-inc1 ; z = 0 - 1 = -1* (3) *a=a-z ; a=7- -1 = 8* In (1) if inc1 is set to -1 it would allow skipping (2) where z is being set to -1. However, the way this system works (with so many steps needed because of the lack of proper instructions): If you did this 100 times (you would do 200 sub-steps instead of 300 -- saving of 100 sub-steps). EG. 3N - N = 2N ; you would save at most N sub-steps. Performance wise this would still be O(N). Even though doing inc=1 or inc=-1 will save you N steps. It is RISC-style, clearly all the optimization (if any) would be done by the compiler. However, the compiler doesn't really have much to work with in terms of optimization (normally a register set along with load/store instructions, for starters). Because of that....the boost in performance from the above will be largely insignificant.
Yes, that would be more efficient, but the point of the demonstration was to show that if you want to add a number, in this case 1, then you start with the number you want to add and then turn it negative. It would be the same if you wanted to add 7. You want +7 but to do that you need to make it -7. It you were adding a + b then you would need the negative value of b, etc.
This SUBLEQ is more of a 1 instruction CISC Computer. RISC Architecture is more about getting things done in 1 or the fewest possible clock cycles with optimized hardware.
I bought a CPU on ebay that had only one instruction implemented. I believe it was Halt and Catch Fire.
lol
@@sql64 dude, your name...
@@dummypg6129 i agree
Dont forget kernel panic and 100% idle load
Must have had Windows on it! ;-)
I've seen a talk where a guy built a compiler which compiled C to just mov instructions. He called it the movfuscator.
I'd love to see a link for that if you (and Gary of course!) don't mind? Thanks!
suvetar just search movfuscator, there's a black hat presentation on it. Adiabatic computing is also interesting.
@@suvetar This is the talk (he mentions it there) ua-cam.com/video/HlUe0TUHOIc/v-deo.html
@@fss1704 Wow - thank you! That was incredible!
@@suvetar ua-cam.com/video/F_Riqjdh2oM/v-deo.html
ua-cam.com/video/vFpNNrt-baE/v-deo.html
Listen
I learned this 50 years ago in a computer class. The instruction was subtract and store.
Address for A, B and branch address => 3 addresses total => single address is usually 42-bit (for 4 TB RAM) => 126-bit encoding for single instruction
For comparison RISC use 16-bit (ATMEL MCUs, ARM Thumb) or 32-bit encoding (every high-performance RISC including new ARMv9)
For comparison CISC x86 uses variable encoding with range from 8-bit to 120-bit (with avg instr length about 30-bit).
So that's why this single instruction set nobody uses - it needs 4x times more memory just for encoding single instruction (so it's no go for MCU with tiny prog space), and it needs 2 instructions and one housekeeping instr = 3 instruction total => 12x more memory total in compare to standard 32-bit RISC.
BTW you can add vector computing (or multiple other instructions) to be A, B, BranchAddr, VecA, VecB, etc. You can add whole x86 instruction set into one single mega instruction and get full modern PC functionality this way. Just it will result into several kilobit instruction length.
So this is good mental exercising what you should avoid when designing good instruction set (or extension). Sad how only a few people realized this.
MOV is also Turing-complete.
Given the basic assumption that you have addition defined in memory and/or an adder memory mapped to the CPU along with a way to jump(many ways to do this) that is certainly possible.
Xoreaxeax movuscater...
Platin 21 brainfuck is way more ins'teresting...
@@platin2148 XOR EA, EA (zero out) MOV (value assignment to memory), OK - but you're still missing one important instruction: branch
@@webgpu that has been proved by Stephen Dolan
in Computer Laboratory, University of Cambridge www.cl.cam.ac.uk/~sd601/papers/mov.pdf
and there's a compiler named movfuscator that compiles valid C code into only MOVs (or only XOR/SUB/ADD/XADD/ADC/SBB/AND/OR/PUSH/POP/1-bit shifts/CMPXCHG/XCHG) github.com/xoreaxeaxeax/movfuscator
I'm reminded of an old saying that, every piece of software has inefficiencies, meaning some instructions can be taken out without changing the functionality. And every piece of software has bugs in it, bits and parts that don't function as required. So by inference one can conclude that every piece of software can ultimately be reduced to a single instruction that doesn't work.
"anything can be implemented in hardware" is a short version of what you said
although true, it's not practical
The instruction set is bloat
-Luke Smith, probably
He's probably already in the forest talking about it.
"Instruction sets are fuckin' gay bruh" - Albert Hitler
The world needs a HISC, i.e., half instruction set computer.
Juuso Alasuutari you mean semicomputers.
- Linus
The earliest 6800 chips (not 68000 they were way later) were sort of 1 instruction in that if you executed it, it would never execute another instruction again. The assembler mnemonic was HCF . Halt and Catch Fire (and some really sarcastic developer really did put that in their assembler software). Basically there was a screw up where it set one accumulator, reset the other, and then dumped them both simultaneously onto the internal data bus. All in one clock cycle too. Quite efficient.
The 6502 also had a few HCF instructions, that would activate a debug functionality that just cycled the address bus through all possible values repeatedly. And the Pentium had its famous HCF instruction, that locked the data bus and then requested an interrupt handle, wich of course would never arrive.
"You had ONE instruction"
Less transistors, made as fast as possible, use less heat, more reliable and more working chip per wafers, less cost. Or more cores per chip.
@@larrybeckham6652 but far more complicated programming of compilers...
@@Syndikalisten True. But get the compilers working and you have future savings that pays off.
@@larrybeckham6652 Except it needs way more instructions to do the same things as a normal CPU, so effectively it's slower
@@TheDoh007 life is of tradeoffs
This brings me back down memory lane, where I was writting code *on paper* for educational purposes. I took computer classes and for like a whole semester we didn't even touched a computer!
In the olden days you programed punch cards. CPU time was more expensive than human brain power. You triple checked your work! Often the code worked the first time. Otherwise it was a huge waste of money. My uncle use to program an old DEC VAX. I saw it when I was a kid I am 56 now. LOL
@@guym6093 For Reference, I'm 62. DEC were cool a cool computing Universe, until the PC changed the game.
My most primitive experience was writing pseudo-code, converting to assembler, then using the reference manual to convert to machine code, then punch the machine code onto papertape, then give a command to the operating-system-less single board 8086 to load the paper-tape to an address and run it. Debug-Rinse-Repeat.
Thus I became a firmware engineer.
Thank goodness it wasn't a single instruction CPU, and that it wasn't snowing and uphill from my dorm to the lab.
I remember the good old days of making mine-craft computers. I also wrote binary code *on paper*.
I remember many years ago someone saying you could implement a cpu with just a subtract instruction. Thanks for finally completing my knowledge! I can now die happy :)
Live long and prosper, though ;)
Next we need to build this CPU with just NAND gates. :D
yes, like the GIGATRON
The 1 instruction is HALT.
LOL, no. 🙄
HALT or HLT
all you really need is a way to execute a lambda function....
@@GaryExplains Why qed exists then? it uses the original universe bootstrap, everyting does.
HCF
One instruction with a complex inner instruction set ;)
But still one instruction that needs no op codes because the same thing happens every time.
Machine code: Wtf
"On the other hand there is an assembler"...
Assembler: Wtf
Aristotle writes in his physics that we only need three principles to describe the whole world. Here we have one solution how.
It is doing 3 things in 1 instruction, loading the memory, subtract one from another, and moves to another address
@@JamilKhan-hk1wl Yes! I thought the same thing when I heard about the one-instruction-code.
@@WerIstWieJesus its like the knights move in chess, it looks weird (L shape) but that one move can get it to all squares
@@JamilKhan-hk1wl You are a great philosopher.
A friend in college actually built a one-instruction computer (not sure if it completely worked) as an electronics project that he did somewhere around late high school (I think). He described that one instruction as "subtract, and branch if negative." Somehow that sounds similar to what I think "SUBLEQ" stands for.
That would be SUBLT (subtract and branch if less than)
Apart from SUBLEQ, SUBGEQ, SUBLT, SUBGT, what other single instruction can we have?
And SUBCMP with two jumping addresses (one for LT, one for GT).
@@kephalopod3054 subLGBT opcodes are gay😅
on the other hand: there is a glimpse of how emulators are created.
What do you mean?
@@RussellTeapot "Creating equivalent code using different instructions" is what an emulator does and is similar to what's done here..
@@kilrahvp oh, I see
Only in so far as it's describing machine code, where emulators take the target machine's machine code and read it as data.
This only works because the instruction itself is doing multiple things at the same time: loading from two different addresses, storing into another, and branching. It's kind of like saying "this knife can open wine bottles and also provide light" and learning that it's a Swiss Army knife with a bottle opener and a mini-flashlight as part of its base.
I kinda agree, but SUBLEQ isn't an unusual instruction. There are CPU's that have it as part of their instruction set.
Sorry for necropost
Your comparison doesn't fit. This instruction does always the same thing because it's just a single instruction. With your Swiss army knife you choose the right tool to do the job, this instructions does always the same thing. All inputs are treated equally. Not more, not less. You don't press on the flashlight button if you open a wine bottle. The Swiss army knife is an example for a cpu that haves different instructions for different situations. This example is more like a hammer, you can use it to nail, to hammer the screw into a wall, to stop the housebreaker, to test the knee of the patient or to brake the glass in an emergency it's always the same instruction: hit x with strength y.
@@jondo7680 Even a hammer generally has a claw on the back you can use to remove nails as well as pound them in. It's one tool that can be used in more than one way.
nope
@@leexabyz Well, as long as you get "necro-readers" like me, it's fine ;P
... and now build this CPU with only NAND gates ;-)
Silly me - I kind of did that but with xor gates. ua-cam.com/video/ISuV82p2vck/v-deo.html
generally its NOR that is used when only one logic operation will be implemented - not sure why - the earliest integrated cpu designs frequently used just one kind of transistor for the bulk of the die area. Might have something to do with the cost of early NOR gate lithography vs NAND.
@@styleisaweapon - good catch - I meant nor and not xor. (nor and nand are the two universal gates that can produce any other gate)
It's nice to know that my memory is not playing tricks on me by remembering this useless fact correctly, thanks for the confirmation.🤪
@@styleisaweapon I guess it's because to have better logic 0 level. Earlier they didn't use CMOS, but NMOS. In NMOS NOR gates, the ground voltage is just 1 transistor away from output, that's why.
Now CMOS is used, NAND is better because stacking of PMOS (in NOR) would slow down the performance.
I, for one, would love to see more detail about the Assembler, pretty please!
SUBLEQ is Turing-complete. Awesome!
I tend to think that an arbitrary unit vector is Turing complete... How and why? Any and all unit vectors have all branches of mathematics embedded within its properties. From a unit vector you can construct the unit circle! Once you have the Unit Circle you have algebra, geometry, trigonometry, and calculus... Even the simplest of all mathematical equations 1+1 = 2 satisfies the Pythagorean Theorem... Simply because the equation of the Unit Circle fixed at the origin (0,0) is a single form or specific set to the Pythagorean Theorem. From the Unit Vector you can construct any other value, digit or number, every linear equation and all polynomials, every trigonometric function, and all geometrical shapes from angles between two vectors with infinite area to a triangle all the way to a full circle. Once you have your polynomials, you can then also define your derivatives and integrals thus giving you calculus. Just follow the properties of vector arithmetic... Also vector arithmetic can be defined as a subset of Lambda Calculus which in itself has already been declared as Turing Complete thus making the Unit Vector Turing Complete. Now, as for how many instructions, that's dependent on the end user for which operations - linear and or affine transformations they chose to apply to the unit vector.
Well, you could just consider a group of many instructions to be one instruction where the opcode is considered an argument to the one instruction. Of course, it wouldn't be a very simple instruction.
Exactly dude.
Address for A, B and branch address => 3 addresses total => single address is usually 42-bit (for 4 TB RAM) => 126-bit encoding for single instruction.
For comparison RISC use 16-bit (ATMEL MCUs, ARM Thumb) or 32-bit encoding (every high-performance RISC including new AArch64 ARMv9)
For comparison CISC x86 uses variable encoding with range from 8-bit to 120-bit (with avg instr length about 30-bit).
So that's why this single instruction set nobody uses - it needs 4x times more memory just for encoding single instruction (so it's no go for MCU with tiny prog space), and it needs 2 instructions and one housekeeping instr = 3 instruction total => 12x more memory total in compare to standard 32-bit RISC.
BTW you can add vector computing (or multiple other instructions) to be A, B, BranchAddr, VecA, VecB, etc. You can add whole x86 instruction set into one single mega instruction and get full modern PC functionality this way. Just it will result into several kilobits instruction length.
So this is good mental exercising what everyone should avoid when designing good instruction set (or extension). Sad how only a few people realized this.
This is supreme!!! ♥ Waiting for the next one in the series!!!
11:20 Doing some figuring on paper leads me to believe that maybe address -1 is the output. This is what I wrote when trying to decode this:
a=0
Z=0-p (p=[H])
a=0-(0-p)
a:0 (-1)
If H==100 then
a=0
Z=0-100=-100
a=0-(-100)=100
a:100 (-1)
This shows me that (if im interpreting this right) the value at address 100 which is the letter H is being subtracted from the value at address -1. So that leads me to beleve the address -1 is reserved for screen output.
Thanks for taking the 30 sec to explain that.
I cannot believe the quality of your channel content. You are AWESOME!
Because you always change the sign (due to using subtract), if you need to add you just subtract from 0. This means addition takes 2 operations and 1 extra memory location (to store the 0) compared to also having an addition operator.
Because you need extra entropy in the form of data storage to pull this off, the subtraction-only computer is actually an addition and subtraction computer, but one of the ops is stored with the data itself (assuming Two's Compliment). You can't pull this off without the second operation being in the data itself, because ultimately negative values don't really exist in binary. Negative numbers are simply positive numbers (or rather, vectors - and positive and negative are the direction of the vector) with a yet-to-be-executed operator encoded along with them. -2 is "subtract 2's worth of value". +2 is "add 2's worth of value". 2 is a number. -2 is nearly a number, but convention says if the second value isn't supplied, use 0. Fractions are the same, two numbers yet to be divided. They aren't values in their own right yet. It's a beautiful example of how the line between operations and data is way more blurred than one might think.
The addition of 1 needs also needs a "1" to be stored somewhere. This requires another bit of memory to store, and so in totality the subtraction-addition-and-increment is as entropy dense as a computer that has three op codes, addXY, addXY (but sign flip Y), and add1. They are the same amount of "complex", it is only that the former has more constants, and the latter more operations.
TL;DR: It's only a single operation computer because of how the term "operation" is defined, and really when one says "special subtraction" one really means "the default is subtraction but constants stored in data can also modify it's behaviour". I'm not trying to nit-pick or be a dick or anything, i'm just trying to explain how you can't actually get Turin Completeness with 1 operation, you have to be a little crafty with the definition of operation.
Depends on format. You can just have a +/- bit.
@@nathangamble125 Yeah that's Two's compliment :) The first bit defines the sign, and in this instance, defines if your going to be doing addition or subtraction
I think its crazy that _just yesterday_ I added a comment to a library I've been writing, which reads exactly "//..because machine words have no sign" and today I am reading your youtube comment - my library comment is about the efficiency of a function on signed vs unsigned values - not all programmers may realize conversion from signed to unsigned is a completely free operation because its abstract - it may still lead to more or less efficiency because the abstract idea decides which future instructions to use (on AMD64, there are more forms of the signed multiply instruction than there are for unsigned multiply, even a 3-operator form...,) but thats a whole 'nother kettle of fish.
@@styleisaweapon Haha, well, those who forget the past are doomed to refactor code in the future or something :)
I was foolish enough to believe it literally. So I paused the video and attempted to implement addition using subtraction(easy), then multiplication and division follows. I could implement inverting using subtraction but I realized it is impossible to implement AND and OR using subtract. Thus, no turing completeness. The fact that it is a single Assembly instruction is true but never a single data operation. The LEQ part implements all branching and looping necessary that is fundamentally based on the three basic boolean gates.
This reminds me of NAND gate one of a universal gate .
My thoghts exactly
Both NAND and NOR can be used. I can never remember which is which but one was best for TTL and the other best for NMOS (which has been replaced by CMOS now anyway). You can make a basic 2 input NOR with three resistors and an NPN transistor that more or less works.
Yeah, NAND and NOR are just dual -- isomorphic with respect to swapping high and low.
Indeed. Would there somehow be a link between the two? I forgot the name of the mathematical rule to do stuff with NAND and OR. There was something with a double negator and swapping OR and AND
@@jeroenstrompf5064 I don't know about the double negation (rule) but the swapping if I recollect correctly is given by de Morgan´s Theorem - There are two “de Morgan´s” rules or theorems,
(1) Two separate terms NOR´ed together is the same as the two terms inverted (Complement) and AND´ed for example: not(A+B) = not A . not B
(2) Two separate terms NAND´ed together is the same as the two terms inverted (Complement) and OR´ed for example: not(A.B) = not A + not B
or=+
and=.
When I click on this and saw you starting to explain the subleq instruction, I've immediately imagined something like what you has described.
Maybe because in my times as IT engineering student, I learned to program for the URM machine. The
Unlimited Register Machine, has only 4 instructions. All registers start at zero. Instruction Z(r), resets/zero the register, instruction S(r) put in the register the successor of is value. Instruction T(r1, r2) transfer/copy from register r1 to register r2, and J(r1,r2,i) jumps to instruction i, if registers r1 and r2 are equal. So one can create programs to add, to subtract, then with those we can create multiplications and divisions, and so on.
Curiosity, all programs for URM are numerable, that is, there is a way to transform any program into a single number, or to transform a number (no matter how big it is) back into a program.
As for the SUBLEQ, you didn't told how to set values in memory. Will 7, 5, 1, 25, or 173 magically appear in memory, if we have to use them?
However this OISC is awesome!
Now imagine being the guy tasked to write a C compiler to this instruction set
i think it is easy if you just map each assembly instruction to the corresponding long list of SUB's
Well, they wrote a C compiler that only utilizes MOV instructions for x86 (it turned out that MOV it's turing complete by itself). It's called movfuscator
Check out movfuscator. This is rule 34 for compilers: if you can think it, it probably exists already.
The link shared in the video description contains a C compiler for SUBLEQ (though only for a subset of the the C language)
writing a compiler isnt hard - ive done it several times in my life - it really really isnt - whats hard is doing it well
Hey Gary! Another neat One Instruction Set Computer design is a Transport-Triggered Architecture, where you only have MOV, but by MOVing data into special registers you can perform arbitrarily defined operations on that data; you might have two registers, ACC and SUB, and to compute a minus b you MOV a ACC, MOV b SUB, where the SUB register takes in data and subtracts it from ACC at every clock cycle.
you should be able to design a cpu with only one instruction and two registers. one register contains the set of bits to be changed, the other register contains a set of bits that defines which bits are to be flipped. it would be a nightmare to try design the rest of the computer around the cpu, and I wouldn't want program for it, but you could do it. it would be up to the programmer to decide which bits needed to flip for any particular situation, making it not fun to program, but you could do it and it would be a very fast cpu. because it's just the one instruction it would be relatively simple to design the system for as many bits as you wanted. you could design the machine to use 256 bit registers, 1024 bit registers, whatever. actually getting it to accomplish anything would be a nightmare, but if someone were willing to tackle making a usable programming language for it, it would be very very fast.
@Richard Vaughn i could see why you would want to add that instruction, but it shouldn't be necessary as long as it pre-established where it's getting and putting data and the programmer to design the program accordingly. as long as you know before hand where the cpu is going to get and put data in a given cycle you can design the program accordingly and pre-initialize data in appropriate memory locations accordingly. it would be a nightmare to sort out, but it SHOULD be possible.
How to create a one instruction computer: Take ANY instruction set, redefine every instruction as EXEC with options. Job done!
No, because this has no options, it does the same thing every time. Adding qualifiers or options is just disguising multiple instructions with the same op code. That is why MOV on x86 doesn't qualify for this. Notice there is no instruction code in these programs.
@@GaryExplains Well of course. Couldn't you tell I wasn't being serious?
No, sorry I couldn't. My bad. But in my defense: A) There were no emojis to suggest you were being funny. B) I get plenty of stupid comments all day by people who are being serious but they are completely wrong about what they write.
@@GaryExplains It's my dry sense of humour. Cheers!!
As it turns out, the move instruction is also Turing complete. I remember someone made a compiler that uses only move instructions a few years ago.
requires being able to mov into the instruction pointer, an operation that while on the surface looks like any other mov instruction, is a separate instruction, with a separate opcode, and only a single operand. I am referring to that compiler you mention. You will find that not all assemblers use such ambiguous syntax as the standard "intel syntax" indicating that the syntax does not define the instruction. Even among the 64-bit general purpose registers, there are 4 register-to-register mov opcodes in play. (legacy to legacy, legacy to r8+, r8+ to legacy, and r8+ to r8+)
If you are referring only to compiler, ignore the following.
iirc, some architectures have the instruction pointer in memory. So in that sense, it is still valid.
The intention here is more towards making a new CPU and instruction set that is Turing complete, and not really to find an existing instruction that makes all existing CPU's Turing complete on its own.
Stellar video! Very nice. Loved every bit of it.
Cool! Reminds me of a Textbook I read in University called Nand-to-tetris which went through the steps of building an entire VM that has a playable version of tetris, just from building an initial nand gate
rough size of that build?
I bet it would be huge.
All a computer does is essentially add numbers together anyways. It does this billions of times a second. From addition, all other mathematical operations can be formed such as division, multiplication and subtraction. From there, further layers of abstraction can be built upon to form a fully functional computer. Anyone interested in learning more should study discrete mathematics and take an introduction to computing systems.
yes, its basically a calculator
That's an impressive idea, given that a textbook turing machine, while only slightly more complex, is NOT a single instruction machine... XD
That depends on how you define "instruction" in the context of the Turing architecture. A Turing machine certainly can be modelled as a single-instruction machine whose single instruction takes the state transition table as input.
Substract and Check value and Conditional Jump in a single instruction!
My brain has only one instruction, and it's NOP.
Haha, here take a like.
Hahahaha 😂🤣
Excellent video! Wow, i would hate to program that processor, though. Writing a language like Sweet16 would be laborious. I designed a CPU instruction set with only 16 instructions, so that we could keep the opcode to 4 bits. My instruction set is: Add, AND, NOT, OR, SHR (shift right), SUB, XOR, LDA #, PSH, Pop, RDA (read memory into A), STA (store), JC (jump if carry set), JN (jump if negative), JZ (jump if zero)... That's it. I used to program in Z-80 assembly, so this is what I thought I could live with if I had to write a language in it. I'm amazed you have done that! I have subscribed.
SUBLEQ is actually a complex instruction as it contains branching instruction. So your computer is actually CISC machine. Period.
Proving: If I follow your logic, I can then make "single-instruction" CPU with instruction like SUBLEQSINJMPRET... Technically it would be "single instruction", with a lot of implicit logic behind the scenes.
That depends on what we define as an instruction. It always does the subtraction, after all
@@varkokonyi my instruction also does the substraction, always
This is exactly what I was thinking. SUBLEQ is neat, but it's a little bit disingenuous to consider it a true single instruction machine code.
@@fuckoffpleaseify Yeah exactly this. It's truly single instruction. Everything after subtract is just more instructions.
The machine code at 10:40 will keep looping until address 4 underflows into positive numbers, at which point it will do weird things.
Yes, that is correct. The code is given just as a example of how the machine works. It isn't, as you say, actually useful.
would be curious to program an FPGA to do just that !
@Gary Explains at 6:30 why not just set inc1=-1 and just a=a-inc1? You don't need the zero register so you halve the number of needed instructions.
Yes, that would be more efficient, but the point of the demonstration was to show that if you want to add a number, in this case 1, then you start with the number you want to add and then turn it negative. It would be the same if you wanted to add 7. You want +7 but to do that you need to make it -7. It you were adding a + b then you would need the negative value of b, etc.
Nice video, it'd be good to see that CPU implemented in logic..
And not just VHDL or simulated logic... Using some old fashioned NAND gates.
So what's stopping you?
I want to see how all that works. Please make more videos about it!
Now implement it with NAND-gates!
Funny that you mention it. I knew a guy in who did just that in 1981.
Two levels of NAND-gates can express any sum of products equation. For you minimalists, any multibit computer can be emulated with a 1 bit computer.
It's kind of funny that the ultimate RISC is also the ultimate CISC -- the main distinguishing feature between RISC/CISC is not really fewer instructions, but restriction of source/destination operand types to exclude memory access in RISC, so they used to be called "load-store" architectures because you had to use instructions like LW and SW to transfer data between memory and registers. Although MOV on x86 is even more bloated than SUBLEQ, allowing for all sorts of crazy operations (like copying a word between two memory locations and incrementing index registers at the same time in one instruction).
What are the advantages of using this type of CPU?
None
@@GaryExplains So why would anyone spend his time in building a language for it? I am the type of guy that says if it is not broken don't change it.
@@alexandruilea915 Research that can lead to innovation. Why would anyone have spent their time researching quantum physics when classical physics worked so well for so long? Consider that without quantum physics the computing revolution wouldn't have happened.
Research can sometimes be without a clear practical real world advantage as its end, and still be extremely valuable later on.
@@alexandruilea915 Words fail me.
extremely simple, and so ultra low power and cheap.. disadvantage is that it will be slow.. (as slow as it gets)
That kind of clever esoteric piece of software. Like it!
If we were able to transform such code efficiently to the assembler of every CPU, we would have a plattform independent assembler.
I designed a mov computer in 1990. Has registers only. Many of which are specialized and are ports to funcional units. The functional units are ALU, stack, jumping, memory etc. Constants could be taken from the next instruction.
Can be extremely small or do multiple instructions in parallel. I mainly focused on the latter and it seemed to work well for digital signal processors. Instruction size of 16 bits combined in words of 32 or 64 bits. DSPs ofren use wide instructions. Moves can also happen while the function unit is still working, like memory access. This allows very simple parallism if organized correctly.
Never made it to hardware though. Later added a few condition upcodes from moves to the same place. This can be used when you want an extremely small instruction size of 8 bits with only 16 registers/ports.
Having multiple instructions is bloat.
You can still have different mnemonics on different use of the SUBLEQ instruction. That is often done in ordinary architectures, like MIPS and ARM.
There are no mnemonics since there is only one instructions with no variations.
i'm gonna write a xilinx rtl for this i think
Wonderful! I've heard about the MOVE-processor in the 90's. Nice to see the concept now in practice. And although I haven't done machine code since halfway the 90's, your video feels like slipping on an old glove :)
It is called movfuscator but it uses different types of move (to address, from value, etc) so technically it is actually 3 or 4 instructions (I haven't check exactly how many).
position Z is acting as a registry: holds value, helps in processing, is returned to 0.
it's a registry, for all intents and purposes.
It's a memory address, that's what all memory does.
Amazing, I wouldn't have even thought of this! If I had to guess, I would've thought maybe 10 instructions was the minimum possible, and I wouldn't have even known which of those instructions were necessary. Never would've guessed just 1 instruction, and that's an implied instruction at that, so no real opcode for it. I wonder if people are going to use this 1 instruction instruction-set as a proof of concept for future processors?
Therry Davis would have liked this...
It’s often forgotten that the x86 is a RISC design with one instruction only. That instruction is PLEASE, which is often omitted because programmers are rather rude. Joke aside, this is an awesome video.
mov in x86 is Turing complete. Just saying. There is a movfuscator, a single instruction compiler for x86.
Indeed it is. It uses lookup tables for arithmetic, which is an interesting solution!
You basically take control of load, store, sub and jlz instructions in one instruction. Nice.
I implemented a SUBLEQ machine as my 20% project at Google
In hardware ? or emulated ?
@@BokoMoko65 Emulated, in Java
@@StefanReich I did an emulated version today in JavaScript, and it was pretty easy
Would love the follow up video mentioned!
2:19 And I was thinking who is Link in description.
lol
If it hasn't been added, it would just be OCD :)
Glad that we figured out to multiply or decide by 2 is to just shift the bit once.
*GARY!!!*
*Good Morning Professor!*
*Good Morning Fellow Classmates!*
Morning Mark
MARK!!!
@@GaryExplains GARY!!!
MARK!!
MARK!!!
That reminds me of the "Computer '74", a project of the Dutch magazin "elektuur". The name not only stands for the year, but also for the piles of 74xx chips used to build that monster. It was programmed in octal, it had no opcodes but every command consisted of two addresses: from and to. It also had a hardware multiplier and a hardware divider!! It has not only memory addresses, but also special addresses for the adder, divider, and of course an address for the program counter. Funfact: It already had a diode matrix for the microcode.
Sorry, forgot the link: archive.org/details/Computer74
Simple; Power On, Power Off.
In this way the execution of a program can be visualized in the square address x address as the temporal progression from one point to another. An instruction is a point in the room address x address x address. The program itself is a discrete sequence of points in this room.
Great video. There is also a really nice talk here on UA-cam by Christopher Domas on how you can actually get away with only using the MOV instruction on x86. Branching is the biggest issue there, so you basically have to loop through all your code for every branch. But it's really neat, and a cool obfuscation method :) Look up "the movfuscator" on youtube.
Good lord don't let me fall on the hand of this kind of modafuckers. Imagine if they polymorfize the shit out of it.
All boolean operations can be made with just a series of "Nand" gates. For example tie the inputs of a Nand gate together you get a Not gate. Put that not gate on the output of another Nand gate and you now have an And gate. Arrange Nand and And in parallel and you can create OR/NOR with a bit more you can create XOR. You now have the 6 basic boolean operators from which you can design any digital circuit including a cpu. A Nand gate is the simplest gate to make as it only requires two transistors.
Why not copy a to b like this:
b = z - a
b = z - b
Assuming z is always 0 then this should work just fine, but takes half as many operations.
Edit: Looked it up, and it doesn't work because the instructions can't be represented in subleq. All instructions must be precisely on the form: x = x - y
@Richard Vaughn z is a constant 0, similar to inc1. It only has to be set once, it's not part of the copy operation.
@Richard Vaughn I'm referring to z in the way the video uses inc1, that's it. It assumes inc1 is 1, I assume z is 0. If you have an issue with that then would you say that the add example in the video is incorrect?
@Richard Vaughn Yes you can assume it's zero, it's done in the following example, it's done on the wiki page.
A concept we discussed at school way back in time was a single instruction CPU whose only instruction was MOVE - take data from one location in memory and write it to another location. Now, it was cheating somewhat since it relied on existence of coprocessors which would watch certain memory locations and do arithmetic operations on them, putting result into another location. But while the SUBLEQ CPU is essentially an academic toy as pretty much any calculation is extremely inefficient, the MOVE CPU is much closer to how a real CPU microcontroller works. And it was still single instruction CPU in the sense that all instructions were move instructions.
It must have been bad at listening to instructions. An extra stupid CPU.
Hi Gary!
Interesting, but I am not sure if the increase in the number instructions is really giving a massive improvement in performance, even if we eliminate the instruction decoding.
I read about a single instruction set CPU developed in Leyden university that I thought was very interesting, because it only has a load instruction with a very large number of registers that were input or output for arithmetic or logical instructions.
That allows for the compiler to organize the code into multiple instruction pipelines. Therefore the parallel performance can be scaled by adding additional arithmetic/logical registers to the CPU.
I thought that was a very interesting feature of that design.
As to the subleq, to it really has only utility in illustrating the feasibility, similar to the original Turing machine. Sure you can write universal code with it, but I will be a collosal pain and is probably not very fast.
What you are describing is NOT a RISC machine. It imay well be a one instruction machine - but it is a complex instruction.
Bogy Wan Kenobi , RISC is reduced instruction set computer so I say Gary use of the term RISC is correct.
@@insoft_uk The word "reduced" does not refer to the number of instructions. It refers to the complexity of the instructions. The instruction he describes, once it gets the full instruction from the instruction memory, is then required to go out to memory, again, twice, to get the two operands and then decide where to go next. That by definition is a complex instruction. The fact that there is only one of them is irrelevant.
I always read the "Reduced instruction set" in RISC as a reduced set of instructions, not a set of reduced instructions. Do you have a quote or a link that would substantiate your interpretation?
Bogy Wan Kenobi Surely "Reduced," refers to the instruction set, not the instruction. So, by definition, a simple instruction set is one with few instructions.
@@110110010 Sure. Just do a google search for ARM or for that matter a youtube search for the same thing. It is not an interpretation. It is the definition. The fact that you don't know this is testimony to either your youth or lack of professionalism. In fact, google "arm vs. intel". As for me . . . I am 35 years a computer hardware engineer.;
That "special subtract", given the implicit move and branch, is a complex instruction
Indeed it is, but nonetheless it is one instruction.
At 5:56 you assume there is a 1 in memory, which cannot result from the substract instruction itself. You need at least a second instruction to set a value to memory.
Those values are loaded into the memory, just like the program itself is loaded into the memory. You don't need a second instruction.
There are other one instruction sets as well. I've built a couple in FPGAs. My latest is a transport triggered architecture, where the one instruction is a move from one register to another. All functional units (think your ALU, load/store unit, etc) have known addresses on the bus, and the instruction might also specify a sub address which allows the functional unit to perform multiple operations. One register that's connected to the functional unit will be the "trigger" instruction -- such that, whenever you schedule an instruction, and it's decoded, executed, if that instruction moves data into a functional unit's trigger register, the operation will be performed that was specified -- i.e., if i say to move the value `5` into the ALU's trigger function and specify the ALU's operation as addition, then it'll use the other input as the current value on the data bus, and the execution of the CPU happens as a side effect of the moving of data between registers. These are also known as exposed datapath computers. Really neat, and solve the VLIW register file pressure problem.
Thank you, you are always thought-provoking!
Yes, but you need software cycles to simulate other instructions,in other words, you need resource to generate "more complex instructions". So, this is the way big instructions set, when correctly used can run faster a software, and reduce his "space or memory usage" at the same clock speed.
This one of the (also license story vs x86 world.) way you are using an arm soc in your phone.
But this thing sometime go, out of control.
Absolutely, nobody said this was efficient, this is an exercise in computer theory.
Haaa!!! That's *two* instructions in one. Not only a SUB, but also a BRANCH instruction fused into one. It is kind of a misleading video... *yet* i found the information in it AMAZING !!! (i just edited this comment lowercasing it:)
How do you define what "one instruction" means?
@@GaryExplains Ok, you could treat this question with two answers: Using a broad, abstract definition, it is "whatever you program the logic gates to do". In this case you could even build a "whole" program into One instruction, and give it a suitable name, like "CalculateTodaysWorldsMeanTemperature" passing as the only operand, a pointer to a table of relevant values to the calculation. The second answer could be, "one processing unit chosen from the minimal instruction set possible: , , , " these could be regarded as "one instruction" since they perform just one single, indivisible operation (explicitly given by its name). Regarding the second answer, the SUB instruction does , , and
The point of it is to build a general purpose processor using only one instruction. You can't add two numbers together with yours.
@@Waouben i don't know if i understood your question correctly, so i'll answer this way:
I certainly can add two numbers using my four types of instructions ,,, .
In this case, only three types of instructions are needed ,,.
First number is in addr, second number in addr+1, result in addr+2, in z80-style:
LD a,(addr); // load
LD b, a; // load (reg.)
LD a,(addr+1); // load
ADD a, b; // logic op
LD (addr+2), a // store
@@webgpu I'm talking about the CalculateTodaysWorldsMeanTemperature instruction. My point being that while you can design an instruction that can do anything you want, be it an entire program, most of them don't allow you to build a general purpose processor on their own. Which mean that however complex SUBLEQ is, it's still one instruction.
5:30 I'm interested in seeing how you can get a "1" from scratch. All memory locations contain random bits at the start. Generate the standard Fibbonocci sequence in consecutive memory locations.
How few primitive operations can you get away with, if SUBTRACT is not sufficient?
The constants are loaded into memory as part of the program. Look at the example with the "7 7 7" at address 3 to 5.
@@GaryExplains I know. But I'm interested in figuring out the minimum instructions needed when memory is randomized at startup. Having a pre-loaded constant with part of the program is no different then having another kind of instruction -- you are not, in fact, providing only a list of one kind of thing. Relying on numbers being part of the encoding of other instructions, I think, is cheating; or at least is another "thing" that you are counting on. Try it with a Harvard architecture, or an encrypted instruction so you can't depend on some instruction containing a specific number within its encoding.
The idea of "minimal" is a bit more slippery and subtle than "one instruction" leads us to believe. That one instruction does different things, conditionally. Imagine a VLIW control word -- that's one instruction, a uniform format for everything the CPU does. But it is far from minimal.
Your "one instruction" combines the actions of three simpler instructions, so it is a chamilian if you ignore the affects other than the one you want. Why not an instruction that does 20 different things, all of which can be easily ignored? That is, the fact that there is only one instruction is not a true measure of austerity.
It also relies on self-modifying code, and on the ability to read constants as well as abstract instructions.
@@JohnDlugosz If you allow 1 to be available at a hardcoded memory address (which would be a simple bit of electronics and nothing to do with RAM etc) then the answer is still 1 subleq instruction because you can create every other number by incrementing from 0.
This is both a RISC and CISC architecture, since the latter is about the complexity of the indviidual operations. And I consider this minus one to be quite complex (involving two addresses and conditional jump.)
Additionally, it is counter the idea behind RISC -- to only have simple instructions -- but since the formal definition of RISC deals with the cardinality of the set primarily, it is indeed RISC.
My first thought was to use add, but subtract is far better I guess :D
Well that's hard core RISQ. I didn't know it was possible to have one instruction. I heard in my youth that the simplest machine required a NAND, a NOT, and a XOR or something like that. Don't remember now. You should have demonstrated how you can do bit manipulations such as these. Absolutely necessary!
They may have been explaining that all logic can be built from a small number of logic gates. Apparently, you can build all digital logic from NAND, or NOR gates. In other words, choose one of the two, and then use those gates to build your logic. Sounds inefficient though.
i really like this! this can be very efficient if pipelined and jmps are better managed
Binary subtraction is actually a form of *addition.* It works like this:
1. Each bit of the _subtrahend_ ( i.e., the number that is to be subtracted from the other number, called _minuend_ ) is _inverted_ (i.e., if it's a one, it's changed to a zero, and vice versa), giving the _ones' complement_ of the subtrahend.
2. The ones' complement is then incremented by one to give the _twos' complement_ of the subtrahend.
3. The twos' complement is then *added* to the minuend, to give the result that includes a carry (i.e., an extra one to the left of the result, which is always discarded).
At the hardware-level, computers always add; they *never* subtract.
There is also a version based on addition, it is called Addleq, but according to Olaf Mazonka it is much harder to program in Addleq than in Subleq.
@@GaryExplains We are talking about two different things. You are talking about microcode, and I am talking about what happens at the hardware-level (i.e., logic-gates). Computers _never_ subtract at this level; there's no such thing as a binary subtracter circuit. An adder circuit is used for subtraction. And to perform the operation of negation of any positive binary number actually means to take the twos' complement of that number.
I'm guessing there's no silicon out there that is actually an OISC, right?
Inwould guess...NAND where you specify oparand locations, destination and bit length
It's been a while I've watched Gary's videos. Good to be back 😌😌
This was very cool! Thanks for sharing.
I thought this might be bbj (byte byte jump, it's on esoteric langs) but I've realised I might be misremembering it.
thank you for a tech video without binary the abstraction is nice
You can add one by storing -1 instead of 1 in your "inc1" memory address, taking away two steps
a = a - inc1
Which is equivalent to
a = a - (-1) = a + 1
Let me know if this is slower than the example in the video for some reason
6:01 -- This what is is being done.
(1) *inc1 = 1*
(2) *z=z-inc1 ; z = 0 - 1 = -1*
(3) *a=a-z ; a=7- -1 = 8*
In (1) if inc1 is set to -1 it would allow skipping (2) where z is being set to -1.
However, the way this system works (with so many steps needed because of the lack of proper instructions):
If you did this 100 times (you would do 200 sub-steps instead of 300 -- saving of 100 sub-steps).
EG. 3N - N = 2N ; you would save at most N sub-steps.
Performance wise this would still be O(N).
Even though doing inc=1 or inc=-1 will save you N steps.
It is RISC-style, clearly all the optimization (if any) would be done by the compiler.
However, the compiler doesn't really have much to work with in terms of optimization (normally a register set along with load/store instructions, for starters).
Because of that....the boost in performance from the above will be largely insignificant.
Yes, that would be more efficient, but the point of the demonstration was to show that if you want to add a number, in this case 1, then you start with the number you want to add and then turn it negative. It would be the same if you wanted to add 7. You want +7 but to do that you need to make it -7. It you were adding a + b then you would need the negative value of b, etc.
@@GaryExplains I see. I guess it was the fact that you referred to it as a constant that threw me off. Thanks for the answer though and awesome video!
Damn! Thanks for this video, amazing video, like always, thanks for the time and effort!
Gary... You are awsome. Good explanation.
This SUBLEQ is more of a 1 instruction CISC Computer. RISC Architecture is more about getting things done in 1 or the fewest possible clock cycles with optimized hardware.
Amazing video!
Okay, now set the initial conditions for this code to generate the entire universe.