Decoding ALU Micro-Ops - Superscalar 8-Bit CPU #33

Fabian Schuiki

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 27 чер 2024
My homebrew CPU can already do a lot of interesting instructions with its ALU: ADD, ADDC, SUB, SUBC, NEG, SHLL, SHLC, SHRL, and SHRC. But not all potential ALU operations are accessible as instructions. Due to the limited number of instruction bits we have available, operations like NOT or SHRA are not accessible from a program. In this video, we are going to fix this issue by introducing a decoder for ALU micro-operations. By using a Programmable Logic Device, we can take a few instruction bits, interpret them as an ALU opcode, and decode them into a all the control signals we could ever need in the ALU. This finally allows the CPU to execute NOT and SHRA instructions, and paves the way for AND, OR, and XOR in the future.
This video series explores the concepts and techniques that make modern computer processors so incredibly fast and powerful. I build my very own 8-bit processor from individual logic gates and gradually evolve it to become a superscalar out-of-order machine. Along the way, we take a deep dive into contemporary computer architecture in a hands-on fashion and rediscover some of the foundations of modern computing.
Previous Video: • Adding Bit Shift Instr...
Series Playlist: • Build a Superscalar CPU
ALU Playlist: • Homebrew Arithmetic Lo...
GitHub Repository: github.com/fabianschuiki/supe...
- Programmable Logic Device (PLDs): en.wikipedia.org/wiki/Program...
- Programmable Array Logic (PALs): en.wikipedia.org/wiki/Program...
- ATF16V8B: www.microchip.com/en-us/produ...
- Galette compiler: github.com/simon-frankau/galette
- Programmer (TL866II+): www.autoelectric.cn/en/tl866_m...
Disjunctive Normal Form:
- • How Computers Add Numb...
- en.wikipedia.org/wiki/Disjunc...
Two's Complement:
- • How Computers Fake Neg...
- en.wikipedia.org/wiki/Two%27s...
00:00 - Intro
01:42 - Available Operations
04:53 - 16V8 PLD Chip
08:57 - Breadboard
12:26 - PLD Programming
19:15 - ALU Micro-Operations
26:36 - Decoder Testing
29:20 - Integration into CPU
30:23 - Updating the ISA
31:47 - Updating the Assembler
33:47 - New Program
38:10 - Testing
43:00 - Recap
44:36 - Outro
#alu #homebrew #8bit #breadboard #superscalar #computer
Наука та технологія

КОМЕНТАРІ • 33

@dmoisset 21 годину тому ⁺¹
A cool detail about ATF16V8s is that they have internall pull-ups on pins, so it's actually correct to leave inputs unconnected. Pins will float up and won't suffer from the noise randomly toggling your FETs and eating power. Those are described in section 7 of the datasheet
@fabianschuiki 18 годин тому
Great point! Feels a lot like the 74LS series of logic with all the high-side pullups 🙂. I'm still not entirely sure whether I want to have the operand data buses pulled low by default, which would mean that I'd have to add a buffer in front of the ALU inputs. Otherwise I could also let the ALU pull RD1 and RD2 high through the ATF16V8's pullups.
@Artentus 3 місяці тому ⁺⁶
Negating with carry is actually a usefull although probably a rare operation. Allows chaining negation like addition and subtraction.
@fabianschuiki 3 місяці тому ⁺⁵
Oh that's an excellent point. That would be a useful thing to have, for wide negations. Luckily, with the ALU op decoder, it's pretty easy to go and add such an op 🙂. I'll probably have to upgrade to 5 bit ALU opcodes soon enough 😁
@TheMason76 3 місяці тому ⁺¹
Great video. Can't wait to see the next one(s)
@fabianschuiki 3 місяці тому
Thanks! 🙂
@OscarSommerbo 3 місяці тому ⁺³
The strange/useless operation eliminated is what was "illegal op-codes" in many 80s 8-bit micros. Most were useless, but some were useful and could shave off a clock cycle when used creatively. Fabians approach is of course valid and "more correct" but the inclusion of the illegal op-codes in early home computers is an interesting anecdote.
@fabianschuiki 3 місяці тому ⁺³
Yes we definitely miss out on those quirky but useful undocumented instructions like this 🙁. You might still get them if you don't use a ROM for decoding, but some logic instead, because these ops often hide in optimizations to the decoder logic. The problem with my approach is that I'm using an ALU decoder, plus an instruction decoder later. So the illegal ops would have to survive two steps of decoding, which gets very unlikely. I was planning to have the instruction decoder throw an Illegal Instruction exception for every unknown bit pattern, but you're making me reconsider that 😃
@OscarSommerbo 3 місяці тому ⁺²
@@fabianschuiki Stopping the user from accessing undefined op-codes is a very modern idea, and is a way to ensure uniformity. I believe that the home computers simply skipped that logic to save on logic gates. Either way you go will be interesting, but I would probably include the function to hinder access to the undefined op-codes.
@fabianschuiki 3 місяці тому ⁺¹
👍 It would be cool if a small operating system would abort your process if it encounters any illegal instructions 😏
@eryksoowiej4427 2 місяці тому ⁺²
@@fabianschuiki I have an interesting idea, that solves this problem quite gracefully. What if you were to add a special instruction that saves a byte of data from program memory to an internal (read-only) register (or FIFO queue of them) to store an ALU "user function"/-s, and then the programmer can use an another new instruction to bypass the ALU decoder and use the contents of that register to controll the ALU instead. That way, the program has minimal overhead when executing these functions, but the programmer has access to *ALL* of the ALU functionality (even if not to all of it at the same time).
@fabianschuiki 2 місяці тому ⁺¹
Oh that is a clever idea. So you'd essentially make the programmer create their own equivalent of instruction prefixes in x86: they push state into the ALU into that queue, and then issue a generic opcode to pop from that queue and execute any user-defined function. That's a cool idea.
@newklear2k 3 місяці тому ⁺¹
You have big Ben Eater vibes. That's a compliment.
@fabianschuiki 3 місяці тому ⁺¹
Thanks 🙂
@schrodingerscat1863 3 місяці тому ⁺³
In simple mode the PLD is similar to an EPROM and you could have used an EPROM for implementation. I remember using similar PLDs back in the 80's when I was at university, back then they were very new and rarely used.
@fabianschuiki 3 місяці тому ⁺¹
Yes I agree, an EEPROM would have definitely worked here. And it would be strictly more powerful, because the EEPROM can store any distinct bit pattern for every input. That allows it to represent *any* boolean function, not just the ones which have a DNF with a limited number of terms.
The reasons I went with the PLD were that they are a lot smaller (I can't find any DIP EEPROMs that arent huge 30+ pin wide DIPs), and I wanted to try this particular breed of programmable logic 🙂. Going to use an EEPROM for the instruction decoding for sure though 😀
@schrodingerscat1863 3 місяці тому ⁺²
@@fabianschuiki Yes, EPROMS are much larger packages for sure and much slower too. I have used write once PROMs in the past for very fast operation but they are of course not reprogrammable so not ideal for tinkering. PLDs are worth experimenting with, they can replace a lot of discrete logic, are cheap and fast.
@fabianschuiki 3 місяці тому
One downside they have is a lack of miniaturization. They do exist in some SMD packages, but I don't think you can do proper in-system programming with them 😕. Sou you're stuck with the DIP or maybe square-ish bent-pin package.
@schrodingerscat1863 3 місяці тому ⁺²
@@fabianschuiki They are not programmable in system at all, much like EPROMS, they are from an era when the idea of in situ programming or remote updates wasn't a thing at all. For more modern packaging CPLDs are your only option but these are way more complex devices that are much more difficult to program and are way more expensive.
@fabianschuiki 3 місяці тому
Yeah... At that point you could just build a big CPU in an FPGA and call it a day. But where's the fun and blinking LEDs in that? 😉
@mrengstad 3 місяці тому ⁺²
Arithmetic right shift of negative numbers isn't exactly like division by 2. Try -1 and you get -1, and the same of -3, you get -2. not -1. It always rounds down, so -0.5 -> -1, -1.5 -> -2. This could be fine, and often is, but it is something to be aware of.
@fabianschuiki 3 місяці тому ⁺¹
Yes, that is a great point! It always rounds down towards negative infinity, while you likely would expect it to round towards zero, seeing that 1/2 goed to 0 as you mention. Thanks! 🙂
@JaenEngineering 3 місяці тому ⁺²
Cool little device. Definitely more elegant than using an EEPROM. I'm guessing the plan next will be to replace the XOR based inverter circuit with a full multiplexer based Logic Unit?
@fabianschuiki 3 місяці тому ⁺¹
Yes exactly! I haven't really found a better approach than the implementation with muxes. Which is sad, because the muxes are very wasteful due to them having two 4-way muxes sharing one set of control lines. You could use more of these 16V8 PLDs, and handle maybe 2 bits per package, but you'd still need 4 chips, and they are about 1.5x the size of a mux chip. Doesn't feel like a real improvement.
@JaenEngineering 3 місяці тому ⁺²
Not to mention cost. Those PALs are quite expensive compared to a cheap muxer. It does also open up the possibility of performing two bitwise logic operations simultaneously by feeding each of the 2 sets of inputs a different "truth table". Not sure if that could be of any use...?
@fabianschuiki 3 місяці тому
Haha great point, you could derive to separate results from the same input bits. I can't think of any use of this on the spot, but maybe there is one!
@eryksoowiej4427 3 місяці тому ⁺²
@@fabianschuiki I might be a bit late to say this, but you musn't have read that datasheet too carefully, as you totaly *CAN* fit *4* *BITS* worth of muxing (for the logic unit) inside of *ONE* ATF16V8B! In "Simple Mode" the IC treats inputs 1 (I/CLK) and 11 (I9/*OE) as regular inputs, and pins 12-14 and 17-19 are *BIDIRECTIONAL*; so for example you could use pins 1-4 for 4 bits of LHS input, pins 5-8 for 4 bits of RHS input, pins 11-14 for 4 bits of control signals (since they are shared between muxes (in the original design) anyway) and last but not least pins 15-18 for the 4 bit result and still have pins 9 (input only) and 19 (input/output) free for expansion and with all bits (of each 4 bit nibbles) neatly arranged in order, next to one another!
@fabianschuiki 3 місяці тому
@@eryksoowiej4427 This is a brilliant idea. 🎉 It hadn't crossed my mind that I could trade some of the output pins for additional input pins. As you suggest, I could handle 4 bits with a single PLD chip, such that 2 chips could do it all. And the best part is that I don't really need all 16 possible logic functions, but only 6-8 of them. So instead of accepting a 4 input truth table, I could make the PLDs accept a 3 bit opcode instead. That would save me an output at the ALU decoder PLD, which simplifies a few things further down the road. Thanks for the exciting idea! 🎉🥳😃
@tmbarral664 3 місяці тому ⁺¹
tiny thing I saw, which I was doing before I was caught by a prof :)
it's an op with 4 bits. And it's a 4-bit bus. Key is the hyphen here, making a word thus no need for plural ;)
@fabianschuiki 3 місяці тому
😀👍

Наступне

Автоматичне відтворення

Doing Boolean Algebra on My CPU - Superscalar 8-Bit CPU #34