Doing Boolean Algebra on My CPU - Superscalar 8-Bit CPU #34

Fabian Schuiki

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 27 вер 2024

КОМЕНТАРІ • 35

@alexw5093 5 місяців тому ⁺³
People like you, ben eater, great scott and more motivate me to build something and make a series myself. You have an inspirational piece of work going here and I greatly appreciate the time and quality you put into these!
@fabianschuiki 5 місяців тому
Thank you very much for the kind words! 😃
@dmoisset 3 місяці тому ⁺¹
Hi! I'm slowly catching up, so sorry for the bunch of comments appearing on old videos. Still loving the series!
A minor detail about what you said at 7:30 , PINs 15 and 16 can actually be used as inputs alright. What's missing is the "feedback loop" which means that when you're using pins 15 and 16 and output, you can not use that output also as an input. With the other outputs you can, essentially creating loops! that allows you for example to define something like an SR latch using the PLD with something similar to the typical pattern of two gates looped together. But for this application, where there are no loops and everything is combinatorial, you can use as many pins as inputs as you like (you can even use CLK and OE as inputs, given that you don't need tristating for your outputs).
@fabianschuiki 3 місяці тому
I'm very happy to read all your comments and feedback 🙂! Excellent point about the two somewhat special pins on the output side.
@OscarSommerbo 6 місяців тому ⁺⁴
I remember xoring registers with themselves to zero them. Ah… The joys of 6510 assembler.
@fabianschuiki 6 місяців тому
Haha indeed. Also a common pattern on x86 😀
@moshixmainframechannel 6 місяців тому ⁺⁵
great series!! thanks!
@fabianschuiki 6 місяців тому
Thanks! 😃
@eryksoowiej4427 5 місяців тому ⁺²
I know, I'm a bit late, jumping the gun a little bit; and maybe we're using these PLDs a few too many times, but implementing flags on one of them just seems to perfect; especialy because we can use their "Register" mode for the first time while using all of their pins (well the output enable is tied to ground at all times but there is no way around it).
First I will start with the simplest setup. You can fit in the logic to implement and latch *5* *flags* in *ONE* ATF16V8! Since we know we want the zero and sign flags we will need to connect all 8 bits of the ALU result (to the input only pins), this leaves us with just 6 more pins; if we connect the 2 sign bits of the two inputs to the adders we can then calculate the overflow flag since we already had the result sign bit connected. After that we can pass in the carry flag to remove that D-latch chip from the breadboard, leaving us with an additional bi-directional pin which could be used for example to implement a parity flag (either just latching the first bit or also xoring it with the sign bit).
The main problem with this solution is that all flags must always latched at the same time; so we could either update them during any instructions that write to the register file, but loose our carry and overflow flags; or only latch them when an ALU instruction is executed and be forced to test any value that we just written/moved to/between registers. However we could isolate those flags to an independent dual D-latch chip, which has independent clocks, and therefore could be triggered only at the right time. This solution has another benefit of freeing two additional bi-directional pins of off the PLD as the carry flag was routed though it to get it latched, which could be used for even more flags; and the overflow flag can still be evaluated in the PLD, but we can just set that output logic block to the "conbinatorial" configuration.
@fabianschuiki 5 місяців тому ⁺¹
That's a neat idea! The PLDs are almost too handy -- a lot of stuff starts to look like yet another thing that would be nicely solved by a PLD. I also don't think the separate updating of individual flags is going to be a problem: you can just define the flags to always update together, and only on a few instructions. In the superscalar/out-of-order world they are anyway more of a nuisance than anything else, so every step you can take to reign them in and make them as simple as possible is great. They end up being just an implicit register operand on any ALU and branch instruction, so the fewer weird things they support, the better.
@lawrencemanning 3 місяці тому ⁺¹
Have you done any tests to determine the current fMax? I’m curious. 😊
Very neat design, though the downside of GALs (etc) is they obscure what’s going on. I’ll let you off with the Boolean ops. 😂 I liked James’s solution too and it would have been a bit repetitive to solve it the same way!
@fabianschuiki 3 місяці тому ⁺¹
I haven't done any tests yet. It might be worth to statically compute the timing of the CPU and then compare it to the actual hardware, and then figure out where to place registers to cut the long paths.
@albinaberg 6 місяців тому ⁺³
Will you be rewriting the assembler at some point?
Feels like it could be improved quite significantly.
@fabianschuiki 6 місяців тому ⁺²
Yes, I totally agree. I'm planning to do another assembler episode at some point to do some refactoring and getting rid of some of the repetition and copy-paste. And a few new features such as jump labels and simple constant evaluation would be great!
@ke9tv 6 місяців тому ⁺³
I'd have probably moved all of the outputs on pins 11-18 one pin higher, to have pin 11 available as an output enable. (I know your current CPU doesn't need one, but I'd have a nagging feeling that sooner or later I'd have an ALU outputting to a shared bus.
It almost seems as if you could cram a full 4-bit ALU (similar to the classic 74181) onto a single PLD. I'm tempted to run the 74181 truth table through something like Espresso to see if it'd fit. Not tonight!
Nice job!
@fabianschuiki 6 місяців тому ⁺²
That's a cool idea! Yeah you might be able to squeeze all of that in here, and then have two 4 bit slices combine into a full 8 bit ALU. That's definitely worth a shot! 😃
You're totally right by the way: the ALU will definitely output to a bus. Not with the logic operator on the RHS path though, but some buffer at the result. I'm contemplating using some tristate multiplexer, to also have the ability to do flag register swapping/reading/writing.
@eryksoowiej4427 6 місяців тому ⁺¹
That wouldn't work for two reasons:
1. In "Simple Mode", pins 15 & 16 are output only, and pin 14 is used as an input; so it would have to be moved to pin 19 or preferablyt shift pins 2-9 to pins 1-8 and just relocate pin 11 to pint 9 as it is unused and is a lot closer to pin 11 that the pin 19 is close to pin 15.
2. The output enable special function of pin 11 only applies when the PLD is programmed in "Register Mode" and then only the the cells in the "Registered" configuration use this pin, other cells CAN'T access them AT ALL (or at least not without connecting the same signal to a regular input pin, also same goes for pin 1).
What one should do if one wants to have an output enable pin, is to just program the chip in "Complex Mode". It allows to repurpose the 0-th out of 8 product terms to enable / disable the output, which can be programmed to be ANY of the inputs or a product of them. However we can't forget about the fact that no-input pins have moved again; this time pins 12 & 19 are output only, which makes it impossible to have 4 4 bit nibbles on consecutive pins, as well as splitting the output only pins apart.
@fabianschuiki By the way, are you planing to select the fastest chips for the PCB version? If so, I wonder if you will go the extra trouble of dealing with PLCC chips (as for whatever reason they're always the fastest versions for these PLDs (specificaly ATF16V8C-5JX)); or would just using SMD versions of ATF16V8C -s (the ATF16V8C-7SU) be enough? The only difference between C and B versions is that C -s unlike B -s have what they call pin-keeper circuits instead of input pullups, they make the input keep it's state when left unconnected. Going from ATF16V8B -s to ATF16V8C -s is a 2x speed up and going form PDIP or SOIC to PLCC is an 1.5x speed up, and these stack for a grand total od 3x speed.
@fabianschuiki 5 місяців тому
Excellent point about the speeds. I haven't really decided on what exactly to go for there. My first goal is to get almost single-cycle CPU going, with whatever speed results from that. Once that works, I plan to undertake some measurements and modeling to figure out what the actual critical paths are and which parts of the CPU need balancing and pipelining. Since I'm planning ahead for OoO and superscalar execution, everything will be prepared for almost arbitrary pipelining and delaying.
@eryksoowiej4427 5 місяців тому ⁺¹
@@fabianschuiki Actually after thinking about this idea a bit more, I came to the conclusion that *IF* one was to attempt this, not only would trying to do this with ATF16V8 -s would be impossible (as pin wise there really only are enough pins to handle 3 bits at a time and therefore requiring at least 3 PLDs which would probably be slower than the current implementation, the complexity of making this idea work will most likely require more product terms than these PLDs can provide), even when we were to use their "big brothers" (the ATF22V10 -s) we would still need to program the high and low halves differently, an all of this is do to the shifting unit (and most importantly the "*ARITHMETIC* *RIGHT* *SHIFT*").
While Atmel decided not to draw how exactly each "OUTPUT LOGIC" block of an ATF22V10 is constructed, it probably is quite similar to the way that the ATF16V8 ones are when programmed in "Register" mode, but all of them have an independent output enable term regardless of ether it is latched or not, a common clock connected to pin 1 (which seams to be always available as an logic input unlike for the ATF16V8 -s), as well as anywhere between 8-16 product terms.
I came up with a reasonable pinout for the dual ATF22V10 version:
Pin : label ; note
1 : OE / CLK ; output enable for result outputs and either output enables or clock for flags
2 : An+4
3 : An+3
4 : An+2
5 : An+1
6 : An
7 : An-1
8 : Bn+3
9 : Bn+2
10 : Bn+1
11 : Bn
12 : GND
13 : Ctrl0
14 : Ctrl1
15 : Ctrl2
16 : Ctrl3
17 : NC / OF ; overflow flag for high half
18 : CO / CF ; carry out / carry flag
19 : Qn+3
20 : Qn+2
21 : Qn+1
22 : Qn
23 : CI ; carry in
24 : VCC
n - can either be 0 for the low half and 4 for high half
A-1 to A8 is the shift unit input (where A-1 and A8 need to be connected to the carry flag)
B0 to B7 is the second logic unit input
Ctrl0 to Ctrl3 is the ALU opcode input
Q0 to Q7 is the tri-state result output
The reason that the ALU output is backwards has to do with the fact that the center bi-directional pins have more product terms than outside ones, so the carry out/flag and MSB of the result have the most complexity available (even then I'm not sure that will be enough).
While it is possible to latch the flags locally, if you plan to share flags between different circuitry (like the once mentioned multiplier circuit) it is probably best to just create a "flags bus" and then latch them somewhere else more central (like close to the CPU instruction decoding or the write port (as more flags should be aquired there)).
@fabianschuiki 5 місяців тому
I have never worked with the ATF22V10 before, but I should definitely go and have a look. What you are proposing is really clever. It would be awesome to have pretty much the entire ALU crammed into a few PLD chips. Pretty flexible. Especially when you want to focus on things beyond just basic old ALUs.
@hwmland 6 місяців тому ⁺¹
Great video once again. When are you going to switch from FLASH to RAM? Would be nice to be able to read/write data...
@fabianschuiki 6 місяців тому ⁺¹
Thanks! 🙂 That switch is coming up pretty soon, after the ALU and some lightweight decoding.
@JaenEngineering 6 місяців тому ⁺¹
Those PALs are nifty little devices, i can see them coming in quite handy for other parts of the build.
Also, curious why you did the subract as a NOT+ADD(+ADD) rather than just a straight SUB? Did i miss something or did you just fancy a change? Lol! 😃
@fabianschuiki 6 місяців тому ⁺¹
😀 That was just to test and showcase the `not` instruction 😏
@JaenEngineering 6 місяців тому ⁺¹
Gotcha. Makes sense.
Also, what program are you using to write the code in? I really need to stop putting off learning to code and having an environment to play in seems like a good place to start.
@fabianschuiki 6 місяців тому ⁺¹
@JaenEngineering I'm using Sublime Text 🙂
@JaenEngineering 6 місяців тому ⁺¹
Awesome. Will definitely check that out. 😊
@WaldoHazeleger 5 місяців тому ⁺¹
Too bad you use GAL's to make a Logic unit. You could use 8x153's to make logic functions. See what they did on the Gigatron CPU. Nice series though. Discovered it today !!!
@fabianschuiki 5 місяців тому ⁺¹
Thanks! 😃 Yeah the 8x 74HC153 multiplexers were my initial plan. But it's pretty cumbersome in terms of wiring and has been done really well by a few other people. Since I'm trying to get to out of order execution, I didn't mind skipping a whole bunch of wiring and a high chip count. I might still switch to the mux-based approach for the PCB though, if it saves on area. One downside of the PLDs is that you can't really make them SMD without losing the ability for reprogramming. So this comes down to whether I can fit 4 SMD mux chips in the same area as 1 DIP PLD chip 😏. But I agree, the PLDs feel a bit like cheating.
@WaldoHazeleger 5 місяців тому ⁺³
@@fabianschuiki If you are cheating, you also could have made the entire ALU in PLD's. I did the 74181 ALU once in an EEPROM
@fabianschuiki 5 місяців тому ⁺¹
Yeah, very true indeed 🙂. Maybe I could do a simplification pass over the design as soon as two ALUs are needed.
@WaldoHazeleger 5 місяців тому ⁺¹
@@fabianschuiki You already have a second ALU in the program counter 🙂
@fabianschuiki 5 місяців тому ⁺¹
Haha 😅 But that's a 16 bit ALU that can only add numbers 😁. But yeah, there's often a compact design for small processors where the PC is just an incrementer circuit, and you use the ALU to combute jump destination addresses. But that somewhat relies on your ALU, registers, and memory bus having the same bit width.

Наступне

Автоматичне відтворення

Adding Flags to My CPU - Superscalar 8-Bit CPU #35