I might have made it look too easy - as if I was designing this stuff "casually" but actually I am not 🙂 It took me many many hours to reach this goal.
2 days ago i decided on making my own CPU and yesterday i was thinking how to ensure and measure the efficiency of my instruction set. Today i stumbled across this video. Thanks YT. I suppose that the best way to measure efficiency is to have the CPU's perform a fixed task at fixed clock speeds and measure the time it takes to complete. To have a fair comparison of architectures/instruction sets, compilation should happen with the default compiler at default optimization settings. The task would probably best be a single C program with many calculations, branching, looping and function calls. The 2 important measurements would be binary file size and program duration Would you perhaps have any good resources on (amateur) CPU development? (I am going to use an FPGA for this)
the Agon is at a disadvantage because it has to do graphics to its display over a serial link to a ESP32 microcontroller. So even though it has an enhanced Z80 compatible running at 18MHz, it's CPU prowess will be humbled by its display bottlekneck
Would be a blast to try out a really optimized 8-bit FPGA-based platform, for example, the well-known Picoblaze Xilinx/AMD 8-bit mini-controller is very nice and I did a lot of damage with that one. I don't own one of the later FPGA-development boards but I think it ran at 100Mhz even in the old ones and each instruction takes exactly 2 CC's => 50MIPS. It only holds of 1024 bytes of RAM (I think it was 512 instructions max) otherwise you to introduce bank switching, which brings down performance. I wonder if I can fit the program into that space.....
I actually started with implementing this using an Assembler/Emulator solution. Currently I'm just outputting 32x22 chars/pixels increasing the value for every line => Takes around 1ms in total. A wild guess would be like 1000CC (500 inst per pixel) which will end at around 1s for the whole picture.
Another great achievement, I always love your videos. My hacked CSCvon8+VGA can do the Mandelbrot in about 2 seconds at ~6.3MHz but its got 'hardware' multiply (8x8 bits). It would be interesting to see how fast a 6809 based computer can go. I would be very interested in a Ultra version on PCB, if you could make that happen then I would definitely build one.
Hi David, no hardware muliplication on board here ;-) just a *very* efficient software implementation. Actually, I am currently not planning to release a PCB version of the Ultra since I fear that manufacturing tolerance of parts may come into play. This is really a "close to the limit" proof-of-concept design. But something with VGA and PS/2 may follow at a later stage.
This is fantastic. I wonder tho, Why does no one make 8 bit cycle instruction cpu's ? Seeing the CPI and CPI set to one would be satisfying. A next project perhaps ? Wink wink*
Thanks for following along. Soon there will be a new Minimal 64x4 coming out displaying the full feature set (4x processing power of a C64). Some explanation will follow...
I haven't stated it explicitly, but I think Matt uses his benchmark for retro-style 8-bit systems. I hijacked it for DIY TTL computers. If you can build a TTL version of what you propose above that'll count as well.
Hm, I am not a fan of modular design of that (rather small) size. I drives the cost up, actually. And why make it fit into a C64? You would need a different keyboard anyway...
Good modular design is easy to troubleshoot for beginners, and 10x10 is cheaper to order. Yep, keyboard is different, just an idea to make childhood dream come true - understand the computer, make the same. But what about mITX format? People could simply put them in available PC case - complete device. @@slu467
So, even using not so powerful SAM CPU h/w architecture you've managed to develop very efficient CPU instructions set? Did you perform some simulations/benchmarks while designing the CPU? Please make a video comparing instruction set comparison, why your Mandelbrot implementation outperforms 6502 & Z80!!! please please
Although technically my design is of the SAP type it took much cumulative work to get it to this level. "Minor" details become quite important at 10MHz with constant simulations/benchmarks for many months 🙂Maybe more about it in the future but I guess most people will find this boring...
Guys let's vote for instruction set explanatory/comparison vision! Of course your DIY h/w is brilliant. But instruction set is different and very interesting topic. Probably you've invented very powerful 8bit instruction set which is even more important than just one working and fast home computer.@@slu467
Excellent video and great achievement in optimization👍. Can the current Minimal64 PCB, be tweaked to a higher performance? With the more efficient micro code and/or higher clock rate? 🙂(Tip: if you plan on an next PCB design, I am interested. Make it so, that standard casings can be used. 😉)
Hi Lucas, oh, yes, the Minimal 64 can be "tweaked" to run twice(!) as fast with what I have learned here. That is what I am currently having fun with... watch out for the... Minimal 64x2 ;-) But there is still some work to be done... Can you point me to "standard casings" you intend to use?
Hello Slu467, The pcb can be conform mini-ITX, flex-ATX or micro-ATX, to be used in standard pc casings. An other (a more DIY) option is, to make it fit in widely available containers, that can be fitted in to a casing for the Minimal. For example: the 212x212x35mm (transparent) FerreroRocker Giftbox; Mepal 195x125x20/52mm Lunchbox "Take a break" flat/large; IKEA 240x350x60mm HARVMATTA, 310x230x70mm HÄSTVISKARE or 460x250x30mm ELLOVEN (this last one, has room for a keyboard underneath). Or something similar. (Ikea, measurements are estimated inner dimentions. The FR Giftbox and Mepal are exact inner dimentions).😀@@slu467
The I/O must play a significant role here. The UART cpu has instructions just for the the UART, while that must take many cycles for other computers. Is there a way to do all the calculation in memory only, so that the I/O is not included?
I wonder how George Foot's 35MHz 6502 would rate? Extrapolating from the other 6502 based figures, I guess that they would be roughly 0.74s with 14.8Mips!
I think it is one thing to have a fast CPU and another to have a fast CPU interacting in a meaningful way with RAM and peripherals and stuff. For the Minimal, the bottleneck is RAM/FLASH access times.
much appreciated vid. and new Minimal! yhea that was amazing thanks, knew you really rock👍.
In the words of ProjectFarm, “Very impressive!”
Thats absolutely nuts! People just casually designing ttl cpus, I want to be that good!
I might have made it look too easy - as if I was designing this stuff "casually" but actually I am not 🙂 It took me many many hours to reach this goal.
Always fun to see your videos! :) Entertaining as well as educational!
Waw, thank you very much!
btw, you can run multiple 8-bit cpu's side by side like a gpu
2 days ago i decided on making my own CPU and yesterday i was thinking how to ensure and measure the efficiency of my instruction set. Today i stumbled across this video. Thanks YT.
I suppose that the best way to measure efficiency is to have the CPU's perform a fixed task at fixed clock speeds and measure the time it takes to complete.
To have a fair comparison of architectures/instruction sets, compilation should happen with the default compiler at default optimization settings.
The task would probably best be a single C program with many calculations, branching, looping and function calls.
The 2 important measurements would be binary file size and program duration
Would you perhaps have any good resources on (amateur) CPU development? (I am going to use an FPGA for this)
the Agon is at a disadvantage because it has to do graphics to its display over a serial link to a ESP32 microcontroller. So even though it has an enhanced Z80 compatible running at 18MHz, it's CPU prowess will be humbled by its display bottlekneck
I wonder what happens if i run Minimal 64 on scratch
😁
Would be a blast to try out a really optimized 8-bit FPGA-based platform, for example, the well-known Picoblaze Xilinx/AMD 8-bit mini-controller is very nice and I did a lot of damage with that one. I don't own one of the later FPGA-development boards but I think it ran at 100Mhz even in the old ones and each instruction takes exactly 2 CC's => 50MIPS. It only holds of 1024 bytes of RAM (I think it was 512 instructions max) otherwise you to introduce bank switching, which brings down performance. I wonder if I can fit the program into that space.....
I actually started with implementing this using an Assembler/Emulator solution. Currently I'm just outputting 32x22 chars/pixels increasing the value for every line => Takes around 1ms in total. A wild guess would be like 1000CC (500 inst per pixel) which will end at around 1s for the whole picture.
Another great achievement, I always love your videos. My hacked CSCvon8+VGA can do the Mandelbrot in about 2 seconds at ~6.3MHz but its got 'hardware' multiply (8x8 bits). It would be interesting to see how fast a 6809 based computer can go. I would be very interested in a Ultra version on PCB, if you could make that happen then I would definitely build one.
Hi David, no hardware muliplication on board here ;-) just a *very* efficient software implementation. Actually, I am currently not planning to release a PCB version of the Ultra since I fear that manufacturing tolerance of parts may come into play. This is really a "close to the limit" proof-of-concept design. But something with VGA and PS/2 may follow at a later stage.
This is fantastic. I wonder tho, Why does no one make 8 bit cycle instruction cpu's ? Seeing the CPI and CPI set to one would be satisfying.
A next project perhaps ? Wink wink*
I see in 2.0 version memory-toPC logic changed. can you describe changes? maybe another video about Minimal evolution?
thanks for your job.
Thanks for following along. Soon there will be a new Minimal 64x4 coming out displaying the full feature set (4x processing power of a C64). Some explanation will follow...
Do the rules prohibit using a multi-processor or ultra parallel GPGPU-like system where every pixel is calculated in parallel with its own processor?
I haven't stated it explicitly, but I think Matt uses his benchmark for retro-style 8-bit systems. I hijacked it for DIY TTL computers. If you can build a TTL version of what you propose above that'll count as well.
A comparison to a 68000 or 68008 (to stay 8 bit) based system would be interesting...
... but I will leave that to the "owner" of the "8-Bit Battle Royal".
Can you make your new versions modular 10x10cm for easy/cheaper reproduction? Or 1:1 replacement for C64, XZSpectrum or mITX format?
Hm, I am not a fan of modular design of that (rather small) size. I drives the cost up, actually. And why make it fit into a C64? You would need a different keyboard anyway...
Good modular design is easy to troubleshoot for beginners, and 10x10 is cheaper to order. Yep, keyboard is different, just an idea to make childhood dream come true - understand the computer, make the same. But what about mITX format? People could simply put them in available PC case - complete device. @@slu467
So, even using not so powerful SAM CPU h/w architecture you've managed to develop very efficient CPU instructions set? Did you perform some simulations/benchmarks while designing the CPU?
Please make a video comparing instruction set comparison, why your Mandelbrot implementation outperforms 6502 & Z80!!! please please
Although technically my design is of the SAP type it took much cumulative work to get it to this level. "Minor" details become quite important at 10MHz with constant simulations/benchmarks for many months 🙂Maybe more about it in the future but I guess most people will find this boring...
Guys let's vote for instruction set explanatory/comparison vision!
Of course your DIY h/w is brilliant. But instruction set is different and very interesting topic. Probably you've invented very powerful 8bit instruction set which is even more important than just one working and fast home computer.@@slu467
Excellent video and great achievement in optimization👍. Can the current Minimal64 PCB, be tweaked to a higher performance? With the more efficient micro code and/or higher clock rate? 🙂(Tip: if you plan on an next PCB design, I am interested. Make it so, that standard casings can be used. 😉)
Hi Lucas,
oh, yes, the Minimal 64 can be "tweaked" to run twice(!) as fast with what I have learned here. That is what I am currently having fun with... watch out for the... Minimal 64x2 ;-)
But there is still some work to be done... Can you point me to "standard casings" you intend to use?
Hello Slu467,
The pcb can be conform mini-ITX, flex-ATX or micro-ATX, to be used in standard pc casings. An other (a more DIY) option is, to make it fit in widely available containers, that can be fitted in to a casing for the Minimal. For example: the 212x212x35mm (transparent) FerreroRocker Giftbox; Mepal 195x125x20/52mm Lunchbox "Take a break" flat/large; IKEA 240x350x60mm HARVMATTA, 310x230x70mm HÄSTVISKARE or 460x250x30mm ELLOVEN (this last one, has room for a keyboard underneath). Or something similar. (Ikea, measurements are estimated inner dimentions. The FR Giftbox and Mepal are exact inner dimentions).😀@@slu467
The I/O must play a significant role here. The UART cpu has instructions just for the the UART, while that must take many cycles for other computers. Is there a way to do all the calculation in memory only, so that the I/O is not included?
Actually the graphical output requires only a very small fraction of compute time here and does not play a significant role.
I wonder how George Foot's 35MHz 6502 would rate? Extrapolating from the other 6502 based figures, I guess that they would be roughly 0.74s with 14.8Mips!
I think it is one thing to have a fast CPU and another to have a fast CPU interacting in a meaningful way with RAM and peripherals and stuff. For the Minimal, the bottleneck is RAM/FLASH access times.
Nery nice