CGA Graphics Programming: Even faster circles!
Вставка
- Опубліковано 20 кві 2024
- We figure out how to draw really fast circles on the IBM CGA adapter, totally smashing our record from the last video.
Code for this episode:
github.com/wbhart/PCRetroProg...
MartyPC (by dbalsom aka GloriousCow):
github.com/dbalsom/martypc
Fond memories of early graphics and assembler :)
Thanks. Optimizing assembly code for speed is my passion.
UA-cam's algorithm at it's best - subscribe ! :)
Brought back so many memories of my first steps in programming - on IBM/Apple clones in the early 90's (eastern europe thing);
I miss those (failed) teenage attempts at 3D rasterization on a 80286/CGA... and 30 years later, I still look at the assembly output of my code and itch for (micro)optimizations.
Thanks, I will have fun watching your videos!
Yeah the algorithm did surprisingly well finding people to watch this one. It's by far the largest number of viewers for a video on this channel.
Me clicking on this vid thinking CGA stood for "conformal geometric algebra" 😭
Same, it didn't help that there were circles here too
Really been having fun watching this series. Seeing it the hands of a pro makes all the work I put into MartyPC feel worth it. I have a number of features in mind that might be helpful for benchmarking routines.
Thanks for the very kind comment and for all your hard work on MartyPC!
Straight-up facts! Good video, thnx.
Awesome
oh that prompt ... i spent a dozen years looking at "c:\>" (until win95 -- win 3.1 still required DOS), so many days dealing with autoexec.bat & config.sys .... (my trauma from "command line" days are so huge i never touched linux OS...)
What language is that example code in? Why do you have multiple 1-bit shifts in a row in the assembly instead of combining them?
That is the language Julia. I use it because it is pretty close to pseudocode and readable enough. Note that in some videos I use a drawpixel function, which is not an actual command in Julia. I just made that up for the presentation.
To combine multiple 1-bit shifts on the 8086/8088, one had to put the shift count in the CL register. There was no multibit shift by an immediate value. The problem with using CL is that it takes up an 8 bit register which we are using for other things, and it wasn't really faster anyway. So typically, unless you want to shift by a variable number of bits, instead of a constant number of bits, you use individual shifts. Each shift by a bit can then be counted as 4 cycles, typically.
@@pcretroprogrammer2656 If you're stuck with an 8086, could you perhaps use some of the segment registers to store data? Only problem is that you make multiple function calls, so it'd be difficult to use ss, and you appear to have data all over making ds and cs difficult.
@@pcretroprogrammer2656 usually it's more like 8 cycles, because each opcode byte fetch takes 4
Have you looked into Bresenham's algorithms?
Sure. I implemented Bresenham's line drawing one on this channel.
The midpoint circle algorithm, which I here generalise for ellipses, is itself a generalization of Bresenham's line drawing algorithm.
There are very many versions of it on the web.
I haven't looked into any other Bresenham algorithms other than his line drawing routine and the various generalisations to circles and ellipses though. I'm aware there are generalisations to general conics, but these are pretty incomplete as far as actually usable scan conversion goes.
Do you have some specific Bresenham algorithm in mind?
@@pcretroprogrammer2656 Those are the ones. I implemented versions in ASM (with Turbo Pascal calling convention) in the 90s (based of descriptions in Richard Ferraro's book "Programmers Guide to the EGA and VGA cards", Chris D. Watkins' code and descriptions [various books] ) and they were quite fast compared to the standard graphics library that came with TP 5.5. Of course I am going by how I experienced it ~30ya, would probably feel slow today.
We had to do bit plane switching for some [4 bit EGA/VGA] modes which complicated things a bit (you needed to know if you are in packed display or bit plane mode) beyond the bit masking required by having multiple pixels per byte. Of course you don't want to address a single pixel at a time (masking & switching) when drawing the horizontal runs (top and bottom 1/4).
This was my the first video of yours I've watched, made me a bit nostalgic.
One of the things I really like about this code is the fact of the havoc it wrecks on the PC because of disabling the dram refresh, and it only keeps its code alive because it's the only thing run by the processor! Could you squeeze in some logic in that refreshed area to do a fancy demo?
I think this comment ended up on the wrong video, as this code doesn't do anything fancy like turn DRAM refresh off.
But to answer your question, yes you can definitely do a fancy demo effect with DRAM refresh off. There is plenty of room to do all sorts of cool things. I intend to do an example of this on the channel fairly soon.
@@pcretroprogrammer2656 Ah, I thought this code also messed with the dram because you were only using registers, but I did get confused with the previous video. But in any case, your reply is what I was looking for. Hoping to see that in the future!
poor havoc. The reason to use registers is because they're always the fastest.
I don't have enough brain bandwidth for this... 🤪
Multiplication by 50 via LEA could be faster ?
LEA doesn't have the multiplications by a constant that the 386 has for example. But you can use it for adding various registers. So far I never found a situation where I could get anything more out of it, but it is presumably possible.
could something like GlaBIOS' CGA optimizations make things even faster?
I don't believe so. Drawing pixels is the only thing it really does faster, but using the BIOS to draw pixels is the last thing you want to do if you want high performance code. It is always going to be faster to write directly to video RAM and to simply update the information for drawing pixels as you go, rather than recomputing it every pixel (which is what the BIOS basically has to do).
I bet it would be interesting to see the difference in speed between direct writes and using the BIOS routines to draw pixels. That and benchmarking the different BIOS ROMs doing it would be neat too.
@@Hiphopasaurus From experience, the standard BIOS routine is VERY slow, and the accelerated routines are only a couple of times faster.
OSU
Elipses are like communist rectangles