I've been following this project from the start. When I look at your doomed demo, it reminds me how amazing this project really is. The whole thing built from discreet logic components - you sometimes forget this - and you've got this 3d game running in VGA?!? Bonkers. So much kudos to you James..
Thanks Peter! When I started editing I was worried I had wasted my time with the whole process as it looked like nothing was changing in the video captures!
Yeah. The screen capture *really* highlights it. Was that a tremendous effort to edit into the video? Would be helpful on future such scenarios. (The glitches are crystal clear, even watching this episode on a mobile.) And a really cute transition to move the picture-in-picture crop of the silver display back into the scene too 😀
Another great video James! Its always interesting watching the trouble shooting side of things and the problems you come across. The project has come along so far and its absolutely amazing what youve achieved! Bonus points for the smooth video transition of your momitpr at 9:17 😊
Hah! Glad you appreciated that. Using the screen bit of the second feed but it looked really weird when I faded to that feed full frame. That was the only solution that felt right.
It's so accesible an real, some makers create, sure amazing, 8 bit systems but skip the guts and glory, others make systems that are perhaps beautiful or structurally tiny or indeed just nostalgic. You strike a good balance between showing every single step and thought and just doing a montage. In addition whilst using obstensively archaic and limiting packages the drive to use newer acrchitectural and programatic paradigms really shines in your results @@weirdboyjim It's win win for me.
A wonderful post as always! Love to see the progress. The way you walk through your problem solving is wonderful motivation for me to work on my own projects :)
As always James... this project is very inspiring. Im mid journey through my own 8 bit project. Mine isn't pipelined, but it does have interrupts and port selects. You could speed up the CPU by using faster UV Eproms... Jameco has some 28 pin DIPS rated at 70ns access time. These are twice as fast as the Typical EEprom rated at 150ns. Keep up the great work. Its a very beautiful project and look forward to the game development.
I have some UV Tom’s with a 55ns access time but the pin out is different. Atmel did do some OTP ones with this pin out that were faster but they haven’t been in stock for a while.
Or copy ROM contents to SRAM before the CPU comes out of reset. Even 10ns SRAM is pretty cheap. Thus starts the task of bottleneck chasing though, and that is something you can never totally finish.
@@Zadster Copying the microcode during a reset would require a mess of hardware that is way out of scope. but They do make NVRAMs... they have the speed of static RAMs but also have 10-year minimum memory retention. but the best they can do is 70ns like the Analog Devices DS1230AB-70IND. Mouser has these in stock and they are JEDEC standard 28-pin DIP packages. or go with the SST39SF010A 128 k x 8 Flash from MicroChip with 55ns, they also conform to JEDEC standard pinouts for x8 memories, but these are 32 Pin DIPs... the JAM1 is using 28 Pin.
I'm his older video he did scope out what was preventing the cpu from running faster, and the rom was running at 100 or less, in practice, and wasn't the first limiting factor, yet.
@@mikafoxx2717 the EPROMs are not rated faster than 150ns in his case. CPU propagation delay from instruction fetch to output has the EPROMs as the biggest limitation to speed. Every IC in the chain adds a minimum of 10ns. so there will be a limit. My guess is that 5 Mhz is the fastest it can go with the fastest EPROMs available (55ns) before it gets unstable. It is a remarkable build and has pushed the limits of what a discrete logic chip CPU can do. It already has vastly more processing power than the 8 bit 6502 based Ricoh CPU used by the Nintendo NES which maxed out at 1.8 Mhz.
Nice little trick to stretch the display of the last pixel while updating the framebuffer. I don’t know how close you are at your memory bandwidth but a less somewhat lees complex solution to a write fifo for updates would be a data and address latch for write operations from the cpu to the framebuffer that is then used by the display circuit to interleave the „cached“ write accesses between the read accesses. I’ve done it with 10ns cache SRAMs as well by simply doubling the bandwidth by doubling the data with to 16 bit and reading 16 bits slower to interleave writes with 55 ns SRAMs in an 1/2 resolution SVGA output.
I might have to do a whole video on all the things that could be done. Past a certain point I’d just redesign the circuit. The mod I present here improves with just one AND chip.
@@weirdboyjim and that’s the beauty of this simple fix but it only goes so far. While it forks here because you have a large somewhat homogenous image it doesn’t work so great for games like Pac-Man, pretty much any platformer or any screen where you have high contrast between small foreground objects and single color flat backgrounds where you can see those stuck colors that go multiple pixels out of where they’re supposed to end. Otherwise I can only agree, it’s a simple and quick solution in this specific case. 👏
I appreciate that you didn't want such a complex task for your build, at the time. But at the time, the doomed demo wasn't written, and the trade-off was reasonable. However, now when the problem is visible, it might be time to reassess, if it is possible to crowbar in the sync circuitry. As always it is your project, I am just along for the ride, so that is your choice, obviously.
It's not such an issue of "complex" task, but if I wanted arbitrary writing I would have designed the circuit differently. There will be a future vga build where I build it to work like that from the ground up.
@@weirdboyjim Yes, it will be great to see it finished! Been following the project for a long time! I don't know how you find the time for it! Keen to see what you've got in store for the JAM-2 😂
Hey James, thanks again for another great video! Really loving this series. Do you plan to go into detail on how you've implemented the "doom" code? I'd be really interested in that too. I felt the same about some of the effects you've programmed, like the 3D looking bar that bounced up and down the screen when you first got vertical scrolling going. You said you'd used a sine (?) lookup table … I'd love to know _how_ that works. A deeper dive on the approach/design/techniques you're using would be greatly instructive. 🙏 It's one thing to know _that_ it works, it's another to know _how!_ Also, how you map that technique (like the bar) to your hardware … 🙏 Thanks again! Lovin' it! 😀
Lookup tables work in a similar way to the old log tables used in schools back in the day. For a given value you can calculate an offset in the table very quickly and retrieve the value. Old games used this technique for many things allowing expensive calculations to be replaced with simple and fast lookups.
Thanks @@schrodingerscat1863, I do know how lookup tables work; I was using that as example. What I was really trying to ask was what he's doing _with_ those values that results in the behaviour in the video. IIRC, he said he was doing it with vertical scrolling. That wasn't enough information (for me) to know _how_ he was manipulating the vertical scrolling register to achieve the effect (and to be clear here, I don't mean storing the value in the register … I know he was doing that too.) For example, at what time was he changing the register value (I assume on a particular scan line). How did he know, in software, which scan line was being affected? (I don't remember him saying there was a register available to the CPU for that, but may have just missed it.) How did he make sure that it happened in the front/back porch so that it didn't take effect in the middle of a scan line and corrupt the display? And so on.
@@rogerramjet8395 If you go back through the series all of this is discussed in detail. The demo being shown here is not a traditional ray caster, there are some other tricks being used. Again I think he goes through this on his second channel. As far as corruption is concerned he is keeping track of writing to the screen in software. Because this demo is demanding he isn't able to do that though which is why there is some screen corruption. The fix basically detaches the video busses while the processor has control of the busses meaning that the existing pixel is copied across the output until control is returned to the video address generator. Several 8bit systems back in the 70s/80s employed tricks like this to reduce hardware complexity.
@@weirdboyjim yea I think was running at standard VGA resolution, and of course was generating enemies for me to kill, but all things considered yours is still impressive, and the only homemade pipelined cpu I know of.
For the NES/SNES, I believe that the CPU writes to work RAM and this is then DMAed to the VRAM during the vertical blank. There are still timing constraints, but it may give your CPU more time. Indeed, it is a way more complicated circuit than the one chip solution but perhaps, you just need another RAM chip and some counters to do the DMA?
That sounds a lit like the fifo solution I discussed in an earlier video. I need to reiterate that it's not something I feel the need to do. The final form of this will have several games and demos that don't show any issues (They will run at 60fps and have all their updated synced with blanking in software), plus one demo/game based on the doomed work that will show some issues. It will be an interesting discussion point both then and in the future builds were I do a more sophisticated memory interface.
@@weirdboyjim Fair enough, it is amazing you can get it to do a doom renderer. Just thinking, if your CPU speed is bottle necked by the access speed of the signal ROM, then you could use a similar technique where you replace the ROM with fast RAM. During the reset sequence you can copy, via DMA counters, from the slower ROM into the fast to access RAM. I have an unrelated question, after the palette memory you have both a 541 and a 574, why do you need the 541 if the 574 can latch the palette colour while the palette RAM is being written to?
If the memory can run at higher clock then double porting by multiplexing the address and data lines might be possible. Some buffers, clock divider and "done".
Still amazing how powerful this is compared to the popular processors of the time.. When you do move onto the next project, RISC-V really does look like it has quite simple instruction decoding, maybe you could implement the RV32E core with its 16 registers more easily.. would allow for easily compiled code, but it might get difficult to get all those registers and bits going.. would need to go for more advanced chips, implement registers in fast ram, or such..
I very pleased with how it's turned out! Actually implementing a basic Risc-V core would not be that difficult, but doing a pipelined one would become a very large build just from making everything four times wider.
@@weirdboyjim Yeah, that's very true. Either way I loved that you followed through with a full pipeline for this build. It taught me more about modern ways processors optimize than anything else. Makes everything so tangible. It would be fun to find out what processor finally matches yours for performance per clock?
Would it be possible to detect when there's a conflict and then bypass the ram by using an AND gate on the video and cpu address lines? Then using two 74LS245 chips you could have the AND gate disable output from the ram onto the video data bus and enable the CPU's output onto the video data bus. The video chip can then read directly from the CPU. You could take it one step further if timing is an issue by having the CPU always write video data to a latch chip, which is then copied to ram. Allow the video chip to read RAM when there is no conflict and read from the latch chip when there is conflict. If the latch chip is ready to write to RAM then the data is stable and could be read safely by the video circuit.
I was going for a minimal tweak to notice the situation in this case, but remember that the circuit was not designed to do this. I have plans for a future build that will be more flexible in this regard.
Extending the pixels is a clever way to make the graphics look cleaner. The difference is very noticeable. I rewatched the frame buffer video and realised that my solution wouldn't have worked anyway because all RAM addresses have contention. It's such an interesting problem to solve.@@weirdboyjim
I cant help but wonder if there is a more elegant way to delay a signal more elegantly than how we have across the build. It absolutely works, but I do wonder.
Nothing wrong with using the 574 like a shift register (ike I did here). That’s creating a delay of an exact number of clocks. Having the chained gates I did in bus control is something I’d rather design better in future.
My Question is it posible to have a Write Buffer and a VBlank buffer the Vblank is only written on the Rising edge of the Write but not from the Signal but from the Writebuffer and on VGA Clock and not CPU Clock?
Yes! That was one of the possibilities I’ve talked about before but there would be quite a few components to that. The goal would be to queue all the writes, you could flush most of them out in hblank
@@weirdboyjim the system writes slower then the VGA can read it. In theorie you can store the adress and the value for one CPU clock and read it with the faster VGA clock that can reset this adress and value store, this only can work when the CPU is slower then the VGA, it is a litte more components then delaying the VGA clock but makes the read and writes more predictable?
I imagine if you implement sprites that can be as tall as the screen you can then render doom using them as big vertical tiles and avoid the corruption if the sprite def memory uses dual-ported RAM :)
@@weirdboyjim well if they were 8 pix wide each you’d only need horizontal res/8 . Pretty sure the old NeoGeo did this for its backgrounds. Whatever works though!
I think a set reset latch would have been better as in it adapts of timing goes off for a reason. It creates about the same clock suppression but more exact in timing what it needs to be to get rid of any effect. Also it is hardly anymore complex then adding a delay because the and gate was not enough by it's own.
Is double buffering not possible? Using two physically separate banks of video memory with 2 buses, so there is no contention. Alternatively, it shouldn't need a massive buffer to write changes, just alternate access to the video RAM in a similar way to how the BBC Micro did. It would need a synchronising circuit, and both the write and read channels would need single layer buffers, which does get rather involved, so I can understand why you wouldn't want to go down that rabbit hole.
The tweak I do here only adds one and chip to the build. Double buffering with separate bus driving from video & cpu is a reasonable design if that’s your goal but it wouldn’t be my preference if I were designing a circuit from scratch to have arbitrary memory access.
Almost no chip included two-ported memory (for cache) even though the designers are free to do what they want. And I wonder how generic two ported memory looks like! Memory arranges cells in a matrix. So for two ports you would basically keep all the bit and word lines, but half the bit cells . So two addresses lead to the same but. Why not just buy VideoRAM . A normal matrix of DRAM cells. Only the SRAM behind the amps is duplicated ( and 1D anyway )
@@ArneChristianRosenfeldt Two port memory was used for video display, not cache. This allowed the video display hardware to access the video RAM without having to halt the CPU.
@@richardkelsch3640 I only found one data sheet about two port memory. It was used to interface between two clock domains with random access. Collision on a cell could happen and it is your responsibility to avoid this. Video RAM only has a single clock for the DRAM part. Every DRAM has internal SRAM to be able to refresh a row. Video RAM has two SRAMs on chip, but each of them only has a single port. Cost always weeded out large true dual port memory. I mean, put some video ram on a breadboard and show me read after write consistency. I think Michael Abrash did that for self modifying code to measure the instruction queue on 8086 derivatives.
@@ArneChristianRosenfeldt That's too bad, as dual port memory in the 1980s had two address buses and two data buses. They allowed for easy interfacing of display memory for computers on a local bus. There were some limitations, but not so bad as to need to halt the CPU to accomplish. One bus (let's call it primary) allowed full control of the memory cells, both read and write. This was typically what you connected to the CPU side. The other bus (secondary) had unlimited read access, regardless of the primary bus state. Some devices were secondary read only and some had write semaphore access. Electronically, think of the secondary bus as merely an additional transistor tap off of the memory cell to read data. Write devices had to have an additional secondary bus output to induce a wait state should the primary bus be writing to the chip. Most display hardware of the time were not intelligent and usually just framebuffers, so secondary bus read-only was reasonable for the display hardware to use. At the time these chips were sold, they were expensive as they required many extra pins to have isolated bus signals and a slightly larger die size.
@@weirdboyjim A sequel‽ :O Now would it be an incremental step like from the NES to the SNES or 8080 to the 8086, or might it go for a completely different architecture, like, say RISC-V perhaps? (To be completely honest, I've wanted to see a series in this style that uses a RISC-V architecture since I first learned about this series.)
Join us on Discord: discord.gg/jmf6M3z7XS
Follow me on Twitter: twitter.com/WeirdBoyJim
Support the channel on Patreon: www.patreon.com/JamesSharman
I've been following this project from the start. When I look at your doomed demo, it reminds me how amazing this project really is. The whole thing built from discreet logic components - you sometimes forget this - and you've got this 3d game running in VGA?!? Bonkers. So much kudos to you James..
Thanks! Hopefully I’ll impress you a couple more times by the time I’m done. 😅
Without you mentioning it, I wouldnt have recognized the corruption at all. Nice hunt, and congrats on the success.
Thanks Peter! When I started editing I was worried I had wasted my time with the whole process as it looked like nothing was changing in the video captures!
Yeah. The screen capture *really* highlights it. Was that a tremendous effort to edit into the video? Would be helpful on future such scenarios. (The glitches are crystal clear, even watching this episode on a mobile.)
And a really cute transition to move the picture-in-picture crop of the silver display back into the scene too 😀
eh it's not too much effort but it's more than none ;) @@wallyhall
Nice to see more frequent uploads from you again. Enjoying following along. Amazing work James.
Thanks Marc! Feel like I’m on the home stretch of this build now!
@@weirdboyjim what's in the pipeline when the build is complete? (See what I did there? 🤭)
You're getting there, James! Merry Christmas and a Happy New Year to you
Thanks, you too!
Another great video James! Its always interesting watching the trouble shooting side of things and the problems you come across.
The project has come along so far and its absolutely amazing what youve achieved!
Bonus points for the smooth video transition of your momitpr at 9:17 😊
Hah! Glad you appreciated that. Using the screen bit of the second feed but it looked really weird when I faded to that feed full frame. That was the only solution that felt right.
Love to watch your materials. Give a thumbs up to every video
Thanks Alex, good to hear you are enjoying!
Without question, hands down, the best series of this type. Sorry other makers. Love this one.
Glad you enjoy it twobob! Very kind words!
It's so accesible an real, some makers create, sure amazing, 8 bit systems but skip the guts and glory, others make systems that are perhaps beautiful or structurally tiny or indeed just nostalgic.
You strike a good balance between showing every single step and thought and just doing a montage. In addition whilst using obstensively archaic and limiting packages the drive to use newer acrchitectural and programatic paradigms really shines in your results @@weirdboyjim
It's win win for me.
and no Im not fixing the typos :P
A wonderful post as always! Love to see the progress. The way you walk through your problem solving is wonderful motivation for me to work on my own projects :)
Thanks! Good to hear you are enjoying it!
As always James... this project is very inspiring. Im mid journey through my own 8 bit project. Mine isn't pipelined, but it does have interrupts and port selects. You could speed up the CPU by using faster UV Eproms... Jameco has some 28 pin DIPS rated at 70ns access time. These are twice as fast as the Typical EEprom rated at 150ns. Keep up the great work. Its a very beautiful project and look forward to the game development.
I have some UV Tom’s with a 55ns access time but the pin out is different. Atmel did do some OTP ones with this pin out that were faster but they haven’t been in stock for a while.
Or copy ROM contents to SRAM before the CPU comes out of reset. Even 10ns SRAM is pretty cheap. Thus starts the task of bottleneck chasing though, and that is something you can never totally finish.
@@Zadster Copying the microcode during a reset would require a mess of hardware that is way out of scope. but They do make NVRAMs... they have the speed of static RAMs but also have 10-year minimum memory retention. but the best they can do is 70ns like the Analog Devices DS1230AB-70IND. Mouser has these in stock and they are JEDEC standard 28-pin DIP packages. or go with the SST39SF010A 128 k x 8 Flash from MicroChip with 55ns, they also conform to JEDEC standard pinouts for x8 memories, but these are 32 Pin DIPs... the JAM1 is using 28 Pin.
I'm his older video he did scope out what was preventing the cpu from running faster, and the rom was running at 100 or less, in practice, and wasn't the first limiting factor, yet.
@@mikafoxx2717 the EPROMs are not rated faster than 150ns in his case. CPU propagation delay from instruction fetch to output has the EPROMs as the biggest limitation to speed. Every IC in the chain adds a minimum of 10ns. so there will be a limit. My guess is that 5 Mhz is the fastest it can go with the fastest EPROMs available (55ns) before it gets unstable. It is a remarkable build and has pushed the limits of what a discrete logic chip CPU can do. It already has vastly more processing power than the 8 bit 6502 based Ricoh CPU used by the Nintendo NES which maxed out at 1.8 Mhz.
Nice little trick to stretch the display of the last pixel while updating the framebuffer. I don’t know how close you are at your memory bandwidth but a less somewhat lees complex solution to a write fifo for updates would be a data and address latch for write operations from the cpu to the framebuffer that is then used by the display circuit to interleave the „cached“ write accesses between the read accesses. I’ve done it with 10ns cache SRAMs as well by simply doubling the bandwidth by doubling the data with to 16 bit and reading 16 bits slower to interleave writes with 55 ns SRAMs in an 1/2 resolution SVGA output.
I might have to do a whole video on all the things that could be done. Past a certain point I’d just redesign the circuit. The mod I present here improves with just one AND chip.
@@weirdboyjim and that’s the beauty of this simple fix but it only goes so far. While it forks here because you have a large somewhat homogenous image it doesn’t work so great for games like Pac-Man, pretty much any platformer or any screen where you have high contrast between small foreground objects and single color flat backgrounds where you can see those stuck colors that go multiple pixels out of where they’re supposed to end. Otherwise I can only agree, it’s a simple and quick solution in this specific case. 👏
@@0toleranz ahh but Pac-Man I can easily code to update during blanking and not have any issues, this was what I was expecting to do a lot of.
I appreciate that you didn't want such a complex task for your build, at the time. But at the time, the doomed demo wasn't written, and the trade-off was reasonable. However, now when the problem is visible, it might be time to reassess, if it is possible to crowbar in the sync circuitry. As always it is your project, I am just along for the ride, so that is your choice, obviously.
It's not such an issue of "complex" task, but if I wanted arbitrary writing I would have designed the circuit differently. There will be a future vga build where I build it to work like that from the ground up.
Thanks James. Happy Holidays. All the best for you and your family in 2024.
Same to you! Jerril!
Nicely done, James! Loving the project, and the Demo looks great! Happy New Year 🎉
Happy new year George! I'm hoping this will be the year I finally finish Jam-1!
@@weirdboyjim Yes, it will be great to see it finished! Been following the project for a long time! I don't know how you find the time for it!
Keen to see what you've got in store for the JAM-2 😂
Once again a great video!
Thanks Chris!
Man I love raycasters. Nice work, James!
This one isn’t actually a ray caster, but I may turn it into one to make the levels simpler to design.
Hey James, thanks again for another great video! Really loving this series. Do you plan to go into detail on how you've implemented the "doom" code? I'd be really interested in that too. I felt the same about some of the effects you've programmed, like the 3D looking bar that bounced up and down the screen when you first got vertical scrolling going. You said you'd used a sine (?) lookup table … I'd love to know _how_ that works. A deeper dive on the approach/design/techniques you're using would be greatly instructive. 🙏 It's one thing to know _that_ it works, it's another to know _how!_ Also, how you map that technique (like the bar) to your hardware … 🙏 Thanks again! Lovin' it! 😀
When I showed the first version of the demo I did show this on the extras channel: ua-cam.com/video/sC3Issh5cPQ/v-deo.htmlsi=w_Q08MN_uNstZwHE
Ahh, thanks @@weirdboyjim. I don't know why (other than that I haven't yet looked) but I hadn't even realised there was another channel! Thanks!
Lookup tables work in a similar way to the old log tables used in schools back in the day. For a given value you can calculate an offset in the table very quickly and retrieve the value. Old games used this technique for many things allowing expensive calculations to be replaced with simple and fast lookups.
Thanks @@schrodingerscat1863, I do know how lookup tables work; I was using that as example. What I was really trying to ask was what he's doing _with_ those values that results in the behaviour in the video. IIRC, he said he was doing it with vertical scrolling. That wasn't enough information (for me) to know _how_ he was manipulating the vertical scrolling register to achieve the effect (and to be clear here, I don't mean storing the value in the register … I know he was doing that too.) For example, at what time was he changing the register value (I assume on a particular scan line). How did he know, in software, which scan line was being affected? (I don't remember him saying there was a register available to the CPU for that, but may have just missed it.) How did he make sure that it happened in the front/back porch so that it didn't take effect in the middle of a scan line and corrupt the display? And so on.
@@rogerramjet8395 If you go back through the series all of this is discussed in detail. The demo being shown here is not a traditional ray caster, there are some other tricks being used. Again I think he goes through this on his second channel. As far as corruption is concerned he is keeping track of writing to the screen in software. Because this demo is demanding he isn't able to do that though which is why there is some screen corruption. The fix basically detaches the video busses while the processor has control of the busses meaning that the existing pixel is copied across the output until control is returned to the video address generator. Several 8bit systems back in the 70s/80s employed tricks like this to reduce hardware complexity.
Awesome progress !....cheers.
Thanks Andymouse!
Auto sub because i really loved videos like this. Nerdy but very fun and enjoyable.
Welcome! Hope you continue to enjoy!
I realize thats just a demo but that looks so much faster than I remember my 386sx25 running Wolfenstein 3D did.
I do my best but Imagine your 386 was running at higher resolution?
@@weirdboyjim yea I think was running at standard VGA resolution, and of course was generating enemies for me to kill, but all things considered yours is still impressive, and the only homemade pipelined cpu I know of.
For the NES/SNES, I believe that the CPU writes to work RAM and this is then DMAed to the VRAM during the vertical blank. There are still timing constraints, but it may give your CPU more time. Indeed, it is a way more complicated circuit than the one chip solution but perhaps, you just need another RAM chip and some counters to do the DMA?
That sounds a lit like the fifo solution I discussed in an earlier video. I need to reiterate that it's not something I feel the need to do. The final form of this will have several games and demos that don't show any issues (They will run at 60fps and have all their updated synced with blanking in software), plus one demo/game based on the doomed work that will show some issues. It will be an interesting discussion point both then and in the future builds were I do a more sophisticated memory interface.
@@weirdboyjim Fair enough, it is amazing you can get it to do a doom renderer.
Just thinking, if your CPU speed is bottle necked by the access speed of the signal ROM, then you could use a similar technique where you replace the ROM with fast RAM. During the reset sequence you can copy, via DMA counters, from the slower ROM into the fast to access RAM.
I have an unrelated question, after the palette memory you have both a 541 and a 574, why do you need the 541 if the 574 can latch the palette colour while the palette RAM is being written to?
If the memory can run at higher clock then double porting by multiplexing the address and data lines might be possible. Some buffers, clock divider and "done".
Unfortunately it’s not fast enough for that. But I think that route is the right way to go in future builds.
Still amazing how powerful this is compared to the popular processors of the time..
When you do move onto the next project, RISC-V really does look like it has quite simple instruction decoding, maybe you could implement the RV32E core with its 16 registers more easily.. would allow for easily compiled code, but it might get difficult to get all those registers and bits going.. would need to go for more advanced chips, implement registers in fast ram, or such..
I very pleased with how it's turned out! Actually implementing a basic Risc-V core would not be that difficult, but doing a pipelined one would become a very large build just from making everything four times wider.
@@weirdboyjim Yeah, that's very true. Either way I loved that you followed through with a full pipeline for this build. It taught me more about modern ways processors optimize than anything else. Makes everything so tangible. It would be fun to find out what processor finally matches yours for performance per clock?
Nice hack, very cool. Probably out of scope but how about an extra bank of ram (if you can face the extra bus required)
That would be one way but it would require a lot of extra supporting chips to make it work.
Would it be possible to detect when there's a conflict and then bypass the ram by using an AND gate on the video and cpu address lines? Then using two 74LS245 chips you could have the AND gate disable output from the ram onto the video data bus and enable the CPU's output onto the video data bus. The video chip can then read directly from the CPU. You could take it one step further if timing is an issue by having the CPU always write video data to a latch chip, which is then copied to ram. Allow the video chip to read RAM when there is no conflict and read from the latch chip when there is conflict. If the latch chip is ready to write to RAM then the data is stable and could be read safely by the video circuit.
I was going for a minimal tweak to notice the situation in this case, but remember that the circuit was not designed to do this. I have plans for a future build that will be more flexible in this regard.
Extending the pixels is a clever way to make the graphics look cleaner. The difference is very noticeable. I rewatched the frame buffer video and realised that my solution wouldn't have worked anyway because all RAM addresses have contention. It's such an interesting problem to solve.@@weirdboyjim
I cant help but wonder if there is a more elegant way to delay a signal more elegantly than how we have across the build. It absolutely works, but I do wonder.
Nothing wrong with using the 574 like a shift register (ike I did here). That’s creating a delay of an exact number of clocks. Having the chained gates I did in bus control is something I’d rather design better in future.
My Question is it posible to have a Write Buffer and a VBlank buffer the Vblank is only written on the Rising edge of the Write but not from the Signal but from the Writebuffer and on VGA Clock and not CPU Clock?
Yes! That was one of the possibilities I’ve talked about before but there would be quite a few components to that. The goal would be to queue all the writes, you could flush most of them out in hblank
@@weirdboyjim the system writes slower then the VGA can read it. In theorie you can store the adress and the value for one CPU clock and read it with the faster VGA clock that can reset this adress and value store, this only can work when the CPU is slower then the VGA, it is a litte more components then delaying the VGA clock but makes the read and writes more predictable?
@@PpVolto yeah, not something I can do as I’m already driving the ram close to it’s limit.
I imagine if you implement sprites that can be as tall as the screen you can then render doom using them as big vertical tiles and avoid the corruption if the sprite def memory uses dual-ported RAM :)
Nice idea but you would need as many sprites on a scan line as you want horizontal resolution.
@@weirdboyjim well if they were 8 pix wide each you’d only need horizontal res/8 . Pretty sure the old NeoGeo did this for its backgrounds. Whatever works though!
Had you considered just interleaving the CPU and VGA memory access as was done on the C-64?
That’s a common way of doing it but I’m running those chips at the close to the max rate. I’d need to degrade the video to make it work.
I think a set reset latch would have been better as in it adapts of timing goes off for a reason. It creates about the same clock suppression but more exact in timing what it needs to be to get rid of any effect. Also it is hardly anymore complex then adding a delay because the and gate was not enough by it's own.
It sure I want to flip flop on the Issue 🤣
Is double buffering not possible? Using two physically separate banks of video memory with 2 buses, so there is no contention. Alternatively, it shouldn't need a massive buffer to write changes, just alternate access to the video RAM in a similar way to how the BBC Micro did. It would need a synchronising circuit, and both the write and read channels would need single layer buffers, which does get rather involved, so I can understand why you wouldn't want to go down that rabbit hole.
The tweak I do here only adds one and chip to the build. Double buffering with separate bus driving from video & cpu is a reasonable design if that’s your goal but it wouldn’t be my preference if I were designing a circuit from scratch to have arbitrary memory access.
Great vid I am trying to build a 6502 computer
Thanks!
cool vid, thanks!
Glad you liked it!
It looks like you have classic synchronous logic CrossClockDomain issue, don'y you?
Well, that and no spare bandwidth to memory.
Will you still be doing sprites?
Yes, I will be. The sprite hardware will be 3 modules, the last I'll add to the vga.
Wouldn't dual port memory make this easier? It turns out such memory is available.
Available but very expensive! Suitable parts about about 20x the price of the ram chips I'm using here.
Almost no chip included two-ported memory (for cache) even though the designers are free to do what they want. And I wonder how generic two ported memory looks like! Memory arranges cells in a matrix. So for two ports you would basically keep all the bit and word lines, but half the bit cells . So two addresses lead to the same but.
Why not just buy VideoRAM . A normal matrix of DRAM cells. Only the SRAM behind the amps is duplicated ( and 1D anyway )
@@ArneChristianRosenfeldt Two port memory was used for video display, not cache. This allowed the video display hardware to access the video RAM without having to halt the CPU.
@@richardkelsch3640 I only found one data sheet about two port memory. It was used to interface between two clock domains with random access. Collision on a cell could happen and it is your responsibility to avoid this.
Video RAM only has a single clock for the DRAM part. Every DRAM has internal SRAM to be able to refresh a row. Video RAM has two SRAMs on chip, but each of them only has a single port. Cost always weeded out large true dual port memory. I mean, put some video ram on a breadboard and show me read after write consistency. I think Michael Abrash did that for self modifying code to measure the instruction queue on 8086 derivatives.
@@ArneChristianRosenfeldt That's too bad, as dual port memory in the 1980s had two address buses and two data buses. They allowed for easy interfacing of display memory for computers on a local bus. There were some limitations, but not so bad as to need to halt the CPU to accomplish.
One bus (let's call it primary) allowed full control of the memory cells, both read and write. This was typically what you connected to the CPU side. The other bus (secondary) had unlimited read access, regardless of the primary bus state. Some devices were secondary read only and some had write semaphore access.
Electronically, think of the secondary bus as merely an additional transistor tap off of the memory cell to read data. Write devices had to have an additional secondary bus output to induce a wait state should the primary bus be writing to the chip.
Most display hardware of the time were not intelligent and usually just framebuffers, so secondary bus read-only was reasonable for the display hardware to use.
At the time these chips were sold, they were expensive as they required many extra pins to have isolated bus signals and a slightly larger die size.
why do I think that in a foreseeable future we will have a VGA board V2.0 ? ....why, why, I wonder... ;)
There will be a new VGA board developed alongside the sequel cpu but not until this project is fully finished.
@@weirdboyjim A sequel‽ :O
Now would it be an incremental step like from the NES to the SNES or 8080 to the 8086, or might it go for a completely different architecture, like, say RISC-V perhaps? (To be completely honest, I've wanted to see a series in this style that uses a RISC-V architecture since I first learned about this series.)
@@angeldude101 Jam-2 will be it’s own thing, it will explore some new architectural concepts and will be much more powerful!