One thing kind of outside of the scope of this video but worth a mention: COAST modules could be a bit of a minefield wrt compatibility, with different revisions and different manufacturer-specific implementations, so if you're in the market for one for an older system like this please do consult the motherboard manual. I've heard that in extreme cases using the wrong one can damage your motherboard, although in practice I've never come across this.
Yep the socket 3 ECS M919 have a proprietary slot that only accept an special module, I have heard there are several versions of the motherboard and modules each incompatible with one another 😂
I actually did have a motherboard become damaged due to an incompatible COAST module I installed in it, several years back. Sadly I don't recall any specifics as it was all hardware I was playing around with at the office.
The lack of a significant difference in the actual benchmarks (as compared to the tests with Doom and Quake) when comparing 256k and 512k of cache actually makes a lot of sense, and kind of highlights why benchmarks need to be tuned properly for what they’re trying to simulate. It’s very likely that 100% of the code and most of the data being operated on fit comfortably in less than 256k of memory, in which case 100% of it would fit in the 256k of cache. And if that’s the case, then you would logically expect to not see any practical improvement with 512k of cache, because the extra resources would never get used by those benchmarks. Of course, that’s not a level of detail most prospective PC users in the mid 90's would know or care about, but it is an effect that’s still visible today in some cases (usually when testing high-performance code on CPUs with significantly different L1 cache geometry) and can really surprise programmers who somehow don’t have a good low-level understanding of how a computer works.
the other thing to take into consideration are the tests and games are non-multi-tasking loads. as you stated above most everything is likely sitting inside the cache. however under loads with a great deal of context switches the more cache the better in quite noticeable ways.
I guess that for the 16M RAM that he have installed, 128k, 256k or 512k cache would give the same-ish results. The amount of cache needed is related to the amount of RAM installed, but I did not find a reliable relation between the two. Also the tag memory is involved in the equation.
512KB might've also been a target for more professional workstation/research/"big" data kinda stuff for the time too, where most applications for consumers/end users would've likely been optimizing for considerably smaller caches, so yeah you probably do see the meaningful difference in workloads and which tiers different things would've been intended for. But yeah if you were a heavy multitasker or you were working in a DAW or larger photos or even video at the time, the extra cache could've probably helped.
the COASTs were ment to be alone, not mixed with the cache in the motherboard. Also its very noticeable the lack of cache in faster pentiums. I vivid remember a client who had a Pentium 200 in an Pcchips M519 (Opti Viper Chipset) with no cache in 2002. it was DOG slow, once i added the 256k was like twice as fast for everything. It was maybe the best case-scenario for this as i never seen such a difference again
Indeed, I always thought the same but there don't seem to be any options here for specifically disabling the motherboard cache, either in the BIOS or as a jumper. Disabling the external cache in the BIOS also disables the COAST module when installed. So either this board is a suboptimal design (which might explain the results) or it's handling this automatically somehow in the background.
@@ctrlaltrees try putting the maximum amount of RAM in then re-run the tests that showed the most improvement with cache enabled VS disabled. Is there a 512K COAST?
True. Also, it was a cheap way (for manufacturers of course) to replace classic SRAM L2 (DIP IC's) into innovative Pipeline Burst SRAM. There are different types/construction of a COAST sticks. I have a few of them and none is able to work with onboard cache - usually switches it off completely.
I'm pretty sure you're aware of this, but the main reason Quake ran so abysmally slow on non-Pentium CPUs was not that the other CPUs were really that much slower (the later 486s from Intel, and the 486DX4/100 in particular had Pentium-like features integrated already), but the fact that Quake was heavily optimized for Pentiums. I remember Quake running just as badly on (somewhat) later CPU offerings from Intel's competitors, even though these CPUs were definitely faster than the original Pentiums in most ways. It was a nice way for Intel to consolidate its grip on the x86 CPU market, but it was pretty much based on a lie.
This is very wrong. Quake did target the Pentium FPU, because it had to. The Pentium FPU was that much better than any 486 FPU. 486 FPUs were really that much slower. The P5 stomped any 486 in terms of raw FPU. Before the Pentium/K5 FPUs were a bit of an afterthought.
@@humidbeing I never said the Pentium was bad or slow, only that Quake was heavily optimized for the Pentium CPU in its entirety, not just the FPU. And that is not necessarily a bad thing. Hoever, had it not been optimized for a single class or generation of CPUs as much as it had, the playing field would have been much more level and the differences in performance would not have been as great.
@@humidbeing no damouze is right. The was an oddity in the pentium fpu, that up until that point, all the other fpu manufacturers thought was pointless. It may have been register renaming, I can't remember now its years since I coded in x86 assembly, let alone x87! Seem to remember all the other fpu took an extra Tstate to swap fpu registers or something. But, even if quake hadn't used this feature strangely often. The fact id the pentium fpu was a weird but very fast design, using 3 state logic rather than binary. That was the cause of the infamous Fdiv problem in the P60 & P66 cpus. 1 was effectively 0 , 0 was effectively -1 and 2 was effectively 1. So when clearing registers read for Fdiv, a mistake was made in a few of the bits(I'm not calling them their actual name for fear of the you tube algorithm) . As a result the Fdiv bug. But this 3 state fpu design was about 50% faster than contemporary binary systems. So, hats off to intel for their weird but fast as duck design.
The real rarities that I remember from the 90s were the machines with 1MB of cache. It was useless for almost every use case, but the folks who maxed out their systems loved to make sure they could get a system with it. I only remember seeing it on DIP boards. On COAST boards we would go with 512kb instead, an easy upsell for someone who wanted to get the last bit of performance for relatively little money. You always knew someone was looking for the best when an order came in for a system with SCSI hard drives and the extra COAST stick.
I remember one late 486 motherboard allowing as much as 1 MB of external cache to be installed, but it was really just about bragging rights, because as you found out, it made virtually no difference versus the 256K external cache that most other boards supported.
I'm not entirely surprised by the result with 512KB, but I am surprised that not having any L2 at all makes such a small impact, even though according to the cachechk shot you included it has over 60% more bandwidth, and system RAM has almost 70% higher latency. Very interesting.
Maybe that little difference is because the cpu is too slow to show that bottlenecks, the cache becomes more important when the cpu clock sepe became much more faster than ram. CPU increase their clock speeds faster than RAM does, there is a paper about that, it's call: "it's the memory, stupid".
Agreed. Game devs and graphics libraries were probably targeting the most popular configs to hit the market sweet spot. Compilation, finite element analysis, audio/signal processing(?), database workloads might uncork the full potential. Linux should still boot on those things right? Coding up a simple benchmark of probes/second into increasingly larger hash tables should show the benefit.
Thanks Rees! I was always intrigued by these back in the day! Seems 256KB was optimal for anything beyond the norm for gaming, etc. It's probably different for RAM intensive programs but I wasn't really doing much of that when I was 13/14 :)
Nice. I very vaguely remember how people used to swear on Cache when I had my 386 - but my cheapo machine of course had none 😅. When I upgraded from there to a Cyrix P200+ in around 98, I don't remember that much talk about Cache anymore. It was all about 3D cards. I try to think of a really slow machine, a 386 with a slow Connor or (even worse at that time at least) Tandon HDD, maybe even a pretty bad mainboard and cheapest RAM and wonder what difference it would have made on that. But I'll probably never go back to revisit my 386 days hardwarewise anyways 😅. Too confident with my Pentium 133 for all my DOS gaming needs. Cheers!
I'd love to see how it moves around large datasets; databases, modifying a number of large tables, etc. vs games. Pull a large table from the drive then modify it a few times and see if the modifications run significantly faster than without the cache. Or working with a large image in something like Aldus Photostyler pre Photoshop
Increasing the amount of cache will expand the maximum volume of memory that can be cached on your board, so if you have a modest amount of RAM then it makes sense that there would be no difference in performance. However, since the cachable area is at the bottom of memory and Windows uses memory at the top, you should see a performance hit with a small cache and a ton of RAM.
Curious as to whether the coast sram is as quick as the onboard sram.. and by mixing the two is why the gains were why it was significantly underwealming in terms of returns on investment. interesting video all the same.
That's a very good question, so I went back to the CACHECHK results to check. Seems both the onboard 256K and the COAST module clock in at a reported 97.4MB/s. Of course this could be a limitation of how the cache is structured on this system vs. how CACHECHK measures it, I suppose.
I do remember seeing this slot back in the day and asking a computer expert about it with them saying it improves performance but is incredibly hard to find.
My first pentium 133 system has a motherboard that has a COAST slot. Its a dreaded VX board, so even though it can take 128MB Ram, only 64MB is cached, so anything over 64MB you get a performance penalty. I still have the system but it has been upgraded (200MHz MMX, 64MB Ram, but still haven't installed anything in the COAST slot).
I had a MicronPC 486 that I put an AMD 5x86 in. It had 15ns DIP cache chips. It would *almost* run at 50 Mhz bus speed. If I disabled the L2 it would run. But even with the AMD CPU 'hotwired' to force 4x multiplier it was really slow at 200Mhz without L2 cache. So I set the bus speed to 40 Mhz for a 160 Mhz CPU speed with L2. The 10ns cache chips it needed for 50Mhz bus speed were too expensive. The 4x multiplier hack for the AMD 5x86 for the 486 socket was very simple. The pin for the multiplier is right at an edge and a Vcc pin has only one pin between it and the multiplier pin. I'd take a short piece of fine wire, wrap it around the 4x pin, bend it out and around the next pin so it didn't touch, then wrap it around that Vcc pin. That particular PC was the only 486 I ever had with a 50 Mhz bus speed option so I never had another chance to try for a 200 Mhz AMD 5x86 before I got to Socket 7 systems. It also had three VLB slots. The 5x86 was supposedly a 3.3V CPU but I ran many of them on 5V without any problems, as long as I used a good heatsink with a fan and white compound. I would also file the bottom of the extruded aluminum heat sinks flat. Temperatures would be much lower than the literally blisteringly hot Cyrix 5x86. I once got an instant blister on a fingertip off one of those Cyrix CPU's heatsink when I barely brushed against it.
I was wondering if you ran cachechk after installing the COAST, and compared the results. Also the motherboard may also need a bios update if there ever was one available for it. The issue with the "faster/fastest" settings is a flag that something is amiss.
Love the Acorn Electron on the shelf. From the onset of the video I'm wondering if with modern chips and the sponsor PBCWay we'll get hand-made modern cache modules that are a lot, lot bigger. But 512Kb is probably the max.
At the time, my friends and I had 486 100Mhz and Quake ran very slowly. One of my friends bought a Pentium 66 and had an advantage in FPS. I installed Linux on my 486 and Quake ran, noticeably, as well as on the Pentium.
If I recall this correctly most boards can´t actually use both on board and Coast together, once you insert the Coast module the on board cache may be disabled. it would be good if you shown that the board actually uses 512k or not. However I think you would benefit from a 512K stick and then maybe you could had used the fastest setting in the bios too
The addition of cache to processors being a big performance jump from 486 to Pentium is quite interesting since it did help performance. But increasing the external cache from the 256k to the 512k COAST surprisingly didn't scale. My guess is that the tests performed did not maximize the 512k cache. On a different perspective, its fascinating how much CPUs are more reliant on the cache since the Pentiums of the 90s. Watching some of Phil's videos where he disables the CPU cache (i.e. a Core 2 Duo or Athlon 64) to get performance closer to a Pentium or 486 really reveals how crippled the CPUs are with more limited amounts of cache, especially when the fastest L1 cache is messed with. While watching the video I remembered the external cache got eliminated by the Pentium 2 since Intel put the cache chip beside the CPU inside the CPU cartridge then later got integrated to the CPU itself by the Pentium 3 much like how modern CPUs do it. I do wonder if a compression/decompression test with 256k vs 512k external cache would produce a more noticable difference like how AMD's Ryzen X3D CPU have a big performance boost from the increase in L3 cache on compression/decompression workloads.
Along with compression/decompression, compilation workloads may also scale better like how they do with more modern CPUs. The increase in cache helping those two workloads I mentioned was why I gone from a CPU with 16MB L3 cache to one with 32MB L3 cache and the performance boost was quite nice along with improved system responsiveness. Anyways that's it for me and my rambles haha.
Oh, I hated those. Between compatibility issues to bad slot. Well not really a bad slot. But, I remember more than one machine it would lock up or bsod. Tapped around the board. Boom, cache on the stick. Luckily after a short period of time it was soldered to the motherboard. I remember reading about that was one of the reasons why they did that.
I think the reason the COAST didn't help is because you were looking at things that were more CPU or GPU dependent rather than memory dependent. Something like CAD software or other heavily memory bottlenecked programs could show a much bigger improvement, assuming the rest of your system isn't presenting a bottleneck. In Quake, it's clearly the rest of the system that's the bottleneck
So, for the general public that purchase these machines to run games, office applications and delve into internet on the first Pentiums, that extra cache was indeed a waste of money better expend elsewhere, like more RAM, faster GUI accelerator, a bigger HDD, a faster CD-ROM, better sound card, hell even a MIDI module. Maybe you would need to get into very expensive CAD/CAM, database stuff to realize any gains with a bigger cache, I have read that for a time Pentium Pros with 1MB weren't replaced with PII 300 as these had only 512Kb that runs at slower speed.
@@RetroTinkerer more or less. however Quake indicated that it was heavily CPU limited in the test system. GL Quake would probably have run faster than software rendering with a Voodoo GPU installed. Once those things were addressed, the COAST module might have made a bigger difference, maybe
@@briangoldberg4439 maybe but I wonder if people installed Voodoo 1 on sub 166MHz Pentiums or MMX, I think I had a Cyrix 6x86 PR150, but I upgraded so frequently my CPU back then on that platform that I'm not sure how much time It passed until I got my Pentium Pro 200 (All with the same Verite V1000 + VooDoo1) most gaming builds were using Pentiums MMX. It would be an interesting experiment nonetheless.
I have the ASUS P/I-P55TP4XE motherboard wirh 256K soldered cache. I also found an COAST module for it but the price seems a bit expensive (180€ NOS). I use 64M of 60ns EDO RAM. Would I benefit from using 512k of cache vs 256k for 64M RAM? How about if I install only 32M RAM? What are the relation between the amount of installed RAM and the cache amount needed for optimal performance? I searched but got conflicting informations. Edit: the small difference between 256k and 512k of cache is mostly due the small amount of RAM you have. For those 16M of RAM, maybe 128k of cache would have similar results like those you got with 256k. Try to max out the amount of RAM and then see the difference (redo the 512k cache tests). Not sure about your motherboard, but some maxed out to 64M, altough most socket 7 motherboards I know supported up to 128M EDO RAM.
Interesting! I think I have a similar slot on the Apricot VS340. I suspect it may be for this purpose. Looking at the lack of performance boost you got I don't think I'll bother hunting down a compatible cache stick 😄
COAST modules always seemed a bit disappointing compared to motherboards with DIP SRAM chips. You'd think if they were going to introduce a new form factor and the modules were going to be rather expensive they'd have worked harded to optimise the speed.When the PII came out they managed to get the off die cache to run at half the speed of the CPU at the cost of having to package the whole lot in a slot rather than a socket. Interestingly the Pentium Pro had separate CPU and cache dies packaged an oversized socket and the cache run at the CPU clockspeed. I guess the PII's PCB was cheaper than the slightly exotic multi chip module the PPro came in.
Nice reminder how bad Quake was on 486 (486 "Dx2" 100 in my case, 2x50MHz as I was experimenting with low-budget overclocking in my poor times - AFAIR it was ~10-11 FPS in my case).
This matches my own benchmarking. Back when I was a kid I lamented not having maxed out L2 cache. Now I'm glad my family didn't waste money on it. It's really not a noticeable difference except in very specific cases.
back in the days as far as I could remember it was a better deal to get a newer and more powerfull Pentium CPU and selling your old ones used on a local BBS than buying a cache module
Indeed, I've certainly seen all sorts of instability attributed to dodgy modules in the past - not to mention the fact that using the wrong one could fry the motherboard 😅
If you want t9 see a fun thing with Cache, take a Petium 166MHz an compare it with a Pentium MMX 166Mhz (which doubled he L1 cache) and it made a decent uplift.
I have a few hunches here: (1) Most games optimized for the 8K L1 the pentiums had, (2) if (1) is wrong then they optimized for smaller L2 caches. Thus wouldn't be hitting a 512K. (3) the motherboard's cache implementation may be rather stupid and directly mapped to physical pages, so unless you have the board completely maxed out... it probably doesn't even use the COAST module. This seems the most likely case. (4) your GPU might be holding you back here. Try something with really good VGA and OpenGL performance? But, realistically test to see if it's the bottleneck. The performance on Quake is what gives me the hunch on this. Quake should have scaled better.
I remember that time, and when I went to buy my last Pentium I system, the seller already clarified that unless I put an exaggerated amount of RAM for the time, the additional cache module was not justified at all. I'm not surprised by the result at all, perhaps in a system like the latest MMX and running a true 32-bit system like Windows 2000 things will change, in Win95/98/Me it's a waste of money for that stick. At that time the real bottleneck was the horrible chipsets that were sold in the cheapest motherboards like the infamous pcchips, which slowed down any processor you put on it.
I've noticed that your video has multiple audio tracks for different languages. Did you add them yourself or is it some AI doohickey UA-cam has forced in?
Yeah, that's a new UA-cam feature. Not sure how I feel about it to be honest but I've left it enabled for now as I've heard that the audio tracks are pretty accurate, at least.
As a computer scientist at least that what 20 plus year old degree says and most time work in some form of IT, that was damn fine high level explaintion of cache and other memory, I tip my hat to you :)
That motherboard was bad, but at least it wasn't using fake cache which was common. No way to discern it apart from bench-marking, fooled at lot of people. No ability to disable the onboard cache and run with just the coast module at it's fastest settings in BIOS is bad though. Also Quake really is just an FPU demo which really doesn't benefit much from a large cache. Prime95 with small FFT's really is the bench to go to.
Too bad you lack the education to know how to test COAST properly. You should of done some I/O tests like a copy from one physical disk to another physical disk. Knowing the difference in purpose of L1 and L2 cache would have gone a long way to help you understand what you were even testing.
One thing kind of outside of the scope of this video but worth a mention: COAST modules could be a bit of a minefield wrt compatibility, with different revisions and different manufacturer-specific implementations, so if you're in the market for one for an older system like this please do consult the motherboard manual. I've heard that in extreme cases using the wrong one can damage your motherboard, although in practice I've never come across this.
Remember the boards with fake onboard cache chips and a COAST slot?
Did you try a coast module alone, w/o the onboard enabled cache?
Yep the socket 3 ECS M919 have a proprietary slot that only accept an special module, I have heard there are several versions of the motherboard and modules each incompatible with one another 😂
I actually did have a motherboard become damaged due to an incompatible COAST module I installed in it, several years back. Sadly I don't recall any specifics as it was all hardware I was playing around with at the office.
Cache memory speed onboard is different from Coast module. 8 ns versus 15 ns. This could affect The performance.
The lack of a significant difference in the actual benchmarks (as compared to the tests with Doom and Quake) when comparing 256k and 512k of cache actually makes a lot of sense, and kind of highlights why benchmarks need to be tuned properly for what they’re trying to simulate. It’s very likely that 100% of the code and most of the data being operated on fit comfortably in less than 256k of memory, in which case 100% of it would fit in the 256k of cache. And if that’s the case, then you would logically expect to not see any practical improvement with 512k of cache, because the extra resources would never get used by those benchmarks.
Of course, that’s not a level of detail most prospective PC users in the mid 90's would know or care about, but it is an effect that’s still visible today in some cases (usually when testing high-performance code on CPUs with significantly different L1 cache geometry) and can really surprise programmers who somehow don’t have a good low-level understanding of how a computer works.
the other thing to take into consideration are the tests and games are non-multi-tasking loads. as you stated above most everything is likely sitting inside the cache. however under loads with a great deal of context switches the more cache the better in quite noticeable ways.
I guess that for the 16M RAM that he have installed, 128k, 256k or 512k cache would give the same-ish results. The amount of cache needed is related to the amount of RAM installed, but I did not find a reliable relation between the two. Also the tag memory is involved in the equation.
512KB might've also been a target for more professional workstation/research/"big" data kinda stuff for the time too, where most applications for consumers/end users would've likely been optimizing for considerably smaller caches, so yeah you probably do see the meaningful difference in workloads and which tiers different things would've been intended for. But yeah if you were a heavy multitasker or you were working in a DAW or larger photos or even video at the time, the extra cache could've probably helped.
the COASTs were ment to be alone, not mixed with the cache in the motherboard. Also its very noticeable the lack of cache in faster pentiums. I vivid remember a client who had a Pentium 200 in an Pcchips M519 (Opti Viper Chipset) with no cache in 2002. it was DOG slow, once i added the 256k was like twice as fast for everything. It was maybe the best case-scenario for this as i never seen such a difference again
Indeed, I always thought the same but there don't seem to be any options here for specifically disabling the motherboard cache, either in the BIOS or as a jumper. Disabling the external cache in the BIOS also disables the COAST module when installed. So either this board is a suboptimal design (which might explain the results) or it's handling this automatically somehow in the background.
@@ctrlaltrees There is a third option, which is your COAST module is a big 'ol fake.
@@ctrlaltrees try putting the maximum amount of RAM in then re-run the tests that showed the most improvement with cache enabled VS disabled. Is there a 512K COAST?
True. Also, it was a cheap way (for manufacturers of course) to replace classic SRAM L2 (DIP IC's) into innovative Pipeline Burst SRAM. There are different types/construction of a COAST sticks. I have a few of them and none is able to work with onboard cache - usually switches it off completely.
It is possible that 512k cache allows to cache more ram, than 256k(f.e. 64mb instead of 32). Try to install 64 or 128 mb ram and run cachechk.
I'm pretty sure you're aware of this, but the main reason Quake ran so abysmally slow on non-Pentium CPUs was not that the other CPUs were really that much slower (the later 486s from Intel, and the 486DX4/100 in particular had Pentium-like features integrated already), but the fact that Quake was heavily optimized for Pentiums. I remember Quake running just as badly on (somewhat) later CPU offerings from Intel's competitors, even though these CPUs were definitely faster than the original Pentiums in most ways.
It was a nice way for Intel to consolidate its grip on the x86 CPU market, but it was pretty much based on a lie.
This is very wrong. Quake did target the Pentium FPU, because it had to. The Pentium FPU was that much better than any 486 FPU. 486 FPUs were really that much slower. The P5 stomped any 486 in terms of raw FPU. Before the Pentium/K5 FPUs were a bit of an afterthought.
@@humidbeing I never said the Pentium was bad or slow, only that Quake was heavily optimized for the Pentium CPU in its entirety, not just the FPU. And that is not necessarily a bad thing.
Hoever, had it not been optimized for a single class or generation of CPUs as much as it had, the playing field would have been much more level and the differences in performance would not have been as great.
@@humidbeing no damouze is right. The was an oddity in the pentium fpu, that up until that point, all the other fpu manufacturers thought was pointless. It may have been register renaming, I can't remember now its years since I coded in x86 assembly, let alone x87! Seem to remember all the other fpu took an extra Tstate to swap fpu registers or something. But, even if quake hadn't used this feature strangely often. The fact id the pentium fpu was a weird but very fast design, using 3 state logic rather than binary. That was the cause of the infamous Fdiv problem in the P60 & P66 cpus. 1 was effectively 0 , 0 was effectively -1 and 2 was effectively 1. So when clearing registers read for Fdiv, a mistake was made in a few of the bits(I'm not calling them their actual name for fear of the you tube algorithm) . As a result the Fdiv bug. But this 3 state fpu design was about 50% faster than contemporary binary systems. So, hats off to intel for their weird but fast as duck design.
@@martinbingham-l5m Are you concussed? I'm the one that said the P5 fpu was really fast and not just marketing.
@@humidbeing Concust no. Drunk however? Maybe. I got the names wrong way around. Sorry about that. Happy new year!
The real rarities that I remember from the 90s were the machines with 1MB of cache. It was useless for almost every use case, but the folks who maxed out their systems loved to make sure they could get a system with it. I only remember seeing it on DIP boards. On COAST boards we would go with 512kb instead, an easy upsell for someone who wanted to get the last bit of performance for relatively little money. You always knew someone was looking for the best when an order came in for a system with SCSI hard drives and the extra COAST stick.
I remember one late 486 motherboard allowing as much as 1 MB of external cache to be installed, but it was really just about bragging rights, because as you found out, it made virtually no difference versus the 256K external cache that most other boards supported.
I'm not entirely surprised by the result with 512KB, but I am surprised that not having any L2 at all makes such a small impact, even though according to the cachechk shot you included it has over 60% more bandwidth, and system RAM has almost 70% higher latency. Very interesting.
Maybe that little difference is because the cpu is too slow to show that bottlenecks, the cache becomes more important when the cpu clock sepe became much more faster than ram. CPU increase their clock speeds faster than RAM does, there is a paper about that, it's call: "it's the memory, stupid".
I had similar results on my P75 after I put a COAST module in it back in the day. I always figured I did something wrong while trying to enable it.
It would be very interesting to compare compiling speed for the three cases.
Agreed. Game devs and graphics libraries were probably targeting the most popular configs to hit the market sweet spot. Compilation, finite element analysis, audio/signal processing(?), database workloads might uncork the full potential. Linux should still boot on those things right? Coding up a simple benchmark of probes/second into increasingly larger hash tables should show the benefit.
Thanks Rees! I was always intrigued by these back in the day! Seems 256KB was optimal for anything beyond the norm for gaming, etc. It's probably different for RAM intensive programs but I wasn't really doing much of that when I was 13/14 :)
COAST has got to be one of the greatest acronyms of all time
Lovely looking illustrations/animations with good timing on sounds 👍
I think there’s a decimal point on the 3DBench mark. 102.3 rather than 1023 and 105.6 instead of 1056.
Oops. You're correct! Looks like I misread that. 🙂
@@ctrlaltreesnot surprising, it’s very had to see. 😊
Nice. I very vaguely remember how people used to swear on Cache when I had my 386 - but my cheapo machine of course had none 😅. When I upgraded from there to a Cyrix P200+ in around 98, I don't remember that much talk about Cache anymore. It was all about 3D cards. I try to think of a really slow machine, a 386 with a slow Connor or (even worse at that time at least) Tandon HDD, maybe even a pretty bad mainboard and cheapest RAM and wonder what difference it would have made on that. But I'll probably never go back to revisit my 386 days hardwarewise anyways 😅. Too confident with my Pentium 133 for all my DOS gaming needs. Cheers!
I'd love to see how it moves around large datasets; databases, modifying a number of large tables, etc. vs games. Pull a large table from the drive then modify it a few times and see if the modifications run significantly faster than without the cache. Or working with a large image in something like Aldus Photostyler pre Photoshop
The sticker on the cache module: SNI PC Abg = Siemens Nixdorf PC Augsburg. I did purchase this cache modules back in time 😁
I have a 512KB module on my Skywell i430 board. When my father first got the machine it was installed.
Increasing the amount of cache will expand the maximum volume of memory that can be cached on your board, so if you have a modest amount of RAM then it makes sense that there would be no difference in performance. However, since the cachable area is at the bottom of memory and Windows uses memory at the top, you should see a performance hit with a small cache and a ton of RAM.
Curious as to whether the coast sram is as quick as the onboard sram.. and by mixing the two is why the gains were why it was significantly underwealming in terms of returns on investment. interesting video all the same.
That's a very good question, so I went back to the CACHECHK results to check. Seems both the onboard 256K and the COAST module clock in at a reported 97.4MB/s. Of course this could be a limitation of how the cache is structured on this system vs. how CACHECHK measures it, I suppose.
@ctrlaltrees wonder if there is a means of disabling the onboard 256kb to compare the results of the 256kb coast, a jumper perhaps
I do remember seeing this slot back in the day and asking a computer expert about it with them saying it improves performance but is incredibly hard to find.
My first pentium 133 system has a motherboard that has a COAST slot. Its a dreaded VX board, so even though it can take 128MB Ram, only 64MB is cached, so anything over 64MB you get a performance penalty.
I still have the system but it has been upgraded (200MHz MMX, 64MB Ram, but still haven't installed anything in the COAST slot).
I had a MicronPC 486 that I put an AMD 5x86 in. It had 15ns DIP cache chips. It would *almost* run at 50 Mhz bus speed. If I disabled the L2 it would run. But even with the AMD CPU 'hotwired' to force 4x multiplier it was really slow at 200Mhz without L2 cache. So I set the bus speed to 40 Mhz for a 160 Mhz CPU speed with L2. The 10ns cache chips it needed for 50Mhz bus speed were too expensive.
The 4x multiplier hack for the AMD 5x86 for the 486 socket was very simple. The pin for the multiplier is right at an edge and a Vcc pin has only one pin between it and the multiplier pin. I'd take a short piece of fine wire, wrap it around the 4x pin, bend it out and around the next pin so it didn't touch, then wrap it around that Vcc pin.
That particular PC was the only 486 I ever had with a 50 Mhz bus speed option so I never had another chance to try for a 200 Mhz AMD 5x86 before I got to Socket 7 systems. It also had three VLB slots.
The 5x86 was supposedly a 3.3V CPU but I ran many of them on 5V without any problems, as long as I used a good heatsink with a fan and white compound. I would also file the bottom of the extruded aluminum heat sinks flat. Temperatures would be much lower than the literally blisteringly hot Cyrix 5x86. I once got an instant blister on a fingertip off one of those Cyrix CPU's heatsink when I barely brushed against it.
This is my boomstick?
I was wondering if you ran cachechk after installing the COAST, and compared the results.
Also the motherboard may also need a bios update if there ever was one available for it. The issue with the "faster/fastest" settings is a flag that something is amiss.
Should have mentioned that CoaST was simply a new formfactor for L2 cache to replace the DIP chips that i486 mobos used previously.
Love the Acorn Electron on the shelf. From the onset of the video I'm wondering if with modern chips and the sponsor PBCWay we'll get hand-made modern cache modules that are a lot, lot bigger. But 512Kb is probably the max.
At the time, my friends and I had 486 100Mhz and Quake ran very slowly. One of my friends bought a Pentium 66 and had an advantage in FPS. I installed Linux on my 486 and Quake ran, noticeably, as well as on the Pentium.
If I recall this correctly most boards can´t actually use both on board and Coast together, once you insert the Coast module the on board cache may be disabled. it would be good if you shown that the board actually uses 512k or not. However I think you would benefit from a 512K stick and then maybe you could had used the fastest setting in the bios too
Nicely explained 👍
The addition of cache to processors being a big performance jump from 486 to Pentium is quite interesting since it did help performance.
But increasing the external cache from the 256k to the 512k COAST surprisingly didn't scale. My guess is that the tests performed did not maximize the 512k cache.
On a different perspective, its fascinating how much CPUs are more reliant on the cache since the Pentiums of the 90s. Watching some of Phil's videos where he disables the CPU cache (i.e. a Core 2 Duo or Athlon 64) to get performance closer to a Pentium or 486 really reveals how crippled the CPUs are with more limited amounts of cache, especially when the fastest L1 cache is messed with.
While watching the video I remembered the external cache got eliminated by the Pentium 2 since Intel put the cache chip beside the CPU inside the CPU cartridge then later got integrated to the CPU itself by the Pentium 3 much like how modern CPUs do it.
I do wonder if a compression/decompression test with 256k vs 512k external cache would produce a more noticable difference like how AMD's Ryzen X3D CPU have a big performance boost from the increase in L3 cache on compression/decompression workloads.
Along with compression/decompression, compilation workloads may also scale better like how they do with more modern CPUs.
The increase in cache helping those two workloads I mentioned was why I gone from a CPU with 16MB L3 cache to one with 32MB L3 cache and the performance boost was quite nice along with improved system responsiveness.
Anyways that's it for me and my rambles haha.
Oh, I hated those. Between compatibility issues to bad slot. Well not really a bad slot. But, I remember more than one machine it would lock up or bsod. Tapped around the board. Boom, cache on the stick. Luckily after a short period of time it was soldered to the motherboard. I remember reading about that was one of the reasons why they did that.
I think the reason the COAST didn't help is because you were looking at things that were more CPU or GPU dependent rather than memory dependent. Something like CAD software or other heavily memory bottlenecked programs could show a much bigger improvement, assuming the rest of your system isn't presenting a bottleneck. In Quake, it's clearly the rest of the system that's the bottleneck
So, for the general public that purchase these machines to run games, office applications and delve into internet on the first Pentiums, that extra cache was indeed a waste of money better expend elsewhere, like more RAM, faster GUI accelerator, a bigger HDD, a faster CD-ROM, better sound card, hell even a MIDI module.
Maybe you would need to get into very expensive CAD/CAM, database stuff to realize any gains with a bigger cache, I have read that for a time Pentium Pros with 1MB weren't replaced with PII 300 as these had only 512Kb that runs at slower speed.
@@RetroTinkerer more or less. however Quake indicated that it was heavily CPU limited in the test system. GL Quake would probably have run faster than software rendering with a Voodoo GPU installed. Once those things were addressed, the COAST module might have made a bigger difference, maybe
@@briangoldberg4439 maybe but I wonder if people installed Voodoo 1 on sub 166MHz Pentiums or MMX, I think I had a Cyrix 6x86 PR150, but I upgraded so frequently my CPU back then on that platform that I'm not sure how much time It passed until I got my Pentium Pro 200 (All with the same Verite V1000 + VooDoo1) most gaming builds were using Pentiums MMX.
It would be an interesting experiment nonetheless.
I have the ASUS P/I-P55TP4XE motherboard wirh 256K soldered cache. I also found an COAST module for it but the price seems a bit expensive (180€ NOS). I use 64M of 60ns EDO RAM. Would I benefit from using 512k of cache vs 256k for 64M RAM? How about if I install only 32M RAM?
What are the relation between the amount of installed RAM and the cache amount needed for optimal performance? I searched but got conflicting informations.
Edit: the small difference between 256k and 512k of cache is mostly due the small amount of RAM you have. For those 16M of RAM, maybe 128k of cache would have similar results like those you got with 256k. Try to max out the amount of RAM and then see the difference (redo the 512k cache tests). Not sure about your motherboard, but some maxed out to 64M, altough most socket 7 motherboards I know supported up to 128M EDO RAM.
Interesting! I think I have a similar slot on the Apricot VS340. I suspect it may be for this purpose. Looking at the lack of performance boost you got I don't think I'll bother hunting down a compatible cache stick 😄
To calculate the fps of Doom, did you use the doomfps utility that's in Phil's DOS Benchmark Pack?
a fourth test i thought would have been good, test the stick without the onboard cache
would be interesting to see 32,64,128 modules, to see if there is any scaling
COAST modules always seemed a bit disappointing compared to motherboards with DIP SRAM chips. You'd think if they were going to introduce a new form factor and the modules were going to be rather expensive they'd have worked harded to optimise the speed.When the PII came out they managed to get the off die cache to run at half the speed of the CPU at the cost of having to package the whole lot in a slot rather than a socket. Interestingly the Pentium Pro had separate CPU and cache dies packaged an oversized socket and the cache run at the CPU clockspeed. I guess the PII's PCB was cheaper than the slightly exotic multi chip module the PPro came in.
But why was, your Tiny computer, so large?
Nice reminder how bad Quake was on 486 (486 "Dx2" 100 in my case, 2x50MHz as I was experimenting with low-budget overclocking in my poor times - AFAIR it was ~10-11 FPS in my case).
This is how I actually played it back in 1996! Well, briefly, until I begged my parents for a new PC... 😅
DX2 would have been 50 or 66mhz, not 100. 100 would be dx4.
@@ghostdog662 dx2 name was "dx2" as a joke, in fact it was Am486Dx4VT8 (3x33) but I was able to run it 2x50MHz.
This matches my own benchmarking. Back when I was a kid I lamented not having maxed out L2 cache. Now I'm glad my family didn't waste money on it. It's really not a noticeable difference except in very specific cases.
Did they ever make a 1 mb cache?
Ah that module should help you COAST through Doom in playable FPS... Maybe
I remember it well
Is this async or pipelined burst cache. Pburst cache should have a better result.. I the video nothing is said about that unfortunately..
I wonder if CPU core bacame the next bottleneck, and extra cache made no difference. Maybe try the same setup with 200MHz Pentium.
Conventional wisdom is that you have to increase speed by 50% or more to notice a speed difference between systems.
back in the days as far as I could remember it was a better deal to get a newer and more powerfull Pentium CPU and selling your old ones used on a local BBS than buying a cache module
‘I boost my speed with cache on a stick’ - Bart Simpson, probably
*polite clapping*
Sorry to say COASTs were shit, the number of failures caused by them either getting lose, the contacts becoming tarnished etc was unreal.
Indeed, I've certainly seen all sorts of instability attributed to dodgy modules in the past - not to mention the fact that using the wrong one could fry the motherboard 😅
If you want t9 see a fun thing with Cache, take a Petium 166MHz an compare it with a Pentium MMX 166Mhz (which doubled he L1 cache) and it made a decent uplift.
I have a few hunches here: (1) Most games optimized for the 8K L1 the pentiums had, (2) if (1) is wrong then they optimized for smaller L2 caches. Thus wouldn't be hitting a 512K. (3) the motherboard's cache implementation may be rather stupid and directly mapped to physical pages, so unless you have the board completely maxed out... it probably doesn't even use the COAST module. This seems the most likely case. (4) your GPU might be holding you back here. Try something with really good VGA and OpenGL performance? But, realistically test to see if it's the bottleneck. The performance on Quake is what gives me the hunch on this. Quake should have scaled better.
I remember that time, and when I went to buy my last Pentium I system, the seller already clarified that unless I put an exaggerated amount of RAM for the time, the additional cache module was not justified at all. I'm not surprised by the result at all, perhaps in a system like the latest MMX and running a true 32-bit system like Windows 2000 things will change, in Win95/98/Me it's a waste of money for that stick. At that time the real bottleneck was the horrible chipsets that were sold in the cheapest motherboards like the infamous pcchips, which slowed down any processor you put on it.
1990's hdd were loud asf
I'd be interested to see the results on removing the 256k
Needs a k6-2 400...
I used to build these in 96 😂😂
I've noticed that your video has multiple audio tracks for different languages. Did you add them yourself or is it some AI doohickey UA-cam has forced in?
Yeah, that's a new UA-cam feature. Not sure how I feel about it to be honest but I've left it enabled for now as I've heard that the audio tracks are pretty accurate, at least.
As a computer scientist at least that what 20 plus year old degree says and most time work in some form of IT, that was damn fine high level explaintion of cache and other memory, I tip my hat to you :)
That motherboard was bad, but at least it wasn't using fake cache which was common. No way to discern it apart from bench-marking, fooled at lot of people. No ability to disable the onboard cache and run with just the coast module at it's fastest settings in BIOS is bad though. Also Quake really is just an FPU demo which really doesn't benefit much from a large cache. Prime95 with small FFT's really is the bench to go to.
How does a computer operate? It's simple really. Just a bunch of 0s and 1s. Hehehe....
I'll show myself out now. 😉
TIL external cache existed lol
Isn’t this an L3 cache?
FPM or EDO, the old story was that EDO without cache was supposed to be as fast as FPM with cache
I think you dont understand what CACHE memory is and how is used on a PC
Hello, I am not happy with this explanation for any reason.
So, the original Readyboost
Too bad you lack the education to know how to test COAST properly. You should of done some I/O tests like a copy from one physical disk to another physical disk. Knowing the difference in purpose of L1 and L2 cache would have gone a long way to help you understand what you were even testing.