Yes, I'm aware :-). And it's Chrome that ultimately depleted whatever it was runing out of, not Doom itself. But since doom requires Chrome, it's a built in cost for this version.
@@demo9750 There is more in this world beyond what you could possibly imagine. And of what you could imagine, you will only have time to meaningfully understand a fraction of the topics, and of the topics you do explore and gain understanding in, you will only gain reasonable mastery of a few. And you know what? It's awesome. So instead of "Where have you been?" - Try Something like: "Hey, there is the GZDoom version that you might want to check out for tests like this". Comes across a lot better.
Considering Doom was designed to run on a 30mhz CPU from 3 decades ago, I'd be astonished if the hardware wasn't powerful enough to run over a *thousand* copies of Doom simultaneously.... if it wasn't bogged down by the web browsers and virtualization...
Yeah, i normally like daves videos but tbh this is a non starter. Its like trying to drive a rally car over a bog and then saying its not actually a fast car
You probably could run a lot instances of Doom by running it in a source port. Chocolate Doom is very close to DOS Doom and runs on modern OS. Could probably spin up tons of containers running Chocolate Doom inside them if it doesn't like multiple instances running on the host machine.
More fun to run then under one OS to see how the scheduler handles it. You could probably pin each instance to one core (or thread) and if you rely on the software renderer, it'll be the true test. Also still curious as to how the Crysis software mode runs nowadays. I know LTT did a video on a Threadripper years ago but they've grown up since then.
My i7-9700KF processor from 2019 needed around 60-70% to run 70 Chocolate Dooms tiled using the original 320x200 resolution and software rendering. There's a vid on my channel and a python script if you want to try it yourself. :)
For the curious, I also tried with chocolate doom and one other, and got better results with chrome... but am still looking for something lightweight enough to run more of them!
For lightweight source ports, there's FastDoom which can run in glorious 80x200 resolution on a 386. If you want something goofier, there's also PhoenixDoom and PsyDoom which are PC ports of the 3DO and PlayStation versions
Maybe try with Chrome flags "--in-process-gpu --process-per-tab" to avoid Chrome serializing GPU access from multiple tabs from the same server. Note that Chrome will probably eat even more RAM than usual so 128 GB may not be enough for 500 windows. Another fun test would be to add "--disable-gpu" to force CPU to set the visible pixels on screen, which should improve CPU load quite a bit.
Actually a lot more than 10x, because that's without PBO on. The 7980X/7985WX can get 115k-120k with PBO on with simple water cooling. For the 7995WX with PBO on, it will get something like 140-150k depending on how good your cooling is. With LN2 the world record is 200k with the 7995WX (6ghz on all 96 cores). Threadripper overclocks like a beast.
@@mrturret01No question, but running the OG code with emulated sound hardware etc. would be pretty cool in my books. Depends on what you want test I guess, either way it'd be a lot more interesting than Chrome.
DOSBox does like 1 thread..., but I thought it needed plenty from 1 thread? But then, doom was never heavy...reaches back and remembers computers it was heavy for... LOL!@@CptJistuce
The Doom test brought back some memories: my first PC was a 386SX, clocked at 16 MHz. On it, I had Windows 3.1 and an early version of Linux (1.0 kernel, I think). Once I installed Xwindow, I discovered that that the share ware version of Doom was available. Out of curiosity, I tried running multiple instances in windows. Two was tolerable, three was unusable. On the same machine, building the Linux kernel took the better part of an afternoon!
My favorite so far was a take on 'can it run crysis' where someone used the vram of a 3090 as a ram drive while also playing the game. I think it was a 3090, maybe ti, it was one of the 24GB cards of late. I thought it was cheeky
I remember playing doom on 486 when it came out on my friend's computer. I would call you insane that one day you can easily run 200 simultaneous dooms on one computer :D Thanks Dave!
Same thought here. And I think 99.9% of the people don't even realize what's happening here in terms of cpu power. Man, if you told me around 1995 that this would be possible, people would go crazy and this would be world news. Now it's 'just' a video and by chance people pass by this channel and take it probably for granted.
Reason? Because you can! :) I remember optimizing my system to run a single copy of doom properly back in the day. Threadrippers and EPIC CPUs have come a long way. Obtaining the CPU in a workstation setup like this might be the best way to actually get a system like that. The threadripper sockets, tightening the socked down and placing a cooler is a bit of a nerve wrecking experience.
Should've tested with PBO on. The 7980X/7985WX can get 115k-120k with PBO on in Cinebench R23 with simple water cooling. For the 7995WX with PBO on, it will get something like 130-140k depending on how good your cooling is and this is without tweaks. With LN2 the world record is 200k with the 7995WX (6ghz on all 96 cores). Threadripper overclocks like a beast.
Most people focus on it being 500 instances of Chrome, but actually even if they're 500 windows they're likely actually running under the same core process. It _may_ be threading each tab separately, but it also may not as I've seen it grouping multiple tabs into a single process. You'd need a browser that actually allows for multiple instances (or multiple browsers? maybe just multiple exe names would do the trick?) in order to run the game. Yes, that does mean having the actual overhead of 500 whole instances, but then it also means that the single instance that would be the limiting factor would be eliminated.
Actually, simpler. Maybe not giving each one their instance but still speading it out. Making the code open it on various browsers at the same time, say having 5 distinct browsers defined so setting 100 would open 5 different browsers times 100 giving us 500 copies spread across 5 core processes. Even if they're naturally opening round-robin, they'll still slot in to their core process properly. (or you could just... y'know, run a source port lol)
With copy on write, multiple copies of the same pages don't take any extra memory, the pages just get mapped to the same place, so it's not nearly as bad as folks imagine...
On the dedicated server end of things, this bodes pretty well. Would love to have a system with that many cores, but we would need probably 2 to 4 TB of RAM. That'll probably be available soon.
That's a really nice setup. I have a dual Xeon E5-2699a v4 with 1TB of DDR4 RAM to play with. For the high number of cores you have, I'd highly suggest increasing your RAM from 128GB to at least 256GB or even 512GB to take full advantage of all of those cores.
Those Xeon E5 V3s and V4s have becone cheap as hell... but I lost track of the subsequent xeon generations... gold... silver... platinum... very confusing... I'm tempted to go back to a dual socket-setup but the V4's are too old...
I have a realtime muti-threaded FFT/spectrogram program that will almost peg any cpu (at certain settings). As far as raw single precision floating point performance, Apple does really well pushing floats around compared to zen3 cores.
@@Zangetsu_X2 On the private side, I use it to validate my math for high order filters, normalized lattice filters, half bands, IR capture, etc. On the public side, I made a bunch of videos testing guitar peddles during its development. There is a free version available for mac/linux on the channel. I put up 6 months ago, but the spectrogram stuff was not ready at the time so not included. It is basically something like 'Smaart' or 'REW' (room equalization wizard) but I do all the math 60 times a second rather than once when a button is pressed. Since the channel only got 91 subscribers in about 2 years, so it is abandonware or something to occasionally scratch my programming itch.
Thanks for win95!! That 640k problem became a thing of the past. People don't know...Having to make several boot floppies for different games and programs.
I think I'm seeing yet more anectdotal evidence to not worry too much about the automagic threading/hyperthreading optimization issue in video game land. Thank you!
I loved this. I make Tool-Assisted Speedruns for Doom and a large part of that process is brute forcing large input ranges to test for specific outcomes in order to make certain movement tricks work. Modern source ports and tools are amazing and can do amazing things. Doom is a single-threaded task, as are its source ports and the tools based on them, and it's possible to break up a large brute force range into many smaller ones - on my 7950x (16c/32t) system, running 32 processes in parallel scales beautifully and has been an incredible leap forward in productivity. The time taken for the brute force processes depends on the complexity of the map. Recently I made a very small test map to verify if an idea might work. 32 parallel brute forcing tasks each ran through 165,683,700 input possibilities spanning 32 frames, with the longest taking 19811 seconds. This means that just shy of 170 years of in-game time was simulated in just over 5 and a half hours. It's crazy to think about what this Threadripper could do.
Doom was the first game I had to upgrade for. I bought a Cyrix chip. Doom was the first game I upgraded twice just to play. Watching 200 instances run - in a browser, is quite the stark contrast.
You could probably do 90 - 95 single core virtual machines each running a full instance of windows 95 and doom95. It would only take about a hundred years to set up.
How about loading it up with DOSBox, FractInt version 20+, edit the virtual video resolution to 20,480x20,480 pixels, then generate a highly detailed fractal with and without "co-processor" use? You can press the TAB key to see how long the render is between manual save operations. Then using batch mode, follow their instructions for auto generation of a deep zoom up to 1 trillion zooms deep. Throw in some color cycling for good measure. Maybe play around with the other methods of rendering the fractal in the same deep zoom at maximum virtual video resolution. Then show a comparative chart for the results.
I have a couple of old HP Desktop workstations. Nothing special specs wise but the build quality of the hardware was a real step up from over the counter cases I had tried before. Very nicely thought out and nicely pressed metal nicely put together.
I despise HP, but I still use a 2540p elitebook from more than a decade ago. An absolute tank, it bounces around in the passenger's seat of my cars more often than not, running tuning software with it's little fan just screaming trying to eat the passenger's seat. OG battery, it will last more than 30 minutes like that still! Anything consumer grade? Pure trash.
Hi Dave! Well, it seems you might want to work on a fun doom project. As others mentioned, other versions of doom that might work with a much more hardcore doom testing, I really hope you will have the time and will to provide us a deeper dive of a multiple doom runs. Thank you for everything comes into your mind and ends up on our 'tele vision'! :)
13:30 I've been a Linux user since 1995 and can remember when a kernel compile took about 24 hours. IIRC that was on a 386SX-16 with 1MB of RAM. My how times have changed!
Hmm, instead of running doom on this pc... here's a new challenge. Write a custom version of Conway's Game of Life and within Conway's Game of Life build an actual Intel 386-486 CPU that can run Dos 6.0. Then have it install and run Doom within the CPU/PC that is within Game of Life while trying to have it render at about 30-60 frames per second. Don't forget about the audio too... If it can do that, then it's a beast!
Your system for running and automating the Doom benchmark was super impressive! I wonder if it would've been possible to use instances of DosBox running Doom instead, and how that would affect the benchmark? I might have to try it out myself sometime
As in multiple vm's ? The RAM would have been swamped --> I have 128GB of ram in my workstation; running too many vm's concurrently is it's Achilles heal 🧐🤔🧐
@@chionyenkwu2253 128 gigs is plenty for some MS-DOS VMs. I can't imagine there's over 64 megs of overhead in each DOSBox instance so that gives you 2048 copies if I'm not mistaken.
Reminds me of the demo for one of the early commercial Atari 800 emulators on windows. They basically selected a folder with hundreds of floppy images, and launched the emulator for every disk image and had them all tiled. A native port of the open source version of doom would probably be better than steaming it via a memory hungry browser. Or even 1000 copies of dosbox each running a copy.
This might seem crazy, but there's actually speedrunners doing a lot of simulations of Doom, to see what is possible in-game, so this use-case makes perfect sense ;)
ZeroMaster's infamous hundred-percent-kills tool-assisted run of the Nuts Doom game map springs to mind -- at one point he was down to one frame every few seconds in real-time because there were so many sprites on the screen. As the MadBoy himself said in the uploaded video's comments -- "This can be done non-TAS (tool-assisted), you just need an actual super computer optimized to play doom and about 6 hours."
Come on, you missed a great opportunity here Dave! Spawning 100 Doom's, then, via Python, make it so that each window of doom actually represents that part of the screen, tie it all together, so that your keyboard input gets piped into all 100 instances, and see how that looks :). But perhaps, the cropping of each window is not possible, but it would be cool. Also, the input to all the 100 instances probably wouldn't be synced, but none the less, it would be cool
The last time I did something like this was when I got an original Amiga (used in 1987 upon graduating from HS) and wanted to just see its pre-emptive multitasking feature in action (having only owned a Commodore 64 up to that point in time). I launched as many copies of the analog clock that I could fit visibly on the screen to watch them slow down their update cycles to multiple second intervals as the 8MHz 68000 began to struggle to meet realtime performance.
The extrapolation I am implying by sharing this anecdotal experience is that getting a Threadripper for Dave is just as exciting as a teenager getting a pre-emptive multitasking OS to use at home in the late 80s. LOL
A machine like this is perfect for doing scientific work, such as MPAS and WRF numerical weather prediction models - they are scalable from supercomputers down to a desktop, via OpenMP as an example.
00:14 🎮 The Threadripper Pro 7995WX with 96 cores and 192 threads can run 100 copies of Doom simultaneously on the HP Z6 g5a workstation, handling it with ease. 02:28 🛠 Setting up the server side involved using Ubuntu under Hyper-V to run the Doom server, spawning multiple browser windows pointing to it. 05:01 🕹 The Threadripper 7995WX managed 200 copies of Doom well, running at reasonable frame rates with 10% CPU usage and around 20% GPU. 06:36 🚫 Pushing it to 500 copies, the system struggled, experiencing issues with graphics rendering and potentially running out of system resources. 07:48 📊 The benchmarking involved Geekbench 6, Cinebench R23/R24, showcasing the 7995WX's impressive single-core and multi-core performance, surpassing previous processors in tests. 11:04 🌡 Under max load, the CPU maintained temperatures around 72-80°C, handling a 350W power draw while maintaining clock speeds of 3,000-3,200 MHz. 12:38 ⚙ Compiling the Linux kernel on the 7995WX demonstrated exceptional speed, completing the task in 28 seconds, more than twice as fast as the 3970X. 13:48 💻 For serious professional work, the RTX A4000 GPU proved capable, sitting between an RTX 3070 and 3080 in performance, ideal for tasks like video editing.
Fun times as always... Another retro software on modern hardware idea that's been percolating in my head --- and I suspect you might be one of the best equipped to pull it off if it's actually possible... Windows 95 only needs 4MB of RAM to run, which is well below the L3 cache numbers for just about every processor out there. I've been wondering what it'd take to coax it to run on a system with no installed RAM ;)
@@Azeazezar Do not think the motherboard chipset will run without RAM, though you can just take the RAM and update the ID chip to have it report that you only have 16M of RAM on the chip, which would allow the PC to boot, though you likely would have to write a custom BIOS, basically making an old 1990's era BIOS run on a modern multicore processor, which would mean having to add in the core operating logic, to set up all the registers on the north and south bridge, to set them up like they need, then start to run the old bios as a direct task on it, and boot from there. Would need an old PCI graphics card, and also a PCIE to PCI adaptor as well, or even ISA and PCIE to ISA adaptor, to allow the VGA routines on the video card ROM to run. After all modern graphibns cards no longer likely support the older VESA standards, though it will be fast. Did think, when Pentium processors got past the 1GHz mark, that you could actually put a simple program into a ROM, and boot off it, to use that fast bus speed to directly synthsise FM radio carriers, and also the audio modulation, as a dual core could run fast enough to have one core do the DSP to read in audio, and the other core do DDS to generate the RF waveform direct, modulation and all. Just a simple low pass filter on the output needed. 300MHz processors could easily do AM radio as well, with a simple resistor DAC and a latch, with the enable being simply a write to any high order address line. After all, with 32 bits, and only needing 12 for the ROM, you can be very wasteful with address decoding if you need speed, and really only have 4 actual peripherals to interface.
It depends on what you mean by no installed RAM. Intel Xeon Max CPUs have 64 gigs of HBM and will boot no fuss without any DIMMs installed. If you install DIMMs the HBM can be used as an extra layer of cache.
I'm convinced that even using software rendering and a good lean engine you could probably run thousands of copies of Doom. I'd LOVE to see how far you could push it.
In its original all-software form DOOM runs great on a 100MHz CPU. If you have almost 100 cores, each at 2.5GHz that comes to 2500 instances. When you consider logical cores and IPC improvements I'd expect the equivalent to actually be more like 20k copies.
There's still a lot of other I/O going on too. Doom itself will run on 486/66 (or you could even slum it on a 33 at lower frame rates), but then you have all of the drawing to VGA etc.@@eDoc2020
The difference between men and boys is the price of their toys. And what a nice toy. The compile test was enlightening as it shows off the new system in a real world usage case. What happens with the old system?
Duke Nukem 3d is a similar game, and there is a reengineered version that runs on modern Windows directly. Edit: it also includes map editor, and given that behavior of in-game characters is also editable, it makes it possible to create demos that run automatically
Amazing that you can make a video with this machine when I can't even order one from HP. Can't imagine what it cost since all I can find on the interwebs is the probable cost of a 7995WX by itself is about 10 grand.
Very neat. Maybe someday an audio performance test will be run on this chip as well with something similar to Dawbench. That would require a decent audio interface as well, since measurements are done at different buffer sizes (2^x samples usually starting at 32 samples). If you have some time for it, great. If not, oh well. Someone will try it I hope.
I wonder if a triple-slot RTX 4090 would fit in the HP case. The power supply connectors and max wattage on the graphics rail would also be a consideration.
I'm pretty sure you can not. All those OEM workstations are normally following the official PCIe specs. Same with my workstation (Dell 7865). The fastest graphics card I could fit was a RTX 4070 FE. And that is very, very tight. Everything else is way too big. There is a pretty small OEM 4090 from Dell that might fit. But you cannot buy it separately - but only within a Alienware gamer PC as far as I know.
I’m impressed with how quickly it compiled, when my poor old HP c8000 takes 24 hours to compile a Linux kernel. Goes to show how much more powerful things are 20 years later. My c8000 plays doom so I guess we should setup a ’death match’ at some point.
Wow, how ironic. Just a couple nights ago, I grabbed a Delphi implementation of DOOM. And hit a brick wall when it came to unicode support, or the lack thereof. I was literally looking to include it as a random part of my game collection.
Dave please do a video on John Carmack! You guys were contemporaries dealing with some of the same issues from different sides I'm curious your thoughts.
Would be great to see benchmarks per CCD and and also the ability to test groups of CCDs, possibly running some workloads in a checkered board pattern with much higher clocks could be interesting, though I suspect AMD already found the sweetspot
The cinebench score seems quite low compared to other reviewers I think that thing needs some more cooling dave also let's have a look at the bios on that badboy
Since a 16Mhz or 20MHz 386 kind of struggled to run Doom (one had to reduce the size of the rendered window within the game to get any sort of reasonable performance out of it) and it ran pretty smoothly on a 33MHz 486, this benchmark is actually a fairly entertaining way to determine how much more powerful a modern system is relative to an early to mid 90s higher end desktop system. Shall we call it DoomBench which measures performance in DoomMarks? :P
As others have already pointed out, running 500 instances of Chrome was the actual stressful factor. Not the copies of Doom.
Yes, I'm aware :-). And it's Chrome that ultimately depleted whatever it was runing out of, not Doom itself. But since doom requires Chrome, it's a built in cost for this version.
So what is the point of re-stating what others have already stated?
@@Katchi_ Catharsis.
@@DavesGarageI can't believe you haven't heard of GZDoom, where have you been?
@@demo9750 There is more in this world beyond what you could possibly imagine. And of what you could imagine, you will only have time to meaningfully understand a fraction of the topics, and of the topics you do explore and gain understanding in, you will only gain reasonable mastery of a few.
And you know what? It's awesome.
So instead of "Where have you been?" - Try Something like: "Hey, there is the GZDoom version that you might want to check out for tests like this". Comes across a lot better.
A hundred instances of Chrome might stress the machine more than a hundred instances of Doom....
TBH even 1 instance of Chrome might do that !
It’ll melt
Maybe he should have tried dosbox instead?
Recreating the infamous LTT experiment "How much chrome tabs you can open with 2TB RAM" in 2023 would be very interesting
Might?
Considering Doom was designed to run on a 30mhz CPU from 3 decades ago, I'd be astonished if the hardware wasn't powerful enough to run over a *thousand* copies of Doom simultaneously.... if it wasn't bogged down by the web browsers and virtualization...
if all resources were utilized to their fullest, this could probably run a hundred thousand copies before it starts to drop frames.
Yeah, i normally like daves videos but tbh this is a non starter. Its like trying to drive a rally car over a bog and then saying its not actually a fast car
@@GraveUypo using a modern gpu to offload work from a modern 192 thread cpu to run vanilla doom is kek; and deserves own video.
“Can it run Crysis?”
Actually Doom required a lot more than 30mhz. 30 mhz did not exist biut 33 mhz did. 486 dx 33 with pci graphics card could run a decent frames.
CPU benchmarking videos sure are improved by a lack of obnoxious ads and no annoying host.
poor linus just got burned 🔥🚒
i actually thought he meant gamers nexus xD @@xxPow3rslave
@@DiverseGreen-Anon I was actually thinking precisely of both.
@@icarvs_vivit valid
Can I please add that Jay bloke?
You probably could run a lot instances of Doom by running it in a source port. Chocolate Doom is very close to DOS Doom and runs on modern OS. Could probably spin up tons of containers running Chocolate Doom inside them if it doesn't like multiple instances running on the host machine.
yeah I am pretty sure you can run more then 100 instances of Doom cause the cpu is running so fast plus has 96 cores
More fun to run then under one OS to see how the scheduler handles it. You could probably pin each instance to one core (or thread) and if you rely on the software renderer, it'll be the true test.
Also still curious as to how the Crysis software mode runs nowadays. I know LTT did a video on a Threadripper years ago but they've grown up since then.
I wouldn't put past it running 100 chocolate dooms per core
@@affegpus4195 what about 2500 per core?🤣
My i7-9700KF processor from 2019 needed around 60-70% to run 70 Chocolate Dooms tiled using the original 320x200 resolution and software rendering. There's a vid on my channel and a python script if you want to try it yourself. :)
We had the Raymond interview AND we get a Threadripper review? What a week!
Please keep up the FANTASTIC content coming Dave (and thanks!)
For the curious, I also tried with chocolate doom and one other, and got better results with chrome... but am still looking for something lightweight enough to run more of them!
What gpu does this very nice hp workstation have in it
For lightweight source ports, there's FastDoom which can run in glorious 80x200 resolution on a 386.
If you want something goofier, there's also PhoenixDoom and PsyDoom which are PC ports of the 3DO and PlayStation versions
First time seeing your channel and all I Gotta say is sick. Computers/Code/Tech and cars. Oh yeah
Maybe try with Chrome flags "--in-process-gpu --process-per-tab" to avoid Chrome serializing GPU access from multiple tabs from the same server. Note that Chrome will probably eat even more RAM than usual so 128 GB may not be enough for 500 windows. Another fun test would be to add "--disable-gpu" to force CPU to set the visible pixels on screen, which should improve CPU load quite a bit.
Viva Firefox
Sheesh, that's crazy powerful! Cinebench R23 score is 10x of my R5 3600's score. Insane.
The 96 core beast costs a LOT more than 10x R5 3600s.
@@pompeymonkey3271 Yeah, I looked at the price... I can buy a decent motorcycle for that amount....
@@pompeymonkey3271 Eh, it's just 13k USD here, chump change lol
Actually a lot more than 10x, because that's without PBO on. The 7980X/7985WX can get 115k-120k with PBO on with simple water cooling. For the 7995WX with PBO on, it will get something like 140-150k depending on how good your cooling is. With LN2 the world record is 200k with the 7995WX (6ghz on all 96 cores). Threadripper overclocks like a beast.
Seeing it run every major version of Windows at the same time would be fun!
The web browser and OS instances are probably adding a huge amount of overhead. I'd use a simple souce port like PRBoom.
I'd use DOSBox, personally.
@@CptJistuce That's going to add more overhead than just running 100 instances of PRBOOM
@@mrturret01No question, but running the OG code with emulated sound hardware etc. would be pretty cool in my books. Depends on what you want test I guess, either way it'd be a lot more interesting than Chrome.
DOSBox does like 1 thread..., but I thought it needed plenty from 1 thread? But then, doom was never heavy...reaches back and remembers computers it was heavy for... LOL!@@CptJistuce
Running Doom in the taskmanager, with the cpu utilization tile would have been more fun lol
@DavesGarage that is the challenge... :D
Superb episode! Feeling lucky you decided to start the channel!
The Doom test brought back some memories: my first PC was a 386SX, clocked at 16 MHz. On it, I had Windows 3.1 and an early version of Linux (1.0 kernel, I think). Once I installed Xwindow, I discovered that that the share ware version of Doom was available. Out of curiosity, I tried running multiple instances in windows. Two was tolerable, three was unusable.
On the same machine, building the Linux kernel took the better part of an afternoon!
This is so awesome, Dave! I sincerely hope that your DoomBench becomes an industry standard for high-performance machine characterization.
My favorite so far was a take on 'can it run crysis' where someone used the vram of a 3090 as a ram drive while also playing the game. I think it was a 3090, maybe ti, it was one of the 24GB cards of late. I thought it was cheeky
I remember playing doom on 486 when it came out on my friend's computer. I would call you insane that one day you can easily run 200 simultaneous dooms on one computer :D Thanks Dave!
I remeber it being sluggish on my low end 486 back then. So I was thinking the same.
Same thought here. And I think 99.9% of the people don't even realize what's happening here in terms of cpu power. Man, if you told me around 1995 that this would be possible, people would go crazy and this would be world news. Now it's 'just' a video and by chance people pass by this channel and take it probably for granted.
28 seconds(!) I remember building new kernels over night back in the 90’s 😅🔥
Reason? Because you can! :)
I remember optimizing my system to run a single copy of doom properly back in the day. Threadrippers and EPIC CPUs have come a long way.
Obtaining the CPU in a workstation setup like this might be the best way to actually get a system like that. The threadripper sockets, tightening the socked down and placing a cooler is a bit of a nerve wrecking experience.
Should've tested with PBO on. The 7980X/7985WX can get 115k-120k with PBO on in Cinebench R23 with simple water cooling. For the 7995WX with PBO on, it will get something like 130-140k depending on how good your cooling is and this is without tweaks. With LN2 the world record is 200k with the 7995WX (6ghz on all 96 cores). Threadripper overclocks like a beast.
15 years later Dave's garage:
We played 100 copies of Crysis with 512 core threadripper CPU.
Most people focus on it being 500 instances of Chrome, but actually even if they're 500 windows they're likely actually running under the same core process.
It _may_ be threading each tab separately, but it also may not as I've seen it grouping multiple tabs into a single process.
You'd need a browser that actually allows for multiple instances (or multiple browsers? maybe just multiple exe names would do the trick?) in order to run the game. Yes, that does mean having the actual overhead of 500 whole instances, but then it also means that the single instance that would be the limiting factor would be eliminated.
Actually, simpler. Maybe not giving each one their instance but still speading it out. Making the code open it on various browsers at the same time, say having 5 distinct browsers defined so setting 100 would open 5 different browsers times 100 giving us 500 copies spread across 5 core processes. Even if they're naturally opening round-robin, they'll still slot in to their core process properly.
(or you could just... y'know, run a source port lol)
With copy on write, multiple copies of the same pages don't take any extra memory, the pages just get mapped to the same place, so it's not nearly as bad as folks imagine...
On the dedicated server end of things, this bodes pretty well. Would love to have a system with that many cores, but we would need probably 2 to 4 TB of RAM. That'll probably be available soon.
Already is with epyc.
When Dave said "Thirty two megabytes of memory total" I perked up. But no, gigabytes. For a second there I was going to nominate JC for a noble prize.
If you award a noble prize, make it novel as well.
At the end of 100 copies I and the subtitles thought you said 68 percent, you in fact said 6 to 8 percent. Insane.
the biggest bottleneck is chrome eating up all of your ram
easy fix just slap 256 gbs of ram into that machine cause I have 128 gigs of ram so he should have more since he's got more cores then me🤣🤣🤣
@@SaraMorgan-ym6ue Did you watch the video? He was at 32GB of RAM used
@@ChrisSmith-rm6xl They just wanted to mention they have 128gbs of ram i think.
i love how dave's house is so high tech he had to upgrade the workstation
Haha, get that pleb 10gbps e-waste outta here!
That's a really nice setup. I have a dual Xeon E5-2699a v4 with 1TB of DDR4 RAM to play with. For the high number of cores you have, I'd highly suggest increasing your RAM from 128GB to at least 256GB or even 512GB to take full advantage of all of those cores.
2gb per thread would probably be optimal
Those Xeon E5 V3s and V4s have becone cheap as hell... but I lost track of the subsequent xeon generations... gold... silver... platinum... very confusing... I'm tempted to go back to a dual socket-setup but the V4's are too old...
I love your video, this is what i looking for, high-end amd threadripper in pre-built workstation desktop. I'm glad i'm found your video
I have a realtime muti-threaded FFT/spectrogram program that will almost peg any cpu (at certain settings). As far as raw single precision floating point performance, Apple does really well pushing floats around compared to zen3 cores.
Yo, what do you do with the FFT spectrogram stuff? Sounds really interesting.
@@Zangetsu_X2 On the private side, I use it to validate my math for high order filters, normalized lattice filters, half bands, IR capture, etc.
On the public side, I made a bunch of videos testing guitar peddles during its development. There is a free version available for mac/linux on the channel. I put up 6 months ago, but the spectrogram stuff was not ready at the time so not included.
It is basically something like 'Smaart' or 'REW' (room equalization wizard) but I do all the math 60 times a second rather than once when a button is pressed.
Since the channel only got 91 subscribers in about 2 years, so it is abandonware or something to occasionally scratch my programming itch.
Thanks for win95!! That 640k problem became a thing of the past.
People don't know...Having to make several boot floppies for different games and programs.
The best rundown of the 7995wx! Good stuff 😊
I think I'm seeing yet more anectdotal evidence to not worry too much about the automagic threading/hyperthreading optimization issue in video game land. Thank you!
@Dave's Garage Love the choice of tyre on those rims. Michelin Pilot Sport. Perform almost as well as the 7995WX Pro
I loved this. I make Tool-Assisted Speedruns for Doom and a large part of that process is brute forcing large input ranges to test for specific outcomes in order to make certain movement tricks work.
Modern source ports and tools are amazing and can do amazing things. Doom is a single-threaded task, as are its source ports and the tools based on them, and it's possible to break up a large brute force range into many smaller ones - on my 7950x (16c/32t) system, running 32 processes in parallel scales beautifully and has been an incredible leap forward in productivity.
The time taken for the brute force processes depends on the complexity of the map. Recently I made a very small test map to verify if an idea might work. 32 parallel brute forcing tasks each ran through 165,683,700 input possibilities spanning 32 frames, with the longest taking 19811 seconds. This means that just shy of 170 years of in-game time was simulated in just over 5 and a half hours. It's crazy to think about what this Threadripper could do.
That sounds EXTREMELY cool. I'd love to see a video explaining what you wrote from start to finish😊
dave, man, im happy that today i found your channel!
Now that was one very informative video. I kinda want one of these on my workstation now!
Doom was the first game I had to upgrade for. I bought a Cyrix chip. Doom was the first game I upgraded twice just to play.
Watching 200 instances run - in a browser, is quite the stark contrast.
You could probably do 90 - 95 single core virtual machines each running a full instance of windows 95 and doom95. It would only take about a hundred years to set up.
Could probably set it up in a few minutes with batch files~
I thought we'd at least be talking about the 2016 Doom version 😅 but very nice video anyway 👍 you gained another sub, sir!
How about loading it up with DOSBox, FractInt version 20+, edit the virtual video resolution to 20,480x20,480 pixels, then generate a highly detailed fractal with and without "co-processor" use? You can press the TAB key to see how long the render is between manual save operations. Then using batch mode, follow their instructions for auto generation of a deep zoom up to 1 trillion zooms deep. Throw in some color cycling for good measure. Maybe play around with the other methods of rendering the fractal in the same deep zoom at maximum virtual video resolution. Then show a comparative chart for the results.
You know I haven't found a good fractal software like fractint for windows. There's a few that are ok but none are as good/as many features and free.
I have a couple of old HP Desktop workstations. Nothing special specs wise but the build quality of the hardware was a real step up from over the counter cases I had tried before. Very nicely thought out and nicely pressed metal nicely put together.
I despise HP, but I still use a 2540p elitebook from more than a decade ago. An absolute tank, it bounces around in the passenger's seat of my cars more often than not, running tuning software with it's little fan just screaming trying to eat the passenger's seat. OG battery, it will last more than 30 minutes like that still!
Anything consumer grade? Pure trash.
Replace old battery or it might explode
I was proud of my first ever Ryzen 6c/12 thread cpu, it looked awesome in task manager :) I've never seen that many. 192 threads .. wow. Just wow.
6:20 - "And we're at about 32MB of memory total". It's an old guy thing, Dave. I feel ya, bro. 🙂
Back from the day's we didn't even dare to dream about 1GB of memory :o
Hi Dave! Well, it seems you might want to work on a fun doom project. As others mentioned, other versions of doom that might work with a much more hardcore doom testing, I really hope you will have the time and will to provide us a deeper dive of a multiple doom runs. Thank you for everything comes into your mind and ends up on our 'tele vision'! :)
Wow what an insane chip! Thanks for testing both fun and serious workloads!
13:30 I've been a Linux user since 1995 and can remember when a kernel compile took about 24 hours. IIRC that was on a 386SX-16 with 1MB of RAM. My how times have changed!
I see it still took 24 minutes of CPU time in those 26 seconds. So if it was single core, it would still take 24 minutes, but that's not bad either!
Obviously it can. Each thread is 100x (?) faster than the 386 / 486 that ran original doom.
Dave is too smart to even assume it would have a problem.
Running it with a software 3d accelerator with software raytracing would have been quite interesting.
6:20 "32 Megabytes" 🤣 Glad to see I'm not the only one that does that while reminiscing the golden days.
Kernel build
2:30 there are multiple great source ports which run perfectly on windows 11 eg. chocolate doom, woof! or gzdoom
Hmm, instead of running doom on this pc... here's a new challenge.
Write a custom version of Conway's Game of Life and within Conway's Game of Life build an actual Intel 386-486 CPU that can run Dos 6.0. Then have it install and run Doom within the CPU/PC that is within Game of Life while trying to have it render at about 30-60 frames per second. Don't forget about the audio too... If it can do that, then it's a beast!
Do you remember that video where we played movies in Task Manager? It’s time.
Your system for running and automating the Doom benchmark was super impressive! I wonder if it would've been possible to use instances of DosBox running Doom instead, and how that would affect the benchmark? I might have to try it out myself sometime
As in multiple vm's ? The RAM would have been swamped --> I have 128GB of ram in my workstation; running too many vm's concurrently is it's Achilles heal 🧐🤔🧐
@@chionyenkwu2253 128 gigs is plenty for some MS-DOS VMs. I can't imagine there's over 64 megs of overhead in each DOSBox instance so that gives you 2048 copies if I'm not mistaken.
As always, superb .. good show !!
Reminds me of the demo for one of the early commercial Atari 800 emulators on windows. They basically selected a folder with hundreds of floppy images, and launched the emulator for every disk image and had them all tiled.
A native port of the open source version of doom would probably be better than steaming it via a memory hungry browser. Or even 1000 copies of dosbox each running a copy.
This is the computer Gentoo users need to have to make software install feel like the other distributions.
Too niche.
This might seem crazy, but there's actually speedrunners doing a lot of simulations of Doom, to see what is possible in-game, so this use-case makes perfect sense ;)
ZeroMaster's infamous hundred-percent-kills tool-assisted run of the Nuts Doom game map springs to mind -- at one point he was down to one frame every few seconds in real-time because there were so many sprites on the screen. As the MadBoy himself said in the uploaded video's comments -- "This can be done non-TAS (tool-assisted), you just need an actual super computer optimized to play doom and about 6 hours."
Come on, you missed a great opportunity here Dave! Spawning 100 Doom's, then, via Python, make it so that each window of doom actually represents that part of the screen, tie it all together, so that your keyboard input gets piped into all 100 instances, and see how that looks :). But perhaps, the cropping of each window is not possible, but it would be cool. Also, the input to all the 100 instances probably wouldn't be synced, but none the less, it would be cool
The last time I did something like this was when I got an original Amiga (used in 1987 upon graduating from HS) and wanted to just see its pre-emptive multitasking feature in action (having only owned a Commodore 64 up to that point in time). I launched as many copies of the analog clock that I could fit visibly on the screen to watch them slow down their update cycles to multiple second intervals as the 8MHz 68000 began to struggle to meet realtime performance.
The extrapolation I am implying by sharing this anecdotal experience is that getting a Threadripper for Dave is just as exciting as a teenager getting a pre-emptive multitasking OS to use at home in the late 80s. LOL
A machine like this is perfect for doing scientific work, such as MPAS and WRF numerical weather prediction models - they are scalable from supercomputers down to a desktop, via OpenMP as an example.
Nice one Dave; very techi-licious vid 😉🧐🧐😉
00:14 🎮 The Threadripper Pro 7995WX with 96 cores and 192 threads can run 100 copies of Doom simultaneously on the HP Z6 g5a workstation, handling it with ease.
02:28 🛠 Setting up the server side involved using Ubuntu under Hyper-V to run the Doom server, spawning multiple browser windows pointing to it.
05:01 🕹 The Threadripper 7995WX managed 200 copies of Doom well, running at reasonable frame rates with 10% CPU usage and around 20% GPU.
06:36 🚫 Pushing it to 500 copies, the system struggled, experiencing issues with graphics rendering and potentially running out of system resources.
07:48 📊 The benchmarking involved Geekbench 6, Cinebench R23/R24, showcasing the 7995WX's impressive single-core and multi-core performance, surpassing previous processors in tests.
11:04 🌡 Under max load, the CPU maintained temperatures around 72-80°C, handling a 350W power draw while maintaining clock speeds of 3,000-3,200 MHz.
12:38 ⚙ Compiling the Linux kernel on the 7995WX demonstrated exceptional speed, completing the task in 28 seconds, more than twice as fast as the 3970X.
13:48 💻 For serious professional work, the RTX A4000 GPU proved capable, sitting between an RTX 3070 and 3080 in performance, ideal for tasks like video editing.
Loser.
Fun times as always... Another retro software on modern hardware idea that's been percolating in my head --- and I suspect you might be one of the best equipped to pull it off if it's actually possible... Windows 95 only needs 4MB of RAM to run, which is well below the L3 cache numbers for just about every processor out there. I've been wondering what it'd take to coax it to run on a system with no installed RAM ;)
I'd start by considering what it would take to pull the ram sticks after boot.i doubt you can get it booted without them.
@@Azeazezar Do not think the motherboard chipset will run without RAM, though you can just take the RAM and update the ID chip to have it report that you only have 16M of RAM on the chip, which would allow the PC to boot, though you likely would have to write a custom BIOS, basically making an old 1990's era BIOS run on a modern multicore processor, which would mean having to add in the core operating logic, to set up all the registers on the north and south bridge, to set them up like they need, then start to run the old bios as a direct task on it, and boot from there. Would need an old PCI graphics card, and also a PCIE to PCI adaptor as well, or even ISA and PCIE to ISA adaptor, to allow the VGA routines on the video card ROM to run. After all modern graphibns cards no longer likely support the older VESA standards, though it will be fast.
Did think, when Pentium processors got past the 1GHz mark, that you could actually put a simple program into a ROM, and boot off it, to use that fast bus speed to directly synthsise FM radio carriers, and also the audio modulation, as a dual core could run fast enough to have one core do the DSP to read in audio, and the other core do DDS to generate the RF waveform direct, modulation and all. Just a simple low pass filter on the output needed. 300MHz processors could easily do AM radio as well, with a simple resistor DAC and a latch, with the enable being simply a write to any high order address line. After all, with 32 bits, and only needing 12 for the ROM, you can be very wasteful with address decoding if you need speed, and really only have 4 actual peripherals to interface.
It depends on what you mean by no installed RAM. Intel Xeon Max CPUs have 64 gigs of HBM and will boot no fuss without any DIMMs installed. If you install DIMMs the HBM can be used as an extra layer of cache.
When I started programming it took 17 seconds to redraw the screen when I moved a textbox 1 pixel to the right.
I'm convinced that even using software rendering and a good lean engine you could probably run thousands of copies of Doom. I'd LOVE to see how far you could push it.
In its original all-software form DOOM runs great on a 100MHz CPU. If you have almost 100 cores, each at 2.5GHz that comes to 2500 instances. When you consider logical cores and IPC improvements I'd expect the equivalent to actually be more like 20k copies.
There's still a lot of other I/O going on too. Doom itself will run on 486/66 (or you could even slum it on a 33 at lower frame rates), but then you have all of the drawing to VGA etc.@@eDoc2020
The difference between men and boys is the price of their toys. And what a nice toy. The compile test was enlightening as it shows off the new system in a real world usage case. What happens with the old system?
What lens are you using?
That DOF is crazy shallow.. Very cinematic. 🎥
Nice bit of performance there! That seems designed for doing ML type stuff. Bet it’s an awesome bit of kit as a daily driver.
Duke Nukem 3d is a similar game, and there is a reengineered version that runs on modern Windows directly. Edit: it also includes map editor, and given that behavior of in-game characters is also editable, it makes it possible to create demos that run automatically
100 copies of Crysis might melt a hole into the desk...
Doom ran like a dream on my 486DX33Mhz with 4mb of ram! I think I had a Trident 1mb SVGA video card too. So I am sure this could smash it.
Amazing that you can make a video with this machine when I can't even order one from HP. Can't imagine what it cost since all I can find on the interwebs is the probable cost of a 7995WX by itself is about 10 grand.
I love the scientific test at the start with DOOM, good video! ^_^
Please do a follow up running as many copies of doom as you can... but natively.
Very neat. Maybe someday an audio performance test will be run on this chip as well with something similar to Dawbench. That would require a decent audio interface as well, since measurements are done at different buffer sizes (2^x samples usually starting at 32 samples). If you have some time for it, great. If not, oh well. Someone will try it I hope.
"it can run 200 instances no problem
it can't run 500 instances" /end of test
if only all benchmarks were this rigourous and precise.
That's one of a Doom machine.
Crazy. Great for cheaper inference on big LLM’s.
I wonder if a triple-slot RTX 4090 would fit in the HP case. The power supply connectors and max wattage on the graphics rail would also be a consideration.
I'm pretty sure you can not. All those OEM workstations are normally following the official PCIe specs. Same with my workstation (Dell 7865).
The fastest graphics card I could fit was a RTX 4070 FE. And that is very, very tight. Everything else is way too big.
There is a pretty small OEM 4090 from Dell that might fit. But you cannot buy it separately - but only within a Alienware gamer PC as far as I know.
I marvel at a system like this and have absolutely no idea what I would use it for that would actually make good use of it.
If you have to ask then TR isn’t for you. /s
The rest of us use it for compiling, multithreaded programming, rendering, simulation, etc.
thats some really nice lightning man
I’m impressed with how quickly it compiled, when my poor old HP c8000 takes 24 hours to compile a Linux kernel. Goes to show how much more powerful things are 20 years later. My c8000 plays doom so I guess we should setup a ’death match’ at some point.
24 hours? IIRC my Celeron 466 took around 45 minutes. It's probably a matter of version differences and/or config options.
Gotta see some POV-Ray tests on that beast
damn that compile time for the linux kernel hardly gives you time for a slurp of tea....
PBRoom, Project Boom, is the closest to original port regarding compatibility which can use software rendering and can run in a window.
This guy writes clean + readable code. It's like he had a coding background or something. :-)
And that's my python... I don't even know python :-)
we are very quickly approaching the limits of silicon cpu/gpu
I enjoyed seeing some Michelin Pilot Sport Cup tires at 00:02:05! What do they go to...I wonder...
It might be intersting to see how many VMs could be run efficiently at once, ala clusterDoom™
be interesting to see something like 3d mark 2000 ran on a virtual machine on this setup
Wow, how ironic. Just a couple nights ago, I grabbed a Delphi implementation of DOOM. And hit a brick wall when it came to unicode support, or the lack thereof. I was literally looking to include it as a random part of my game collection.
Why ironic?
Dave please do a video on John Carmack! You guys were contemporaries dealing with some of the same issues from different sides I'm curious your thoughts.
Would be great to see benchmarks per CCD and and also the ability to test groups of CCDs, possibly running some workloads in a checkered board pattern with much higher clocks could be interesting, though I suspect AMD already found the sweetspot
The cinebench score seems quite low compared to other reviewers I think that thing needs some more cooling dave also let's have a look at the bios on that badboy
Yeah there's some HP OEM gimping happening here for sure
I remember the days with ntstress and adding a few more instances of vmstress and couple of instances of wolf3d... those days...
This is some mad scientist stuff. Crazy CPU
Since a 16Mhz or 20MHz 386 kind of struggled to run Doom (one had to reduce the size of the rendered window within the game to get any sort of reasonable performance out of it) and it ran pretty smoothly on a 33MHz 486, this benchmark is actually a fairly entertaining way to determine how much more powerful a modern system is relative to an early to mid 90s higher end desktop system. Shall we call it DoomBench which measures performance in DoomMarks? :P