This is one of my favorite channels on UA-cam, bar none. When I was young, I never really considered that the games I played were made by real people. It's awesome to see one of the creators actually want to talk about some of what went in to these projects. I never thought I'd get to see essentially "bonus features" behind the scenes to some of my favorite games. Now if only I can find someone from Whoopee Camp to talk about Tomba....
These videos also explain why Nintendo keep falling behind with their console compared to Sony and Microsoft and why SEGA fell out of the market. When the hardware is limiting the studios that make games tend to avoid those platforms because they spend just as much time wrestling with the hardware as they do making the game itself. From what I heard, the Dreamcast and the Saturn were both very limiting and needlessly complicated compared to the other consoles and only hastened SEGA's departure from the console war.
Incidentally, this is similar to how Knuckles' Chaotix implements its particle system: • Lookup table with pre-calculated random starting states; • Master particle object stores random starting offset on the lookup table, current phase of the particle system and lifespam; • Each particle child of this master particle object is assigned an index number based on its creation order; • Position of the particles are adjusted per frame based on their index number and current phase. Like in the following pseudo-code: position_x = particle_obj_x + lut[(offset+index)*2] * phase; position_y = particle_obj_y + lut[(offset+index)*2+1] * phase; • Particles are just pixels which are blitted directly to the 32X display in the appropriate scanline; • The color for this pixel is looked up directly on the appropriate 32X's VRAM pattern for this particle object, based on this particle's index number. The result is a somewhat efficient 2D particle system for a 1994 console.
I remember transformers... That was your most impressive game IMO. So many explosions and particles effects happening in a large world with many, many enemies on screen
As a student learning computer graphics, It amazes me how little people currently care about optimisation when it comes to there code, but all it takes is a little out the box thinking to achieve much better results. Great video!
The problem is that optimisation often sacrifices code clarity, which is usually much more important than raw performance. After all, developer time is expensive, computing power is not. (Which is not to say that you should never optimize. It *is* important to be aware of potential bottlenecks and address them as soon as they become relevant.)
It can be understood with enough dedication, but the hour or so that it takes, as well as any possible mistakes made because the new coder didn't fully understand how it worked sometimes aren't worth the few extra frames. The fun part of programming, though, like Ras mentioned, is that you're always stuck between a rock and a hard place with catch 22s like this.
MyThirdLeg35 "well documented" *looks at own code, sees it's nothing more that spaghetti and commented out trial-and-error* yes. If only that was an actual thing. My game code is currently held together by stubborness and ill-placed hope.
Honestly, commenting is a skill on it's own. I"m not too bad a programmer, but I got a lot of marks off on my commenting when I took my first actual C# class.
Indeed. Even my stupid 2d engines are full of lookup tables... Or at least they were 20 years ago - you simply couldn't justify calculating something like sin or cos directly... So even though I was using pre-rotated sprite graphics... I still had to calculate trigonometric functions anyway to determine the rotations, and... Lookup tables saved the day. That's a really simple example, but there's so many others. Thinking to the 8 bit home computer sitting on the desk next to me, I realised even with it's measly 1.8 mhz CPU I can probably implement a raycasting engine on it. all well and good but it has 4 colour graphics in it's primary modes - that's 2 bits per pixel. What's wrong with that? Erm... Well, Computers don't like dealing with things that aren't multiples of whole bytes... We'd lose a lot of performance trying to draw individual pixels, so again, lookup table to cover the possible combinations... Lookup tables for trigonometry. Lookups to speed up multiply/divide (since we lack any hardware multiply/divide), lookup tables for lots of things... Trade memory for processing time. That's what it comes down to...
Still applies in embedded systems. ARM Cortex M for example doesn't excel at floating point math either (at least at the lower end), so sine tables (etc.) are still quite popular.
randomguy8196 also need to consider what fits into various parts of memory; lookup tables might suddenly become much slower when they no longer fit in cache and need fetching from RAM.
See, that's the kind of stuff I never would've thought of on it's own, the to calculate it from scratch based on age. I may have had the idea by working with it a lot, after a while, but I never would've thought of it just on it's own.
You gradually learn to spot these kinds of solutions. Remember: "The fastest code is the one which doesn't run". I.e. when you don't have the computational power to do something, don't. Yeah, it's simple to say and hard in practise :) . You'll notice that they solved their problem by restricting their features to what was possible to express simply as a function of time and what could fit as a lookup table on the VPU. Also note that you have to process particles in batches of 32 and if you have too many different kinds of particle systems with different behaviours, each with less than 32 particles, you won't be able to saturate the VPU's throughput. Often you have to do these compromises - to achieve the best possible result, restrict your features to only what is fast.
well, in this case a lot of it is down to knowing the quirks of the hardware... For instance, recalculating most of the properties of particle every single time intuitively seems like a bad idea. But, in the end, the old saying applies - 'only as strong as your weakest link'. What I can infer from this solution is that the Vector units had processing power to spare, but that the bandwidth between the CPU and VPU is limited. Thus, indeed, when you realise that's the bottleneck, it becomes apparent that it's your bandwidth that you need to optimise, not your amount of VPU calculations... Then again, if you take your bandwidth optimisations too far you may create another bottleneck... Optimisation is a weird game at times...
Mr. Grongy My favourite Windows 98-NT era screensaver was called Particle Fire. I have no idea how complex its internals were, but it could do several hundred particles with physics and color changes with nearly instant startup time. Miss having it...
So you basically used a trade off. You didn't have enough memory to store everything in the VPU, and you didn't have enough speed to continuously send it data from the EE - and you solved it by making the VPU do ten times more work to calculate all the needed info dynamically every frame. But that was okay because the VPU was designed to be able to do that math very very fast, so doing this was still much faster than using the EE (plus it saved up the EE to do everything else). Nice.
Knowing your hardware is always a bonus, compute shaders are a thing which lets you use the GPU for normal work. The GPU being optimised for a particular kind of work.
@@circuit10 Indeed. The equation to advance the position of a particle one frame is the same as the one used to calculate its position after 100 or 1000 frames, all that changes is the deltaT value.
i think rather than just trading off it was a matter of transforming the calculation from a frame by frame basis into a deterministic formula (or function if you will) of time. the point isnt just that the VPU was fast at maths. every dev used them for these kinds of calculations. the trick is structuring the data in a way that is both easily ingestible by the vpu and doesnt necessitate it to keep track of the current state of particles. the vpu probably didnt end up making that many more calculations than it would doing this on a per frame basis with a more conventional method. they were removing randomness out of the system by seeding it once and then providing lookup tables (trading some memory for computing speed).
So simply put, you did your best to use a functional, as opposed to a imperative approach for particle attributes, as well as some clever grouping. Very neat.
Love how all the more complicated stuff, like colour, transparency, jitter, etc, was all just pre-calculated. Kinda like baking lightmaps in a game to prevent their slowing down the engine. Very cool setup.
Not only is this guy amazing at the coding itself and coming up with ideas, he's also rediculously good at explaining it. This is the sort of person who should be a teacher, not only enjoying his work, but also skilled.
coming from an age where cycle counting and scanline counting were a thing I love these videos. Modern methods just seem to be lets build a layer on a layer and the user can buy better/newer/faster hardware.
So you basically went from an "updating function" to a "solution function". Basically trading a bit of space on the disk for faster computation. It is simply genius. It's too bad that computers are so powerful and fast nowadays that we are not even trying to improve the efficiency of our code. Love your videos, they are really insightful.
Videos like this make me appreciate just how technical engine programmers need to be and why so many studios these days choose to use pre-existing graphics engines like Unreal and Unity rather than doing all this from scratch. Not to mention that doing everything themselves would also mean developing the tools and therefore waste even more budget on just creating the tools needed to create the assets needed to create the game.
Batch, batch, batch! I think I might put this on the graphics playlist for my next WebGL class. I may even try coding something inspired by it to see if it's good fodder for an optimization & performance testing lab.
The thing about a particle system like this was that it was specifically designed with the PS2’s hardware in mind: the PS2 had fast particle processing, but low memory to store said particles which just so happens to perfectly cater to this kind of system as it frees up the CPU to do other things.
What I love about your channel Jon is the level of complexity and fluid thinking behind each solution - At the time when these games were released, we all thought hey wow this looks great, then, that moment fades with it the chance to appreciate the deep complexities behind such wizardry - because that was all behind locked doors! Unless you were on the dev team in 1997 and were on speaking terms with the geniuses behind all this - you'd never know, or be able to find out. Your insight into the tasks at hand for that given era of gaming is an absolutely stunning look into a world of which, most of us never knew.
This was probably the best Coding Secrets yet. I really loved the deep dive into how the data was passed around and the various optimizations. Bravo!!! I hope more videos like this are coming out!!
I'm have no programming knowledge other than editing bits of code with a walk-through for modding, but i really enjoy the insight on the making of my childhood games! Thank you! Well made videos :)
As a nostalgic and programmer, unveiling such implementation detail only makes me appreciate them more! Please keep up sharing! Your career is so exciting to me!
Ugh... this is AMAZING. Thank you so much for sharing. I can only imagine how excited you guys were to get the performance you were getting when it all came together...
If only there were a video this in-depth about the development of Fantavision. As always, your content is top-tier, fascinating and explained in a way that even a plebeian like me can understand more than half of what you're saying.
Its always amazing to see developers that have pushed the hardware to its limits or done things people said couldn't be done. The original Crash on the PS1 is a great example of that as well with the way it pushed the hardware to its limits (I believe Sony wasn't exactly happy with the way Naughty Dog bypassed the Sony libraries and accessed the hardware directly but by the time they found out about it, the game was very far along and Sony liked the game so much they allowed Naughty Dog to keep doing what they were doing)
With the exception of separate hardware that's very similar to how I did my game items and collision for Andriod. It runs super fast and the size of it doesn't effect performance at all. The particles I created are ridiculously lightweight and made it possible to run on an iphone browser with webgl. I still can't thank the guy enough who put me on the right path for it.
There was a talk recently at unite Berlin 2018 where a dev rediscovered this "stateless" particles effect for mobile phone with open gl 2, old is new again
These coding secrets videos are pretty interesting. Although I'm not a coding guy I like to know how the games work. It's like taking a peep in the kitchen of your favourite restaurant. Thank you Jon.
This is exactly how I wanted to calculate the position of a planet around a binary star system in C++ for a project I was doing in school, but I couldn't integrate the damn acceleration differential and had to end up triple feeding values back from acceleration to velocity to position, and the calculations absolutely ate my CPU apart for hours. It's cool to see the approach I tried to use actually used successfully in game design. ;D
Your videos are just excellent. I get such satisfaction from watching them. I don't know if you have seen Pannenkoek's videos, but it does the same thing. Really getting under the hood of the systems that run great games.
Or as Wane said "We're not worthy", not then when we played the games nor now when we get them explained. Cool stuff! As a old 68k geek I miss the hands on, down and gritty coding, where every scanline was accounted for. And 50 vs 60hz made a huge difference.
Would it be possible for you to explain why the Ps2 had trouble with 480p content? I'ev heard several things over the years, such as it needing to use up the gpu, but yet God Of War was 480p and could have passed as early Wii game. Other times the Ps2 would go beyond 480p, Sonic Mega Collection Plus is apparently 525p, acourding to the start screen and Gran Turismo 4 is 1080i, it just doesn't make sense to me why it was a problem. The Psp technically had 480p for all it's games, if you had a 2000,3000 model with component cables, and it's weaker but had similar visuals, a good example again being the 2 God of War games for the Psp.
Actually the PSP was more like 272p since its resolution was 480x272, which is kind of an odd number now that I think about it. I suspect Sony was trying to go for a cinematic aspect ratio since they were trying to sell movies on UMD discs for a short while.
The PS2 GPU didn't have enough VRAM to do a 480px tall "screen", not if you use it as a traditional rasterizer, where textures and two framebuffers (one screen for display, one screen you are drawing) are all held in VRAM. Well, it did have enough, but then you have very very little space for textures. Early games used traditional code to store everything in VRAM, but in order to have good quality textures, they had to reduce the screen size. Later games figured out that you can actually stream textures very very fast into VRAM. It involved a lot of micromanagement, but you could pretty much texture from system RAM by sending a texture to VRAM, have it drawn, deleting it from VRAM, and starting over. This was because the video memory was some 2048bit RDRAM that had speeds rivalling that of next gen consoles.So texture micromanagement, and efficient use of textures by using 4-bit palettes, saved up a lot of VRAM space that could be then used to make 480p screens. Some games, while keeping 480p output, used less horizontal pixels too, like Odin Sphere using 512x480 from the top of my head. The 1080i in Gran Turismo was of course 540 lines in every frame, so it wouldn't be much different from that either. This took some time to be figured out, hence why it was uncommon for most early titles. I believe this is how it worked but I might be wrong about specifics.
Ram huh... That makes sense. One of the reasons you can't do fullscreen effects (aside from the CPU just being too slow) on systems like the Mega Drive and SNES is that you can't manage double buffering. Even taking into account 4 bit per pixel graphics, 320x224 = ~35k You only have 64k total, so already you can't fit two whole frames into VRAM, but then given how the systems worked you'd lose 2+k minimum to a tilemap, even though you're not making any use of the features a tilemap provides. Similarly, why the n64 got all those 'high resolution' mode things with it's RAM expansion. Unified memory architecture, sure. But the system typically renders 24 bit images... By default it has 4.5 megabytes of RAM (that sounds weird but it's 9 bit ECC RAM with the 9th bit hacked to provide extra RAM instead of error correction.), that .5 megabytes is actually only usable by the Z-buffer... But let's think about what that means; If you were to output an interlaced PAL image of 640x576 which is about the highest supported resolution... Yes, that's interlaced, but you'd still need full double buffering of both fields unless you can render at the field rate. (50 hz for PAL). If you can render at the field rate you could get away with 288 line buffers, but otherwise you need the full 576 lines... So... Double buffered, 640x576 = 1.1 megabyte x 2 = 2.2 megabytes. The Z-buffer, if used is another 370k Ouch. And we have 4 total - and although ROM is fast, we still can't read from it directly, so that 4.5 megabytes has to contain all the active audio, textures, game code, variables, etc. For the 4.5 megabyte unexpanded system that's about 2.6 of 4.5 megabytes used up just on the framebuffers. More than half of all RAM. On the 9 megabyte upgraded system, it's just over 25% And that certainly explains a lot... And the PS2? Well, obviously it has a lot more RAM than an n64... Except... It's not a Unified memory architecture... And the GPU only has 4 megabytes of VRAM.... Which is about the same as what the n64 had in total, but the PS2 has to deal with vastly more texture data and other bits and pieces, (thankfully it doesn't have to share that 4 megabytes with anything else, but still...) It's surprising sometimes how quickly just a basic framebuffer can chew up your VRAM... I remember dealing with early SVGA cards that had 1 megabyte or sometimes even only 512k of VRAM... Sure, you could do 1024x768... ... In theory... In practice, there just isn't enough VRAM for that kind of framebuffer. Especially in high or true colour modes.
That kinda explains why N64 games always used such poor (and very obviously tiled) textures. Not enough memory to go nuts on the texture detail when half your RAM is used up just for the frame buffer.
You sir... Are awesome. Best explanation. And the best algorithm for efficient drawing lots of data. Even surpassed professional companies their AAA-titles. And I think this technique can be useful nowadays, as well. Bravo
Looooved this video! Sounds like this system works kinda like how we use shaders now, trying to do as much of the graphics calculations on the GPU as possible, while sending it the least amount of data. Very clever use of hardware :)
Yesss, I was waiting for this! Thanks for explaining this! I hope you explain the crepuscular window rays too at some point, even though I expect that's much simpler.
That just goes to show how much it pays to know your hardware. I don't know the slightest thing about the PS2 hardware, so I had no real point of reference for whether any of this made sense or not. (especially since it seemed to rely on knowing how the vector processors in the PS2 worked, and how they interacted with the CPU), but it does seem well thought out, assuming the hardware limitations involved... With the Mega Drive stuff I had a better sense of what was going on, since I actually know something about the hardware architecture. (not as well as I know the SNES, but still...) For later systems I know a decent amount about the n64, and a handful of architectural things about the Gamecube (I guess you can spot a trend here. ;p), though not in all that much detail. (I do know the TEV is a strange beast compared to PC hardware, that's for sure.), of course, knowing the Gamecube also implies knowing the Wii, since it's very similar, but that's neither here nor there... I know the general principles of 3d rendering and particle systems though, so that helps quite a bit in making sense of what you did here... Still, calculating the particles from scratch based on their age is a counter-intuitive approach that I likely wouldn't have thought of (or at least, not initially), since in my previous coding experience recalculating the same thing over and over is not usually thought of as being good for performance... Again though, it shows how critical it is to know your hardware - it makes sense if there's a bandwidth issue between the CPU and Vector units that at some point the performance loss from recalculating things over and over ends up being less important than the performance loss for constant bus transfers... Ah, so much to think about here... XD
Personally, on PS2, I managed to display only 120 000/130 000 triangles per frame (they are textured 3D models), of course I can probably still optimize. For the VU, you said we could not send all the data to the VU, but that's true for everything, even for 3D models, the goal being that the VU's memory serves as a memory cache. DMA transfers do not block, so you can send your data to the VU and do your calculations at the same time (CPU or VU). The biggest optimization is the one where you have to do parallelism (don't wait for the data), the other is to optimize the assembly code so that there is no pipeline stall PS: Sorry ,if my english is bad :)
The math formula at 6:00 suggests to me why physics are tied to frame rates. I've noticed a lot of physics formulas require a time delta, and frames are a convenient source of it in games I mean, I can't calculate position from velocity if I don't know how long that something was doing it.
Nice! I did something similar for a PyWeek entry. The thing there is (most) everything has to be coded in python. I say almost, because we can use OpenGL shaders. So for particles, I did all computations of position, rotation, scaling, color, etc within a vertex shader. The only thing the CPU had to do was load the data for the particular particle system, prepare a buffer of triangles and call the shader only passing it the age of the system. (In case you're curious, the entry was called "not your data")
I love these videos! Would love to hear more of the practical implementation of these systems; especially with the Crash/Madagascar PS2 examples. The explanation of the systems is clear, but how are you able to design these systems to create parallax effects (God rays from stained glass) or other effects in game? Also hearing about how the design team looks to balance the hardware and software limitations to create a great looking frame/render while at the same time balancing the other logic behind the scenes.
Hi, very interesting and great use of the hardware. How do you deal with porting or cross-platform development if you use such target-specific approaches? would be great to get some insides how this is done. I guess nowadays most developers use separate engines which make the development hardware independent (unity, unreal engine,…), but was this always the case?
He's said in the past that each generation he basically had one system he used/worked with and he just trusted the other developers in his team to make the other systems work. So he probably doesn't know much about what tricks were used for the other consoles at the time.
I worked on the first Xbox ports of that engine - it just means inserting the abstraction layer yourself, so that game can make the same calls on any platform and the engine handles them in the most efficient way for the hardware. We struggled to make the original Xbox match that particle performance...the PS2 could really shift triangles using Dave's microcode. And there were occasional hardware tricks like setting negative fog values on PS2 to create an X-ray effect - but other platforms would clamp the fog to zero. Of course, the Xbox made up for it with these new-fangled "shaders"... :)
Chris Payne hi Chris. Thanks vor your reply and the insides. So it’s similar like modern Game Development, except that the ‘engine’ or abstraction layer would be developed separately and not be taken out of a box. Sounds quite challenging, but nice to get some insides how it was done back in the days.
This is one of my favorite channels on UA-cam, bar none.
When I was young, I never really considered that the games I played were made by real people. It's awesome to see one of the creators actually want to talk about some of what went in to these projects.
I never thought I'd get to see essentially "bonus features" behind the scenes to some of my favorite games. Now if only I can find someone from Whoopee Camp to talk about Tomba....
This appears to be their official twitter page (and the only place someone might still be able to reach them, I think) - twitter.com/WhoopeeCamp
'One of your favourites bar none' is probably best described as your 'favourite'. 😉
These videos also explain why Nintendo keep falling behind with their console compared to Sony and Microsoft and why SEGA fell out of the market. When the hardware is limiting the studios that make games tend to avoid those platforms because they spend just as much time wrestling with the hardware as they do making the game itself. From what I heard, the Dreamcast and the Saturn were both very limiting and needlessly complicated compared to the other consoles and only hastened SEGA's departure from the console war.
@@KryyssTV Just saturn. Dreamcast was really easy. In fact so easy that was the fastest console on being hacked lol
Rodrigo Vázquez some people still make games for the Dreamcast
I wonder if you've ever sat down and had a beer with any Naughty Dog employees, swapping war stories.
I've had lunch with Neil Druckmann. He was cool...
Friend, you are in for a treat. Google Andy Gavin Crash Bandicoot and set aside a few evenings to read through it. Amazing stuff.
@@GameHut wow what an honor!!
I come from the future and let me tell you, people don't think he's that cool anymore
@@GameHut Wow... thats amazing... you should take a look at what hes doing today.... hes kinda gone off the deep end
Incidentally, this is similar to how Knuckles' Chaotix implements its particle system:
• Lookup table with pre-calculated random starting states;
• Master particle object stores random starting offset on the lookup table, current phase of the particle system and lifespam;
• Each particle child of this master particle object is assigned an index number based on its creation order;
• Position of the particles are adjusted per frame based on their index number and current phase. Like in the following pseudo-code:
position_x = particle_obj_x + lut[(offset+index)*2] * phase;
position_y = particle_obj_y + lut[(offset+index)*2+1] * phase;
• Particles are just pixels which are blitted directly to the 32X display in the appropriate scanline;
• The color for this pixel is looked up directly on the appropriate 32X's VRAM pattern for this particle object, based on this particle's index number.
The result is a somewhat efficient 2D particle system for a 1994 console.
That's... So smart. I find it amazing just how innovative you had to be back then. Incredible solution!
slowpoke now imagine what'd be possible with this kind of creativity and modern hardware...
Ikey Ilex the problem is, nowadays with the hardware we have there's no need for this level of ingenuity, so its rarely seen
TheStiepen I don't think you give modern game programmers enough credit. they put a lot of effort into optimization still.
Ikey Ilex Optimization's always nice, but now that it's not needed as badly sometimes it's just better to add more stuff.
TheStiepen it's propably this kind of creativity that made DOOM on switch possible
Measuring performance by the scanline.. now THIS is podracing!
S e b u l b a
On c64 we would display the scan line count of a function by changing the border colour.
this is what we wanted from the previous video :P
Exactly! I even had to downvote that because I was disappointed
Kim you chose to. you most certainly didn't have to
I remember transformers... That was your most impressive game IMO. So many explosions and particles effects happening in a large world with many, many enemies on screen
As a student learning computer graphics, It amazes me how little people currently care about optimisation when it comes to there code, but all it takes is a little out the box thinking to achieve much better results. Great video!
The problem is that optimisation often sacrifices code clarity, which is usually much more important than raw performance. After all, developer time is expensive, computing power is not.
(Which is not to say that you should never optimize. It *is* important to be aware of potential bottlenecks and address them as soon as they become relevant.)
Unclear code can quite easily be understood as long as it is well documented
It can be understood with enough dedication, but the hour or so that it takes, as well as any possible mistakes made because the new coder didn't fully understand how it worked sometimes aren't worth the few extra frames.
The fun part of programming, though, like Ras mentioned, is that you're always stuck between a rock and a hard place with catch 22s like this.
MyThirdLeg35
"well documented"
*looks at own code, sees it's nothing more that spaghetti and commented out trial-and-error*
yes. If only that was an actual thing. My game code is currently held together by stubborness and ill-placed hope.
Honestly, commenting is a skill on it's own.
I"m not too bad a programmer, but I got a lot of marks off on my commenting when I took my first actual C# class.
I love seeing stuff like this! Not many people stop to think about all the ins and outs/behind-the-scenes of coding, it’s awesome to see it explained!
AriWolfPup and it makes you admire the game even more.
LUTs are a godsend. It made so many 3D games possible back in the day, Doom and Quake come to mind.
coffee115 That's so amazing.
Even used in the code. Aka jump table. Not the exact same thing but the principle is the same :)
Indeed. Even my stupid 2d engines are full of lookup tables...
Or at least they were 20 years ago - you simply couldn't justify calculating something like sin or cos directly... So even though I was using pre-rotated sprite graphics... I still had to calculate trigonometric functions anyway to determine the rotations, and... Lookup tables saved the day.
That's a really simple example, but there's so many others.
Thinking to the 8 bit home computer sitting on the desk next to me, I realised even with it's measly 1.8 mhz CPU I can probably implement a raycasting engine on it.
all well and good but it has 4 colour graphics in it's primary modes - that's 2 bits per pixel.
What's wrong with that? Erm... Well, Computers don't like dealing with things that aren't multiples of whole bytes...
We'd lose a lot of performance trying to draw individual pixels, so again, lookup table to cover the possible combinations... Lookup tables for trigonometry.
Lookups to speed up multiply/divide (since we lack any hardware multiply/divide), lookup tables for lots of things...
Trade memory for processing time. That's what it comes down to...
Still applies in embedded systems. ARM Cortex M for example doesn't excel at floating point math either (at least at the lower end), so sine tables (etc.) are still quite popular.
randomguy8196 also need to consider what fits into various parts of memory; lookup tables might suddenly become much slower when they no longer fit in cache and need fetching from RAM.
See, that's the kind of stuff I never would've thought of on it's own, the to calculate it from scratch based on age.
I may have had the idea by working with it a lot, after a while, but I never would've thought of it just on it's own.
Funny thing is I'm working with something very similar to this now :D
You gradually learn to spot these kinds of solutions. Remember: "The fastest code is the one which doesn't run". I.e. when you don't have the computational power to do something, don't. Yeah, it's simple to say and hard in practise :) . You'll notice that they solved their problem by restricting their features to what was possible to express simply as a function of time and what could fit as a lookup table on the VPU. Also note that you have to process particles in batches of 32 and if you have too many different kinds of particle systems with different behaviours, each with less than 32 particles, you won't be able to saturate the VPU's throughput. Often you have to do these compromises - to achieve the best possible result, restrict your features to only what is fast.
well, in this case a lot of it is down to knowing the quirks of the hardware...
For instance, recalculating most of the properties of particle every single time intuitively seems like a bad idea.
But, in the end, the old saying applies - 'only as strong as your weakest link'.
What I can infer from this solution is that the Vector units had processing power to spare, but that the bandwidth between the CPU and VPU is limited.
Thus, indeed, when you realise that's the bottleneck, it becomes apparent that it's your bandwidth that you need to optimise, not your amount of VPU calculations...
Then again, if you take your bandwidth optimisations too far you may create another bottleneck...
Optimisation is a weird game at times...
What you mean Zintom?
Modern GPUs have had a similar bandwidth restriction for several years now, so techniques like those in the video are at least still relevant.
It looks like a great screensaver.
Mr. Grongy My favourite Windows 98-NT era screensaver was called Particle Fire. I have no idea how complex its internals were, but it could do several hundred particles with physics and color changes with nearly instant startup time. Miss having it...
Aerin Ravage Oh? I never saw that!
So you basically used a trade off. You didn't have enough memory to store everything in the VPU, and you didn't have enough speed to continuously send it data from the EE - and you solved it by making the VPU do ten times more work to calculate all the needed info dynamically every frame. But that was okay because the VPU was designed to be able to do that math very very fast, so doing this was still much faster than using the EE (plus it saved up the EE to do everything else). Nice.
Knowing your hardware is always a bonus, compute shaders are a thing which lets you use the GPU for normal work. The GPU being optimised for a particular kind of work.
yeah, this is very useful for modern games, if you allocate more power to the GPU than to the CPU you can get way higher framerates
It doesn't really have to do any more maths though
@@circuit10 Indeed. The equation to advance the position of a particle one frame is the same as the one used to calculate its position after 100 or 1000 frames, all that changes is the deltaT value.
i think rather than just trading off it was a matter of transforming the calculation from a frame by frame basis into a deterministic formula (or function if you will) of time.
the point isnt just that the VPU was fast at maths. every dev used them for these kinds of calculations. the trick is structuring the data in a way that is both easily ingestible by the vpu and doesnt necessitate it to keep track of the current state of particles.
the vpu probably didnt end up making that many more calculations than it would doing this on a per frame basis with a more conventional method.
they were removing randomness out of the system by seeding it once and then providing lookup tables (trading some memory for computing speed).
So simply put, you did your best to use a functional, as opposed to a imperative approach for particle attributes, as well as some clever grouping. Very neat.
Yeah, this system wouldn't allow particle collision, but I guess they didn't need that.
collision with other particles maybe not, but in another comment he explained how they handled collision with other things.
yess. too many people here are like “oh, i see. so you let the fast vpus do more work. clever.”
nice concise way to sum it up
@@jc_dogen How?
He did Sega Saturn assembly programming! The PS2 was piece of cake for Travelers Tales!
PS2: I feel a force...
Travelers' Tales: Hello sweetheart, ready for games?
Saturn: *whimper*
Love how all the more complicated stuff, like colour, transparency, jitter, etc, was all just pre-calculated. Kinda like baking lightmaps in a game to prevent their slowing down the engine.
Very cool setup.
Reminds me of how we treat the vertex shader stage when it comes to skinning and mesh deformations
Not only is this guy amazing at the coding itself and coming up with ideas, he's also rediculously good at explaining it. This is the sort of person who should be a teacher, not only enjoying his work, but also skilled.
Crazy impressive. I loooove these videos. Not enough technical deep dives into gaming graphics for actual shipping products.
coming from an age where cycle counting and scanline counting were a thing I love these videos. Modern methods just seem to be lets build a layer on a layer and the user can buy better/newer/faster hardware.
I love it when developers figure out new ways to push the hardware farther than the designers had imagined.
So you basically went from an "updating function" to a "solution function". Basically trading a bit of space on the disk for faster computation.
It is simply genius. It's too bad that computers are so powerful and fast nowadays that we are not even trying to improve the efficiency of our code.
Love your videos, they are really insightful.
Videos like this make me appreciate just how technical engine programmers need to be and why so many studios these days choose to use pre-existing graphics engines like Unreal and Unity rather than doing all this from scratch. Not to mention that doing everything themselves would also mean developing the tools and therefore waste even more budget on just creating the tools needed to create the assets needed to create the game.
Batch, batch, batch! I think I might put this on the graphics playlist for my next WebGL class. I may even try coding something inspired by it to see if it's good fodder for an optimization & performance testing lab.
The thing about a particle system like this was that it was specifically designed with the PS2’s hardware in mind: the PS2 had fast particle processing, but low memory to store said particles which just so happens to perfectly cater to this kind of system as it frees up the CPU to do other things.
Wow thats amazing stuff right there. Using the hardware in ways that it wasnt designed to do to get the best results. Thats what it is all about.
Exactly what I was hoping for from the last video. Thanks!
This is one of the most amazing explanations. It's very eloquent and self-contained. Bravo.
What I love about your channel Jon is the level of complexity and fluid thinking behind each solution - At the time when these games were released, we all thought hey wow this looks great, then, that moment fades with it the chance to appreciate the deep complexities behind such wizardry - because that was all behind locked doors! Unless you were on the dev team in 1997 and were on speaking terms with the geniuses behind all this - you'd never know, or be able to find out.
Your insight into the tasks at hand for that given era of gaming is an absolutely stunning look into a world of which, most of us never knew.
This was probably the best Coding Secrets yet. I really loved the deep dive into how the data was passed around and the various optimizations. Bravo!!! I hope more videos like this are coming out!!
I know nothing about coding but I must say, what an ingenious and novel approach. Well done!
I'm have no programming knowledge other than editing bits of code with a walk-through for modding, but i really enjoy the insight on the making of my childhood games! Thank you! Well made videos :)
I'm amazed at how you make it seem so simple, while it is so clever
As a nostalgic and programmer, unveiling such implementation detail only makes me appreciate them more!
Please keep up sharing!
Your career is so exciting to me!
You know content is good when you hardly understand a third of the things said, and yet you watch every single minute of it.
GameHut you are awesome.
This just shows how much effort and dedication can go into something viewed as so small. You guys did some amazing work.
Ugh... this is AMAZING. Thank you so much for sharing. I can only imagine how excited you guys were to get the performance you were getting when it all came together...
Now this is the type of content I'd have liked to have seen from the previous particle vid. Great stuff.
I love that you are taking the time to make these videos. It's very interesting to see what goes on behind the game.
If only there were a video this in-depth about the development of Fantavision. As always, your content is top-tier, fascinating and explained in a way that even a plebeian like me can understand more than half of what you're saying.
Its always amazing to see developers that have pushed the hardware to its limits or done things people said couldn't be done.
The original Crash on the PS1 is a great example of that as well with the way it pushed the hardware to its limits (I believe Sony wasn't exactly happy with the way Naughty Dog bypassed the Sony libraries and accessed the hardware directly but by the time they found out about it, the game was very far along and Sony liked the game so much they allowed Naughty Dog to keep doing what they were doing)
With the exception of separate hardware that's very similar to how I did my game items and collision for Andriod. It runs super fast and the size of it doesn't effect performance at all. The particles I created are ridiculously lightweight and made it possible to run on an iphone browser with webgl. I still can't thank the guy enough who put me on the right path for it.
Guess Prime and Crash didn't feel so good.
This is the kind of channel that makes me want to do some programming outside my office hours.
This channel is a gold mine, thank you for being part of a lot our childhood
There was a talk recently at unite Berlin 2018 where a dev rediscovered this "stateless" particles effect for mobile phone with open gl 2, old is new again
That dev can't be that clever then. This system also comes with obvious limitations.
@@DasAntiNaziBroetchen that's a stupid comment though
I love this channel.
So many game programming gems explained AND shown in action!
These coding secrets videos are pretty interesting. Although I'm not a coding guy I like to know how the games work. It's like taking a peep in the kitchen of your favourite restaurant. Thank you Jon.
This is exactly how I wanted to calculate the position of a planet around a binary star system in C++ for a project I was doing in school, but I couldn't integrate the damn acceleration differential and had to end up triple feeding values back from acceleration to velocity to position, and the calculations absolutely ate my CPU apart for hours. It's cool to see the approach I tried to use actually used successfully in game design. ;D
One of my favourite vids so far.
You worked with some incredibly talented people.
Your videos are just excellent. I get such satisfaction from watching them. I don't know if you have seen Pannenkoek's videos, but it does the same thing. Really getting under the hood of the systems that run great games.
just explaining how it all works like that has helped me code a particle system, thanks!
These are easily the best videos of the ones you put out.
SDK: Use particles, but 100 of them make the product lose FPS. Also 50 MB footprint for that particle engine.
GameHut: Hold my beer.
Amazing how well explained this was. I excpected not understanding a word, but you made it look easy!
Or as Wane said "We're not worthy", not then when we played the games nor now when we get them explained. Cool stuff! As a old 68k geek I miss the hands on, down and gritty coding, where every scanline was accounted for. And 50 vs 60hz made a huge difference.
Hahahaha that Billy Mays meme though!
Thanks for the excellent insight, as usual
Would it be possible for you to explain why the Ps2 had trouble with 480p content? I'ev heard several things over the years, such as it needing to use up the gpu, but yet God Of War was 480p and could have passed as early Wii game. Other times the Ps2 would go beyond 480p, Sonic Mega Collection Plus is apparently 525p, acourding to the start screen and Gran Turismo 4 is 1080i, it just doesn't make sense to me why it was a problem. The Psp technically had 480p for all it's games, if you had a 2000,3000 model with component cables, and it's weaker but had similar visuals, a good example again being the 2 God of War games for the Psp.
Actually the PSP was more like 272p since its resolution was 480x272, which is kind of an odd number now that I think about it. I suspect Sony was trying to go for a cinematic aspect ratio since they were trying to sell movies on UMD discs for a short while.
The PS2 GPU didn't have enough VRAM to do a 480px tall "screen", not if you use it as a traditional rasterizer, where textures and two framebuffers (one screen for display, one screen you are drawing) are all held in VRAM. Well, it did have enough, but then you have very very little space for textures. Early games used traditional code to store everything in VRAM, but in order to have good quality textures, they had to reduce the screen size. Later games figured out that you can actually stream textures very very fast into VRAM. It involved a lot of micromanagement, but you could pretty much texture from system RAM by sending a texture to VRAM, have it drawn, deleting it from VRAM, and starting over. This was because the video memory was some 2048bit RDRAM that had speeds rivalling that of next gen consoles.So texture micromanagement, and efficient use of textures by using 4-bit palettes, saved up a lot of VRAM space that could be then used to make 480p screens. Some games, while keeping 480p output, used less horizontal pixels too, like Odin Sphere using 512x480 from the top of my head. The 1080i in Gran Turismo was of course 540 lines in every frame, so it wouldn't be much different from that either. This took some time to be figured out, hence why it was uncommon for most early titles. I believe this is how it worked but I might be wrong about specifics.
zyrobs I appreciate a lot your explanation.
Ram huh... That makes sense.
One of the reasons you can't do fullscreen effects (aside from the CPU just being too slow) on systems like the Mega Drive and SNES is that you can't manage double buffering.
Even taking into account 4 bit per pixel graphics, 320x224 = ~35k
You only have 64k total, so already you can't fit two whole frames into VRAM, but then given how the systems worked you'd lose 2+k minimum to a tilemap, even though you're not making any use of the features a tilemap provides.
Similarly, why the n64 got all those 'high resolution' mode things with it's RAM expansion.
Unified memory architecture, sure. But the system typically renders 24 bit images...
By default it has 4.5 megabytes of RAM (that sounds weird but it's 9 bit ECC RAM with the 9th bit hacked to provide extra RAM instead of error correction.), that .5 megabytes is actually only usable by the Z-buffer...
But let's think about what that means; If you were to output an interlaced PAL image of 640x576 which is about the highest supported resolution... Yes, that's interlaced, but you'd still need full double buffering of both fields unless you can render at the field rate. (50 hz for PAL).
If you can render at the field rate you could get away with 288 line buffers, but otherwise you need the full 576 lines...
So... Double buffered, 640x576 = 1.1 megabyte x 2 = 2.2 megabytes. The Z-buffer, if used is another 370k
Ouch. And we have 4 total - and although ROM is fast, we still can't read from it directly, so that 4.5 megabytes has to contain all the active audio, textures, game code, variables, etc.
For the 4.5 megabyte unexpanded system that's about 2.6 of 4.5 megabytes used up just on the framebuffers. More than half of all RAM.
On the 9 megabyte upgraded system, it's just over 25%
And that certainly explains a lot...
And the PS2? Well, obviously it has a lot more RAM than an n64...
Except... It's not a Unified memory architecture... And the GPU only has 4 megabytes of VRAM....
Which is about the same as what the n64 had in total, but the PS2 has to deal with vastly more texture data and other bits and pieces, (thankfully it doesn't have to share that 4 megabytes with anything else, but still...)
It's surprising sometimes how quickly just a basic framebuffer can chew up your VRAM...
I remember dealing with early SVGA cards that had 1 megabyte or sometimes even only 512k of VRAM...
Sure, you could do 1024x768...
... In theory...
In practice, there just isn't enough VRAM for that kind of framebuffer. Especially in high or true colour modes.
That kinda explains why N64 games always used such poor (and very obviously tiled) textures. Not enough memory to go nuts on the texture detail when half your RAM is used up just for the frame buffer.
Pretty simple and interesting concept, and still pretty usable today.
Thanks for the wonderful insight into how this was achieved, tricks with the hardware like this are always awesome to see!
You sir... Are awesome. Best explanation. And the best algorithm for efficient drawing lots of data. Even surpassed professional companies their AAA-titles. And I think this technique can be useful nowadays, as well.
Bravo
Looooved this video!
Sounds like this system works kinda like how we use shaders now, trying to do as much of the graphics calculations on the GPU as possible, while sending it the least amount of data. Very clever use of hardware :)
It's so great that we kept using this for the next 17 Bloody years without changing anything
That's some badass architectural-fu. Well done.
Thanks! Those ps2 tidbits are truly amazing
Yesss, I was waiting for this!
Thanks for explaining this! I hope you explain the crepuscular window rays too at some point, even though I expect that's much simpler.
That just goes to show how much it pays to know your hardware.
I don't know the slightest thing about the PS2 hardware, so I had no real point of reference for whether any of this made sense or not. (especially since it seemed to rely on knowing how the vector processors in the PS2 worked, and how they interacted with the CPU), but it does seem well thought out, assuming the hardware limitations involved...
With the Mega Drive stuff I had a better sense of what was going on, since I actually know something about the hardware architecture. (not as well as I know the SNES, but still...)
For later systems I know a decent amount about the n64, and a handful of architectural things about the Gamecube (I guess you can spot a trend here. ;p), though not in all that much detail. (I do know the TEV is a strange beast compared to PC hardware, that's for sure.), of course, knowing the Gamecube also implies knowing the Wii, since it's very similar, but that's neither here nor there...
I know the general principles of 3d rendering and particle systems though, so that helps quite a bit in making sense of what you did here...
Still, calculating the particles from scratch based on their age is a counter-intuitive approach that I likely wouldn't have thought of (or at least, not initially), since in my previous coding experience recalculating the same thing over and over is not usually thought of as being good for performance...
Again though, it shows how critical it is to know your hardware - it makes sense if there's a bandwidth issue between the CPU and Vector units that at some point the performance loss from recalculating things over and over ends up being less important than the performance loss for constant bus transfers...
Ah, so much to think about here... XD
Love the videos. Your videos have made me rethink how I do many of the solutions in my non game related applications. Thanks for the info.
Awesome vid.
I love seeing how programmers made the most of each different hardware platform.
Freaking clever approach. It's beautiful
Personally, on PS2, I managed to display only 120 000/130 000 triangles per frame (they are textured 3D models), of course I can probably still optimize.
For the VU, you said we could not send all the data to the VU, but that's true for everything, even for 3D models, the goal being that the VU's memory serves as a memory cache. DMA transfers do not block, so you can send your data to the VU and do your calculations at the same time (CPU or VU).
The biggest optimization is the one where you have to do parallelism (don't wait for the data), the other is to optimize the assembly code so that there is no pipeline stall
PS: Sorry ,if my english is bad :)
Cool tricks is what got me interested in software development as a kid!
Finally, another impossible subject! Seriously though, I love your content. Really great insight, thank you.
these videos, along with those of pannenkoek2012, motivated me to start programming my own games by myself
thanks for relative deep-dive. This was super interesting and would love to see more content at this level in future!
Dude, y'all are insanely smart. Kind of makes me worried about getting a job.
A job in?
These videos are so soothing
The math formula at 6:00 suggests to me why physics are tied to frame rates. I've noticed a lot of physics formulas require a time delta, and frames are a convenient source of it in games
I mean, I can't calculate position from velocity if I don't know how long that something was doing it.
It is for videos like this one that I subscribed for in the first place.
Very good technique. Of course the limitation is that it would make complex particle behaviour like collisions impossible.
Great explanation, thanks for this episode.
Thanks for the insight in the classic playstation and hardware.
This was a great explanation, and a brilliant solution. Thank you!
It is very cool what you can do when you program down to the hardware level
I very much enjoyed the more technical insight, thank you for sharing :)
Nice! I did something similar for a PyWeek entry. The thing there is (most) everything has to be coded in python. I say almost, because we can use OpenGL shaders. So for particles, I did all computations of position, rotation, scaling, color, etc within a vertex shader. The only thing the CPU had to do was load the data for the particular particle system, prepare a buffer of triangles and call the shader only passing it the age of the system. (In case you're curious, the entry was called "not your data")
Just amazing.
That was awesome. I'd love a video on how the VPU is so "good at maths". Given its name, I assume it's capable of doing many calculations at once?
It's interesting how much of this is still applicable today, but with the GPU !
Fascinating insight!
Now do this on a modern system.
*BILLIONS, EVEN TRILLIONS OF PARTICLES PER SECOND*
“Pretty Complex”? I was surprised when the video ended, feeling like that as only the intro preface part.
Sad i can only click "like" once :(
These deeper tech dives are orgasmic, please keep em coming!
I love this stuff! :D "Just jam in there!" Crazy, guys :)
This is awesome.
This is what happens when you have a huge OCD, a bit of time and a lot of competence, which is how everything should be done in programming. AWESOME
Neat video. Always nice to see the amazing tricks used.
I love these videos! Would love to hear more of the practical implementation of these systems; especially with the Crash/Madagascar PS2 examples. The explanation of the systems is clear, but how are you able to design these systems to create parallax effects (God rays from stained glass) or other effects in game? Also hearing about how the design team looks to balance the hardware and software limitations to create a great looking frame/render while at the same time balancing the other logic behind the scenes.
Hi, very interesting and great use of the hardware. How do you deal with porting or cross-platform development if you use such target-specific approaches? would be great to get some insides how this is done. I guess nowadays most developers use separate engines which make the development hardware independent (unity, unreal engine,…), but was this always the case?
He's said in the past that each generation he basically had one system he used/worked with and he just trusted the other developers in his team to make the other systems work. So he probably doesn't know much about what tricks were used for the other consoles at the time.
Gregory Norris ah thanks. I might have missed that part/video.
I worked on the first Xbox ports of that engine - it just means inserting the abstraction layer yourself, so that game can make the same calls on any platform and the engine handles them in the most efficient way for the hardware. We struggled to make the original Xbox match that particle performance...the PS2 could really shift triangles using Dave's microcode. And there were occasional hardware tricks like setting negative fog values on PS2 to create an X-ray effect - but other platforms would clamp the fog to zero. Of course, the Xbox made up for it with these new-fangled "shaders"... :)
Chris Payne hi Chris. Thanks vor your reply and the insides. So it’s similar like modern Game Development, except that the ‘engine’ or abstraction layer would be developed separately and not be taken out of a box. Sounds quite challenging, but nice to get some insides how it was done back in the days.
I just looked this guy up on Wikipedia, and he has worked on some epic games!